AMD MI300X Guzzles Power, Rated for 750 Watts

We are still working on the official launch of AMD’s AI Data Center Accelerator MI300X. That’s certainly negligible processing power, and AMD aims to use it as a stick to knock Nvidia out of its position as the dominant player in the world of AI acceleration. However, even though new architectures typically become more power efficient (the same unit of work consumes less energy), performance gains can lead to increased power consumption. And AMD’s OAM-based (OCP accelerator module), MI300X, is indeed a power hog. At 750 W, it’s actually the highest rated TDP in its form factor yet. But don’t worry. OAM solutions are specified to deliver up to 1000 W of power, so there is still room for further performance scaling.
750 W is a staggering amount of power consumed by any piece of PC hardware (at least from a personal point of view), but these watts are being driven by much faster and specialized hardware than PC hardware. You have to keep in mind that you are supplying power. The same is true for AMD’s most powerful graphics cards. For that wattage, AMD offers what it claims to be the highest performing accelerator for AI-related workloads (both generative AI and large language models). [LLM] process).
Considering how AMD packed 12 chiplets built across two manufacturing processes (8x 5nm) [GPU] and 4x 6nm node [I/O die] The total number of transistors is 153 billion, so there may be some support for that claim. Of course, there is also the issue that AMD was able to run his 40 billion parameter LLM model (Falcon 40-B) on a single MI300X. This is amazing, especially considering AMD aims to expand his MI300X to up to 8 accelerators in his one package.
row 0 – cell 0 | AMD MI300X | AMD MI300A | AMD MI250X | AMD RX7900XTX |
CPU core | 0 | 3x 8-core CCDs (24 cores) [Zen 4] | – | – |
GPU core | 8x GCDs (304 CUs) [CDNA 3] | 6x GCDs (228 CUs) [CDNA 3] | (220CU) [CDNA 2] | (RDNA3) |
addressable memory | 192GB (24GB HBM3 x 8) | 128GB (8x16GB HBM3) | 128GB (16GB HBM2e x 8) | 24GB GDDR5 |
memory bandwidth | 5.2TB/s | 5.2TB/s | ~ 3.28 TB/s | 384GB/s |
Infinite fabric bandwidth | 896GB/s | 896GB/s | 800GB/s | – |
number of transistors | 153 billion | 146 billion | ~ 58.2 billion | ~ 57 billion |
TDP | 750W | ? | 560W | 355W |
As you can see from the table above, AMD is focused on improving power efficiency, but not enough to offset the increasing computing requirements of high-performance computing (HPC) scenarios. Currently, that scenario involves dealing with an LLM model that appears to pop left and right. . Increased performance requirements mean that even with AMD’s latest power saving technology and techniques, and TSMC’s latest manufacturing technology, a 190W power envelope increase is still required.
However, its 190W TDP increase (~33% higher power consumption) comes to powering ~3x more transistors compared to the MI250X. This is an impressive result showing efficiency gains even without considering MI300X’s improved support for sparse algorithms (very important for LLM and AI processing). That said, nothing can be said about the difference between AMD’s computing accelerator and the company’s flagship gaming GPU, his relatively poor RX 7900 XTX.