AMD Instinct MI300 Data Center APU Pictured Up Close: 13 Chiplets, 146 Billion Transistors

AMD unveiled its next-generation Instinct MI300 accelerator at CES 2023. We were lucky enough to get some hands-on time and take some close-up images of the mammoth chip.
Without a doubt, the Instinct MI300 is a groundbreaking design. The data center APU blends a total of 13 chiplets, many of which are 3D stacked to create a chip with 24 Zen 4 CPU cores fused with CDNA 3 graphics. 8 stacks of engines and HBM3. All in all, this chip features his 146 billion transistors, making it the largest chip AMD has put into production.
The MI300 has a total transistor count of 146 billion, easily beating Intel’s 100 billion transistor Ponte Vecchio combined with 128 GB of HBM3 memory. The stripped chip is very difficult to photograph given its shiny appearance, but you can clearly see his eight stacks of HBM3 flanking the center die. A small sliver of structural silicon is placed between these HBM stacks to ensure stability when the cooling solution is torqued onto the package.
The computing portion of the chip is made up of nine 5nm chiplets which are either CPU or GPU cores but AMD has not provided details on how many of each will be used. Zen 4 cores will typically roll out as an 8-core die, so we can consider 3 CPU dies and 6 GPU dies. The GPU die uses AMD’s CDNA 3 architecture. This is his third revision of AMD’s data center-specific graphics architecture. AMD did not specify his CU count.
These nine dies are 3D stacked on top of four 6nm-based dies that are more than just passive interposers. These dies are said to be active and handle I/O and various other functions. An AMD rep showed us another MI300 sample of his. In this sample, the top die was sanded with a belt sander, revealing four active interposer dies underneath. There, a memory controller that interfaces with the HBM3 stack, as well as communication between I/O tiles. This second sample of his could not be filmed.
The 3D design enables incredible data throughput between the CPU, GPU, and memory dies while also allowing the CPU and GPU to operate on the same data in memory simultaneously (zero-copy), saving power and improving performance. improves and simplifies programming. It will be interesting to see if this device can be used without standard his DRAM as seen with Intel’s Xeon Max CPU which employs on-package HBM.
AMD officials were baffled by the details. So it’s not clear if AMD will use the standard TSV approach to fuse the top and bottom dies, or a more advanced hybrid bonding approach. AMD is said to share more details about the packaging soon.
AMD claims the MI300 offers 8x more AI performance and 5x more performance per watt than the Instinct MI250 (measured at FP8 for sparsity). AMD also says it can reduce training time for ultra-large AI models such as ChatGPT and DALL-E from months to weeks, saving millions of dollars in power.
The current generation Instinct MI250 powers the world’s first exascale machine, the Frontier supercomputer, while the Instinct MI300 powers the upcoming 2 exaflop El Capitan supercomputer. AMD says these Halo MI300 chips are expensive and relatively rare. These are not mass-produced products, so they will not be widely deployed like the EPYC Genoa data center CPUs. However, this technology filters multiple variants in different form factors.
The chip also competes with Nvidia’s Grace Hopper Superchip. This is a combination of Hopper GPU and Grace CPU on the same board. These chips are expected to arrive later this year. The Neoverse-based Grace CPU supports the Arm v9 instruction set, and the system comes with two of his chips fused with Nvidia’s newly branded NVLink-C2C interconnect technology. AMD’s approach is designed to provide superior throughput and energy efficiency. This is because combining these devices into one package of his typically results in higher throughput between the units than connecting two separate devices.
The MI300 also competes with Intel’s Falcon Shores. The chip features a dizzying number of possible configurations with varying numbers of compute tiles with x86 cores, GPU cores and memory, but they aren’t expected to arrive until 2024.
Here you can see the bottom of the MI300 package with the contact pads used for the LGA mounting system. AMD has not shared details of the socket mechanism, but will reveal more soon. The chip is currently in AMD’s labs and the company plans to ship his Instinct MI300 in late 2023. The supercomputer will be the world’s fastest supercomputer when deployed in 2023. Currently on schedule.