AMD’s EPYC ‘Bergamo’ and Zen 4c Detailed: Same as Zen 4, But Denser
Ever-increasing performance demands in cloud data centers have forced CPU developers to rethink their designs for maximum performance per socket while facing cost constraints set by Moore’s Law slowdown need to do it. AMD’s EPYC ‘Begamo’ is the industry’s first x86 cloud-native CPU based on a specially tuned Zen 4c microarchitecture, essentially maintaining the same feature set as the Zen 4 microarchitecture while reducing core size requirements. reported to be able to halve semi-analytical.
AMD’s EPYC ‘Bergamo’ processors pack 128 cores, are housed in the same socket SP5 as the 96-core EPYC ‘Genoa’ CPUs, feature a similar 12-channel DDR5-4800 memory subsystem, and have the same I/O die (codenamed )Use the. Floyd), meaning it also features 128 PCIe Gen5 lanes and other features of his SP5 offering. As a cloud-native system-on-a-chip (SoC) and in part a response to the rise of Arm-based data center-grade his SoCs by Ampere, Amazon, Google and Microsoft, Bergamo’s design is based on multiple factors such as: formed by It focuses on efficiency, power usage, die size, and low total cost of ownership (TCO), rather than aiming for maximum performance per core.
row 0 – cell 0 | EPYC 9654 | EPYC 9754 | EPYC 9734 |
design | Genoa | Bergamo | Bergamo |
micro architecture | Zen 4/Persephone | Zen 4c/Dionysus | Zen 4c/Dionysus |
core/thread | 96/192 | 128/256 | 112/224 |
L1i cache | 32KB | 32KB | 32KB |
L1d cache | 32KB | 32KB | 32KB |
L2 cache | 1MB | 1MB | 1MB |
Total L2 cache | 96MB | 128MB | 112MB |
L3 cache per CCX | 32MB | 16MB | 16MB |
Total L3 cache | 384MB | 256MB | 256MB |
CCD | Durango | Vindhya | Vindhya |
number of CCDs | 12 | 8 | 8 |
CCX per CCD | 1 | 2 | 2 |
Cores per CCD | 8 | 16 | 14 |
I/O die | floyd | floyd | floyd |
memory channel | 12 | 12 | 12 |
Rated memory speed | DDR5-4800 | DDR5-4800 | DDR5-4800 |
memory bandwidth | 460.8GB/s | 460.8GB/s | 460.8GB/s |
PCIe 5.0 lanes | 128 | 128 | 128 |
TDP/Maximum TDP | 360W/400W | 360W/400W | 360W/400W |
socket | SP5 | SP5 | SP5 |
Scalability | 2P | 2P | 2P |
At the microarchitectural level, Zen 4c maintains the same design as Zen 4, including identical features and instruction-per-clock performance, but is configured and implemented significantly differently, SemiAnalysis claims. increase. According to SemiAnalysis, Zen 4c’s ‘Dionysus’ core is about 35.4% smaller compared to Zen 4’s ‘Persephone’ core. To achieve this, AMD had to implement a number of design tricks. Analysts think:
- Boost clock target decreased from 3.70 GHz to 3.10 GHz. This makes timing closure easier and reduces the need for additional buffer cells to meet relaxed timing constraints. Since today’s designs are often constrained by wiring density and congestion, lower frequencies allow signal paths to be packed more tightly, increasing the density of standard cells.
- This reduced the number of physical partitions on the die and packed the logic closer together, making debugging and introducing fixes more difficult, but reduced the size of the die.
- To reduce the SRAM area, Zen 4c used high-density 6T dual-port SRAM cells, as opposed to Zen 4’s 8T dual-port SRAM circuitry. As a result, Zen 4 and Zen 4c cores have similar L1 and L2 cache sizes, but in Zen 4c the caches use less space, but these caches are also not as fast as the caches in Zen 4. .
- Finally, we removed the Through Silicon Via (TSV) array in the 3D V-Cache to save even more silicon.
These weren’t the only die area reduction methods AMD used. According to SemiAnalysis, AMD’s Bergamo will be based on an 8 Vindhya core complex die (CCD) that packs 16 Zen 4c cores (up from 8 Zen 4 cores per CCD). This is justified by the smaller core, but it also impacts the clock speed potential. Each CCD also has two 8-core core complexes (CCX) and 32MB of L3 cache (16MB per CCX). In contrast, each Zen 4 CCX has 32MB of L2, a significant increase in size compared to the Zen 4c CCX.
All in all, I would say that AMD’s Zen 4c and Bergamo changed their design trajectory as they had to fit the 128 Zen 4 class cores into the same 360W to 400W power envelope as Genoa. By reducing frequency targets, using denser SRAM cells and cutting L3 by half per CCX, AMD could certainly increase core counts, but what effect would that have on performance per core? It will not be clarified yet.
According to SemiAnalysis, AMD is gearing up to launch two Bergamo processors later this month, the 128-core EPYC 9754 and its slightly stripped-down sibling the 112-core EPYC 9734. , one can only wonder how many custom and semi-custom Bergamo products AMD will eventually produce, but for now two models are already due to be introduced next week.
Dan McNamara, head of AMD’s server business, said: “Bergamo will launch next week with cloud-native optimized devices that are energy-efficient and have exceptional performance-per-watt for cloud-native computing. I will hear that it is,” he said. Chief at Bank of America 2023 Global Technology Conference (via) looking for alpha).