Gaming PC

Nvidia Details Grace Hopper CPU Superchip Design: 144 Cores on 4N TSMC Process

Ahead of next week’s Hot Chips 34 presentation, Nvidia has released new details about its Grace CPU Superchip, revealing that the chip will be manufactured on its 4N process. Nvidia also shared more information about its architecture and data fabric, along with performance and efficiency benchmarks. Nvidia hasn’t made a formal presentation at Hot Chips yet — more details will be added after the session — but information shared today broadens the road to Grace chips and servers first to market. Shows half of 2023.

Remember easily. Nvidia’s Grace CPU is the company’s first CPU-only Arm chip designed for the data center, with a total of 144 cores as his two chips on a single motherboard. The Grace Hopper Superchip, on the other hand, is a combination of a Hopper GPU and a Grace CPU. same board.

Among the most important disclosures, Nvidia has finally officially confirmed that its Grace CPUs are using the TSMC 4N process. TSMC List the ‘N4’ 4nm process under the 5nm node family., describing it as an enhanced version of the 5nm node. Nvidia uses a special variant of this node called ‘4N’ that is specially optimized for GPUs and CPUs.

(Image credit: Nvidia)

These types of specialized nodes are becoming more common as Moore’s Law weakens and transistors become harder and more costly to shrink with each new node. To enable custom process nodes like Nvidia’s 4N, chip designers and foundries use design engineering co-optimization (DTCO) to create custom power, performance, and area (PPA) for specific products. They work together by adjusting their traits.

Nvidia has previously revealed that it is using off-the-shelf Arm Neoverse cores for its Grace CPUs, but has yet to identify the specific version they are using. However, Nvidia has revealed that Grace is using Arm v9 cores to support his SVE2, and the Neoverse N2 platform will be Arm’s first to support Arm v9 and SVE2-like extensions. IP. The N2 Perseus platform will come as a 5nm design (remember, the N4 belongs to TSMC’s 5nm family) and will support PCIe Gen 5.0, DDR5, HBM3, CCIX 2.0, and CXL 2.0. The Perseus design is optimized for performance per watt and performance per area. Arm has said that the company’s next-gen core, his Poseidon, won’t hit the market until his 2024, and given Grace’s early 2023 launch date, these cores are likely candidates. sex is lower.

Nvidia Grace Hopper CPU Architecture

Nvidia’s new Nvidia Scalable Coherency Fabric (SCF) is a mesh interconnect very similar to the standard CMN-700 Coherent Mesh Network used in Arm Neoverse cores.

The Nvidia SCF offers 3.2 TB/s of bisection bandwidth between the various Grace chip units such as CPU cores, memory and I/O, not to mention the NVLink-C2C interface that connects the chip to other units. There is none. Motherboard, whether it’s another Grace CPU or a Hopper GPU.

Grace CPU

(Image credit: Nvidia)

Mesh supports over 72 cores and each CPU has a total of 117 MB of L3 cache. Nvidia states that the album’s first block diagram above is “a possible topology for illustrative purposes” and that its arrangement does not exactly match his second diagram.

This diagram shows a chip with 8 SCF cache partitions (SCCs) that look like L3 cache slices (more on that in the presentation) and 8 CPU units (these look like clusters of cores). is showing. The SCCs and cores are connected in two groups to Cache Switch Nodes (CSNs), which reside in the SCF mesh fabric and provide the interface between the CPU cores and memory to the rest of the chip. SCF also supports coherence across up to four sockets using Coherent NVLink.

Grace CPU

(Image credit: Nvidia)

Nvidia also shared this diagram, showing that each Grace CPU supports up to 68 PCIe lanes and up to 4 PCIe 5.0 x16 connections. Each x16 connection supports up to 128 GB/s bi-directional throughput (an x16 link can be split into two x8 links). We also see 16 dual-channel LPDDR5X memory controllers (MC).

However, this diagram is different from the first one. The L3 cache is shown as two contiguous blocks attached to a quad-core CPU cluster. This makes a lot more sense than the previous figure, with a total of up to 72 cores in the chip. However, the first diagram does not show his separate SCF partitions or his CSN nodes, which is a bit confusing. We will discuss this during the presentation and update as necessary.

Nvidia says its Scalable Coherency Fabric (SCF) is a proprietary design, while Arm allows partners to tune core counts, cache sizes, and use different types of memory such as DDR5 and HBM for CMN- We are making the 700 mesh customizable. Choose from different interfaces such as PCIe 5.0, CXL, CCIX. This means Nvidia may use a highly customized CMN-700 implementation for the on-die fabric.

Nvidia Grace Hopper Enhanced GPU Memory

Related Articles

Back to top button