Nvidia Unveils DGX GH200 Supercomputer and MGX Systems, Grace Hopper Superchips in Production

admin May 29, 2023

0 7 minutes read

It may be a little late to market, but NVIDIA CEO Jensen Huang told Computex 2023 in Taipei, Taiwan that the company’s Grace Hopper superchip is now in full production and that the Grace platform Announced 6 wins on supercomputers. These chips are the basic building blocks of one of Huang’s other big announcements at Computex 2023. The company’s new DGX GH200 AI supercomputing platform is built for large-scale generative AI workloads, combining 256 Grace Hopper superchips to form a 144 TB supercomputing powerhouse. It is now possible. Amount of shared memory available for the most demanding generative AI training tasks. Nvidia has customers like Google, Meta, and Microsoft already ready to accept their cutting-edge systems.

Nvidia also announced a new MGX reference architecture that will enable OEMs to build new AI supercomputers faster, with up to 100+ systems available. Finally, the company also announced a new Spectrum-X Ethernet networking platform specifically designed and optimized for AI servers and supercomputing clusters. Let’s dive in.

image 1 of 8

(Image credit: Tom’s Hardware)

Nvidia Grace Hopper super chip goes into production

We have detailed the Grace and Grace Hopper Superchips in the past. These chips are at the heart of a new system that Nidia announced today. The Grace chip is Nvidia’s proprietary Arm CPU exclusive processor, and the Grace Hopper superchip combines a Grace 72 core CPU, Hopper GPU, 96 GB of HBM3, and 512 GB of LPDDR5X in the same package, all in 200 billion units equivalent to the transistor of . This combination provides incredible data bandwidth between CPU and GPU, with throughput of up to 1 TB/s between CPU and GPU, providing tremendous benefits for certain memory-constrained workloads. increase.

The Grace Hopper superchip is now in full production, with systems expected to come from Nidia’s group of system partners including Asus, Gigabyte, ASRock Rack and Pegatron. More importantly, Nvidia is rolling out its own system based on the new chip and has published OxM and hyperscaler reference design architectures. This is explained below.

Nvidia DGX GH200 supercomputer

image 1 of 2

Nvidia’s DGX system is the go-to system and reference architecture for the most demanding AI and HPC workloads, but today’s DGX A100 system features eight A100 GPUs working together as one cohesive unit. is limited to Given the explosive popularity of generative AI, Nvidia’s customers are eager for large-scale systems with much better performance. DGX H200 is designed to deliver ultimate throughput for massive scalability on the biggest workloads such as generative AI training, large language models, and recommenders. Achieve system and data analytics by bypassing the limitations of standard cluster connectivity options such as InfiniBand and Ethernet using Nvidia’s custom NVLink switch silicon.

Details on the new DGX GH200 AI supercomputer have yet to be revealed, but Nvidia has used a new NVLink switch system with 36 NVLink switches to power 256 GH200 Grace Hopper chips and 144 We know that we are combining TB of shared memory into one cohesive unit. It looks and acts like one giant GPU. The new NVLink switch system NVLink switch Now it’s 3rd generation silicon.

The DGX GH200 features a total of 256 Grace Hopper CPU+GPUs, easily surpassing Nvidia’s largest NVLink-connected DGX configuration to date with 8 GPUs, and 144TB of shared memory is “just ”500 times more than the DGX A100 system, which offers 320GB of shared memory. Memory between 8 of his A100 GPUs. Additionally, scaling a DGX A100 system to clusters with more than eight GPUs requires adopting InfiniBand as the interconnect between systems, which reduces performance. In contrast, the DGX GH200 is Nvidia’s first to build an entire supercomputer cluster around his NVLink switch topology, and Nvidia claims up to 10x more between GPUs and between CPUs than the previous generation. It claims to offer 7x more bandwidth with system. It’s also designed to offer 5x interconnect power efficiency (possibly measured as PJ/bit) over competing interconnects and up to 128 TB/s bisection bandwidth.

This system has 150 miles of optical fiber and weighs 40,000 pounds, but sees itself as a single GPU. With 256 Grace Hopper superchips, the DGX GH200’s “AI performance” reaches 1 exaflops, according to Nvidia. This means that the values are measured on smaller data types that are more relevant to AI workloads than his FP64 measurements used in HPC and supercomputing. This performance is powered by 900 GB/s of GPU-to-GPU bandwidth. This is very good scalability considering the Grace Hopper achieves up to 1 TB/s throughput on the Grace CPU when connected directly on the same board using NVLink. -C2C chip interconnection.

Nvidia provided a predictive benchmark of the DGX GH200 with NVLink switch system going head-to-head with an InfiniBand-connected DGX H100 cluster. Nvidia used varying numbers of GPUs ranging from 32 to 256 to compute the above workload, but each system used the same number of GPUs in each test. As you can see, the explosion in interconnect performance is expected to result in 2.2x to 6.3x performance gains.

Nvidia plans to deliver the DGX GH200 reference blueprint to key customers Google, Meta and Microsoft by the end of 2023, as well as the system as a reference architecture design for cloud service providers and hyperscalers. is.

Nvidia also eats its own dog food. The company has introduced its new Nvidia Helios supercomputer, consisting of four of his DGX GH200 systems, which it plans to use for its own research and development work. The four systems, containing a total of 1,024 Grace Hopper superchips, are connected by Nvidia’s Quantum-2 InfiniBand 400 Gb/s networking.

Reference Architecture for Nvidia MGX Systems

(Image credit: Nvidia)

DGX is responsible for high-end systems, while Nvidia’s HGX systems are responsible for hyperscalers. However, the new MGX system will act as a middle ground between these two systems, and DGX and HGX will continue to coexist with the new MGX system.

Nvidia’s OxM partners are facing new challenges with AI-centric server designs that are slowing design and deployment. Nvidia’s new MGX reference architecture is designed to speed up that process with over 100 reference designs. The MGX system consists of a modular design spanning Nvidia’s portfolio of CPUs and GPUs, DPUs and networking systems, including designs based on the popular x86 and Arm-based processors found in today’s servers. It is Nvidia also offers both air-cooled and water-cooled design options, allowing him to offer his OxM a variety of design points for a wide range of applications.

image 1 of 7

Unsurprisingly, Nvidia points out that QCT and Supermicro’s primary systems will be powered by Grace and Grace Hopper superchips, but x86 flavors will likely expand the range of systems available over time. I expect. Asus, Gigabyte, ASRock Rack, and Pegatron all plan to use his MGX reference architecture for systems hitting the market later this year or early next year.

The MGX reference design could be a surprise announcement after Nvidia’s Computex press bombardment. These will be the systems that mainstream data centers and enterprises will eventually deploy to adopt AI-centric architectures, and they will ship in far greater numbers than their somewhat outlandish designs. and the more expensive DGX system – these are the factors that add to the volume. Nvidia is still finalizing the specs, which will be published and a white paper will be released soon.

Nvidia Spectrum-X networking platform

image 1 of 2

Reference Architecture for Nvidia MGX Systems — (Image credit: Nvidia)

Nvidia’s acquisition of Mellanox is a pivotal move for the company as it can optimize and tune its network components and software for AI-centric needs. The new Spectrum-X networking platform is perhaps the perfect example of these capabilities, with Nvidia touting it as “the world’s first high-performance Ethernet for AI” networking platform.

image 1 of 8

One of the key takeaways here is that Nvidia is focusing on Ethernet as the interconnect for its high-performance AI platform, rather than the InfiniBand connections often found in high-performance systems. The Spectrum-X design employs Nvidia’s 51 Tb/s Spectrum-4 400 GbE Ethernet switches and his Nvidia Bluefield-3 DPUs in combination with software and his SDK, allowing the developer to optimize his AI workloads. You can tailor the system to your unique needs. In contrast to other Ethernet-based systems, Spectrum-X is lossless, so it offers excellent QoS and latency, Nvidia said. It also includes new adaptive routing technology, which is especially useful in multi-tenant environments.

The Spectrum-X networking platform is a foundational aspect of Nvidia’s portfolio as it brings high-performance AI cluster capabilities to Ethernet-based networking and provides new options for widespread deployment of AI on hyperscale infrastructure. . The Spectrum-X platform is fully interoperable with existing Ethernet-based stacks, offering up to 256 200 Gb/s ports on a single switch or 16,000 ports in a two-tier leaf-spine topology. Offers great scalability.

The Nvidia Spectrum-X platform and its associated components, including 400G LinkX optics, are available now.

Nvidia Grace and Grace Hopper Superchip Supercomputing Wins

image 1 of 2

Nvidia’s first Arm CPU (Grace) is already in mass production and has won three recent supercomputer awards, including the newly announced Taiwania 4 built by computing vendor ASUS for Taiwan’s National High Performance Computing Center Influenced by The system is powered by 44 of his Grace CPU nodes, which Nvidia claims will rank him one of Asia’s most energy-efficient supercomputers when deployed. Supercomputers are used to model climate change issues.

Nvidia also revealed details of its new Taipei 1 supercomputer based in Taiwan. The system will include 64 DGX H100 AI supercomputers and 64 of his Nvidia OVX systems connected with the company’s networking kit. The system will be used for more unspecified local R&D workloads when completed later this year.