Intel Delivers 10,000 Aurora Supercomputer Blades, Benchmarks Against Nvidia and AMD
With 2 exaflops of performance, the Intel-powered Aurora supercomputer is expected to top the Top 500 list of fastest supercomputers, beating out the current world’s fastest AMD-powered Frontier supercomputer. However, due to continued delays in hardware deliveries by Intel, Aurora has yet to submit its benchmarks to the Top 500 committee, which is why it was not included in the list announced today. Intel shared new details about the system today, announcing at the ISC conference that it has delivered “more than 10,000” working blades for its Aurora supercomputer. Note, however, that these are not actual working blades. actual A blade is required for full deployment. More on that below.
However, Intel says the system will be fully operational later this year and will share benchmarks with Aurora to take on AMD and Nvidia powered supercomputers, with a 2x performance advantage over AMD’s MI250X GPU and Nvidia 20% improvement over the H100. GPUs.
Intel said it has delivered silicon for “more than 10,000” blades of both its fourth-generation Sapphire Rapids Xeon chips and Ponte Vecchio GPUs to the Argonne Leadership Computing Facility (ALCF).
However, Aurora was designed to run on Intel’s HBM-equipped Sapphire Rapids ‘Xeon Max’ chips, and has been lagging ever since. Because of these delays, Intel initially started shipping his ALCF of non-HBM Sapphire Rapids chips, and the facility started putting standard non-HBM Sapphire Rapids on Aurora as a stopgap measure.
Intel is currently offering Xeon Max chips with faster HBM to ALCF, but not all of the 10,000 blades it pushes to have Max chips inside. After contacting Intel, a representative of the company confirmed that not all blades will feature his final Xeon Max silicon. According to the company, about 75% of the blades contain the final version of silicon Xeon Max. Perhaps this is the bottleneck preventing the system from submitting his Top500 list benchmarks.
The system is configured in 166 racks with 64 blades per rack, for a total of 10,624 blades, so the system requires “over” the 10,000 blades delivered to operate. It might be good enough, but full performance isn’t good enough.
Intel also shared detailed specifications for the Aurora supercomputer, including the detailed specs on the slide above. With 21,248 CPUs and 63,744 Ponte Vecchio GPUs, Aurora will deliver over 2 exaflops of performance when fully online by the end of the year. The system also features 10.9 petabytes (PB) of DDR5 memory, 1.36 PB of HBM attached to the CPU, 8.16 PB of GPU memory, and 230 PB of storage capacity delivering 31 TB/s of bandwidth (Slide up for other interesting details).
Intel also revealed that Aurora will start running generative AI workloads on a number of workloads. Aurora GPT’s large language model is scientifically oriented, with his 1 trillion parameters based on Megatron and DeepSpeed. Intel provided the following summary of the project.
“These scientific generative AI models are trained on general text, code, scientific text, and structured scientific data from biology, chemistry, materials science, physics, medicine, and other sources. The resulting model, which contains a trillion parameters, will open new and interesting experiments in molecular and materials design, systems biology, polymer chemistry and energy materials, climate science, and cosmology. It is used in a variety of scientific applications, from synthesizing knowledge across millions of sources suggesting accelerated identification of biological processes associated with cancer and other diseases and suggested targets for drug discovery. It is used for
Intel also published some benchmarks from the Sunspot system, a smaller two-rack version of Aurora with a total of 128 nodes. Intel compared Sunspot’s performance to estimated numbers representing a “similarly sized” Polaris supercomputer with Nvidia A100 GPUs and a Crusher supercomputer with AMD’s MI250X GPUs. Unfortunately, Intel doesn’t provide test notes or details for these configurations, so take the results with a grain of salt.
In a single-node test of the Reactor Prediction workload, Intel claims its system is 45% faster than Nvidia’s competitor and 12% faster than an AMD system. Turning to scalability metrics, Intel puts the total number of GPUs used in the test system at 96 GPUs (AMD and Nvidia nodes have 4 GPUs each, compared to 6 GPUs per node on Intel systems). ), Sunspot claims to deliver more than twice the performance. Performance on both AMD and Nvidia systems on Monte Carlo workloads. For 90 nodes of the NWChemEx workload, Intel claims it is 72% faster than his Nvidia-powered Solaris system with 90 nodes.
The Aurora supercomputer was first announced in 2015, with a projected completion date of 2018. At the time, the system was designed to use the later canceled Knights Hill processor. Since then, the system has been redesigned and rescheduled many times, with the new Aurora announced in 2019 and expected to deliver 1 exaflops of performance in 2021. Yet another rescheduling for late 2021 claimed he would provide 2 exaflops when the system was complete. It is scheduled for later this year.
The long and winding road continues, but it seems that the end is finally in sight. Intel has said that it will be providing all his Xeon Max processors to complete the system soon and plans to submit the first Top 500 benchmarks with the system completed by the end of the year.