Gaming PC

India Chooses Arm’s Neoverse for National Chip Design Push

Indian Advanced Computing Development Center (C-DAC) announced this week[PDF] Japan’s first self-designed High Performance Computing (HPC) CPU. India’s first chip, dubbed ‘Aum’, is a chiplet design that can scale up to 96 cores and is based on Arm’s v8.4 ‘Zeus’ Neoverse V1 design (the same one AWS uses in Graviton3). It is based on and is expected to enter the market as follows: On TSMC’s 5nm process as early as 2024.

Aum was developed as part of the National Supercomputing Mission, a program aimed at reducing India’s exposure to export restrictions. To that end, the goal is to introduce a nationally developed processor his architecture. Perhaps alarming for both Intel and AMD, Aum’s aim is to be usable in both high-performance computing scenarios and personal computing chips. And where parrots are present, the market is smaller.

The reason is simple. If India has the ability to design chips (which means mixing and matching parts from Arm’s free portfolio, or directing specific implementations for the final manufacturable design). implied or not), possible technology export restrictions could be relaxed a bit more. At the same time, the National Computing Mission also aims to improve security against eventual backdoors. A neutral design provider like Arm fits naturally into these concerns. And while controlling the design process itself won’t get you there (unless a backdoor is applied to the factory floor by a motivated and capable adversary), it’s a head start. The planned use of open source software to underpin a specialized software ecosystem also paints a more diverse software future, a segment that could fragment if given enough time. is more than just hardware.

Aum’s package and discrete A48Z chiplet design. (Image credit: C-DAC)

The core of the 96-core Aum chips is the A48Z chiplet, each featuring 48 Arm Zeus cores (3 GHz base, 3.5 GHz turbo) with 96 MB of immediate access L2 cache, as well as cores and additional system memory. Overall, each Aum package offers up to 16 DDR5 memory channels (at 5200 MHz, yielding 332.8 GB/s bandwidth) and 64 GB of HBM3 memory (6.4 GHz stock, tuned to 5.6 GHz on initial release, which is a staggering typical 2.87 TB/s). 128 PCIe Gen 5 lanes add byte throughput, 64 of which enable additional accelerators (such as GPU and FPGA accelerators).

C-DAC AUM chip presentation materials

Parrot interconnect memory subsystem. (Image credit: C-DAC)

The remaining 64 may be routed to the chip’s internal communication fabric, a coherent mesh network of NUMA-style full-memory coherent links based on the CCIX protocol. This link is used by two Aum sockets to communicate and will require a design page or two from AMD’s Infinity Fabric.

C-DAC AUM chip presentation materials

Spec comparison between Fugaku’s C-DAC’s Aum HPC processor and Fujitsu’s A64FX. (Image credit: C-DAC)

According to the documentation, Aum’s design is primarily aimed at increasing the amount of memory bandwidth available for each flop of computing power (byte/flop ratio). This has proven to be a major limiting factor in performance scaling for HPC computations. Too few lanes (memory throughput) and too many cars (floating operations per second) can only end in one direction. As a result, Aum and its Arm architecture are targeting performance of 4.6 teraflops per socket and 3 TB/s of total memory bandwidth. This puts the byte/flop ratio at 0.7, much higher than the 0.38 hit by Japan’s Fugaku, the world’s fastest Arm supercomputer, and definitively beats the US IBM- and Nvidia-based Summit (<0.2 bytes/flop). exceeds . However, with an expected 300W TDP, it actually seems to be less energy efficient compared to Fugaku's A64FX Arm core.

C-DAC AUM chip presentation materials

Byte/flop efficiency metrics for several HPC systems. (Image credit: C-DAC)

If all goes according to plan, India’s ParrotArm CPU would be a strong entry into the supercomputing arena. The point is, at least for the first iteration, it will be homegrown, if not dramatically so. It’s clear that a lot of work has gone into evolving the entire memory subsystem, and memory in general is easier and more affordable than the TSMC 5nm chips Parrot is made from. Customizing the CPU core itself could be the next step for C-DAC, paving the way for India and giving momentum to the “chip nationalization” process in other countries. By the way, China is also interested in Arm. But that is a whole other story.

Related Articles

Back to top button