Gaming PC

Ampere Unveils 192-Core CPU, Then Offers Controversial Test Results

Ampere this week Ampere One processor For the industry’s first cloud data center with up to 132 general-purpose CPUs for AI inference.

The new chip draws more power than its predecessor, the Ampere Altra (which will remain in Ampere stable for at least some time), but the company promises up to 192 cores despite the higher power consumption. It claims that processors with , offer higher computational density than CPUs. From AMD and Intel. Some of these performance claims are controversial.

192 custom cloud native cores

Ampere’s AmpereOne processor features 136-192 cores (32-128 cores in Ampere Altra) running at up to 3.0 GHz and is powered by the company’s proprietary Armv8.6+ instruction set architecture (with two 128-bit vectors). Based on the implementation. FP16, BF16, INT16, and INT8 formats) with 2MB of 8-way set associativity L2 cache per core (up to 1MB) and a mechanical network with 64 home nodes and directory-based snooping. interconnected using filter. In addition to the L1 and L2 caches, the SoC also has 64MB of system level cache. The new CPUs are rated between 200W and 350W depending on the exact SKU, while the Ampere Altra is rated between 40W and 180W.

(Image credit: Ampere)

The company claims that the new cores are even more optimized for cloud and AI workloads, featuring “power and efficiency” improvements in instructions per clock (IPC). This probably means higher IPC (compared to his Neoverse N1 from Arm used in Altra) without any visible increase. in power consumption and die area. As for the die area, Ampere has not disclosed it, but has stated that the AmpereOne is built on one of TSMC’s 5nm class process technologies.

(Image credit: Ampere)

Ampere hasn’t revealed all the details of the AmpereOne core, but the high-precision L1 data prefetcher (reducing latency, ensuring less time for the CPU to wait for data, and minimizing memory accesses helps the system power consumption). Sophisticated branch misprediction recovery (the faster the CPU can detect and recover from a branch misprediction, the less latency and less power wasted), and advanced memory disambiguation (IPC , minimizes pipeline stalls, maximizes out-of-order execution, reduces latency, and improves handling of multiple read/write requests in virtualized environments).

Related Articles

Back to top button