Intel’s New AVX10 Brings AVX-512 Capabilities to E-Cores
Intel posted a new APX (Advanced performance enhancements) Today also announced the new AVX10 [PDF] This provides integrated support for AVX-512 features in both P-cores and E-cores for the first time. This evolution of the AVX instruction set will help Intel avoid serious problems encountered with the new x86 hybrid architecture found in Alder and Raptor Lake processors.
However, the new AVX10 ISA is not supported on Intel’s current generation CPUs. It will be included in future chips. Intel says AVX10 will be the vector ISA of choice in the future for both consumer and server processors.
Intel AVX10 (Advanced Instruction Extensions 10)
At the most basic level, AVX10 will allow Intel chips with both E and P cores to continue to support AVX-512, but 512-bit instructions can only be executed on P cores. The integrated 256-bit AVX10 instructions, on the other hand, can run on either P-cores or E-cores, so the entire chip can still support AVX-512 functionality.
So Intel doesn’t have to disable support for 512-bit vectors like they did when they disabled AVX-512 on both Alder Lake and Raptor Lake.
Looking more closely, the AVX10 (Advanced construction Extensions 10) ISA is a superset of AVX-512 and comes with all the features of the AVX-512 ISA for processors with both 256-bit and 512-bit vector register sizes.
The integrated AVX10 ISA includes “new versions of 256-bit instructions with AVX-512 vector instructions with AVX512VL feature flag, 256-bit maximum vector register length, eight 32-bit mask registers and embedded rounding support”, which runs on both P and E cores.
However, E-cores are limited to the maximum vector length of 256 bits for converged AVX10, while P-cores can use 512-bit vectors. This feels similar to Arm’s support for variable vector widths in SVE.
According to Intel, existing applications can deliver the same level of performance with AVX-512 as with AVX-512, at least for the same vector lengths. Intel also claims that:
- Applications compiled with Intel AVX2 can be recompiled to Intel AVX10 to achieve improved performance without the need for additional software tuning.
- Vector register pressure-sensitive Intel AVX2 applications can get maximum performance with 16 additional vector registers and new instructions.
- Highly threaded and vectorizable applications can potentially achieve higher aggregate throughput when running on E-core based Intel Xeon processors or Intel products with performance hybrid architectures.
Intel will support AVX10 version 1 (AVX10.1) starting with the 6th Gen Xeon “Granite Rapids” chips, but that generation only supports 512-bit vector instructions and not the new unified 256-bit vector instructions. Instead, this first generation serves as his AVX-512 to AVX10 transition chip.
Chips coming after Granite Rapids will support AVX10.2. This adds support for unified 256-bit vector lengths and new features such as new AI data types and conversions, data movement optimizations, and standards support. All future Xeon processors will continue to fully support all AVX-512 instructions, allowing legacy apps to work as expected.
To address developer feedback (obviously negative), Intel also plans to greatly simplify enumeration methods in AVX10 compared to AVX-512. Intel also plans to reduce version and enumeration bloat by ensuring that each new AVX10 revision transition includes enough new instructions and features to warrant the change.
Intel froze the AVX-512 ISA when AVX10 debuted and all future use of AVX-512 instructions will be through the AVX10 ISA. On the other hand, the new AMX is unaffected.
Intel APX (Advanced Performance Extensions)
Intel also today announced the new APX (Advanced Performance Extensions) (not to be confused with the older iAPX 432).
Intel claims that code compiled with APX has 10% fewer loads and 20% fewer stores than the same code compiled for the Intel 64 baseline. Intel also says that register accesses are faster than complex load and store operations and consume significantly less dynamic power. Interestingly, the new APX finds a new use for the 128B space left unused when Intel abandoned his MPX in 2019, reusing it for XSAVE.
APX’s top-level features are:
- 16 Additional General Purpose Registers (GPRs) R16-R31 (also referred to as Extended GPRs (EGPRs) in this document)
- Three-operand instruction format with new data destination (NDD) registers for many integer instructions
- Conditional ISA improvements: New conditional load, store and compare instructions combined with compiler options to suppress writing of status flags for common instructions.
- Optimized register state save/restore operations
- New 64-bit absolute direct jump instruction
Intel claims to have implemented APX in a way that does not affect silicon area or CPU core power consumption.you can Learn more about APX hereIntel has a list of both APX and AVX10 resources at the bottom of the linked page.
APX and AVX10 follow Intel’s recent announcements. Intel 64 architecture slimming under investigation Migrate to a simplified version of x86 named x86S.