Intel announces two new x86-64 instruction sets designed Boost your AVX-based workloads and deliver more performance. A hybrid architecture of performance (P) cores and efficiency (E) cores. Intel’s first announcement is the latest Intel Advanced Performance Extensions, also known as Intel APX. It is designed for generation-to-generation instruction set driven improvements in instruction loading, storing and comparison without impacting power consumption or the overall silicon die area of the CPU core.
Intel has also released a technical document detailing the new AVX10.This will enable both Intel’s Performance (P) and Efficiency (E) cores to support the upcoming integrated AVX10/256-bit instruction set. This means that future generations of Intel’s hybrid desktop, server and workstation chips will be able to collectively support multiple AVX vectors including vector sizes of 128, 256 and 512 bits across the core.
Intel Advanced Performance Extensions (APX): Beyond AVX and AMX
Intel has released details about its new product. Advanced Performance Extensions, APX for short. The idea behind APX is to allow access to more registers and improve overall general purpose performance. They are designed to be more efficient and access more registers when using the x86-based instruction set. New features like doubling the general purpose registers from 16 to 32 allow the compiler to keep more values in registers, and Intel claims 10% fewer loads and 20% fewer stores when compiling code for APX versus the same code for x86-64 using Intel 64. This is his 64-bit compatibility mode for Intel’s x86 instruction set.
The idea behind doubling the number of GPRs from 16 on x86-64 to 32 GPRs available on Intel APX is that more data can be kept closer together, avoiding the need to read and write to various levels of cache and memory. More GPRs also theoretically mean fewer accesses to slower areas such as DRAM, which take longer and consume more power.
Despite effectively abandoning MPX (Memory Protection Extensions), Intel APX can effectively use the existing space reserved for MPX for something called XSAVE. To elaborate on XSAVE, Intel’s APX General Purpose Registers (GPR) are XSAVE-capable, which means they can be automatically saved and restored by XSAVE and XRSTOR sequences during context switches. Intel also states that by default these do not change size or layout. This means that they can occupy the same space left in the obsolete Intel MPX registers.
Another important feature of Intel’s APX is support for the three-operand instruction format, a subset of the x86 instruction set that specifies the data to operate on. APX introduces new instructions optimized for expected load, including the new 64-bit absolute jump instruction. Compared to older examples using EVEX, a 4-byte extension of VEX, APX converts one register operand to three, effectively reducing the need for additional register move instructions. As a result, APX-compiled code is 10% more efficient and requires 10% fewer instructions than the previous ISA.
Intel AVX10: Push AVX-512 through 256-bit and 512-bit vectors
Since the introduction of AVX-512, one of the most significant updates to Intel’s consumer instruction set is Intel’s Advanced Vector Extension 10 (AVX10). On the surface, it looks like they are pushing AVX-512 support on all cores in heterogeneous processor designs.
The most important and fundamental change introduced by AVX10 compared to the previous AVX-512 instruction set is that the previously disabled AVX-512 instruction set was incorporated into future examples of heterogeneous core designs represented by processors such as the Core i9-12900K and now the Core i9-13900K. This will enable his AVX-512 support on these processors. AVX-512 is currently only supported on Intel Xeon Performance (P) cores.
Looking into the core concepts of AVX10, we can see that consumer-based desktop chips will now fully support AVX-512. Performance (P) cores have the theoretical ability to support 512-bit wide vectors per Intel’s wishes (Intel currently confirms support is up to 256-bit vectors), while efficiency (E) cores are limited to 256-bit vectors. Overall, though, the entire chip is capable of supporting the full AVX-512 instruction set across all cores, whether they are serious performance cores or low power efficient cores.
Intel says the following about performance in the AVX10 technical document:
- When I recompile an application compiled with Intel AVX2 to Intel AVX10, Additional software adjustments are required.
- Vector register pressure-sensitive Intel AVX2 applications use 16 Additional vector registers and new instructions.
- Highly threaded and vectorizable applications are likely to achieve higher aggregate throughput at runtime. E-core based Intel Xeon processors or Intel® products with Performance Hybrid Architecture.
Intel further claims that their chip, which already utilizes 256-bit vectors as an example, maintains similar performance levels when compiled on AVX10 with 256-bit ISO vector length. However, AVX10’s true potential is revealed when it exploits the more substantial 512-bit vector length, promising the best achievable AVX10 instruction set performance. This coincides with the introduction of new AVX10 libraries and enhanced tooling support, enabling application developers to compile new AI and science-focused code for optimal benefit. Additionally, this means that existing libraries can be recompiled with AVX10/256 compatibility and, where possible, further optimized to take advantage of larger vector units and improve performance throughput.
In the first phase of Intel’s AVX10 (AVX10.1), this was introduced for early software enablement and supports a subset of Intel’s AVX-512 instruction set. The Granite Rapids (6th Gen Xeon) Performance (P) core is the first core to be forward compatible with AVX10. Note that AVX10.1 does not enable 256-bit built-in routing. As such, AVX10.1 serves as an introduction to AVX10, allowing for forward compatibility and the implementation of a new version enumeration scheme.
Intel’s 6th Gen Xeon (codenamed Granite Rapids) enabled AVX10.1, and future chips after this will offer full AVX10.2 support, as well as support for AVX-512, which allows compatibility with legacy instruction sets and applications compiled with them. Note that although Intel AVX10/512 includes all of Intel’s AVX-512 instructions, applications compiled to Intel AVX-512 with vector lengths limited to 256 bits are not guaranteed to run on AVX10/256 processors due to differences in supported mask register widths.
Initial support for the AVX10 instruction set is close to the transition of AVX10.1, but it will be when AVX10.2 is finally deployed that AVX10 performance and efficiency will start to show causality, at least when using compatible instruction sets associated with AVX10. New processors with AVX10 will no longer be able to run AVX-512 binaries like they used to, so by default a developer can recompile existing code to work on his AVX10. Intel is finally starting to look to the future.
The introduction of AVX10 completely replaces the AVX-512 superset. Once AVX10 becomes widely available through Intel’s future product releases, there will technically be no need to use his AVX-512 going forward. One challenge this creates is that software developers who have compiled their libraries specifically for 512-bit wide vectors will need to recompile their code as described above in order to properly handle the 256-bit wide vectors that AVX10 supports comprehensively across cores.
AVX-512 doesn’t work as an instruction set, but it’s worth highlighting that AVX10 is backwards compatible. This is an important aspect in supporting instruction sets with various vector widths such as 128, 256 and 512 bits (where applicable). Developers can recompile their code and libraries for the upcoming broad migration and convergence to his AVX10 unified instruction set.
Intel is committed to supporting a maximum vector size of at least 256 bits on all Intel processors in the future. However, Intel hasn’t officially confirmed this, so it’s unclear which SKUs (if any) and underlying architectures will support the full 512-bit vector size in the future.
The essence of Intel’s new AVX10 instruction set comes into play when AVX10.2 is phased in and officially introduces support for 256-bit instruction vectors on all cores, whether performance or efficiency cores. This also shows that both the performance and efficiency cores include 128-bit, 256-bit and 512-bit integer division, thus supporting full vector extensions based on each core’s specifications.