AMD Addresses Controversy: RDNA 3 Shader Pre-Fetching Works Fine

According to a statement issued by AMD, reports that shader prefetching is broken on AMD’s RDNA 3 GPUs are inaccurate. tom’s hardware:
“As with previous hardware generations, shader prefetching is supported in RDNA 3. [gitlab link (opens in new tab)]The code in question controls an experimental feature that is not planned for inclusion in these products and will not be enabled in this generation of the product. It is a common industry practice to include experimental features that allow for research and refinement for deployment in future product generations. — AMD Spokesperson tom’s hardware.
AMD’s statement comes after media reports that the recently launched Navi31 silicon on its RDNA 3 graphics card has “non-functioning shader prefetch hardware”. The source of speculation is @Kepler_L2quoted code from the Mesa3D driver that seems to indicate that shader prefetching does not work on some GPUs with A0 revisions of silicon (CHIP_GFZ1100, CHIP_GFX1102, and CHIP_GFX110).
However, according to AMD’s statement, the code cited by Kepler_L2 is currently disabled as it relates to experimental features not intended for the final RDNA 3 product. AMD says that including experimental features in new silicon is a fairly common practice and accurate. This approach is often used in other types of processors such as CPUs.
For example, AMD shipped every generation of Ryzen products with the necessary TSVs to enable 3D V-Cache, but didn’t use the feature until 3rd Gen Ryzen. Similarly, Intel often adds features that may not make it into the final product, with DLVR features being a recent example.
Of course, if the “experimental” feature works perfectly fine and doesn’t require any additional tweaks (such as the extra L3 cache slices required for 3D V-Cache), those who assume it will be included in the final product there will be So the line between “experimental” or “nice to have but not important or not necessary to hit the target” feature can be a bit blurred. works as intended in RDNA 3.
Another elephant in the room is AMD’s use of the A0 stepping of the RDNA 3 silicon. This means that this is the first physically unmodified version of the chip. speculation is unfounded.
AMD did not answer questions as to whether it used A0 silicon for the first wave of RDNA 3 CPUs, but industry sources say the company did use A0 silicon for the Navi31. , and most of the 5000 series are said to have launched A0 revision silicon.this is No Indicates “unfinished product”. The goal of every design team is to get the design done on the first spin with working, shippable silicon. For example, Nvidia also often ships his A0 stepping silicon.
Microprocessors may go through several revisions during their lifetime to fix bugs, errata, and improve performance. Generally, the first revision of silicon out of the fab is A0, with successive “minor” respins classified as A1, A2, etc. More significant revisions to silicon tend to switch to ‘B’ or continuous stepping, etc. (bringing a rhythm of B0, B1, and B2, for example). This will follow new alphanumeric specifiers as the chip improves.
Nearly every complex chip has known and unknown errata and bugs that are addressed and shipped that way with firmware, drivers and software workarounds that can mitigate or eliminate those issues. This is the essence of modern semiconductor design and manufacturing. For example, Intel’s Skylake generation of processors shipped with 53 known errata, and six months later Intel listed 40 more errata. This is common because chip design cycles are long, often spanning years, and there is often no time to re-spin the chip to address minor issues. Similar trends can be seen for other types and generations of processors.
However, not all errata can be fixed with workarounds, so some issues will be resolved later in silicon if necessary. However, the goals of any design team remain the same. It’s about providing first-spin silicon that can meet design goals for shipping products. In that respect, using A0 silicon can be said to be a home run.
There are also many examples of chips that have had flawed design/verification processes requiring multiple steps to reach market. For example, Sapphire Rapids was last known to be the 12th stepping and has not yet shipped in large quantities (A0, A1, B0, C0, C1, C2, D0, E0, E2, E3, E4 , and E5 stepping). — technically 7 base spins). Naturally, that led to serious production delays and delayed release dates.
It’s hard to make chips. They are the most sophisticated class of devices mankind has ever made, but they are built with unimaginably small features. This can lead to problems and errata that may take several revisions to eradicate, but success is often measured by shipping working silicon that meets the goals on the first outing. will be Never mind those who claim that A0 stepping is always the equivalent of “unfinished silicon”.