Gaming PC

Bug Forces Intel to Halt Some Xeon Sapphire Rapids Shipments

Intel has confirmed it has paused shipments of some of its 4th Gen Xeon Sapphire Rapids processors due to a newly discovered bug. We have received information that Intel has paused shipments and have followed up on this issue and have provided some details regarding this issue: semi-analytical, stated that shipments of certain SKUs have been paused since mid-June. We have followed up with Intel on this matter and the company has released the following statement: tom’s hardware:

“We are aware of an issue with a subset of 4th Generation Intel Xeon Medium Core Count Processors (SPR-MCC) that may disrupt system operation under certain conditions and are actively investigating. It has not been observed when running commercial software, etc.” The 4th Gen Intel Xeon processor SKUs (i.e. XCC and HBM) do not exhibit this issue. Out of an abundance of caution, we have temporarily halted shipments of some of his SPR MCCs while we are confident in our expected firmware mitigation and plan to release the rest of our shipments shortly. . — Intel spokesperson Tom’s hardware.

In response to additional questions, Intel also said it did not expect the firmware mitigation to impact performance.

Intel’s Sapphire Rapids processors are created using two basic designs. One is the XCC package, which uses four compute tiles (dies) to create a single chip, and the other is the MCC package, which uses a single monolithic die. As shown in the slide above, the MCC design will be used for chips with up to 32 cores, which is Intel’s source of mass sales, while the XCC variant will be used for Halo chips with 36-60 cores.

“Intel is facing new design issues related to the Sapphire Rapids MCC, the highest volume version of Sapphire Rapids. It’s stopped,” Patel said.

Intel has not confirmed that the issue is limited to dual and quad socket SKUs, instead classifying the issue as being limited to a “subset” of SKUs and suspending shipments. did not specify when it started. Intel has also not confirmed Patel’s claims that the bug is related to timing, nor has he clarified the nature of the issue.

The nature of the bug remains obscure at this time, as timing issues can include many possibilities, from UPI interconnections to instruction timing issues. Intel knows the issue can be fixed with a firmware fix that they believe is being validated at this time. So no redesign or new revisions/steps are required to fix this issue. Additionally, since the new firmware is a good fix, Intel may not have to replace processors already in the field, but it could pose a validation headache for customers.

Intel blamed not only the process node technology failures that slowed down Sapphire Rapids, but also design and verification methodology issues that led to further delays and numerous new steps (usually minor redesigns requiring new versions). But it has received a lot of criticism. Use silicon to fix the problem). Intel’s Sapphire Rapids has been plagued by rumors that a design/verification mistake led to a 12-step step-up. Unsurprisingly, this led to significant production delays and missed launch dates.

The company then announced plans to adopt a different approach to its design, simulation and verification flow. it fixes those problems. Intel says these adjustments will be fully reflected in the next generation of Xeon processors.

Intel says the new Sapphire Rapids bug did not occur while “executing commercial software” and was apparently not discovered during testing. This kind of situation is not entirely unprecedented. Nearly every complex chip has known and unknown errata and bugs that are addressed and shipped with firmware, drivers, and software workarounds that can mitigate or eliminate those issues. This is the essence of modern semiconductor design and manufacturing.

For example, Intel’s Skylake generation of processors shipped with 53 known errata, and six months later Intel listed another 40 errata. Another example is the recent discovery that AMD’s EPYC Rome chip crashes after his 1,044 days of uptime. Some bugs are deemed not critical enough to fix, so they are simply left unfixed or fixed by a combination of firmware and software. Fixing the most critical bugs may require additional steps, but this is the worst case scenario. Luckily for Intel, that doesn’t seem to be the case this time.

However, while bugs are not uncommon, such types of bugs rarely lead to stoppages in shipments, suggesting that this is more than just various errors in the garden. Intel has not announced when it will resume shipping Sapphire Rapids, but we will update our coverage as soon as we know more.

Related Articles

Back to top button