Nvidia GeForce RTX 4070 Review: Mainstream Ada Arrives

Nvidia positions the new GeForce RTX 4070 as a nice upgrade for GTX 1070 and RTX 2070 users, but in many cases doesn’t hide the fact that it effectively ties in with the previous generation RTX 3080. A suggested retail price of $599 means it’s also an RTX replacement. The 3070 Ti has 50% more VRAM and dramatically improved efficiency. Is the RTX 4070 one of the best graphics cards? Sure, it’s easily recommended over the $1,000+ cards, but you’ll inevitably trade performance for pennies saved.
The RTX 4070 borrows a lot from the RTX 4070 Ti. Both use the AD104 GPU and feature a 192-bit memory interface with 12GB of GDDR6X 12Gbps VRAM. The main difference other than the $200 price drop is that the RTX 4070 has 5,888 CUDA cores compared to 7,680 on the 4070 Ti. Clock speeds are also a bit slower in theory, but we’ll talk more about that in our testing.
we covered Nvidia’s Ada Lovelace Architecture If you want to learn more about the capabilities of the RTX 40-series GPUs, start there. The main question here is how the RTX 4070 stacks up against its more expensive siblings, not to mention his RTX 30 series of the previous generation.Here are the official specs of the reference card is.
graphics card | RTX4070 | RTX4080 | RTX 4070 Ti | RTX 3080 Ti | RTX3080 | RTX 3070 Ti | RTX3070 |
---|---|---|---|---|---|---|---|
architecture | AD104 | AD103 | AD104 | GA102 | GA102 | GA104 | GA104 |
process technology | TSMC 4N | TSMC 4N | TSMC 4N | Samsung 8N | Samsung 8N | Samsung 8N | Samsung 8N |
Transistor (billion) | 32 | 45.9 | 35.8 | 28.3 | 28.3 | 17.4 | 17.4 |
Die size (mm^2) | 294.5 | 378.6 | 294.5 | 628.4 | 628.4 | 392.5 | 392.5 |
SMS | 46 | 76 | 60 | 80 | 68 | 48 | 46 |
GPU core (shader) | 5888 | 9728 | 7680 | 10240 | 8704 | 6144 | 5888 |
Tensor cores | 184 | 304 | 240 | 320 | 272 | 192 | 184 |
Ray Tracing “Core” | 46 | 76 | 60 | 80 | 68 | 48 | 46 |
Boost Clock (MHz) | 2475 | 2505 | 2610 | 1665 | 1710 | 1765 | 1725 |
VRAM Speed (Gbps) | twenty one | 22.4 | twenty one | 19 | 19 | 19 | 14 |
VRAM (GB) | 12 | 16 | 12 | 12 | Ten | 8 | 8 |
VRAM bus width | 192 | 256 | 192 | 384 | 320 | 256 | 256 |
L2 cache (MiB) | 36 | 64 | 48 | 6 | Five | Four | Four |
ROP | 64 | 112 | 80 | 112 | 96 | 96 | 96 |
TMU | 184 | 304 | 240 | 320 | 272 | 192 | 184 |
TFLOPS FP32 (Boost) | 29.1 | 48.7 | 40.1 | 34.1 | 29.8 | 21.7 | 20.3 |
TFLOPS FP16 (FP8) | 233 (466) | 390 (780) | 321 (641) | 136 (273) | 119 (238) | 87 (174) | 81 (163) |
Bandwidth (GBps) | 504 | 717 | 504 | 912 | 760 | 608 | 448 |
TGP (Watts) | 200 | 320 | 285 | 350 | 320 | 290 | 220 |
release date | April 2023 | November 2022 | January 2023 | June 2021 | September 2020 | June 2021 | October 2020 |
launch price | $599 | $1,199 | $799 | $1,199 | $699 | $599 | $499 |
The gradient from the RTX 4080 to the 4070 Ti and from there to the RTX 4070 is pretty steep. We are now seeing the same number of GPU shaders (5888) that Nvidia used in his RTX 3070 of the previous generation. Of course, there are many other changes as well.
Chief among them is a significant increase in GPU core clocks. 5888 shaders running at 2.5GHz provide much better performance than the same number of shaders running at 1.7GHz. Calculations show a performance improvement of almost 50%. Nvidia likes to be conservative too, with real game clocks he’s closer to 2.7 GHz, while the RTX 3070 also clocked closer to 1.9 GHz in tests.
Memory bandwidth also ends up being slightly higher than the 3070, but with a significantly larger L2 cache it inevitably performs much better than the raw bandwidth suggests. Moving to a 192-bit interface instead of the interface presents some interesting compromises, but I’m happy that this round offered at least 12GB of VRAM. The 3060 Ti, 3070, and 8GB 3070 Ti all feel a bit limited these days. But 12GB is the current maximum for 192-bit interfaces, except for using the memory chips in “clamshell” mode (two chips per channel on each side of the circuit board).
AMD covered up the lack of VRAM on the RTX 4070 yesterday, but AMD has yet to reveal its own ‘mainstream’ 7000 series parts, facing similar potential compromises. It is important to note that 16 GB of his VRAM is possible with a 256-bit interface, but it also increases board and component costs. You’ll probably get the RX 7800 XT with 16GB, but the RX 7700 XT will also come with 12GB of VRAM. It’s only part of the story, so we’ll have to see how the RTX 4070 stacks up before declaring a winner.
Another notable item is the 200W TGP (Total Graphics Power), which Nvidia has often been keen to highlight that the RTX 4070 will use. the following Stronger than TGP, where competing cards (and previous generation products) typically match or surpass TGP. You can confirm it’s true here. More details will be explained later.
The good news is that the latest generation graphics cards are finally here, starting at $599. There are naturally third-party overclocking cards with added features like RGB lighting and stronger cooling that drive up the price, but Nvidia has limited this pre-launch review to cards priced at MSRP.PNY There is also a model, which I will explore in detail in another review, but I will include the performance results in the chart. (Spoiler: It’s as fast as the Founders Edition.)
Above is a block diagram of the RTX 4070 and the complete AD104, you can see all the extras included but turned off in this lower layer AD104 implementation. None of the blocks in that image are “scaled” and Nvidia doesn’t provide a die shot of his AD104, so I’m trying to determine how much space is allocated for the various bits and pieces. you can’t. Anyway, it does the dirty work (looking at you, Fritzchens Fritz (opens in new tab)).
As mentioned, the AD104 includes Nvidia’s 4th generation Tensor Cores, 3rd generation RT cores, new and improved NVENC/NVDEC units for video encoding and decoding (now supporting AV1), and a significantly more powerful Contains an optical flow accelerator (OFA). The latter is used for DLSS 3, and although it is “theoretically” possible to do frame generation using Ampere OFA (or other alternatives), so far we have not provided that functionality. Only RTX 40 series cards can.
Meanwhile, Tensor Cores now support FP8 with sparsity. It’s not clear how useful it will be for all workloads, but at least for some, AI and deep learning can take advantage of less precise numerical formats to improve performance without significantly altering the quality of results. I am sure it has improved. It ultimately depends on the work being done and it can be difficult to figure out whether to use FP8 or FP16 and even sparsity. Fundamentally, this is a problem for software developers, but it is likely that tools will eventually be added that take advantage of such functionality (such as Stable Diffusion or GPT Text Generation).
Those interested in AI research may have other reasons to choose the RTX 4070 over its competitors. We’ll take a look at some of these tasks, as well as their performance in gaming and professional workloads.