Nvidia GeForce RTX 4070 Review: Mainstream Ada Arrives

admin April 12, 2023

0 4 minutes read

Nvidia positions the new GeForce RTX 4070 as a nice upgrade for GTX 1070 and RTX 2070 users, but in many cases doesn’t hide the fact that it effectively ties in with the previous generation RTX 3080. A suggested retail price of $599 means it’s also an RTX replacement. The 3070 Ti has 50% more VRAM and dramatically improved efficiency. Is the RTX 4070 one of the best graphics cards? Sure, it’s easily recommended over the $1,000+ cards, but you’ll inevitably trade performance for pennies saved.

The RTX 4070 borrows a lot from the RTX 4070 Ti. Both use the AD104 GPU and feature a 192-bit memory interface with 12GB of GDDR6X 12Gbps VRAM. The main difference other than the $200 price drop is that the RTX 4070 has 5,888 CUDA cores compared to 7,680 on the 4070 Ti. Clock speeds are also a bit slower in theory, but we’ll talk more about that in our testing.

we covered Nvidia’s Ada Lovelace Architecture If you want to learn more about the capabilities of the RTX 40-series GPUs, start there. The main question here is how the RTX 4070 stacks up against its more expensive siblings, not to mention his RTX 30 series of the previous generation.Here are the official specs of the reference card is.

Swipe to scroll horizontally

Comparison of Nvidia RTX 4070 and other Ada/Ampere GPUs
graphics card	RTX4070	RTX4080	RTX 4070 Ti	RTX 3080 Ti	RTX3080	RTX 3070 Ti	RTX3070
architecture	AD104	AD103	AD104	GA102	GA102	GA104	GA104
process technology	TSMC 4N	TSMC 4N	TSMC 4N	Samsung 8N	Samsung 8N	Samsung 8N	Samsung 8N
Transistor (billion)	32	45.9	35.8	28.3	28.3	17.4	17.4
Die size (mm^2)	294.5	378.6	294.5	628.4	628.4	392.5	392.5
SMS	46	76	60	80	68	48	46
GPU core (shader)	5888	9728	7680	10240	8704	6144	5888
Tensor cores	184	304	240	320	272	192	184
Ray Tracing “Core”	46	76	60	80	68	48	46
Boost Clock (MHz)	2475	2505	2610	1665	1710	1765	1725
VRAM Speed (Gbps)	twenty one	22.4	twenty one	19	19	19	14
VRAM (GB)	12	16	12	12	Ten	8	8
VRAM bus width	192	256	192	384	320	256	256
L2 cache (MiB)	36	64	48	6	Five	Four	Four
ROP	64	112	80	112	96	96	96
TMU	184	304	240	320	272	192	184
TFLOPS FP32 (Boost)	29.1	48.7	40.1	34.1	29.8	21.7	20.3
TFLOPS FP16 (FP8)	233 (466)	390 (780)	321 (641)	136 (273)	119 (238)	87 (174)	81 (163)
Bandwidth (GBps)	504	717	504	912	760	608	448
TGP (Watts)	200	320	285	350	320	290	220
release date	April 2023	November 2022	January 2023	June 2021	September 2020	June 2021	October 2020
launch price	$599	$1,199	$799	$1,199	$699	$599	$499

The gradient from the RTX 4080 to the 4070 Ti and from there to the RTX 4070 is pretty steep. We are now seeing the same number of GPU shaders (5888) that Nvidia used in his RTX 3070 of the previous generation. Of course, there are many other changes as well.

Chief among them is a significant increase in GPU core clocks. 5888 shaders running at 2.5GHz provide much better performance than the same number of shaders running at 1.7GHz. Calculations show a performance improvement of almost 50%. Nvidia likes to be conservative too, with real game clocks he’s closer to 2.7 GHz, while the RTX 3070 also clocked closer to 1.9 GHz in tests.

Memory bandwidth also ends up being slightly higher than the 3070, but with a significantly larger L2 cache it inevitably performs much better than the raw bandwidth suggests. Moving to a 192-bit interface instead of the interface presents some interesting compromises, but I’m happy that this round offered at least 12GB of VRAM. The 3060 Ti, 3070, and 8GB 3070 Ti all feel a bit limited these days. But 12GB is the current maximum for 192-bit interfaces, except for using the memory chips in “clamshell” mode (two chips per channel on each side of the circuit board).

AMD covered up the lack of VRAM on the RTX 4070 yesterday, but AMD has yet to reveal its own ‘mainstream’ 7000 series parts, facing similar potential compromises. It is important to note that 16 GB of his VRAM is possible with a 256-bit interface, but it also increases board and component costs. You’ll probably get the RX 7800 XT with 16GB, but the RX 7700 XT will also come with 12GB of VRAM. It’s only part of the story, so we’ll have to see how the RTX 4070 stacks up before declaring a winner.

Another notable item is the 200W TGP (Total Graphics Power), which Nvidia has often been keen to highlight that the RTX 4070 will use. the following Stronger than TGP, where competing cards (and previous generation products) typically match or surpass TGP. You can confirm it’s true here. More details will be explained later.

The good news is that the latest generation graphics cards are finally here, starting at $599. There are naturally third-party overclocking cards with added features like RGB lighting and stronger cooling that drive up the price, but Nvidia has limited this pre-launch review to cards priced at MSRP.PNY There is also a model, which I will explore in detail in another review, but I will include the performance results in the chart. (Spoiler: It’s as fast as the Founders Edition.)

image 1 of 2

4 GPCs, 1 NVENC, and 1 NVDEC for RTX 4070 (Image credit: Tom’s Hardware)

Above is a block diagram of the RTX 4070 and the complete AD104, you can see all the extras included but turned off in this lower layer AD104 implementation. None of the blocks in that image are “scaled” and Nvidia doesn’t provide a die shot of his AD104, so I’m trying to determine how much space is allocated for the various bits and pieces. you can’t. Anyway, it does the dirty work (looking at you, Fritzchens Fritz (opens in new tab)).

As mentioned, the AD104 includes Nvidia’s 4th generation Tensor Cores, 3rd generation RT cores, new and improved NVENC/NVDEC units for video encoding and decoding (now supporting AV1), and a significantly more powerful Contains an optical flow accelerator (OFA). The latter is used for DLSS 3, and although it is “theoretically” possible to do frame generation using Ampere OFA (or other alternatives), so far we have not provided that functionality. Only RTX 40 series cards can.

Meanwhile, Tensor Cores now support FP8 with sparsity. It’s not clear how useful it will be for all workloads, but at least for some, AI and deep learning can take advantage of less precise numerical formats to improve performance without significantly altering the quality of results. I am sure it has improved. It ultimately depends on the work being done and it can be difficult to figure out whether to use FP8 or FP16 and even sparsity. Fundamentally, this is a problem for software developers, but it is likely that tools will eventually be added that take advantage of such functionality (such as Stable Diffusion or GPT Text Generation).

Those interested in AI research may have other reasons to choose the RTX 4070 over its competitors. We’ll take a look at some of these tasks, as well as their performance in gaming and professional workloads.