Nvidia Reveals Ada Lovelace GPU Secrets: Extreme Transistor Counts at High Clocks

admin September 23, 2022

0 6 minutes read

When Nvidia unveiled its Ada Lovelace family of graphics processing units earlier this week, it mostly focused on its top-end AD102 GPU and flagship GeForce RTX 4090 graphics card. Not many details have been released about the AD103 and AD104 graphics chips. Luckily, Nvidia has today uploaded an Ada Lovelace whitepaper that contains a ton of data about their new GPUs and fills in many gaps. We’ve updated all known hubs for the RTX 40-series GPUs with new details, but here’s a quick rundown of the new and interesting information.

Big GPU for big games

We already know that Nvidia’s top-of-the-line AD102 is a 608 mm^2 GPU containing 76.3 billion transistors, 18,432 CUDA cores and 96MB of L2 cache. We also know that the AD103 is a 378.6 mm^2 graphics processor with 45.9 billion transistors, 10,240 CUDA cores and 64MB of L2 cache. As for the AD104, it has a die size of 294.5 mm^2, 35.8 billion transistors, 7680 CUDA cores and 48 MB of L2.

Nvidia Ada Specs and Amps
GPU/graphic card	Full AD102	RTX4090	RTX4080 16GB	RTX4080 12GB	RTX 3090 Ti
architecture	AD102	AD102	AD103	AD104	GA102
process technology	TSMC 4N	TSMC 4N	TSMC 4N	TSMC 4N	Samsung 8LPP
Transistor (billion)	76.3	76.3	45.9	35.8	28.3
Die size (mm^2)	608	608	378.6	294.5	628.4
streaming multiprocessor	144	128	76	60	84
GPU core (shader)	18432	16384	9728	7680	10752
Tensor cores	576	512	320	240	336
Ray Tracing Core	144	144	80	60	84
TMU	512	512	304?	240	336
ROP	192	192	112	80	112
L2 Cache (MB)	96	96	64	48	6
Boost Clock (MHz)	?	2520	2505	2600	1860
TFLOPS FP32 (Boost)	?	82.6	48.7	40.1	40.0
TFLOPS FP16 (FP8)	?	661 (1321)	390 (780)	319 (639)	320 (none)
TFLOPS Ray Tracing	?	191	113	82	78.1
memory interface (bits)	384	384	256	192	384
Memory speed (GT/s)	?	twenty one	22.4	twenty one	twenty one
Bandwidth (GBps)	?	1008	736	504	1008
TDP (Watts)	?	450	320	285	450
Release date	?	October 12, 2022	November 2022?	November 2022?	March 2022
launch price	?	$1,599	$1,199	$899	$1,999

One of the interesting things Nvidia mentions in its whitepaper is that the Ada Lovelace GPU uses fast transistors in the critical path to boost maximum clock speeds. As a result, a fully capable AD102 GPU with 18,432 of his CUDA cores “can be clocked above 2.5 GHz while maintaining the same 450W TGP.” With this in mind, it should come as no surprise that the company is talking about the 3.0 GHz clock of the GeForce RTX 4090 (which has 16,384 CUDA cores) reached in the lab. Definitely tops the list of cards.

(Image credit: Nvidia)

In addition to high clock speeds, Nvidia’s Ada Lovelace GPUs also feature a large L2 cache to boost performance for computationally intensive workloads (ray tracing, path tracing, simulations, etc.) and reduce memory bandwidth requirements. reduce Basically, Nvidia’s Ada GPUs take a page from the RDNA 2 Infinity Cache book (here) as a reference, but the general target for the new architecture is pretty much where AMD’s Radeon RX 6000 series products debut in 2020. I believe it was set before.

Speaking of workloads like simulation, it should be noted that in the world of supercomputers, they are run on numbers in double-precision floating-point format (FP64) to improve the accuracy of results. FP64 is more expensive than FP32, both in terms of performance and hardware complexity. This is why computer graphics use his FP32 format and many simulations of non-critical assets are also done in his FP32 precision. The AD102 GPU, on the other hand, contains just 288 of his FP64 cores (2 per streaming multiprocessor) so that programs containing FP64 code, including FP64 Tensor Core code, work correctly.

Still, the AD102’s FP64 rate is 1/64th the TFLOP rate of FP32 operation (which is consistent with the Ampere architecture). Nvidia does not show FP64 cores in his streaming multiprocessor (SM) module diagram nor does he disclose the number of such cores within the AD103 and AD104 GPUs. His low FP64 rate for Ada graphics processors highlights that these parts are primarily geared towards gaming.

More Transistors = More Performance

The complexity and die size of Nvidia’s Ada Lovelace graphics processors compared to the company’s Ampere GPUs shouldn’t come as a surprise. The new Ada GPUs are made using TSMC’s 4N (5nm-class) manufacturing technology, while Ampere is manufactured on Samsung Foundry’s 8LPP process (10nm-class node with 10% optical shrinkage). This added complexity (transistor count) enables significant performance gains such as ray tracing and quality improvements with DLSS 3.0.

Nvidia Ada Specs and Amps
GPU/graphic card	AD102	RTX4090	RTX4080 16GB	RTX4080 12GB	RTX 3090 Ti
GPUs	AD102	AD102	AD103	AD104	GA102
TFLOPS FP32 (Boost)	?	82.6	48.7	40.1	40.0
TFLOPS FP16 (FP8)	?	661 (1321)	390 (780)	319 (639)	320 (none)
TFLOPS Ray Tracing	?	191	113	82	78.1

Another thing to note is that Nvidia’s AD102 GPU has higher transistor density than its lower siblings. On the other hand, a 3.6% increase in transistor density allows the AD102 to pack significantly more execution units compared to its smaller siblings. On the other hand, however, the relaxed transistor density of AD103 and AD104 often allows for better yields (assuming node defect densities are not typically high) and higher clocks.

It is difficult to make predictions about the frequency potential of the AD103 and AD104 without access to the actual hardware and knowledge of the actual yield rate. However, if AD102 can operate from 2.50 GHz to 3.0 GHz, AD103 and AD104 have even higher potential. The RTX 4080 12GB uses a fully enabled AD104 chip running at 2610 MHz, the RTX 4080 16GB uses 95% of the AD103 chip (76 out of 80 SMs) running at 2505 MHz, RTX We also know that the 4090 uses only 89% (128 out of 144 SMs) running at 2510 MHz, with 25% of the L2 cache disabled.

Having so many execution units at high complexity and high clocks should result in significant performance gains. Nvidia’s GeForce RTX 4090 has more than double his maximum FP32 compute speed (~82.6 TFLOPS) compared to the GeForce RTX 3090 Ti (~40 TFLOPS).

Meanwhile, Nvidia’s current lineup of Ada GPUs for demanding gamers shows that the company is on track with its three-chip approach to the high-end gaming market. Usually Nvidia releases flagship gaming GPUs, followed by chips with around 66%-75% resources (e.g. CUDA cores) of flagship units, then around 50% of flagship units announced a graphics processor powered by With the Ampere family, Nvidia’s GA103 chip was designed primarily with laptops in mind and was rarely used for desktops, so that strategy has been adjusted somewhat (and late to the party, too). But in the Ada generation, Nvidia has three chips.

More SKUs in stock

One interesting point is the difference in maximum configurations offered by the AD102 GPU and the GeForce RTX 4090 graphics card. The AD102 has 18,432 CUDA cores, while the GeForce RTX 4090 has 16,384 CUDA cores enabled. Such an approach gives Nvidia some flexibility in terms of yields and future new graphics card introductions, so it’s pushing the RTX 4090 Ti, RTX 4080 Ti, and RTX 5500/5000 Ada Generation into the ProViz market and more. Plenty of room to put in.

The GeForce RTX 4080 16GB and RTX 4080 12GB, on the other hand, use near-perfect AD103 and full-blown AD104 GPUs respectively. We don’t know what the future holds, but we expect to see cut-down versions of the AD103 and AD104 GPUs eventually. We can speculate about the GeForce RTX 4070 Ti and/or RTX 4070 based on the cutdown bins of the AD104 chips. We can also speculate on the possibility of an ultra-high-end graphics solution for laptops with the AD103 graphics processor, but we’ll speculate on the specs for those parts.

some thoughts

Nvidia’s Ada Lovelace architecture is a qualitative and quantitative leap over the Ampere architecture. Nvidia has not only significantly improved the performance of ray tracing, tensor cores and other units at his level of architecture, but also increased their numbers and improved clocks. The main enhancement here is the significant increase in L2 cache for Ada GPUs compared to Ampere GPUs.

These leaps are made possible in a big way by TSMC’s Nvidia GPU-optimized 4N process technology. Additionally, the company used high-speed transistors to increase the frequency of its new graphics processors, further boosting performance.

However, the cutting-edge production nodes and the large die sizes of Nvidia’s new GPUs make the parts significantly more expensive to manufacture. As such, the GeForce RTX 4080 and 4090 graphics cards are priced significantly higher than their direct predecessors.

Nvidia has introduced only five Ada Lovelace-based products so far. GeForce RTX 4080 12GB, RTX 4080 16GB, and RTX 4090 graphics cards for desktops, plus RTX 6000 Ada generation for workstations/data centers, and L40 (Lovelace 40) boards for heavy loads. End workstations and virtualized workstation environments.

Given that the company can offer cut-down versions of the full-fat AD102 and AD102, AD103, and AD104 GPUs, we can expect a number of new GeForce RTX 40-series cards for client machines and Ada RTX-series solutions for data centers. Meanwhile, Nvidia is likely preparing some smaller GPUs (AD106, AD107), so it looks like the Ada Lovelace product family is at least as broad as Ampere’s lineup.