The graphics cards changed thanks to the cache and we didn't notice it

The world of graphics cards has undergone numerous evolutions over the years that have completely or partially transformed the most important keys surrounding this component. For example, think about what integration hardware accelerated lighting and morph enginein programmable shaders or on arrival unified shaderswhich marked such a giant leap that it defined the standard that is still used today.

Another important change took place in 2018, marking the beginning of the commitment to specialization, thanks to the introduction of tensor cores and RT cores in GeForce RTX 20, a movement that ended up “creating a school” and that defined the latest generation of graphics cards that hit the market because both the Radeon RX 7000 and Intel Arc have specialized hardware for AI and to accelerate ray tracing.

However, there he was too another very interesting change with the introduction of Radeon RX 7000 and GeForce RTX 40 Which few people really realize and is so important that it could even end up defining the future of graphics cards even in the long term, specialization by ranges in specific resolutions thanks to the architecture of each card. Are you lost? Well don’t worry, read on and I’ll explain.

Graphics cards and their dependence on cache and bus

The truth is like this AMD was the first in following this approach, that is, in the way of using configurations cache and specialized buses at specific resolutions with the Radeon RX 6000 series, an approach that culminated in the Radeon RX 7000 and which NVIDIA eventually adopted in its own way.

For example, the Radeon RX 6600 and Radeon RX 6600 XT are the two graphics cards that do best at 1080p, but loses performance at 1440pwhile the Radeon RX 6700 XT moves like a fish to water at 1440p and It also loses a lot of performance in 4K.

In the case of its closest rivals, we find a different picture, because with the GeForce RTX 3060 not such a big loss performance at 1440p and the GeForce RTX 3060 Ti also holds up to 4K much better, actually losing to the Radeon RX 6700 XT at 1440p, but it beats it in 4K.

infinite cache on AMD graphics cards

All of this has its own explanation and is in memory buses and caches. Radeon RX 6600 and 6600 XT have 128-bit buses and rely on a 32 MB L3 cache block to compensate for the bandwidth offered by said bus. At 1080p, the L3 cache is enough to get a good hit rate when the GPU searches the cache to minimize access to the graphics memory, but when we go to 1440p we lack the L3 cache to maintain a good hit rate, we have to resort more to the graphics memory and the bandwidth limitations start to show.

NVIDIA shared a very interesting chart on this topic, which I leave right below these lines. Have a larger amount of L2 cache can reduce the amount of graphics memory access by up to 60%, if we move in the optimal resolution for each type of configuration. In this example it would be 1080p.

The GeForce RTX 3060 has a 192-bit bus, and this allows it to maintain a higher bandwidth that does not “run out” when the resolution is increased, allowing it to hold much better at higher resolutions, in fact On the Radeon RX 6600 XT, it loses at 1080p, comes close at 1440p, and beats it at 4K.

The same thing happens with the Radeon RX 6700 XT, his 96 MB L3 cache and its 192-bit bus they make a significant difference compared to the previous ones and allow to achieve completely optimal at 1440p, but not an ideal setting for 4KThis results in a larger than normal performance drop compared to, for example, a graphics card without L3 cache that has a 256-bit bus, such as the GeForce RTX 3060 Ti.

Having the L3 cache on the graphics card allows us to have a block of memory very close to the GPU with faster access and lower latency than the graphics memory, which translates into much more bandwidth. However, its capacity will refer to the amount of instructions and data we can storethe more we have stored, the more hits the GPU will reap and the fewer calls it will have to make to the graphics memory, which will improve performance.

When we increase the resolution, those data and elements that we have to cache weigh more, i.e. they take up more space, which reduces cache efficiency and increases the amount requested to keep it effective. If you don’t increase the capacity proportionally, the number of accesses will drop and the GPU will have to do more graphics memory lookups over the system bus. If said bus is 128-bit, the bandwidth will be lower than if we had a 256-bit bus, and this will negatively affect performance.

As I said earlier, NVIDIA followed the same path with the GeForce RTX 40, although they did it their way. Those in green kept the 384-bit bus on the GeForce RTX 4090 and used the 256-bit bus on the GeForce RTX 4080, two configurations that are optimal for 4K resolution, and both accompanied by a massive amount of L2 cache.

This type of cache it is faster than L3 cacheand it has lower latency because it’s closer to the GPU, so it has higher peak performance. Its only drawback is that its capacity is less compared to the L3 cache, in fact AMD was able to deploy 96 MB of L3 on the Radeon RX 7900 XTX, while NVIDIA only achieved 72 MB of L2 on the GeForce RTX 4090.

Both types of cache they work the samethey store data and items that can be accessed faster with the GPU and are equally affected by the increase in resolution, but in this case NVIDIA played the cards better because they used a faster cache and because they kept the larger bus size with the GeForce RTX 4090.

The path to specialization through resolutions

This complete shift in favor of heavier graphics card cache has led us to a very specific scenario that generally revolves around further specialization by target resolution. The reduction in memory buses and the increase in cache was particularly large in the mid-range and also in the range we can consider mid-range and higher.

Let’s look at a concrete example. The Radeon RX 7600 is a graphics card that is designed for doing the best in 1080p, that doesn’t mean it can’t run games at 1440p, but it does mean it loses a lot of performance at that resolution because it’s not built to run optimally at that resolution. The same would happen with the GeForce RTX 4060, for example, which is the ideal solution for 1080p.

The reference GeForce RTX 4060 offers virtually identical performance to the GeForce RTX 2080 at 1080p, but loses to it at 1440p and 4K, as the latter handles upsampling better thanks to its 256-bit bus. The first one has a bandwidth of 272 GB/s and the other reaches 448 GB/s.

In a similar situation is the GeForce RTX 4060 Ti, which shows its best face at 1080p resolution, while the GeForce RTX 4070 and GeForce RTX 4070 Ti develop its full potential in 1440p resolution. Again, this is not to say that they are not capable of running games in 4K, in fact they can do it without problems, but this is not the optimal resolution for their configuration at the memory bus and cache level.

Let’s do another simple comparison, The GeForce RTX 4070 Ti beats the GeForce RTX 3090 Ti at 1080p, but loses to it at 1440p, and the gap between the two widens at 4K. Again, this is determined by the configuration of the cache and memory buses. The former has a 192-bit bus and 48 MB of L2 to balance the 504 GB/s bandwidth, while the GeForce RTX 3090 has 6 MB of L2 cache but has a 384-bit bus that allows it to achieve 1,008 MB/s of bandwidth.

This has both positive and negative sides. On the plus side, this change makes it possible to create graphics cards that offer higher performance at the target resolution at a lower cost and outperforms even more expensive graphics cards at this level and swallow Without going any further, the Radeon RX 6600 is only slightly below the GeForce RTX 2070 at 1080p.

The disadvantage is that such graphics cards they are more difficult to adapt to changes in resolution, though thankfully it’s something we can easily compensate for thanks to the advances made in the world of image scaling and reconstruction. With GeForce RTX we can activate DLSS 2 and if we have GeForce RTX 40 we will also have the option to use DLSS 3 in compatible games, while with AMD graphics cards we can use FSR2. In case of Intel Arc we can use XeSS.

Source: Muy Computer

David

Donald Salinas is an experienced automobile journalist and writer for Div Bracket. He brings his readers the latest news and developments from the world of automobiles, offering a unique and knowledgeable perspective on the latest trends and innovations in the automotive industry.