In four years, we’ve gone from PC owners able to play a path-traced version of Quake 2 through to the same techniques applied to Cyberpunk 2077 – one of the most demanding triple-A games around. Even today, Quake 2 RTX continues to be a challenging piece of software to run – but anything from an RTX 4070 to the top-end RTX 4090 can deliver those cutting-edge visuals at 60 frames per second or higher. The question is, how?
There’s no simple answer here as we’re looking at a range of technological innovations in terms of both software and hardware, and it’s on the latter point where we can begin to get to the bottom of the issue. Quake 2 RTX launched in June 2019 when the most powerful GPU on the market was the Nvidia GeForce RTX 2080 Ti. It managed to run the game rather well at around 1080p, capable of 60fps with room to spare. However, turn up the resolution to a higher standard like 4K and frame-rates operate in the upper teens to the early 20s.
Four years later, the latest GPUs are powering through those RT calculations – an RTX 4090 runs the same workload around four times faster – though before we go into how ray tracing hardware performance has improved so dramatically, I’m going to stress that this is only part of the story. Developers are working hard to increase efficiency on the software side too.
A video deep dive into how path tracing has evolved over four years from delivering Quake 2 RTX to the phenomenal visuals found in Cyberpunk 2077 RT Overdrive.
The story has to begin somewhere though and the advances in the latest graphics hardware are impressive. The RTX 4090’s 4x performance gain over the RTX 2080 Ti comes from a number of sources in the hardware: more shader cores and a higher operating frequency of course, but some of it is architectural as well. For example, each new Nvidia architecture has respectively doubled the triangle intersection testing throughput in the RT core. So, the Ampere architecture in the 30-series cards can tests twice the amount of triangles in the same time as the 20-series Turing offerings.
You can see this in practice when you compare an RTX 3070 running Quake 2 RTX versus the RTX 2080 Ti. In most rasterisation titles, or even hybrid ray-traced workloads, the RTX 3070 and RTX 2080 Ti run neck and neck, or the RTX 2080 Ti can pull ahead. However, in pure ray tracing workloads, RTX 3070 performance is a great deal better. The new Ada Lovelace architecture found in the RTX 4090 further increases triangle intersection testing by another 2x leap over Ampere.
Another technical architectural advantage found in more recent GPUs is a greater amount of L2 cache. This is a general trend among all GPU vendors, but Intel’s Arc and Nvidia’s Ada Lovelace architectures in particular have disproportionately higher amounts of L2 cache than other similarly powered GPUs or their most recent predecessor. For example, an RTX 3090 has 6MB of L2 Cache, while an RTX 4090 has 72MB. Intel Arc A770 has 16MB of cache, while its primary competitors from the Ampere or RDNA 2 family have a great deal less.
Quake 2 RTX at native 4K resolution really stressed the RTX 2080 Ti back in 2019, but through sheer hardware advances alone, RTX 4090 delivers a 4x performance increase. This is just one part of the reason why RT Overdrive is viable today.
The new RTX 40-series cards also include Shader Execution Reordering (SER). In a game with high quality ray tracing or path tracing, there is potentially a great variety of shaders for all of the materials in the game world. For example, in Cyberpunk 2077, a car can have different materials: metals, the glass, the clear coat paint, the dashboard, the leather seats and much more. For realistic rendering, it is very important that all of these materials are respected and shaded correctly when light bounces.
The problem is that rays bounce around the scene rather randomly and access these shaders in a haphazard way that decreases GPU unit utilisation while it happens – this can severely decrease performance. SER works against this by bundling the disparate shaders together, preventing the scattered nature of the data being accessed on the GPU. Cyberpunk 2077 reportedly uses this, although like L2 Cache differences, it is hard to get a grip on the real-world performance implications because these are not features that can be disabled or re-enabled for benchmarking purposes.
The past four years have also seen dramatic improvements on the software side too, with huge efficiency boosts. Basically, developers have developed ways to shoot a similar amount of rays while getting a much better visual return for a similar amount of performance. One of these key advancements comes from ReStir. ReStir stands for Spatiotemporal Reservoir Resampling and attempts to answer the question of how to trace rays from a multitude of different light sources – crucial when Cyberpunk 2077 can have so many in its neon-soaked world. In ray tracing, to get accurate lighting results you are sampling in a way which is very noisy. You are sending out rays into the scene to figure out where light is and where it is not and the results can be incredibly noisy, so you need many, many rays to get the result looking decent.
How path-traced 4K is possible at over 100fps on RTX 4090. High performance at 1080p is here and thanks to DLSS performance mode, the same internal resolution upscales nicely to 4K with only a seven percent hit to frame-rate. Meanwhile, DLSS 3 frame generation increases frame-rate by over 57 percent.
That is why getting lighting traced in Cyberpunk is such a challenge: there are lights everywhere and you can round a corner and suddenly see a whole host of completely unpredictable lights. A traditional path tracer would require so many rays that it would not be performant in real-time by the time it started to look good. This is why Quake 2 is comparatively easy to trace the lighting in: there are few light sources in any given scene, and the levels are small and pre-packaged in a way where it is easy to find and trace lights when new ones are about to come on screen and start affecting the pixels the player can see.
The way rays are traced to lights in Quake 2 RTX is completely untenable for good visuals and performance in something like Cyberpunk 2077 – that method is outmoded for a modern game, so new methods that did not exist in 2018 or 2019 had to come into existence in the first place to make Cyberpunk 2077 possible. That is where ReStir comes in or in its branded form of RTXDI – the algorithm smartly reuses rays shot out in past frames and rays shot out in the current frame to accurately fill the gaps in the noise for important local light sources. ReStir allows for a minimal amount of rays to be traced and returns comparatively noise-free lighting and shadows from many, many light sources. As you can imagine, without this method of tracing many lights that ReStir provides, Cyberpunk 2077 RT Overdrive could not exist in its current form.
Beyond optimisations to ray tracing algorithms, we also have big advances in image reconstruction. Back in 2019 when Quake 2 RTX launched, DLSS 2 did not even exist and the first iteration of the tech was not up to the task. These days, reconstruction techniques are an established pillar of modern PC gaming and it changes the way we can experience GPU heavy experiences, like path traced ones. For example, on an RTX 4090 maxing out Cyberpunk 2077, you would be limited to around a 1080p resolution if you were looking for a healthy 60fps baseline. With advances in machine learning though, we can now take that 1080p image, process it in real-time for a performance cost to to reconstruct up to 4x times the amount of pixels, and greatly improve image quality.
On top of this, we now have machine learning assisted frame-generation to further enhance that presentation as well. In 2019, frame generation was genuinely unheard of for games, with the last compelling demo of the technology being nearly 10 years earlier when LucasArts demoed a version of Frame Generation in The Force Unleashed 2. In 2023 we actually have viable frame generation technology, and now we can take that 4K DLSS Performance mode image we had earlier and amplify its frame-rate to increase perceptual fluidity. With frame generation and image reconstruction, heavy ray-traced experiences are much more fluid and detailed then they possibly could have been just four years ago.
So, Cyberpunk 2077 RT Overdrive is the coming together of a lot of separate elements: hardware advancements, ray tracing algorithmic advancements, and image processing advancements – all of which have occurred astonishingly quickly. However, this isn’t the end-point of the journey. RT Overdrive still has visual and performance limitations to address.
One of the first limitations is how it deals with forward rendered elements – transparent things like glass, for example. These elements are why I would describe Cyberpunk 2077 as ‘near path-traced’ as opposed to ‘completely path-traced’ as these materials are still not fully handled via ray tracing like the opaque world geometry. This is still an open area of research, but we have already seen hints here in other titles as to what could happen.
Our first look at RT Overdrive focused on how Cyberpunk 2077 presents with no RT at all – and how RT Overdrive improves over the prior ‘psycho’ RT setting.
Quake 2 RTX had some very compelling thick glass rendering, as did the Unreal Engine 4 title Chernobylite which had some great-looking ray-traced glass refraction and shading. I imagine this will be one of the areas that may improve with further patches to Cyberpunk RT Overdrive in the future… as yes, this remains a ‘technology preview’ that remains in full development.
Another limitation comes in the amount of path-traced bounces that is occurring on opaque geometry – it’s currently two bounces, which is really enough from a perceptual realism perspective, but there are potential lighting conditions you could imagine where more bounces could help. Based on what I understand from an engineer on the project, I think Cyberpunk 2077 may be updated to use neural radiance caching. Based on the 2021 presentation covering it – neural radiance caching in lieu of other methods would have some interesting benefits for Cyberpunk 2077.
For one it would include specular information in the cache, so multibounce reflections could be handled by the cache, which other techniques do not do and given the amount of synthetic materials and metals in Cyberpunk’s world, this could be a big benefit. It would also provide diffuse lighting information, so more extreme shadowing conditions could then be accurately represented in the game with less noise and lag. Right now, areas that are mainly indirectly lit can suffer from noise. Caching lighting information in a neural radiance cache could potentially alleviate this issue completely.
It turns out that path traced Cyberpunk is more scalable than you might think. In this video, Rich shows how RTX 20 series and 30 series cards can join the party, while the cloud via GeForce Now’s RTX 4080 tier means you can get a good 4K experience without owning a discrete GPU at all.
The last limitation in Cyberpunk 2077 overdrive is in performance – which makes sense as it is nearly fully path-traced lighting in a AAA game, yet here I still think there is room for improvement on the software side. For example, right now the game does not take advantage of OMM – or opacity micromaps. This is an asset format that can be read by Ada Lovelace GPUs which speeds up the cost of tracing alpha tested geometry – like vegetation, for example.
If you go to areas in the game with a lot of foliage right now, you can see how they are much heavier than those areas with standard opaque geometry, so I’d expect to see improved performance. But beyond that it is hard to know – but one thing I found interesting here is how the game at native 4K on an RTX 4090 runs with a frame-rate that is similar to the performance of an RTX 2080 Ti running Quake 2 RTX.
If the trend in hardware and software evolution continues, the possibilities are intriguing over the next few years – all before the arrival of the next generation of console hardware. We should expect to see the performance of today’s RTX 4090 filter down into lower-end parts, while the higher-end GPUs should be able to push still further. Even today, we’re now seeing a clear divide between console and PC capabilities and depending on the will of developers, and Nvidia’s continued RT evangelism, we should start to see visuals of this quality deployed in more games. In the meantime, it goes without saying that RT Overdrive is worth checking out, along with our recommendations on getting the tech working on today’s 20-series and 30-series graphics hardware.
Article source https://www.eurogamer.net/digitalfoundry-2023-cyberpunk-2077-rt-overdrive-how-is-path-tracing-possible-on-a-triple-a-game