This article contains promotional content .(March 2024) |
Deep learning super sampling (DLSS) is a family of real-time deep learning image enhancement and upscaling technologies developed by Nvidia that are available in a number of video games. The goal of these technologies is to allow the majority of the graphics pipeline to run at a lower resolution for increased performance, and then infer a higher resolution image from this that approximates the same level of detail as if the image had been rendered at this higher resolution. This allows for higher graphical settings and/or frame rates for a given output resolution, depending on user preference. [1]
As of September 2022, the first and second generations of DLSS are available on all RTX-branded cards from Nvidia in supported titles, while the third generation unveiled at Nvidia's GTC 2022 event is exclusive to the Ada Lovelace generation RTX 40 series graphics cards. [2] Nvidia has also introduced Deep learning dynamic super resolution (DLDSR), a related and opposite technology where the graphics are rendered at a higher resolution, then downsampled to the native display resolution using an artificial intelligence-assisted downsampling algorithm to achieve higher image quality than rendering at native resolution. [3]
Nvidia advertised DLSS as a key feature of the GeForce 20 series cards when they launched in September 2018. [4] At that time, the results were limited to a few video games (namely Battlefield V [5] and Metro Exodus ) because the algorithm had to be trained specifically on each game on which it was applied and the results were usually not as good as simple resolution upscaling. [6] [7]
In 2019, the video game Control shipped with real-time ray tracing and an improved version of DLSS, which did not use the Tensor Cores. [8] [9]
In April 2020, Nvidia advertised and shipped an improved version of DLSS named DLSS 2.0 with driver version 445.75. DLSS 2.0 was available for a few existing games including Control and Wolfenstein: Youngblood , and would later be added to many newly released games and game engines such as Unreal Engine [10] and Unity. [11] This time Nvidia said that it used the Tensor Cores again, and that the AI did not need to be trained specifically on each game. [4] [12] Despite sharing the DLSS branding, the two iterations of DLSS differ significantly and are not backwards-compatible. [13] [14]
Release | Release date | Highlights |
---|---|---|
1.0 | February 2019 | Predominantly spatial image upscaler, required specifically trained for each game integration, included in Battlefield V and Metro Exodus, among others [5] |
"1.9" (unofficial name) | August 2019 | DLSS 1.0 adapted for running on the CUDA shader cores instead of tensor cores, used for Control [8] [4] [15] |
2.0 | April 2020 | An AI accelerated form of TAAU using Tensor Cores, and trained generically [16] |
3.0 | September 2022 | DLSS 3.0, augmented with an optical flow frame generation algorithm (only available on RTX 40-series GPUs) to generate frames inbetween rendered frames [2] [17] |
3.5 | September 2023 | DLSS 3.5 adds ray reconstruction, replacing multiple denoising algorithms with a single AI model trained on five times more data than DLSS 3. [18] [17] |
When using DLSS, depending on the game, users have access to various quality presets in addition to the option to set the internally rendered, upscaled resolution manually:
Quality preset [a] | Scale factor [b] | Render scale [c] |
---|---|---|
DLAA | 1x | 100% |
Ultra Quality [19] (unused) | 1.32x | 76.0% |
Quality | 1.50x | 66.7% |
Balanced | 1.72x | 58.0% |
Performance | 2.00x | 50.0% |
Ultra Performance (since v2.1; only recommended for resolutions from 8K [19] ) | 3.00x | 33.3% |
Auto | Rendered resolution dynamically adjusts in real time to achieve user-defined fps targets (e.g., 144 fps on a 144 Hz monitor). [20] |
The first iteration of DLSS is a predominantly spatial image upscaler with two stages, both relying on convolutional auto-encoder neural networks. [21] The first step is an image enhancement network which uses the current frame and motion vectors to perform edge enhancement, and spatial anti-aliasing. The second stage is an image upscaling step which uses the single raw, low-resolution frame to upscale the image to the desired output resolution. Using just a single frame for upscaling means the neural network itself must generate a large amount of new information to produce the high resolution output, this can result in slight hallucinations such as leaves that differ in style to the source content. [13]
The neural networks are trained on a per-game basis by generating a "perfect frame" using traditional supersampling to 64 samples per pixel, as well as the motion vectors for each frame. The data collected must be as comprehensive as possible, including as many levels, times of day, graphical settings, resolutions, etc. as possible. This data is also augmented using common augmentations such as rotations, colour changes, and random noise to help generalize the test data. Training is performed on Nvidia's Saturn V supercomputer. [14] [22]
This first iteration received a mixed response, with many criticizing the often soft appearance and artifacts in certain situations; [23] [6] [5] likely a side effect of the limited data from only using a single frame input to the neural networks which could not be trained to perform optimally in all scenarios and edge-cases. [13] [14]
Nvidia also demonstrated the ability for the auto-encoder networks to learn the ability to recreate depth-of-field and motion blur, [14] although this functionality has never been included in a publicly released product.[ citation needed ]
DLSS 2.0 is a temporal anti-aliasing upsampling (TAAU) implementation, using data from previous frames extensively through sub-pixel jittering to resolve fine detail and reduce aliasing. The data DLSS 2.0 collects includes: the raw low-resolution input, motion vectors, depth buffers, and exposure / brightness information. [13] It can also be used as a simpler TAA implementation where the image is rendered at 100% resolution, rather than being upsampled by DLSS, Nvidia brands this as DLAA (deep learning anti-aliasing). [24]
TAA(U) is used in many modern video games and game engines, [25] however all previous implementations have used some form of manually written heuristics to prevent temporal artifacts such as ghosting and flickering. One example of this is neighborhood clamping which forcefully prevents samples collected in previous frames from deviating too much compared to nearby pixels in newer frames. This helps to identify and fix many temporal artifacts, but deliberately removing fine details in this way is analogous to applying a blur filter, and thus the final image can appear blurry when using this method. [13]
DLSS 2.0 uses a convolutional auto-encoder neural network [23] trained to identify and fix temporal artifacts, instead of manually programmed heuristics as mentioned above. Because of this, DLSS 2.0 can generally resolve detail better than other TAA and TAAU implementations, while also removing most temporal artifacts. This is why DLSS 2.0 can sometimes produce a sharper image than rendering at higher, or even native resolutions using traditional TAA. However, no temporal solution is perfect, and artifacts (ghosting in particular) are still visible in some scenarios when using DLSS 2.0.
Because temporal artifacts occur in most art styles and environments in broadly the same way, the neural network that powers DLSS 2.0 does not need to be retrained when being used in different games. Despite this, Nvidia does frequently ship new minor revisions of DLSS 2.0 with new titles, [26] so this could suggest some minor training optimizations may be performed as games are released, although Nvidia does not provide changelogs for these minor revisions to confirm this.
The main advancements compared to DLSS 1.0 include: Significantly improved detail retention, a generalized neural network that does not need to be re-trained per-game, and ~2x less overhead (~1-2 ms vs ~2-4 ms). [13]
It should also be noted that forms of TAAU such as DLSS 2.0 are not upscalers in the same sense as techniques such as ESRGAN or DLSS 1.0, which attempt to create new information from a low-resolution source; instead, TAAU works to recover data from previous frames, rather than creating new data. In practice, this means low resolution textures in games will still appear low-resolution when using current TAAU techniques. This is why Nvidia recommends game developers use higher resolution textures than they would normally for a given rendering resolution by applying a mip-map bias when DLSS 2.0 is enabled. [13]
Augments DLSS 2.0 by making use of motion interpolation. The DLSS frame generation algorithm takes two rendered frames from the rendering pipeline and generates a new frame that smoothly transitions between them. So for every frame rendered, one additional frame is generated. [2]
DLSS 3.0 makes use of a new generation Optical Flow Accelerator (OFA) included in Ada Lovelace generation RTX GPUs. The new OFA is faster and more accurate than the OFA already available in previous Turing and Ampere RTX GPUs. [27] This results in DLSS 3.0 being exclusive for the RTX 40 Series.
At release, DLSS 3.0 does not work for VR displays.[ citation needed ]
DLSS 3.5 adds ray reconstruction, replacing multiple denoising algorithms with a single AI model trained on five times more data than DLSS 3. Ray reconstruction is available on all RTX GPUs and first targeted games with path tracing (aka "full ray tracing"), including Cyberpunk 2077's Phantom Liberty DLC, Portal with RTX , and Alan Wake 2 . [18] [17]
DLSS requires and applies its own anti-aliasing method. Thus, depending on the game and quality setting used, using DLSS may improve image quality even over native resolution rendering. [28]
It operates on similar principles to TAA. Like TAA, it uses information from past frames to produce the current frame. Unlike TAA, DLSS does not sample every pixel in every frame. Instead, it samples different pixels in different frames and uses pixels sampled in past frames to fill in the unsampled pixels in the current frame. DLSS uses machine learning to combine samples in the current frame and past frames, and it can be thought of as an advanced and superior TAA implementation made possible by the available tensor cores. [13]
Nvidia also offers deep learning anti-aliasing (DLAA). DLAA provides the same AI-driven anti-aliasing DLSS uses, but without any upscaling or downscaling functionality. [24]
With the exception of the shader-core version implemented in Control, DLSS is only available on GeForce RTX 20, GeForce RTX 30, GeForce RTX 40, and Quadro RTX series of video cards, using dedicated AI accelerators called Tensor Cores. [23] [ failed verification ] Tensor Cores are available since the Nvidia Volta GPU microarchitecture, which was first used on the Tesla V100 line of products. [29] They are used for doing fused multiply-add (FMA) operations that are used extensively in neural network calculations for applying a large series of multiplications on weights, followed by the addition of a bias. Tensor cores can operate on FP16, INT8, INT4, and INT1 data types. Each core can do 1024 bits of FMA operations per clock, so 1024 INT1, 256 INT4, 128 INT8, and 64 FP16 operations per clock per tensor core, and most Turing GPUs have a few hundred tensor cores. [30]
The Tensor Cores use CUDA Warp-Level Primitives on 32 parallel threads to take advantage of their parallel architecture. [31] A Warp is a set of 32 threads which are configured to execute the same instruction.
Since Windows 10 version 1903, Microsoft Windows provided DirectML as one part of DirectX to support Tensor Cores.
Especially in early versions of DLSS, users reported blurry frames. Andrew Edelsten, an employee at Nvidia, therefore commented on the problem in a blog post in 2019 and promised that they were working on improving the technology and clarified that the DLSS AI algorithm was mainly trained with 4K image material. That the use of DLSS leads to particularly blurred images at lower resolutions, such as Full HD, is due to the fact that the algorithm has far less image information available to calculate an appropriate image compared to higher resolutions like 4K. [32]
Furthermore, the use of DLSS frame generation may lead to increased input latency [33] and visual artifacts. [34]
It has also been criticized that by implementing DLSS in their games, game developers no longer have an incentive to optimize them so that they also run smoothly in native resolution on modern PC hardware. For example, for the game Alan Wake 2 in 4K resolution at the highest graphics settings with ray tracing enabled, the use of DLSS in Performance mode is recommended even with current-generation high-end graphics cards such as the Nvidia GeForce RTX 4080 in order to achieve 60 fps. [35]
GeForce is a brand of graphics processing units (GPUs) designed by Nvidia and marketed for the performance market. As of the GeForce 40 series, there have been eighteen iterations of the design. The first GeForce products were discrete GPUs designed for add-on graphics boards, intended for the high-margin PC gaming market, and later diversification of the product line covered all tiers of the PC graphics market, ranging from cost-sensitive GPUs integrated on motherboards, to mainstream add-in retail boards. Most recently, GeForce technology has been introduced into Nvidia's line of embedded application processors, designed for electronic handhelds and mobile handsets.
A graphics processing unit (GPU) is a specialized electronic circuit initially designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal computers, workstations, and game consoles. After their initial design, GPUs were found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. Other non-graphical uses include the training of neural networks and cryptocurrency mining.
Scalable Link Interface (SLI) is the brand name for a now discontinued multi-GPU technology developed by Nvidia for linking two or more video cards together to produce a single output. SLI is a parallel processing algorithm for computer graphics, meant to increase the available processing power.
In computer graphics, a shader is a computer program that calculates the appropriate levels of light, darkness, and color during the rendering of a 3D scene—a process known as shading. Shaders have evolved to perform a variety of specialized functions in computer graphics special effects and video post-processing, as well as general-purpose computing on graphics processing units.
Quadro was Nvidia's brand for graphics cards intended for use in workstations running professional computer-aided design (CAD), computer-generated imagery (CGI), digital content creation (DCC) applications, scientific calculations and machine learning from 2000 to 2020.
In computer graphics and digital imaging, imagescaling refers to the resizing of a digital image. In video technology, the magnification of digital material is known as upscaling or resolution enhancement.
Temporal anti-aliasing (TAA) is a spatial anti-aliasing technique for computer-generated video that combines information from past frames and the current frame to remove jaggies in the current frame. In TAA, each pixel is sampled once per frame but in each frame the sample is at a different location within the pixel. Pixels sampled in past frames are blended with pixels sampled in the current frame to produce an anti-aliased image. Although this method makes TAA achieve a result comparable to supersampling, the technique inevitably causes ghosting and blurriness to the image.
Tiled rendering is the process of subdividing a computer graphics image by a regular grid in optical space and rendering each section of the grid, or tile, separately. The advantage to this design is that the amount of memory and bandwidth is reduced compared to immediate mode rendering systems that draw the entire frame at once. This has made tile rendering systems particularly common for low-power handheld device use. Tiled rendering is sometimes known as a "sort middle" architecture, because it performs the sorting of the geometry in the middle of the graphics pipeline instead of near the end.
GPUOpen is a middleware software suite originally developed by AMD's Radeon Technologies Group that offers advanced visual effects for computer games. It was released in 2016. GPUOpen serves as an alternative to, and a direct competitor of Nvidia GameWorks. GPUOpen is similar to GameWorks in that it encompasses several different graphics technologies as its main components that were previously independent and separate from one another. However, GPUOpen is partially open source software, unlike GameWorks which is proprietary and closed.
Nvidia RTX is a professional visual computing platform created by Nvidia, primarily used in workstations for designing complex large-scale models in architecture and product design, scientific visualization, energy exploration, and film and video production, as well as being used in mainstream PCs for gaming.
id Tech 7 is a multiplatform proprietary game engine developed by id Software. As part of the id Tech series of game engines, it is the successor to id Tech 6. The software was first demonstrated at QuakeCon 2018 as part of the id Software announcement of Doom Eternal.
The GeForce 20 series is a family of graphics processing units developed by Nvidia. Serving as the successor to the GeForce 10 series, the line started shipping on September 20, 2018, and after several editions, on July 2, 2019, the GeForce RTX Super line of cards was announced.
Turing is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia. It is named after the prominent mathematician and computer scientist Alan Turing. The architecture was first introduced in August 2018 at SIGGRAPH 2018 in the workstation-oriented Quadro RTX cards, and one week later at Gamescom in consumer GeForce 20 series graphics cards. Building on the preliminary work of Volta, its HPC-exclusive predecessor, the Turing architecture introduces the first consumer products capable of real-time ray tracing, a longstanding goal of the computer graphics industry. Key elements include dedicated artificial intelligence processors and dedicated ray tracing processors. Turing leverages DXR, OptiX, and Vulkan for access to ray tracing. In February 2019, Nvidia released the GeForce 16 series GPUs, which utilizes the new Turing design but lacks the RT and Tensor cores.
The GeForce 30 series is a suite of graphics processing units (GPUs) designed and marketed by Nvidia, succeeding the GeForce 20 series. The GeForce 30 series is based on the Ampere architecture, which features Nvidia's second-generation ray tracing (RT) cores and third-generation Tensor Cores. Through Nvidia RTX, hardware-enabled real-time ray tracing is possible on GeForce 30 series cards.
Intel Arc is a brand of graphics processing units designed by Intel. These are discrete GPUs mostly marketed for the high-margin gaming PC market. The brand also covers Intel's consumer graphics software and services.
RTX Video Super Resolution is a video-upscaling feature released by Nvidia on February 28, 2023.
Deep learning anti-aliasing (DLAA) is a form of spatial anti-aliasing created by Nvidia. DLAA depends on and requires Tensor Cores available in Nvidia RTX cards.
The GeForce 40 series is the most recent family of consumer-level graphics processing units developed by Nvidia, succeeding the GeForce 30 series. The series was announced on September 20, 2022, at the GPU Technology Conference (GTC) 2022 event, and launched on October 12, 2022, starting with its flagship model, the RTX 4090.
Ada Lovelace, also referred to simply as Lovelace, is a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to the Ampere architecture, officially announced on September 20, 2022. It is named after the English mathematician Ada Lovelace, one of the first computer programmers. Nvidia announced the architecture along with the GeForce RTX 40 series consumer GPUs and the RTX 6000 Ada Generation workstation graphics card. The Lovelace architecture is fabricated on TSMC's custom 4N process which offers increased efficiency over the previous Samsung 8 nm and TSMC N7 processes used by Nvidia for its previous-generation Ampere architecture.
Deep learning super sampling uses artificial intelligence and machine learning to produce an image that looks like a higher-resolution image, without the rendering overhead. Nvidia's algorithm learns from tens of thousands of rendered sequences of images that were created using a supercomputer. That trains the algorithm to be able to produce similarly beautiful images, but without requiring the graphics card to work as hard to do it.
Of course, this is to be expected. DLSS was never going to provide the same image quality as native 4K while providing a 37% performance uplift. That would be black magic. But the quality difference comparing the two is almost laughable, in how far away DLSS is from the native presentation in these stressful areas.
Recently, two big titles received NVIDIA DLSS support, namely Metro Exodus and Battlefield V. Both these games come with NVIDIA's DXR (DirectX Raytracing) implementation that at the moment is only supported by the GeForce RTX cards. DLSS makes these games playable at higher resolutions with much better frame rates, although there is a notable decrease in image sharpness. Now, AMD has taken a jab at DLSS, saying that traditional AA methods like SMAA and TAA "offer superior combinations of image quality and performance."
The benefit for most people is that, generally, DLSS comes with a sizeable FPS improvement. How much varies from game to game. In Metro Exodus, the FPS jump was barely there and certainly not worth the bizarre hit to image quality.
Of course, this isn't the first DLSS implementation we've seen in Control. The game shipped with a decent enough rendition of the technology that didn't actually use machine learning
As promised, NVIDIA has updated the DLSS network in a new GeForce update that provides better, sharper image quality while still retaining higher framerates in raytraced games. While the feature wasn't used as well in its first iteration, NVIDIA is now confident that they have successfully fixed all the issues it had before
The original DLSS required training the AI network for each new game. DLSS 2.0 trains using non-game-specific content, delivering a generalized network that works across games. This means faster game integrations, and ultimately more DLSS games.
we developed a new image processing algorithm that approximated our AI research model and fit within our performance budget. This image processing approach to DLSS is integrated into Control
The DLSS team first extracts many aliased frames from the target game, and then for each one we generate a matching "perfect frame" using either super-sampling or accumulation rendering. These paired frames are fed to NVIDIA's supercomputer. The supercomputer trains the DLSS model to recognize aliased inputs and generate high-quality anti-aliased images that match the "perfect frame" as closely as possible. We then repeat the process, but this time we train the model to generate additional pixels rather than applying AA. This has the effect of increasing the resolution of the input. Combining both techniques enables the GPU to render the full monitor resolution at higher frame rates.
NVIDIA GPUs execute groups of threads known as warps in SIMT (Single Instruction, Multiple Thread) fashion