Deep learning anti-aliasing (DLAA) is a form of spatial anti-aliasing created by Nvidia. [1] DLAA depends on and requires Tensor Cores available in Nvidia RTX cards. [1]
DLAA is similar to deep learning super sampling (DLSS) in its anti-aliasing method, [2] with one important differentiation being that the goal of DLSS is to increase performance at the cost of image quality, [3] whereas the main priority of DLAA is improving image quality at the cost of performance (irrelevant of resolution upscaling or downscaling). [4] DLAA is similar to temporal anti-aliasing (TAA) in that they are both spatial anti-aliasing solutions relying on past frame data. [3] [5] Compared to TAA, DLAA is substantially better when it comes to shimmering, flickering, and handling small meshes like wires. [6]
DLAA collects game rendering data including raw low-resolution input, motion vectors, depth buffers, and exposure information. This information feeds into a convolutional neural network that processes the image to reduce aliasing while preserving fine detail. [3]
The neural network architecture employs an auto-encoder design trained on high-quality reference images. The training dataset includes diverse scenarios focusing on challenging cases like sub-pixel details, high-contrast edges, and transparent surfaces. The network then processes frames in real-time. [3]
Unlike traditional anti-aliasing solutions that rely on manually written heuristics, such as TAA, DLAA uses its neural network to preserve fine details while eliminating unwanted visual artifacts. [3]
TAA is used in many modern video games and game engines; [7] however, all previous implementations have used some form of manually written heuristics to prevent temporal artifacts such as ghosting and flickering. One example of this is neighborhood clamping which forcefully prevents samples collected in previous frames from deviating too much compared to nearby pixels in newer frames. This helps to identify and fix many temporal artifacts, but deliberately removing fine details in this way is analogous to applying a blur filter, and thus the final image can appear blurry when using this method. [8]
DLAA uses an auto-encoder convolutional neural network [9] trained to identify and fix temporal artifacts, instead of manually programmed heuristics as mentioned above. Because of this, DLAA can generally resolve detail better than other TAA and TAAU implementations, while also removing most temporal artifacts.
While DLSS handles upscaling with a focus on performance, DLAA handles anti-aliasing with a focus on visual quality. DLAA runs at the given screen resolution with no upscaling or downscaling functionality provided by DLAA. [10]
DLSS and DLAA share the same AI-driven anti-aliasing method. [11] As such, DLAA functions like DLSS without the upscaling part. Both are made by Nvidia and require Tensor Cores. However, DLSS and DLAA cannot be enabled at the same time, only one can be selected depending on whether performance or image quality is prioritized.
Rendering is the process of generating a photorealistic or non-photorealistic image from input data such as 3D models. The word "rendering" originally meant the task performed by an artist when depicting a real or imaginary thing. Today, to "render" commonly means to generate an image or video from a precise description using a computer program.
Jaggies are artifacts in raster images, most frequently from aliasing, which in turn is often caused by non-linear mixing effects producing high-frequency components, or missing or poor anti-aliasing filtering prior to sampling.
Frame rate, most commonly expressed in frame/s, frames per second or FPS, is typically the frequency (rate) at which consecutive images (frames) are captured or displayed. This definition applies to film and video cameras, computer animation, and motion capture systems. In these contexts, frame rate may be used interchangeably with frame frequency and refresh rate, which are expressed in hertz. Additionally, in the context of computer graphics performance, FPS is the rate at which a system, particularly a GPU, is able to generate frames, and refresh rate is the frequency at which a display shows completed frames. In electronic camera specifications frame rate refers to the maximum possible rate frames could be captured, but in practice, other settings may reduce the actual frequency to a lower number than the frame rate.
In digital signal processing, spatial anti-aliasing is a technique for minimizing the distortion artifacts (aliasing) when representing a high-resolution image at a lower resolution. Anti-aliasing is used in digital photography, computer graphics, digital audio, and many other applications.
In computer graphics, mipmaps or pyramids are pre-calculated, optimized sequences of images, each of which is a progressively lower resolution representation of the previous. The height and width of each image, or level, in the mipmap is a factor of two smaller than the previous level. Mipmaps do not have to be square. They are intended to increase rendering speed and reduce aliasing artifacts. A high-resolution mipmap image is used for high-density samples, such as for objects close to the camera; lower-resolution images are used as the object appears farther away. This is a more efficient way of downscaling a texture than sampling all texels in the original texture that would contribute to a screen pixel; it is faster to take a constant number of samples from the appropriately downfiltered textures. Mipmaps are widely used in 3D computer games, flight simulators, other 3D imaging systems for texture filtering, and 2D and 3D GIS software. Their use is known as mipmapping. The letters MIP in the name are an acronym of the Latin phrase multum in parvo, meaning "much in little".
In 3D computer graphics, anisotropic filtering is a method of enhancing the image quality of textures. It only applies on surfaces at oblique viewing angles to the camera and where the projection of the texture appears to be non-orthogonal. As per its etymology, anisotropic filtering does not filter the same in every direction.
1080i is a term used in high-definition television (HDTV) and video display technology. It means a video mode with 1080 lines of vertical resolution. The "i" stands for interlaced scanning method. This format was once a standard in HDTV. It was particularly used for broadcast television. This is because it can deliver high-resolution images without needing excessive bandwidth. This format is used in the SMPTE 292M standard.
Pixel art scaling algorithms are graphical filters that attempt to enhance the appearance of hand-drawn 2D pixel art graphics. These algorithms are a form of automatic image enhancement. Pixel art scaling algorithms employ methods significantly different than the common methods of image rescaling, which have the goal of preserving the appearance of images.
In computer graphics and digital imaging, imagescaling refers to the resizing of a digital image. In video technology, the magnification of digital material is known as upscaling or resolution enhancement.
Multisample anti-aliasing (MSAA) is a type of spatial anti-aliasing, a technique used in computer graphics to remove jaggies.
Temporal anti-aliasing (TAA) is a spatial anti-aliasing technique for computer-generated video that combines information from past frames and the current frame to remove jaggies in the current frame. In TAA, each pixel is sampled once per frame but in each frame the sample is at a different location within the frame. Pixels sampled in past frames are blended with pixels sampled in the current frame to produce an anti-aliased image. Although this method makes TAA achieve a result comparable to supersampling, the technique inevitably causes ghosting and blurriness to the image.
Anti-aliasing may refer to any of a number of techniques to combat the problems of aliasing in a sampled signal such as a digital image or digital audio recording.
Fast approximate anti-aliasing (FXAA) is a screen-space anti-aliasing algorithm created by Timothy Lottes at Nvidia.
The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.
A convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns features by itself via filter optimization. This type of deep learning network has been applied to process and make predictions from many different types of data including text, images and audio. Convolution-based networks are the de-facto standard in deep learning-based approaches to computer vision and image processing, and have only recently been replaced -- in some cases -- by newer deep learning architectures such as the transformer. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.
GPUOpen is a middleware software suite originally developed by AMD's Radeon Technologies Group that offers advanced visual effects for computer games. It was released in 2016. GPUOpen serves as an alternative to, and a direct competitor of Nvidia GameWorks. GPUOpen is similar to GameWorks in that it encompasses several different graphics technologies as its main components that were previously independent and separate from one another. However, GPUOpen is partially open source software, unlike GameWorks which is proprietary and closed.
The Style Generative Adversarial Network, or StyleGAN for short, is an extension to the GAN architecture introduced by Nvidia researchers in December 2018, and made source available in February 2019.
Deep learning super sampling (DLSS) is a family of real-time deep learning image enhancement and upscaling technologies developed by Nvidia that are available in a number of video games. The goal of these technologies is to allow the majority of the graphics pipeline to run at a lower resolution for increased performance, and then infer a higher resolution image from this that approximates the same level of detail as if the image had been rendered at this higher resolution. This allows for higher graphical settings and/or frame rates for a given output resolution, depending on user preference.
Deep learning in photoacoustic imaging combines the hybrid imaging modality of photoacoustic imaging (PA) with the rapidly evolving field of deep learning. Photoacoustic imaging is based on the photoacoustic effect, in which optical absorption causes a rise in temperature, which causes a subsequent rise in pressure via thermo-elastic expansion. This pressure rise propagates through the tissue and is sensed via ultrasonic transducers. Due to the proportionality between the optical absorption, the rise in temperature, and the rise in pressure, the ultrasound pressure wave signal can be used to quantify the original optical energy deposition within the tissue.
Video super-resolution (VSR) is the process of generating high-resolution video frames from the given low-resolution video frames. Unlike single-image super-resolution (SISR), the main goal is not only to restore more fine details while saving coarse ones, but also to preserve motion consistency.
{{cite web}}
: CS1 maint: numeric names: authors list (link)