Deep image compositing

Last updated January 21, 2025

Deep image compositing is a way of compositing and rendering digital images that emerged in the mid-2010s. In addition to the usual color and opacity channels a notion of spatial depth is created. This allows multiple samples in the depth of the image to make up the final resulting color. This technique produces high quality results and removes artifacts around edges that could not be dealt with otherwise.

Deep data

Deep data is encoded by advanced 3D renderers into an image that samples information about the path each rendered pixel takes along the z axis extending outward from the virtual camera through space, including the color and opacity of every non-opaque surface or volume it passes through along the way, as well as neighboring samples. It might be considered somewhat analogous to the way ray tracing generates simulated photon paths through such mediums; however, ray tracing and other traditional rendering techniques generally produce images that contain only three or four channels of color and opacity values per pixel, flattened into a two dimensional frame.

Depth maps, on the other hand, contain z axis information encoded in a grayscale image. Each level of gray represents a different slice of the z space. The "thickness" of each slice is determined at time of render, allowing for more or less depth fidelity depending on how deep the scene is. Depth maps have been a boon to compositors for blending 3D renders with live action and practical elements. To be useful, the map must have high enough bit depth to encode separation between close-to-camera objects and objects near infinity. Most 3D software packages are now capable of generating 16-bit and 32-bit depth maps, providing up to 2 billion depth levels. Depth maps do not however include transparency information about non-opaque surfaces or volumes and as such, objects beyond and viewed through these semi- or fully-transparent objects will have no depth information of their own and may not get composited or blurred correctly. Even the popular addition of cryptomattes to many post-production and VFX studios' pipelines, while providing separate color-coded ID shapes for individual elements in a rendered scene to further bridge the gap between CGI and compositing, don't allow for the nearly automated and fully non-linear workflows that deep data does. This is because deep images encapsulate enough 3D information that normally time-intensive tasks such as rotoscoping with numerous holdout mattes for complex interactions between moving characters and semi-transparent environmental volumes like smoke or water, are essentially trivial. Instead of going through that process, multiple mattes could easily be generated from a single set of deep images with no need to re-render every matte element and background for each case. In addition to that efficiency and flexibility, deep data images inherently provide much higher visual quality in common areas that have been difficult with traditional renders, such as the motion-blurred edges of characters with semi-transparent elements like hair.

One downside to the use of deep images is their substantial file size, since they encode a relatively enormous amount of data per frame compared to even multichannel formats such as OpenEXR.

Function-based (integrated)

The data is stored as a function of depth. This results in a function curve that can be used to look up the data at any arbitrary depth. Manipulating the data is harder.

Sample-based (deintegrated)

Each sample is considered as an independent piece and can so be manipulated easily. To make sure the data is representing the right detail, an additional expand value needs to be introduced.

Generating deep data

3D renderers produce the necessary data as a part of the rendering pipeline. Samples are gathered in depth and then combined. The deep data can be written out before this happens and so is nothing new to the process. Generating deep data from camera data needs a proper depth map. This is used in a couple of cases but still not accurate enough for detailed representation. For basic holdout task this can be sufficient though.

Compositing deep data images

Deep images can be composited like regular images. The depth component makes it easier to determine the layering order. Traditionally this had to be input by the user. Deep images have that information for themselves and need no user input. Edge artifacts are reduced as transparent pixels have more data to work with.

History

Deep Images have been around in 3D rendering packages for quite a while now. The use of them for holdouts was first done at several VFX houses in shaders. Holdout mattes can be generated at render time. Using them in a more interactive manner was started recently by several companies, SideFX integrated it in their Houdini software and facilities like Industrial Light & Magic, DreamWorks Animation, Weta, AnimalLogic and DRD studios have implemented interactive solutions.

In 2014 the Academy of Motion Picture Arts and Sciences honored the technology with its annual SciTech awards. Dr. Peter Hillman for the long-term development and continued advancement of innovative, robust and complete toolsets for deep compositing and to Colin Doncaster, Johannes Saam, Areito Echevarria, Janne Kontkanen and Chris Cooper for the development, prototyping and promotion of technologies and workflows for deep compositing.

Resources

Related Research Articles

In computer graphics, alpha compositing or alpha blending is the process of combining one image with a background to create the appearance of partial or full transparency. It is often useful to render picture elements (pixels) in separate passes or layers and then combine the resulting 2D images into a single, final image called the composite. Compositing is used extensively in film when combining computer-rendered image elements with live footage. Alpha blending is also used in 2D computer graphics to put rasterized foreground elements over a background.

Rendering is the process of generating a photorealistic or non-photorealistic image from input data such as 3D models. The word "rendering" originally meant the task performed by an artist when depicting a real or imaginary thing. Today, to "render" commonly means to generate an image or video from a precise description using a computer program.

Digital compositing is the process of digitally assembling multiple images to make a final image, typically for print, motion pictures or screen display. It is the digital analogue of optical film compositing. It's part of VFX processing.

Portable Network Graphics is a raster-graphics file format that supports lossless data compression. PNG was developed as an improved, non-patented replacement for Graphics Interchange Format (GIF)—unofficially, the initials PNG stood for the recursive acronym "PNG's not GIF".

A depth buffer, also known as a z-buffer, is a type of data buffer used in computer graphics to represent depth information of objects in 3D space from a particular perspective. The depth is stored as a height map of the scene, the values representing a distance to camera, with 0 being the closest. The encoding scheme may be flipped with the highest number being the value closest to camera. Depth buffers are an aid to rendering a scene to ensure that the correct polygons properly occlude other polygons. Z-buffering was first described in 1974 by Wolfgang Straßer in his PhD thesis on fast algorithms for rendering occluded objects. A similar solution to determining overlapping polygons is the painter's algorithm, which is capable of handling non-opaque scene elements, though at the cost of efficiency and incorrect results.

<span class="mw-page-title-main">Voxel</span> Element representing a value on a grid in three dimensional space

A voxel is a three-dimensional counterpart to a pixel. It represents a value on a regular grid in a three-dimensional space. Voxels are frequently used in the visualization and analysis of medical and scientific data. They are also commonly used in video games, both as a technological feature, as in Outcast, and a graphical style, which was popularised by Minecraft.

Ray casting is the methodological basis for 3D CAD/CAM solid modeling and image rendering. It is essentially the same as ray tracing for computer graphics where virtual light rays are "cast" or "traced" on their path from the focal point of a camera through each pixel in the camera sensor to determine what is visible along the ray in the 3D scene.

<span class="mw-page-title-main">Hidden-surface determination</span> Visibility in 3D computer graphics

In 3D computer graphics, hidden-surface determination is the process of identifying what surfaces and parts of surfaces can be seen from a particular viewing angle. A hidden-surface determination algorithm is a solution to the visibility problem, which was one of the first major problems in the field of 3D computer graphics. The process of hidden-surface determination is sometimes called hiding, and such an algorithm is sometimes called a hider. When referring to line rendering it is known as hidden-line removal. Hidden-surface determination is necessary to render a scene correctly, so that one may not view features hidden behind the model itself, allowing only the naturally viewable portion of the graphic to be visible.

<span class="mw-page-title-main">Transparency (graphic)</span> Capability of a computer graphic to allow whatever is "behind" it to be visible

Transparency in computer graphics is possible in a number of file formats. The term "transparency" is used in various ways by different people, but at its simplest there is "full transparency" i.e. something that is completely invisible. Only part of a graphic should be fully transparent, or there would be nothing to see. More complex is "partial transparency" or "translucency" where the effect is achieved that a graphic is partially transparent in the same way as colored glass. Since ultimately a printed page or computer or television screen can only be one color at a point, partial transparency is always simulated at some level by mixing colors. There are many different ways to mix colors, so in some cases transparency is ambiguous.

<span class="mw-page-title-main">Volume rendering</span> Representing a 3D-modeled object or dataset as a 2D projection

In scientific visualization and computer graphics, volume rendering is a set of techniques used to display a 2D projection of a 3D discretely sampled data set, typically a 3D scalar field.

Reyes rendering is a computer software architecture used in 3D computer graphics to render photo-realistic images. It was developed in the mid-1980s by Loren Carpenter and Robert L. Cook at Lucasfilm's Computer Graphics Research Group, which is now Pixar. It was first used in 1982 to render images for the Genesis effect sequence in the movie Star Trek II: The Wrath of Khan. Pixar's RenderMan was an implementation of the Reyes algorithm, It has been deprecated as of 2016 and removed as of RenderMan 21. According to the original paper describing the algorithm, the Reyes image rendering system is "An architecture for fast high-quality rendering of complex images." Reyes was proposed as a collection of algorithms and data processing systems. However, the terms "algorithm" and "architecture" have come to be used synonymously in this context and are used interchangeably in this article.

Color digital images are made of pixels, and pixels are made of combinations of primary colors represented by a series of code. A channel in this context is the grayscale image of the same size as a color image, made of just one of these primary colors. For instance, an image from a standard digital camera will have a red, green and blue channel. A grayscale image has just one channel.

<span class="mw-page-title-main">3D computer graphics</span> Graphics that use a three-dimensional representation of geometric data

3D computer graphics, sometimes called CGI, 3-D-CGI or three-dimensional computer graphics, are graphics that use a three-dimensional representation of geometric data that is stored in the computer for the purposes of performing calculations and rendering digital images, usually 2D images but sometimes 3D images. The resulting images may be stored for viewing later or displayed in real time.

<span class="mw-page-title-main">Depth map</span> Image also containing data on distances of objects from the camera

In 3D computer graphics and computer vision, a depth map is an image or image channel that contains information relating to the distance of the surfaces of scene objects from a viewpoint. The term is related to depth buffer, Z-buffer, Z-buffering, and Z-depth. The "Z" in these latter terms relates to a convention that the central axis of view of a camera is in the direction of the camera's Z axis, and not to the absolute Z axis of a scene.

2D to 3D video conversion is the process of transforming 2D ("flat") film to 3D form, which in almost all cases is stereo, so it is the process of creating imagery for each eye from one 2D image.

Natron is a free and open-source node-based compositing application. It has been influenced by digital compositing software such as Avid Media Illusion, Apple Shake, Blackmagic Fusion, Autodesk Flame and Nuke, from which its user interface and many of its concepts are derived.

This is a glossary of terms relating to computer graphics.

Volumetric capture or volumetric video is a technique that captures a three-dimensional space, such as a location or performance. This type of volumography acquires data that can be viewed on flat screens as well as using 3D displays and VR headset. Consumer-facing formats are numerous and the required motion capture techniques lean on computer graphics, photogrammetry, and other computation-based methods. The viewer generally experiences the result in a real-time engine and has direct input in exploring the generated volume.

Cryptomatte is a piece of open-source software created by Jonah Friedman and Andy Jones at Psyop. It is also used synonymously for the specific style of image created by the software or other software working alike.

A neural radiance field (NeRF) is a method based on deep learning for reconstructing a three-dimensional representation of a scene from two-dimensional images. The NeRF model enables downstream applications of novel view synthesis, scene geometry reconstruction, and obtaining the reflectance properties of the scene. Additional scene properties such as camera poses may also be jointly learned. First introduced in 2020, it has since gained significant attention for its potential applications in computer graphics and content creation.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.