In computer graphics, alpha compositing or alpha blending is the process of combining one image with a background to create the appearance of partial or full transparency. [1] It is often useful to render picture elements (pixels) in separate passes or layers and then combine the resulting 2D images into a single, final image called the composite. Compositing is used extensively in film when combining computer-rendered image elements with live footage. Alpha blending is also used in 2D computer graphics to put rasterized foreground elements over a background.
In order to combine the picture elements of the images correctly, it is necessary to keep an associated matte for each element in addition to its color. This matte layer contains the coverage information—the shape of the geometry being drawn—making it possible to distinguish between parts of the image where something was drawn and parts that are empty.
Although the most basic operation of combining two images is to put one over the other, there are many operations, or blend modes, that are used.
The concept of an alpha channel was introduced by Alvy Ray Smith and Ed Catmull in the late 1970s at the New York Institute of Technology Computer Graphics Lab. Bruce A. Wallace derived the same straight over operator based on a physical reflectance/transmittance model in 1981. [2] A 1984 paper by Thomas Porter and Tom Duff introduced premultiplied alpha using a geometrical approach. [3]
The use of the term alpha is explained by Smith as follows: "We called it that because of the classic linear interpolation formula that uses the Greek letter (alpha) to control the amount of interpolation between, in this case, two images A and B". [4] That is, when compositing image A atop image B, the value of in the formula is taken directly from A's alpha channel.
In a 2D image a color combination is stored for each picture element (pixel), often a combination of red, green and blue (RGB). When alpha compositing is in use, each pixel has an additional numeric value stored in its alpha channel, with a value ranging from 0 to 1. A value of 0 means that the pixel is fully transparent and the color in the pixel beneath will show through. A value of 1 means that the pixel is fully opaque.
With the existence of an alpha channel, it is possible to express compositing image operations using a compositing algebra. For example, given two images A and B, the most common compositing operation is to combine the images so that A appears in the foreground and B appears in the background. This can be expressed as AoverB. In addition to over, Porter and Duff [3] defined the compositing operators in, held out by (the phrase refers to holdout matting and is usually abbreviated out), atop, and xor (and the reverse operators rover, rin, rout, and ratop) from a consideration of choices in blending the colors of two pixels when their coverage is, conceptually, overlaid orthogonally:
As an example, the over operator can be accomplished by applying the following formula to each pixel: [2]
Here , and stand for the color components of the pixels in the result of the "over", image A, and image B respectively, applied to each color channel (red/green/blue) individually, whereas , and are the alpha values of the respective pixels.
The over operator is, in effect, the normal painting operation (see Painter's algorithm). The in and out operators are the alpha compositing equivalent of clipping. The two use only the alpha channel of the second image and ignore the color components. In addition, plus defines additive blending. [3]
If an alpha channel is used in an image, there are two common representations that are available: straight (unassociated) alpha and premultiplied (associated) alpha.
The most significant advantage of premultiplied alpha is that it allows for correct blending, interpolation, and filtering. Ordinary interpolation without premultiplied alpha leads to RGB information leaking out of fully transparent (A=0) regions, even though this RGB information is ideally invisible. When interpolating or filtering images with abrupt borders between transparent and opaque regions, this can result in borders of colors that were not visible in the original image. Errors also occur in areas of semitransparency because the RGB components are not correctly weighted, giving incorrectly high weighting to the color of the more transparent (lower alpha) pixels. [5]
Premultiplied alpha may also be used to allow regions of regular alpha blending (e.g. smoke) and regions with additive blending mode (e.g. flame and glitter effects) to be encoded within the same image. [6] [7] This is represented by an RGBA triplet that express emission with no occlusion, such as (0.4, 0.3, 0.2, 0.0).
Another advantage of premultiplied alpha is performance; in certain situations, it can reduce the number of multiplication operations (e.g. if the image is used many times during later compositing). The Porter–Duff operations have a simple form only in premultiplied alpha. [3] Some rendering pipelines expose a "straight alpha" API surface, but converts them into premultiplied alpha for performance. [8]
One disadvantage of premultiplied alpha is that it can reduce the available relative precision in the RGB values when using integer or fixed-point representation for the color components. This may cause a noticeable loss of quality if the color information is later brightened or if the alpha channel is removed. In practice, this is not usually noticeable because during typical composition operations, such as OVER, the influence of the low-precision color information in low-alpha areas on the final output image (after composition) is correspondingly reduced. This loss of precision also makes premultiplied images easier to compress using certain compression schemes, as they do not record the color variations hidden inside transparent regions, and can allocate fewer bits to encode low-alpha areas. The same “limitations” of lower quantisation bit depths such as 8 bit per channel are also present in imagery without alpha, and this argument is problematic as a result.
Assuming that the pixel color is expressed using straight (non-premultiplied) RGBA tuples, a pixel value of (0, 0.7, 0, 0.5) implies a pixel that has 70% of the maximum green intensity and 50% opacity. If the color were fully green, its RGBA would be (0, 1, 0, 0.5). However, if this pixel uses premultiplied alpha, all of the RGB values (0, 0.7, 0) are multiplied, or scaled for occlusion, by the alpha value 0.5, which is appended to yield (0, 0.35, 0, 0.5). In this case, the 0.35 value for the G channel actually indicates 70% green emission intensity (with 50% occlusion). A pure green emission would be encoded as (0, 0.5, 0, 0.5). Knowing whether a file uses straight or premultiplied alpha is essential to correctly process or composite it, as a different calculation is required.
Emission with no occlusion cannot be represented in straight alpha. No conversion is available in this case.
The most popular image formats that support the alpha channel are PNG and TIFF. GIF supports alpha channels, but is considered to be inefficient when it comes to file size. Support for alpha channels is present in some video codecs, such as Animation and Apple ProRes 4444 of the QuickTime format, or in the Techsmith multi-format codec.
The file format BMP generally does not support this channel; however, in different formats such as 32-bit (888–8) or 16-bit (444–4) it is possible to save the alpha channel, although not all systems or programs are able to read it: it is exploited mainly in some video games [9] or particular applications; [10] specific programs have also been created for the creation of these BMPs.
File/Codec format [11] | Maximum Depth | Type | Browser support | Media type | Notes |
---|---|---|---|---|---|
Apple ProRes 4444 | 16-bit | None | Video (.mov) | ProRes is the successor of the Apple Intermediate Codec [12] | |
HEVC / h.265 | 10-bit | Limited to Safari | Video (.hevc) | Intended successor to H.264 [13] [14] [15] | |
WebM (codec video VP8, VP9, or AV1) | 12-bit | All modern browsers | Video (.webm) | While VP8/VP9 is widely supported with modern browsers, AV1 still has limited support. [16] Only Chromium-based browsers will display alpha layers. | |
OpenEXR | 32-bit | None | Image (.exr) | Has largest HDR spread. | |
PNG | 16-bit | straight | All modern browsers | Image (.png) | |
APNG | 24-bit | straight | Moderate support | Image (.apng) | Supports animation. [17] |
TIFF | 32-bit | both | None | Image (.tiff) | |
GIF | 8-bit | All modern browsers | Image (.gif) | Browsers generally do not support GIF alpha layers. | |
SVG | 32-bit | straight | All modern browsers | Image (.svg) | Based on CSS color. [18] |
JPEG XL | 32-bit | both | Moderate support | Image (.jxl) | Allows lossy and HDR. [19] |
The RGB values of typical digital images do not directly correspond to the physical light intensities, but are rather compressed by a gamma correction function:
This transformation better utilizes the limited number of bits in the encoded image by choosing that better matches the non-linear human perception of luminance.
Accordingly, computer programs that deal with such images must decode the RGB values into a linear space (by undoing the gamma-compression), blend the linear light intensities, and re-apply the gamma compression to the result: [20] [21] [ failed verification ]
When combined with premultiplied alpha, pre-multiplication is done in linear space, prior to gamma compression. [22] This results in the following formula:
Note that the alpha channel may or may not undergo gamma-correction, even when the color channels do.
Although used for similar purposes, transparent colors and image masks do not permit the smooth blending of the superimposed image pixels with those of the background (only whole image pixels or whole background pixels allowed).
A similar effect can be achieved with a 1-bit alpha channel, as found in the 16-bit RGBA high color mode of the Truevision TGA image file format and related TARGA and AT-Vista/NU-Vista display adapters' high color graphic mode. This mode devotes 5 bits for every primary RGB color (15-bit RGB) plus a remaining bit as the "alpha channel".
Dithering can be used to simulate partial occlusion where only 1-bit alpha is available.
For some applications, a single alpha channel is not sufficient: a stained-glass window, for instance, requires a separate transparency channel for each RGB channel to model the red, green and blue transparency separately. More alpha channels can be added for accurate spectral color filtration applications.
Some order-independent transparency methods replace the over operator with a commutative approximation. [23]
Digital compositing is the process of digitally assembling multiple images to make a final image, typically for print, motion pictures or screen display. It is the digital analogue of optical film compositing. It’s part of VFX processing.
Portable Network Graphics is a raster-graphics file format that supports lossless data compression. PNG was developed as an improved, non-patented replacement for Graphics Interchange Format (GIF)—unofficially, the initials PNG stood for the recursive acronym "PNG's not GIF".
The RGB color model is an additive color model in which the red, green and blue primary colors of light are added together in various ways to reproduce a broad array of colors. The name of the model comes from the initials of the three additive primary colors, red, green, and blue.
Gamma correction or gamma is a nonlinear operation used to encode and decode luminance or tristimulus values in video or still image systems. Gamma correction is, in the simplest cases, defined by the following power-law expression:
Chroma subsampling is the practice of encoding images by implementing less resolution for chroma information than for luma information, taking advantage of the human visual system's lower acuity for color differences than for luminance.
RGBA stands for red green blue alpha. While it is sometimes described as a color space, it is actually a three-channel RGB color model supplemented with a fourth alpha channel. Alpha indicates how opaque each pixel is and allows an image to be combined over others using alpha compositing, with transparent areas and anti-aliasing of the edges of opaque regions. Each pixel is a 4D vector.
In digital photography, computer-generated imagery, and colorimetry, a greyscale or grayscale image is one in which the value of each pixel is a single sample representing only an amount of light; that is, it carries only intensity information. Grayscale images, a kind of black-and-white or gray monochrome, are composed exclusively of shades of gray. The contrast ranges from black at the weakest intensity to white at the strongest.
Transparency in computer graphics is possible in a number of file formats. The term "transparency" is used in various ways by different people, but at its simplest there is "full transparency" i.e. something that is completely invisible. Only part of a graphic should be fully transparent, or there would be nothing to see. More complex is "partial transparency" or "translucency" where the effect is achieved that a graphic is partially transparent in the same way as colored glass. Since ultimately a printed page or computer or television screen can only be one color at a point, partial transparency is always simulated at some level by mixing colors. There are many different ways to mix colors, so in some cases transparency is ambiguous.
YCbCr, Y′CbCr, or Y Pb/Cb Pr/Cr, also written as YCBCR or Y′CBCR, is a family of color spaces used as a part of the color image pipeline in video and digital photography systems. Y′ is the luma component and CB and CR are the blue-difference and red-difference chroma components. Y′ is distinguished from Y, which is luminance, meaning that light intensity is nonlinearly encoded based on gamma corrected RGB primaries.
sRGB is a standard RGB color space that HP and Microsoft created cooperatively in 1996 to use on monitors, printers, and the World Wide Web. It was subsequently standardized by the International Electrotechnical Commission (IEC) as IEC 61966-2-1:1999. sRGB is the current defined standard colorspace for the web, and it is usually the assumed colorspace for images that are neither tagged for a colorspace nor have an embedded color profile.
S3 Texture Compression (S3TC) is a group of related lossy texture compression algorithms originally developed by Iourcha et al. of S3 Graphics, Ltd. for use in their Savage 3D computer graphics accelerator. The method of compression is strikingly similar to the previously published Color Cell Compression, which is in turn an adaptation of Block Truncation Coding published in the late 1970s. Unlike some image compression algorithms, S3TC's fixed-rate data compression coupled with the single memory access made it well-suited for use in compressing textures in hardware-accelerated 3D computer graphics. Its subsequent inclusion in Microsoft's DirectX 6.0 and OpenGL 1.3 led to widespread adoption of the technology among hardware and software makers. While S3 Graphics is no longer a competitor in the graphics accelerator market, license fees have been levied and collected for the use of S3TC technology until October 2017, for example in game consoles and graphics cards. The wide use of S3TC has led to a de facto requirement for OpenGL drivers to support it, but the patent-encumbered status of S3TC presented a major obstacle to open source implementations, while implementation approaches which tried to avoid the patented parts existed.
Netpbm is an open-source package of graphics programs and a programming library. It is used mainly in the Unix world, where one can find it included in all major open-source operating system distributions, but also works on Microsoft Windows, macOS, and other operating systems.
Color digital images are made of pixels, and pixels are made of combinations of primary colors represented by a series of code. A channel in this context is the grayscale image of the same size as a color image, made of just one of these primary colors. For instance, an image from a standard digital camera will have a red, green and blue channel. A grayscale image has just one channel.
Silicon Graphics Image (SGI) or the RGB file format is the native raster graphics file format for Silicon Graphics workstations. The format was invented by Paul Haeberli. It can be run-length encoded (RLE). FFmpeg and ImageMagick, among others, support this format.
RGBE or Radiance HDR is an image format invented by Gregory Ward Larson for the Radiance rendering system. It stores pixels as one byte each for RGB values with a one byte shared exponent. Thus it stores four bytes per pixel.
Rec. 709, also known as Rec.709, BT.709, and ITU 709, is a standard developed by ITU-R for image encoding and signal characteristics of high-definition television.
Blend modes in digital image editing and computer graphics are used to determine how two layers are blended with each other. The default blend mode in most applications is simply to obscure the lower layer by covering it with whatever is present in the top layer ; because each pixel has numerical values, there also are many other ways to blend two layers.
Ericsson Texture Compression (ETC) is a lossy texture compression technique developed in collaboration with Ericsson Research in early 2005. It was originally developed under the name iPACKMAN and based on an earlier compression scheme called iPACKMAN.
This is a glossary of terms relating to computer graphics.
The Quite OK Image Format (QOI) is a specification for lossless image compression of 24-bit or 32-bit color raster (bitmapped) images, invented by Dominic Szablewski and first announced on 24 November 2021.
However, something interesting happens when we generate the next mipmap level...
By switching to pre-multiplied blend mode for all particle effects, and entire scene can be done with one draw call (assuming atlasing/2D array for the textures)... Another neat trick with pre-multiplied alpha is that if you have overlapping textures that are in known positions, you can pre-process them all down to one texture.
Win2D uses straight alpha in its API surface, but premultiplied alpha for internal rendering operations.