Chroma subsampling is the practice of encoding images by implementing less resolution for chroma information than for luma information, taking advantage of the human visual system's lower acuity for color differences than for luminance. [1]
It is used in many video and still image encoding schemes –both analog and digital –including in JPEG encoding.
Digital signals are often compressed to reduce file size and save transmission time. Since the human visual system is much more sensitive to variations in brightness than color, a video system can be optimized by devoting more bandwidth to the luma component (usually denoted Y'), than to the color difference components Cb and Cr. In compressed images, for example, the 4:2:2 Y'CbCr scheme requires two-thirds the bandwidth of non-subsampled "4:4:4" R'G'B'. [a] This reduction results in almost no visual difference as perceived by the viewer.
The human vision system (HVS) processes color information (hue and colorfulness) at about a third of the resolution of luminance (lightness/darkness information in an image). Therefore it is possible to sample color information at a lower resolution while maintaining good image quality.
This is achieved by encoding RGB image data into a composite black and white image, with separated color difference data (chroma). For example with , gamma encoded components are weighted and then summed together to create the luma component. The color difference components are created by subtracting two of the weighted components from the third. A variety of filtering methods can be used to limit the resolution.
Gamma encoded luma should not be confused with linear luminance . The presence of gamma encoding is denoted with the prime symbol .
Gamma-correcting electro-optical transfer functions (EOTF) are used due to the nonlinear response of human vision. The use of gamma improves perceived signal-to-noise in analogue systems, and allows for more efficient data encoding in digital systems. This encoding uses more levels for darker colors than for lighter ones, accommodating human vision sensitivity. [2]
The subsampling scheme is commonly expressed as a three-part ratio J:a:b (e.g. 4:2:2) or four parts, if alpha channel is present (e.g. 4:2:2:4), that describe the number of luminance and chrominance samples in a conceptual region that is J pixels wide and 2 pixels high. The parts are (in their respective order):
This notation is not valid for all combinations and has exceptions, e.g. 4:1:0 (where the height of the region is not 2 pixels, but 4 pixels, so if 8 bits per component are used, the media would be 9 bits per pixel) and 4:2:1.
4:1:1 | 4:2:0 | 4:2:2 | 4:4:0 | 4:4:4 | |||||||||||||||||||||||||||||||
Y'CrCb | |||||||||||||||||||||||||||||||||||
= | = | = | = | = | |||||||||||||||||||||||||||||||
Y' | |||||||||||||||||||||||||||||||||||
+ | + | + | + | + | |||||||||||||||||||||||||||||||
1 | 2 | 3 | 4 | J | = 4 | 1 | 2 | 3 | 4 | J | = 4 | 1 | 2 | 3 | 4 | J | = 4 | 1 | 2 | 3 | 4 | J | = 4 | 1 | 2 | 3 | 4 | J | = 4 | ||||||
(Cr, Cb) | 1 | a | = 1 | 1 | 2 | a | = 2 | 1 | 2 | a | = 2 | 1 | 2 | 3 | 4 | a | = 4 | 1 | 2 | 3 | 4 | a | = 4 | ||||||||||||
1 | b | = 1 | b | = 0 | 1 | 2 | b | = 2 | b | = 0 | 1 | 2 | 3 | 4 | b | = 4 | |||||||||||||||||||
¼ horizontal resolution, full vertical resolution | ½ horizontal resolution, ½ vertical resolution | ½ horizontal resolution, full vertical resolution | full horizontal resolution, ½ vertical resolution | full horizontal resolution, full vertical resolution |
The mapping examples given are only theoretical and for illustration. Also the diagram does not indicate any chroma filtering, which should be applied to avoid aliasing. To calculate required bandwidth factor relative to 4:4:4 (or 4:4:4:4), one needs to sum all the factors and divide the result by 12 (or 16, if alpha is present).
Each of the three Y'CbCr components has the same sample rate, thus there is no chroma subsampling. This scheme is sometimes used in high-end film scanners and cinematic post-production.
"4:4:4" may instead be wrongly referring to R'G'B' color space, which implicitly also does not have any chroma subsampling (except in JPEG R'G'B' can be subsampled). Formats such as HDCAM SR can record 4:4:4 R'G'B' over dual-link HD-SDI.
The two chroma components are sampled at half the horizontal sample rate of luma: the horizontal chroma resolution is halved. This reduces the bandwidth of an uncompressed video signal by one-third, which means for 8 bit per component without alpha (24 bit per pixel) only 16 bits are enough, as in NV16.
Many high-end digital video formats and interfaces use this scheme:
In 4:1:1 chroma subsampling, the horizontal color resolution is quartered, and the bandwidth is halved compared to no chroma subsampling. Initially, 4:1:1 chroma subsampling of the DV format was not considered to be broadcast quality and was only acceptable for low-end and consumer applications. [3] [4] However, DV-based formats (some of which use 4:1:1 chroma subsampling) have been used professionally in electronic news gathering and in playout servers. DV has also been sporadically used in feature films and in digital cinematography.
In the 480i "NTSC" system, if the luma is sampled at 13.5 MHz, then this means that the Cr and Cb signals will each be sampled at 3.375 MHz, which corresponds to a maximum Nyquist bandwidth of 1.6875 MHz, whereas traditional "high-end broadcast analog NTSC encoder" would have a Nyquist bandwidth of 1.5 MHz and 0.5 MHz for the I/Q channels. However, in most equipment, especially cheap TV sets and VHS/Betamax VCRs, the chroma channels have only the 0.5 MHz bandwidth for both Cr and Cb (or equivalently for I/Q). Thus the DV system actually provides a superior color bandwidth compared to the best composite analog specifications for NTSC, despite having only 1/4 of the chroma bandwidth of a "full" digital signal.
Formats that use 4:1:1 chroma subsampling include:
In 4:2:0, the horizontal sampling is doubled compared to 4:1:1, but as the Cb and Cr channels are only sampled on each alternate line in this scheme, the vertical resolution is halved. The data rate is thus the same. This fits reasonably well with the PAL color encoding system, since this has only half the vertical chrominance resolution of NTSC. It would also fit extremely well with the SECAM color encoding system, since like that format, 4:2:0 only stores and transmits one color channel per line (the other channel being recovered from the previous line). However, little equipment has actually been produced that outputs a SECAM analogue video signal. In general, SECAM territories either have to use a PAL-capable display or a transcoder to convert the PAL signal to SECAM for display.
Different variants of 4:2:0 chroma configurations are found in:
Cb and Cr are each subsampled at a factor of 2 both horizontally and vertically. Most digital video formats corresponding to 576i "PAL" use 4:2:0 chroma subsampling.
There are four main variants of 4:2:0 schemes, having different horizontal and vertical sampling siting relative to the 2×2 "square" of the original input size. [15]
With interlaced material, 4:2:0 chroma subsampling can result in motion artifacts if it is implemented the same way as for progressive material. The luma samples are derived from separate time intervals, while the chroma samples would be derived from both time intervals. It is this difference that can result in motion artifacts. The MPEG-2 standard allows for an alternate interlaced sampling scheme, where 4:2:0 is applied to each field (not both fields at once). This solves the problem of motion artifacts, reduces the vertical chroma resolution by half, and can introduce comb-like artifacts in the image.
Original. This image shows a single field. The moving text has some motion blur applied to it.
4:2:0 progressive sampling applied to moving interlaced material. The chroma leads and trails the moving text. This image shows a single field.
4:2:0 interlaced sampling applied to moving interlaced material. This image shows a single field.
In the 4:2:0 interlaced scheme, however, vertical resolution of the chroma is roughly halved, since the chroma samples effectively describe an area 2 samples wide by 4 samples tall instead of 2×2. As well, the spatial displacement between both fields can result in the appearance of comb-like chroma artifacts.
4:2:0 progressive sampling applied to a still image. Both fields are shown.
4:2:0 interlaced sampling applied to a still image. Both fields are shown.
If the interlaced material is to be de-interlaced, the comb-like chroma artifacts (from 4:2:0 interlaced sampling) can be removed by blurring the chroma vertically. [18]
This ratio is possible, and some codecs support it, but it is not widely used. This ratio uses half of the vertical and one-fourth the horizontal color resolutions, with only one-eighth of the bandwidth of the maximum color resolutions used. Uncompressed video in this format with 8-bit quantization uses 10 bytes for every macropixel (which is 4×2 pixels) or 10 bit for every pixel. It has the equivalent chrominance bandwidth of a PAL-I or PAL-M signal decoded with a delay line decoder, and still very much superior to NTSC.
Used by Sony in their HDCAM High Definition recorders (not HDCAM SR). In the horizontal dimension, luma is sampled horizontally at three quarters of the full HD sampling rate – 1440 samples per row instead of 1920. Chroma is sampled at 480 samples per row, a third of the luma sampling rate. In the vertical dimension, both luma and chroma are sampled at the full HD sampling rate (1080 samples vertically).
A number of legacy schemes allow different subsampling factors in Cb and Cr, similar to how a different amount of bandwidth is allocated to the two chroma values in broadcast systems such as CCIR System M. These schemes are not expressible in J:a:b notation. Instead, they adopt a Y:Cb:Cr notation, with each part describing the amount of resolution for the corresponding component. It is unspecified whether the resolution reduction happens in the horizontal or vertical direction.
Chroma subsampling suffers from two main types of artifacts, causing degradation more noticeable than intended where colors change abruptly.
Gamma-corrected signals like Y'CbCr have an issue where chroma errors "bleed" into luma. In those signals, a low chroma actually makes a color appear less bright than one with equivalent luma. As a result, when a saturated color blends with an unsaturated or complementary color, a loss of luminance occurs at the border. This can be seen in the example between magenta and green. [20] This issue persists in HDR video where gamma is generalized into a transfer function "EOTF". A steeper EOTF shows a stronger luminance loss. [21]
Some proposed corrections of this issue are:
Rec. 2020 defines a "constant luminance" Yc'CbcCrc, which is calculated from linear RGB components and then gamma-encoded. This version does not suffer from the luminance loss by design. [25]
Another artifact that can occur with chroma subsampling is that out-of-gamut colors can occur upon chroma reconstruction. Suppose the image consisted of alternating 1-pixel red and black lines and the subsampling omitted the chroma for the black pixels. Chroma from the red pixels will be reconstructed onto the black pixels, causing the new pixels to have positive red and negative green and blue values. As displays cannot output negative light (negative light does not exist), these negative values will effectively be clipped, and the resulting luma value will be too high. Other sub-sampling filters (especially the averaging "box") have a similar issue that is harder to make a simple example out of. Similar artifacts arise in the less artificial example of gradation near a fairly sharp red/black boundary. [20]
It is possible for the decoder to deal with out-of-gamut colors by considering how much chroma a given luma value can hold and distribute it into the 4:4:4 intermediate accordingly, termed "in-range chroma reconstruction" by Glenn Chan. The "proportion" method is in spirit similar to Kornelski's luma-weighted average, while the "spill" method resembles error diffusion. [20] Improving chroma reconstruction remains an active field of research. [26]
The term Y'UV refers to an analog TV encoding scheme (ITU-R Rec. BT.470) while Y'CbCr refers to a digital encoding scheme. [2] One difference between the two is that the scale factors on the chroma components (U, V, Cb, and Cr) are different. However, the term YUV is often used erroneously to refer to Y'CbCr encoding. Hence, expressions like "4:2:2 YUV" always refer to 4:2:2 Y'CbCr, since there simply is no such thing as 4:x:x in analog encoding (such as YUV). Pixel formats used in Y'CbCr can be referred to as YUV too, for example yuv420p, yuvj420p and many others.
In a similar vein, the term luminance and the symbol Y are often used erroneously to refer to luma, which is denoted with the symbol Y'. The luma (Y') of video engineering deviates from the luminance (Y) of color science (as defined by CIE). Luma is formed as the weighted sum of gamma-corrected (tristimulus) RGB components. Luminance is formed as a weighed sum of linear (tristimulus) RGB components. In practice, the CIE symbol Y is often incorrectly used to denote luma. In 1993, SMPTE adopted Engineering Guideline EG 28, clarifying the two terms. The prime symbol ' is used to indicate gamma correction. [27]
Similarly, the chroma of video engineering differs from the chrominance of color science. The chroma of video engineering is formed from weighted tristimulus components (gamma corrected, OETF), not linear components. In video engineering practice, the terms chroma, chrominance, and saturation are often used interchangeably to refer to chroma, but it is not a good practice, as ITU-T Rec H.273 says. [28]
Chroma subsampling was developed in the 1950s by Alda Bedford for the development of color television by RCA, which developed into the NTSC standard; luma–chroma separation was developed earlier, in 1938 by Georges Valensi. Through studies[ which? ], he showed that the human eye has high resolution only for black and white, somewhat less for "mid-range" colors like yellows and greens, and much less for colors on the end of the spectrum, reds and blues.[ clarification needed ] This knowledge allowed RCA to develop a system in which they discarded most of the blue signal after it comes from the camera, keeping most of the green and only some of the red; this is chroma subsampling in the YIQ color space and is roughly analogous to 4:2:1 subsampling, in that it has decreasing resolution for luma, yellow/green, and red/blue.
Chrominance is the signal used in video systems to convey the color information of the picture, separately from the accompanying luma signal. Chrominance is usually represented as two color-difference components: U = B′ − Y′ (blue − luma) and V = R′ − Y′ (red − luma). Each of these different components may have scale factors and offsets applied to it, as specified by the applicable video standard.
SECAM, also written SÉCAM, is an analog color television system that was used in France, Russia and some other countries or territories of Europe and Africa. It was one of three major analog color television standards, the others being PAL and NTSC. Like PAL, a SECAM picture is also made up of 625 interlaced lines and is displayed at a rate of 25 frames per second. However, due to the way SECAM processes color information, it is not compatible with the PAL video format standard. SECAM video is composite video; the luminance and chrominance are transmitted together as one signal.
Video is an electronic medium for the recording, copying, playback, broadcasting, and display of moving visual media. Video was first developed for mechanical television systems, which were quickly replaced by cathode-ray tube (CRT) systems, which, in turn, were replaced by flat-panel displays of several types.
A video codec is software or hardware that compresses and decompresses digital video. In the context of video compression, codec is a portmanteau of encoder and decoder, while a device that only compresses is typically called an encoder, and one that only decompresses is a decoder.
Gamma correction or gamma is a nonlinear operation used to encode and decode luminance or tristimulus values in video or still image systems. Gamma correction is, in the simplest cases, defined by the following power-law expression:
Y′UV, also written YUV, is the color model found in the PAL analogue color TV standard. A color is described as a Y′ component (luma) and two chroma components U and V. The prime symbol (') denotes that the luma is calculated from gamma-corrected RGB input and that it is different from true luminance. Today, the term YUV is commonly used in the computer industry to describe colorspaces that are encoded using YCbCr.
Composite video is an baseband analog video format that typically carries a 405, 525 or 625 line interlaced black and white or color signal, on a single channel, unlike the higher-quality S-Video and the even higher-quality YPbPr.
S-Video is an analog video signal format that carries standard-definition video, typically at 525 lines or 625 lines. It encodes video luma and chrominance on two separate channels, achieving higher image quality than composite video which encodes all video information on one channel. It also eliminates several types of visual defects such as dot crawl which commonly occur with composite video. Although it is improved over composite video, S-Video has lower color resolution than component video, which is encoded over three channels.
ITU-R Recommendation BT.601, more commonly known by the abbreviations Rec. 601 or BT.601, is a standard originally issued in 1982 by the CCIR for encoding interlaced analog video signals in digital video form. It includes methods of encoding 525-line 60 Hz and 625-line 50 Hz signals, both with an active region covering 720 luminance samples and 360 chrominance samples per line. The color encoding system is known as YCbCr 4:2:2.
Component video is an analog video signal that has been split into two or more component channels. In popular use, it refers to a type of component analog video (CAV) information that is transmitted or stored as three separate signals. Component video can be contrasted with composite video in which all the video information is combined into a single signal that is used in analog television. Like composite, component cables do not carry audio and are often paired with audio cables.
D-1 or 4:2:2 Component Digital is an SMPTE digital recording video standard, introduced in 1986 through efforts by SMPTE engineering committees. It started as a Sony and Bosch – BTS product and was the first major professional digital video format. SMPTE standardized the format within ITU-R 601, also known as Rec. 601, which was derived from SMPTE 125M and EBU 3246-E standards.
Serial digital interface (SDI) is a family of digital video interfaces first standardized by SMPTE in 1989. For example, ITU-R BT.656 and SMPTE 259M define digital video interfaces used for broadcast-grade video. A related standard, known as high-definition serial digital interface (HD-SDI), is standardized in SMPTE 292M; this provides a nominal data rate of 1.485 Gbit/s.
CIF, also known as FCIF, is a standardized format for the picture resolution, frame rate, color space, and color subsampling of digital video sequences used in video teleconferencing systems. It was first defined in the H.261 standard in 1988.
YCbCr, Y′CbCr, or Y Pb/Cb Pr/Cr, also written as YCBCR or Y′CBCR, is a family of color spaces used as a part of the color image pipeline in video and digital photography systems. Y′ is the luma component and CB and CR are the blue-difference and red-difference chroma components. Y′ is distinguished from Y, which is luminance, meaning that light intensity is nonlinearly encoded based on gamma corrected RGB primaries.
H.262 or MPEG-2 Part 2 is a video coding format standardised and jointly maintained by ITU-T Study Group 16 Video Coding Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG), and developed with the involvement of many companies. It is the second part of the ISO/IEC MPEG-2 standard. The ITU-T Recommendation H.262 and ISO/IEC 13818-2 documents are identical.
Hold-And-Modify, usually abbreviated as HAM, is a display mode of the Commodore Amiga computer. It uses a highly unusual technique to express the color of pixels, allowing many more colors to appear on screen than would otherwise be possible. HAM mode was commonly used to display digitized photographs or video frames, bitmap art and occasionally animation. At the time of the Amiga's launch in 1985, this near-photorealistic display was unprecedented for a home computer and it was widely used to demonstrate the Amiga's graphical capability. However, HAM has significant technical limitations which prevent it from being used as a general purpose display mode.
In video, luma represents the brightness in an image. Luma is typically paired with chrominance. Luma represents the achromatic image, while the chroma components represent the color information. Converting R′G′B′ sources into luma and chroma allows for chroma subsampling: because human vision has finer spatial sensitivity to luminance differences than chromatic differences, video systems can store and transmit chromatic information at lower resolution, optimizing perceived detail at a particular bandwidth.
MUSE, commercially known as Hi-Vision was a Japanese analog high-definition television system, with design efforts going back to 1979.
This glossary defines terms that are used in the document "Defining Video Quality Requirements: A Guide for Public Safety", developed by the Video Quality in Public Safety (VQIPS) Working Group. It contains terminology and explanations of concepts relevant to the video industry. The purpose of the glossary is to inform the reader of commonly used vocabulary terms in the video domain. This glossary was compiled from various industry sources.
ITU-R Recommendation BT.2100, more commonly known by the abbreviations Rec. 2100 or BT.2100, introduced high-dynamic-range television (HDR-TV) by recommending the use of the perceptual quantizer or hybrid log–gamma (HLG) transfer functions instead of the traditional "gamma" previously used for SDR-TV.
{{cite book}}
: CS1 maint: multiple names: authors list (link)420paldv chroma samples are sited like:
Assumes that the image will be upsampled using a bilinear filter. If nearest neighbor is used instead, the upsampled image might look worse than with standard downsampling.
luma: To avoid the interdisciplinary confusion resulting from the two distinct definitions of luminance, it has been proposed that the video documents use luma for luminance, television (i.e., the luminance signal), and chroma for chrominance television (i.e., the chrominance signal)
NOTE – The term chroma is used rather than the term chrominance in order to avoid the implication of the use of linear light transfer characteristics that is often associated with the term chrominance. [...] NOTE – The term luma is used rather than the term luminance in order to avoid the implication of the use of linear light transfer characteristics that is often associated with the term luminance. The symbol L is sometimes used instead of the symbol Y to avoid confusion with the symbol y as used for vertical location.
{{cite journal}}
: Cite journal requires |journal=
(help)