|Information technology – Generic coding of moving pictures and associated audio information: Video|
|Latest version||March 2013|
|Organization||ITU-T, ISO/IEC JTC 1|
|Committee||ITU-T Study Group 16 VCEG, MPEG|
|Base standards||H.261, MPEG-1|
|Related standards||H.222.0, H.263, H.264, H.265|
H.262or MPEG-2 Part 2 (formally known as ITU-T Recommendation H.262 and ISO/IEC 13818-2, also known as MPEG-2 Video) is a video coding format standardised and jointly maintained by ITU-T Study Group 16 Video Coding Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG), and developed with the involvement of many companies. It is the second part of the ISO/IEC MPEG-2 standard. The ITU-T Recommendation H.262 and ISO/IEC 13818-2 documents are identical.
The standard is available for a fee from the ITU-Tand ISO. MPEG-2 Video is very similar to MPEG-1, but also provides support for interlaced video (an encoding technique used in analog NTSC, PAL and SECAM television systems). MPEG-2 video is not optimized for low bit-rates (e.g., less than 1 Mbit/s), but somewhat outperforms MPEG-1 at higher bit rates (e.g., 3 Mbit/s and above), although not by a large margin unless the video is interlaced. All standards-conforming MPEG-2 Video decoders are also fully capable of playing back MPEG-1 Video streams.
The ISO/IEC approval process was completed in November 1994.The first edition was approved in July 1995 and published by ITU-T and ISO/IEC in 1996. Didier LeGall of Bellcore chaired the development of the standard and Sakae Okubo of NTT was the ITU-T coordinator and chaired the agreements on its requirements.
The technology was developed with contributions from a number of companies. Hyundai Electronics (now SK Hynix) developed the first MPEG-2 SAVI (System/Audio/Video) decoder in 1995.
The majority of patents that were later asserted in a patent pool to be essential for implementing the standard came from three companies: Sony (311 patents), Thomson (198 patents) and Mitsubishi Electric (119 patents).
In 1996, it was extended by two amendments to include the registration of copyright identifiers and the 4:2:2 Profile.ITU-T published these amendments in 1996 and ISO in 1997.
There are also other amendments published later by ITU-T and ISO.The most recent edition of the standard was published in 2013 and incorporates all prior amendments.
|Edition||Release date||Latest amendment||ISO/IEC standard||ITU-T Recommendation|
|First edition||1995||2000||ISO/IEC 13818-2:1996||H.262 (07/95)|
|Second edition||2000||2010||ISO/IEC 13818-2:2000||H.262 (02/00)|
|Third edition||2013||ISO/IEC 13818-2:2013||H.262 (02/12), incorporating Amendment 1 (03/13)|
This section may contain an excessive amount of intricate detail that may interest only a particular audience. Specifically, this is not the place to explain the general concept of video compression in such detail; focus should be kept on the H.262 video codec..May 2020) (Learn how and when to remove this template message)(
An HDTV camera with 8-bit sampling generates a raw video stream of 25 × 1920 × 1080 × 3 = 155,520,000 bytes per second for 25 frame-per-second video (using the 4:4:4 sampling format). This stream of data must be compressed if digital TV is to fit in the bandwidth of available TV channels and if movies are to fit on DVDs. Video compression is practical because the data in pictures is often redundant in space and time. For example, the sky can be blue across the top of a picture and that blue sky can persist for frame after frame. Also, because of the way the eye works, it is possible to delete or approximate some data from video pictures with little or no noticeable degradation in image quality.
A common (and old) trick to reduce the amount of data is to separate each complete "frame" of video into two "fields" upon broadcast/encoding: the "top field", which is the odd numbered horizontal lines, and the "bottom field", which is the even numbered lines. Upon reception/decoding, the two fields are displayed alternately with the lines of one field interleaving between the lines of the previous field; this format is called interlaced video. The typical field rate is 50 (Europe/PAL) or 59.94 (US/NTSC) fields per second, corresponding to 25 (Europe/PAL) or 29.97 (North America/NTSC) whole frames per second. If the video is not interlaced, then it is called progressive scan video and each picture is a complete frame. MPEG-2 supports both options.
Digital television requires that these pictures be digitized so that they can be processed by computer hardware. Each picture element (a pixel) is then represented by one luma number and two chroma numbers. These describe the brightness and the color of the pixel (see YCbCr). Thus, each digitized picture is initially represented by three rectangular arrays of numbers.
Another common practice to reduce the amount of data to be processed is to subsample the two chroma planes (after low-pass filtering to avoid aliasing). This works because the human visual system better resolves details of brightness than details in the hue and saturation of colors. The term 4:2:2 is used for video with the chroma subsampled by a ratio of 2:1 horizontally, and 4:2:0 is used for video with the chroma subsampled by 2:1 both vertically and horizontally. Video that has luma and chroma at the same resolution is called 4:4:4. The MPEG-2 Video document considers all three sampling types, although 4:2:0 is by far the most common for consumer video, and there are no defined "profiles" of MPEG-2 for 4:4:4 video (see below for further discussion of profiles).
While the discussion below in this section generally describes MPEG-2 video compression, there are many details that are not discussed, including details involving fields, chrominance formats, responses to scene changes, special codes that label the parts of the bitstream, and other pieces of information. Aside from features for handling fields for interlaced coding, MPEG-2 Video is very similar to MPEG-1 Video (and even quite similar to the earlier H.261 standard), so the entire description below applies equally well to MPEG-1.
MPEG-2 includes three basic types of coded frames: intra-coded frames (I-frames), predictive-coded frames (P-frames), and bidirectionally-predictive-coded frames (B-frames).
An I-frame is a separately-compressed version of a single uncompressed (raw) frame. The coding of an I-frame takes advantage of spatial redundancy and of the inability of the eye to detect certain changes in the image. Unlike P-frames and B-frames, I-frames do not depend on data in the preceding or the following frames, and so their coding is very similar to how a still photograph would be coded (roughly similar to JPEG picture coding). Briefly, the raw frame is divided into 8 pixel by 8 pixel blocks. The data in each block is transformed by the discrete cosine transform (DCT). The result is an 8×8 matrix of coefficients that have real number values. The transform converts spatial variations into frequency variations, but it does not change the information in the block; if the transform is computed with perfect precision, the original block can be recreated exactly by applying the inverse cosine transform (also with perfect precision). The conversion from 8-bit integers to real-valued transform coefficients actually expands the amount of data used at this stage of the processing, but the advantage of the transformation is that the image data can then be approximated by quantizing the coefficients. Many of the transform coefficients, usually the higher frequency components, will be zero after the quantization, which is basically a rounding operation. The penalty of this step is the loss of some subtle distinctions in brightness and color. The quantization may either be coarse or fine, as selected by the encoder. If the quantization is not too coarse and one applies the inverse transform to the matrix after it is quantized, one gets an image that looks very similar to the original image but is not quite the same. Next, the quantized coefficient matrix is itself compressed. Typically, one corner of the 8×8 array of coefficients contains only zeros after quantization is applied. By starting in the opposite corner of the matrix, then zigzagging through the matrix to combine the coefficients into a string, then substituting run-length codes for consecutive zeros in that string, and then applying Huffman coding to that result, one reduces the matrix to a smaller quantity of data. It is this entropy coded data that is broadcast or that is put on DVDs. In the receiver or the player, the whole process is reversed, enabling the receiver to reconstruct, to a close approximation, the original frame.
The processing of B-frames is similar to that of P-frames except that B-frames use the picture in a subsequent reference frame as well as the picture in a preceding reference frame. As a result, B-frames usually provide more compression than P-frames. B-frames are never reference frames in MPEG-2 Video.
Typically, every 15th frame or so is made into an I-frame. P-frames and B-frames might follow an I-frame like this, IBBPBBPBBPBB(I), to form a Group Of Pictures (GOP); however, the standard is flexible about this. The encoder selects which pictures are coded as I-, P-, and B-frames.
P-frames provide more compression than I-frames because they take advantage of the data in a previous I-frame or P-frame – a reference frame. To generate a P-frame, the previous reference frame is reconstructed, just as it would be in a TV receiver or DVD player. The frame being compressed is divided into 16 pixel by 16 pixel macroblocks. Then, for each of those macroblocks, the reconstructed reference frame is searched to find a 16 by 16 area that closely matches the content of the macroblock being compressed. The offset is encoded as a "motion vector". Frequently, the offset is zero, but if something in the picture is moving, the offset might be something like 23 pixels to the right and 4-and-a-half pixels up. In MPEG-1 and MPEG-2, motion vector values can either represent integer offsets or half-integer offsets. The match between the two regions will often not be perfect. To correct for this, the encoder takes the difference of all corresponding pixels of the two regions, and on that macroblock difference then computes the DCT and strings of coefficient values for the four 8×8 areas in the 16×16 macroblock as described above. This "residual" is appended to the motion vector and the result sent to the receiver or stored on the DVD for each macroblock being compressed. Sometimes no suitable match is found. Then, the macroblock is treated like an I-frame macroblock.
MPEG-2 video supports a wide range of applications from mobile to high quality HD editing. For many applications, it is unrealistic and too expensive to support the entire standard. To allow such applications to support only subsets of it, the standard defines profiles and levels.
A profile defines sets of features such as B-pictures, 3D video, chroma format, etc. The level limits the memory and processing power needed, defining maximum bit rates, frame sizes, and frame rates.
A MPEG application then specifies the capabilities in terms of profile and level. For example, a DVD player may say it supports up to main profile and main level (often written as MP@ML). It means the player can play back any MPEG stream encoded as MP@ML or less.
The tables below summarizes the limitations of each profile and level, though there are constraints not listed here. Annex E Note that not all profile and level combinations are permissible, and scalable modes modify the level restrictions.:
|Abbr.||Name||Picture Coding Types||Chroma Format||Scalable modes||Intra DC Precision|
|SP||Simple profile||I, P||4:2:0||none||8, 9, 10|
|MP||Main profile||I, P, B||4:2:0||none||8, 9, 10|
|SNR||SNR Scalable profile||I, P, B||4:2:0||SNR||8, 9, 10|
|Spatial||Spatially Scalable profile||I, P, B||4:2:0||SNR , spatial||8, 9, 10|
|HP||High profile||I, P, B||4:2:2 or 4:2:0||SNR , spatial||8, 9, 10, 11|
|422||4:2:2 profile||I, P, B||4:2:2 or 4:2:0||none||8, 9, 10, 11|
|MVP||Multi-view profile||I, P, B||4:2:0||Temporal||8, 9, 10|
|Max resolution||Max luminance samples per second|
(approximately height x width x framerate)
|Max bit rate|
|LL||Low Level||23.976, 24, 25, 29.97, 30||352||288||3,041,280||4|
|ML||Main Level||23.976, 24, 25, 29.97, 30||720||576||10,368,000, except in High profile: constraint is 14,475,600 for 4:2:0 and 11,059,200 for 4:2:2||15|
|H-14||High 1440||23.976, 24, 25, 29.97, 30, 50, 59.94, 60||1440||1152||47,001,600, except in High profile: constraint is 62,668,800 for 4:2:0||60|
|HL||High Level||23.976, 24, 25, 29.97, 30, 50, 59.94, 60||1920||1152||62,668,800, except in High profile: constraint is 83,558,400 for 4:2:0||80|
A few common MPEG-2 Profile/Level combinations are presented below, with particular maximum limits noted:
|Profile @ Level||Resolution (px)||Framerate max. (Hz)||Sampling||Bitrate (Mbit/s)||Example Application|
|SP@LL||176 × 144||15||4:2:0||0.096||Wireless handsets|
|SP@ML||352 × 288||15||4:2:0||0.384||PDAs|
|320 × 240||24|
|MP@LL||352 × 288||30||4:2:0||4||Set-top boxes (STB)|
|MP@ML||720 × 480||30||4:2:0||15||DVD (9.8Mbps), SD DVB (15 Mbps)|
|720 × 576||25|
|MP@H-14||1440 × 1080||30||4:2:0||60||HDV (25 Mbps)|
|1280 × 720||30|
|MP@HL||1920 × 1080||30||4:2:0||80||ATSC (18.3 Mbps), SD DVB (31 Mbps), HD DVB (50.3 Mbps)|
|1280 × 720||60|
|422P@ML||720 × 480||30||4:2:2||50||Sony IMX (I only), Broadcast Contribution (I&P only)|
|720 × 576||25|
|422P@H-14||1440 × 1080||30||4:2:2||80|
|422P@HL||1920 × 1080||30||4:2:2||300||Sony MPEG HD422 (50 Mbps), Canon XF Codec (50 Mbps),|
Convergent Design Nanoflash recorder (up to 160 Mbps)
|1280 × 720||60|
Some applications are listed below.
The following organizations have held patents for MPEG-2 video technology, as listed at MPEG LA. All of these patents are now expired.
|GE Technology Development, Inc.||75|
|CIF Licensing, LLC||44|
|Alcatel Lucent (including Multimedia Patent Trust)||33|
|Cisco Technology, Inc.||13|
|Robert Bosch GmbH||5|
|Nippon Telegraph and Telephone (NTT)||2|
|ARRIS Technology, Inc.||2|
|Hewlett Packard Enterprise Company||1|
JPEG is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality. JPEG typically achieves 10:1 compression with little perceptible loss in image quality. Since its introduction in 1992, JPEG has been the most widely used image compression standard in the world, and the most widely used digital image format, with several billion JPEG images produced every day as of 2015.
MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s without excessive quality loss, making video CDs, digital cable/satellite TV and digital audio broadcasting (DAB) possible.
MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth. While MPEG-2 is not as efficient as newer standards such as H.264/AVC and H.265/HEVC, backwards compatibility with existing hardware and software means it is still widely used, for example in over-the-air digital television broadcasting and in the DVD-Video standard.
Motion compensation is an algorithmic technique used to predict a frame in a video, given the previous and/or future frames by accounting for motion of the camera and/or objects in the video. It is employed in the encoding of video data for video compression, for example in the generation of MPEG-2 files. Motion compensation describes a picture in terms of the transformation of a reference picture to the current picture. The reference picture may be previous in time or even from the future. When images can be accurately synthesized from previously transmitted/stored images, the compression efficiency can be improved.
A video codec is software or hardware that compresses and decompresses digital video. In the context of video compression, codec is a portmanteau of encoder and decoder, while a device that only compresses is typically called an encoder, and one that only decompresses is a decoder.
JPEG 2000 (JP2) is an image compression standard and coding system. It was developed from 1997 to 2000 by a Joint Photographic Experts Group committee chaired by Touradj Ebrahimi, with the intention of superseding their original discrete cosine transform (DCT) based JPEG standard with a newly designed, wavelet-based method. The standardized filename extension is .jp2 for ISO/IEC 15444-1 conforming files and .jpx for the extended part-2 specifications, published as ISO/IEC 15444-2. The registered MIME types are defined in. For ISO/IEC 15444-1 it is image/jp2.
MPEG-1 Audio Layer II or MPEG-2 Audio Layer II is a lossy audio compression format defined by ISO/IEC 11172-3 alongside MPEG-1 Audio Layer I and MPEG-1 Audio Layer III (MP3). While MP3 is much more popular for PC and Internet applications, MP2 remains a dominant standard for audio broadcasting.
Chroma subsampling is the practice of encoding images by implementing less resolution for chroma information than for luma information, taking advantage of the human visual system's lower acuity for color differences than for luminance.
A compression artifact is a noticeable distortion of media caused by the application of lossy compression. Lossy data compression involves discarding some of the media's data so that it becomes small enough to be stored within the desired disk space or transmitted (streamed) within the available bandwidth. If the compressor cannot store enough data in the compressed version, the result is a loss of quality, or introduction of artifacts. The compression algorithm may not be intelligent enough to discriminate between distortions of little subjective importance and those objectionable to the user.
ITU-R Recommendation BT.601, more commonly known by the abbreviations Rec. 601 or BT.601 is a standard originally issued in 1982 by the CCIR for encoding interlaced analog video signals in digital video form. It includes methods of encoding 525-line 60 Hz and 625-line 50 Hz signals, both with an active region covering 720 luminance samples and 360 chrominance samples per line. The color encoding system is known as YCbCr 4:2:2.
Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, Advanced Video Coding, is a video compression standard based on block-oriented, motion-compensated integer-DCT coding. It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. It supports resolutions up to and including 8K UHD.
H.261 is an ITU-T video compression standard, first ratified in November 1988. It is the first member of the H.26x family of video coding standards in the domain of the ITU-T Study Group 16 Video Coding Experts Group, and was developed with a number of companies, including Hitachi, PictureTel, NTT, BT and Toshiba. It was the first video coding standard that was useful in practical terms.
CIF, also known as FCIF, is a standardized format for the picture resolution, frame rate, color space, and color subsampling of digital video sequences used in video teleconferencing systems. It was first defined in the H.261 standard in 1988.
MPEG-4 Part 2, MPEG-4 Visual is a video compression format developed by the Moving Picture Experts Group (MPEG). It belongs to the MPEG-4 ISO/IEC standards. It is a discrete cosine transform (DCT) compression standard, similar to previous standards such as MPEG-1 Part 2 and H.262/MPEG-2 Part 2.
JPEG XR is a still-image compression standard and file format for continuous tone photographic images, based on technology originally developed and patented by Microsoft under the name HD Photo. It supports both lossy and lossless compression, and is the preferred image format for Ecma-388 Open XML Paper Specification documents.
Macroblock is a processing unit in image and video compression formats based on linear block transforms, typically the discrete cosine transform (DCT). A macroblock typically consists of 16×16 samples, and is further subdivided into transform blocks, and may be further subdivided into prediction blocks. Formats which are based on macroblocks include JPEG, where they are called MCU blocks, H.261, MPEG-1 Part 2, H.262/MPEG-2 Part 2, H.263, MPEG-4 Part 2, and H.264/MPEG-4 AVC. In H.265/HEVC, the macroblock as a basic processing unit has been replaced by the coding tree unit.
Program stream is a container format for multiplexing digital audio, video and more. The PS format is specified in MPEG-1 Part 1 and MPEG-2 Part 1, Systems. The MPEG-2 Program Stream is analogous and similar to ISO/IEC 11172 Systems layer and it is forward compatible.
High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard designed as part of the MPEG-H project as a successor to the widely used Advanced Video Coding. In comparison to AVC, HEVC offers from 25% to 50% better data compression at the same level of video quality, or substantially improved video quality at the same bit rate. It supports resolutions up to 8192×4320, including 8K UHD, and unlike the primarily 8-bit AVC, HEVC's higher fidelity Main10 profile has been incorporated into nearly all supporting hardware.
2D Plus Delta is a method of encoding 3D image listed as a part of MPEG2 and MPEG4 standards, specifically on the H.264 implementation of Multiview Video Coding extension. This technology originally started as a proprietary method for Stereoscopic Video Coding and content deployment that utilizes the Left or Right channel as the 2D version and the optimized difference or disparity (Delta) between that image channel view and a second eye image view is injected into the videostream as user_data, secondary stream, independent stream, enhancement layer or NALu for deployment. The Delta data can be either a spatial stereo disparity, temporal predictive, bidirectional or optimized motion compensation.
A video coding format is a content representation format for storage or transmission of digital video content. It typically uses a standardized video compression algorithm, most commonly based on discrete cosine transform (DCT) coding and motion compensation. Examples of video coding formats include H.262, MPEG-4 Part 2, H.264, HEVC (H.265), Theora, RealVideo RV40, VP9, and AV1. A specific software or hardware implementation capable of compression or decompression to/from a specific video coding format is called a video codec; an example of a video codec is Xvid, which is one of several different codecs which implements encoding and decoding videos in the MPEG-4 Part 2 video coding format in software.