This article's lead section may be too long.(May 2023) |
Advanced video coding for generic audiovisual services | |
Status | In force |
---|---|
Year started | 2003 |
First published | 17 August 2004 |
Latest version | 14.0 22 August 2021 |
Organization | ITU-T, ISO, IEC |
Committee | SG16 (VCEG), MPEG |
Base standards | H.261, H.262 (aka MPEG-2 Video), H.263, ISO/IEC 14496-2 (aka MPEG-4 Part 2) |
Related standards | H.265 (aka HEVC), H.266 (aka VVC) |
Domain | Video compression |
License | MPEG LA [1] |
Website | www |
Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. [2] It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019 [update] . [3] [4] It supports a maximum resolution of 8K UHD. [5] [6]
The intent of the H.264/AVC project was to create a standard capable of providing good video quality at substantially lower bit rates than previous standards (i.e., half or less the bit rate of MPEG-2, H.263, or MPEG-4 Part 2), without increasing the complexity of design so much that it would be impractical or excessively expensive to implement. This was achieved with features such as a reduced-complexity integer discrete cosine transform (integer DCT), [7] variable block-size segmentation, and multi-picture inter-picture prediction. An additional goal was to provide enough flexibility to allow the standard to be applied to a wide variety of applications on a wide variety of networks and systems, including low and high bit rates, low and high resolution video, broadcast, DVD storage, RTP/IP packet networks, and ITU-T multimedia telephony systems. The H.264 standard can be viewed as a "family of standards" composed of a number of different profiles, although its "High profile" is by far the most commonly used format. A specific decoder decodes at least one, but not necessarily all profiles. The standard describes the format of the encoded data and how the data is decoded, but it does not specify algorithms for encoding video – that is left open as a matter for encoder designers to select for themselves, and a wide variety of encoding schemes have been developed. H.264 is typically used for lossy compression, although it is also possible to create truly lossless-coded regions within lossy-coded pictures or to support rare use cases for which the entire encoding is lossless.
H.264 was standardized by the ITU-T Video Coding Experts Group (VCEG) of Study Group 16 together with the ISO/IEC JTC 1 Moving Picture Experts Group (MPEG). The project partnership effort is known as the Joint Video Team (JVT). The ITU-T H.264 standard and the ISO/IEC MPEG-4 AVC standard (formally, ISO/IEC 14496-10 – MPEG-4 Part 10, Advanced Video Coding) are jointly maintained so that they have identical technical content. The final drafting work on the first version of the standard was completed in May 2003, and various extensions of its capabilities have been added in subsequent editions. High Efficiency Video Coding (HEVC), a.k.a. H.265 and MPEG-H Part 2 is a successor to H.264/MPEG-4 AVC developed by the same organizations, while earlier standards are still in common use.
H.264 is perhaps best known as being the most commonly used video encoding format on Blu-ray Discs. It is also widely used by streaming Internet sources, such as videos from Netflix, Hulu, Amazon Prime Video, Vimeo, YouTube, and the iTunes Store, Web software such as the Adobe Flash Player and Microsoft Silverlight, and also various HDTV broadcasts over terrestrial (ATSC, ISDB-T, DVB-T or DVB-T2), cable (DVB-C), and satellite (DVB-S and DVB-S2) systems.
H.264 is restricted by patents owned by various parties. A license covering most (but not all[ citation needed ]) patents essential to H.264 is administered by a patent pool formerly administered by MPEG LA. Via Licensing Corp acquired MPEG LA in April 2023 and formed a new patent pool administration company called Via Licensing Alliance. [8] The commercial use of patented H.264 technologies requires the payment of royalties to Via and other patent owners. MPEG LA has allowed the free use of H.264 technologies for streaming Internet video that is free to end users, and Cisco paid royalties to MPEG LA on behalf of the users of binaries for its open source H.264 encoder openH264.
The H.264 name follows the ITU-T naming convention, where Recommendations are given a letter corresponding to their series and a recommendation number within the series. H.264 is part of "H-Series Recommendations: Audiovisual and multimedia systems". H.264 is further categorized into "H.200-H.499: Infrastructure of audiovisual services" and "H.260-H.279: Coding of moving video". [9] The MPEG-4 AVC name relates to the naming convention in ISO/IEC MPEG, where the standard is part 10 of ISO/IEC 14496, which is the suite of standards known as MPEG-4. The standard was developed jointly in a partnership of VCEG and MPEG, after earlier development work in the ITU-T as a VCEG project called H.26L. It is thus common to refer to the standard with names such as H.264/AVC, AVC/H.264, H.264/MPEG-4 AVC, or MPEG-4/H.264 AVC, to emphasize the common heritage. Occasionally, it is also referred to as "the JVT codec", in reference to the Joint Video Team (JVT) organization that developed it. (Such partnership and multiple naming is not uncommon. For example, the video compression standard known as MPEG-2 also arose from the partnership between MPEG and the ITU-T, where MPEG-2 video is known to the ITU-T community as H.262. [10] ) Some software programs (such as VLC media player) internally identify this standard as AVC1.
In early 1998, the Video Coding Experts Group (VCEG – ITU-T SG16 Q.6) issued a call for proposals on a project called H.26L, with the target to double the coding efficiency (which means halving the bit rate necessary for a given level of fidelity) in comparison to any other existing video coding standards for a broad variety of applications. VCEG was chaired by Gary Sullivan (Microsoft, formerly PictureTel, U.S.). The first draft design for that new standard was adopted in August 1999. In 2000, Thomas Wiegand (Heinrich Hertz Institute, Germany) became VCEG co-chair.
In December 2001, VCEG and the Moving Picture Experts Group (MPEG – ISO/IEC JTC 1/SC 29/WG 11) formed a Joint Video Team (JVT), with the charter to finalize the video coding standard. [11] Formal approval of the specification came in March 2003. The JVT was (is) chaired by Gary Sullivan, Thomas Wiegand, and Ajay Luthra (Motorola, U.S.: later Arris, U.S.). In July 2004, the Fidelity Range Extensions (FRExt) project was finalized. From January 2005 to November 2007, the JVT was working on an extension of H.264/AVC towards scalability by an Annex (G) called Scalable Video Coding (SVC). The JVT management team was extended by Jens-Rainer Ohm (RWTH Aachen University, Germany). From July 2006 to November 2009, the JVT worked on Multiview Video Coding (MVC), an extension of H.264/AVC towards 3D television and limited-range free-viewpoint television. That work included the development of two new profiles of the standard: the Multiview High Profile and the Stereo High Profile.
Throughout the development of the standard, additional messages for containing supplemental enhancement information (SEI) have been developed. SEI messages can contain various types of data that indicate the timing of the video pictures or describe various properties of the coded video or how it can be used or enhanced. SEI messages are also defined that can contain arbitrary user-defined data. SEI messages do not affect the core decoding process, but can indicate how the video is recommended to be post-processed or displayed. Some other high-level properties of the video content are conveyed in video usability information (VUI), such as the indication of the color space for interpretation of the video content. As new color spaces have been developed, such as for high dynamic range and wide color gamut video, additional VUI identifiers have been added to indicate them.
The standardization of the first version of H.264/AVC was completed in May 2003. In the first project to extend the original standard, the JVT then developed what was called the Fidelity Range Extensions (FRExt). These extensions enabled higher quality video coding by supporting increased sample bit depth precision and higher-resolution color information, including the sampling structures known as Y′CBCR 4:2:2 (a.k.a. YUV 4:2:2) and 4:4:4. Several other features were also included in the FRExt project, such as adding an 8×8 integer discrete cosine transform (integer DCT) with adaptive switching between the 4×4 and 8×8 transforms, encoder-specified perceptual-based quantization weighting matrices, efficient inter-picture lossless coding, and support of additional color spaces. The design work on the FRExt project was completed in July 2004, and the drafting work on them was completed in September 2004.
Five other new profiles (see version 7 below) intended primarily for professional applications were then developed, adding extended-gamut color space support, defining additional aspect ratio indicators, defining two additional types of "supplemental enhancement information" (post-filter hint and tone mapping), and deprecating one of the prior FRExt profiles (the High 4:4:4 profile) that industry feedback[ by whom? ] indicated should have been designed differently.
The next major feature added to the standard was Scalable Video Coding (SVC). Specified in Annex G of H.264/AVC, SVC allows the construction of bitstreams that contain layers of sub-bitstreams that also conform to the standard, including one such bitstream known as the "base layer" that can be decoded by a H.264/AVC codec that does not support SVC. For temporal bitstream scalability (i.e., the presence of a sub-bitstream with a smaller temporal sampling rate than the main bitstream), complete access units are removed from the bitstream when deriving the sub-bitstream. In this case, high-level syntax and inter-prediction reference pictures in the bitstream are constructed accordingly. On the other hand, for spatial and quality bitstream scalability (i.e. the presence of a sub-bitstream with lower spatial resolution/quality than the main bitstream), the NAL (Network Abstraction Layer) is removed from the bitstream when deriving the sub-bitstream. In this case, inter-layer prediction (i.e., the prediction of the higher spatial resolution/quality signal from the data of the lower spatial resolution/quality signal) is typically used for efficient coding. The Scalable Video Coding extensions were completed in November 2007.
The next major feature added to the standard was Multiview Video Coding (MVC). Specified in Annex H of H.264/AVC, MVC enables the construction of bitstreams that represent more than one view of a video scene. An important example of this functionality is stereoscopic 3D video coding. Two profiles were developed in the MVC work: Multiview High profile supports an arbitrary number of views, and Stereo High profile is designed specifically for two-view stereoscopic video. The Multiview Video Coding extensions were completed in November 2009.
Additional extensions were later developed that included 3D video coding with joint coding of depth maps and texture (termed 3D-AVC), multi-resolution frame-compatible (MFC) stereoscopic and 3D-MFC coding, various additional combinations of features, and higher frame sizes and frame rates.
Versions of the H.264/AVC standard include the following completed revisions, corrigenda, and amendments (dates are final approval dates in ITU-T, while final "International Standard" approval dates in ISO/IEC are somewhat different and slightly later in most cases). Each version represents changes relative to the next lower version that is integrated into the text.
The following organizations hold one or more patents in MPEG LA's H.264/AVC patent pool.
Organization [32] | Active patents | Expired patents | Total patents [31] |
---|---|---|---|
Panasonic Corporation | 1,054 | 416 | 1,470 |
Godo Kaisha IP Bridge | 1,033 | 267 | 1,300 |
LG Electronics | 871 | 130 | 1001 |
Dolby Laboratories | 1014 | 414 | 1428 |
Toshiba | 59 | 336 | 395 |
Microsoft | 95 | 145 | 240 |
Nippon Telegraph and Telephone (including NTT Docomo) | 234 | 4 | 238 |
Sony | 77 | 77 | 154 |
Fraunhofer Society | 208 | 16 | 224 |
5 | 134 | 139 | |
GE Video Compression | 136 | 0 | 136 |
Fujitsu | 92 | 14 | 106 |
Mitsubishi Electric | 44 | 56 | 100 |
Tagivan II LLC | 82 | 0 | 82 |
Samsung Electronics | 17 | 46 | 63 |
Maxell | 54 | 2 | 56 |
Philips | 6 | 41 | 47 |
Vidyo | 41 | 2 | 43 |
Ericsson | 1 | 33 | 34 |
Electronics and Telecommunications Research Institute (ETRI) of Korea | 10 | 25 | 35 |
The H.264 video format has a very broad application range that covers all forms of digital compressed video from low bit-rate Internet streaming applications to HDTV broadcast and Digital Cinema applications with nearly lossless coding. With the use of H.264, bit rate savings of 50% or more compared to MPEG-2 Part 2 are reported. For example, H.264 has been reported to give the same Digital Satellite TV quality as current MPEG-2 implementations with less than half the bitrate, with current MPEG-2 implementations working at around 3.5 Mbit/s and H.264 at only 1.5 Mbit/s. [33] Sony claims that 9 Mbit/s AVC recording mode is equivalent to the image quality of the HDV format, which uses approximately 18–25 Mbit/s. [34]
To ensure compatibility and problem-free adoption of H.264/AVC, many standards bodies have amended or added to their video-related standards so that users of these standards can employ H.264/AVC. Both the Blu-ray Disc format and the now-discontinued HD DVD format include the H.264/AVC High Profile as one of three mandatory video compression formats. The Digital Video Broadcast project (DVB) approved the use of H.264/AVC for broadcast television in late 2004.
The Advanced Television Systems Committee (ATSC) standards body in the United States approved the use of H.264/AVC for broadcast television in July 2008, although the standard is not yet used for fixed ATSC broadcasts within the United States. [35] [36] It has also been approved for use with the more recent ATSC-M/H (Mobile/Handheld) standard, using the AVC and SVC portions of H.264. [37]
The CCTV (Closed Circuit TV) and Video Surveillance markets have included the technology in many products.
Many common DSLRs use H.264 video wrapped in QuickTime MOV containers as the native recording format.
AVCHD is a high-definition recording format designed by Sony and Panasonic that uses H.264 (conforming to H.264 while adding additional application-specific features and constraints).
AVC-Intra is an intraframe-only compression format, developed by Panasonic.
XAVC is a recording format designed by Sony that uses level 5.2 of H.264/MPEG-4 AVC, which is the highest level supported by that video standard. [38] [39] XAVC can support 4K resolution (4096 × 2160 and 3840 × 2160) at up to 60 frames per second (fps). [38] [39] Sony has announced that cameras that support XAVC include two CineAlta cameras—the Sony PMW-F55 and Sony PMW-F5. [40] The Sony PMW-F55 can record XAVC with 4K resolution at 30 fps at 300 Mbit/s and 2K resolution at 30 fps at 100 Mbit/s. [41] XAVC can record 4K resolution at 60 fps with 4:2:2 chroma sampling at 600 Mbit/s. [42] [43]
H.264/AVC/MPEG-4 Part 10 contains a number of new features that allow it to compress video much more efficiently than older standards and to provide more flexibility for application to a wide variety of network environments. In particular, some such key features include:
These techniques, along with several others, help H.264 to perform significantly better than any prior standard under a wide variety of circumstances in a wide variety of application environments. H.264 can often perform radically better than MPEG-2 video—typically obtaining the same quality at half of the bit rate or less, especially on high bit rate and high resolution video content. [49]
Like other ISO/IEC MPEG video standards, H.264/AVC has a reference software implementation that can be freely downloaded. [50] Its main purpose is to give examples of H.264/AVC features, rather than being a useful application per se. Some reference hardware design work has also been conducted in the Moving Picture Experts Group. The above-mentioned aspects include features in all profiles of H.264. A profile for a codec is a set of features of that codec identified to meet a certain set of specifications of intended applications. This means that many of the features listed are not supported in some profiles. Various profiles of H.264/AVC are discussed in next section.
The standard defines several sets of capabilities, which are referred to as profiles, targeting specific classes of applications. These are declared using a profile code (profile_idc) and sometimes a set of additional constraints applied in the encoder. The profile code and indicated constraints allow a decoder to recognize the requirements for decoding that specific bitstream. (And in many system environments, only one or two profiles are allowed to be used, so decoders in those environments do not need to be concerned with recognizing the less commonly used profiles.) By far the most commonly used profile is the High Profile.
Profiles for non-scalable 2D video applications include the following:
For camcorders, editing, and professional applications, the standard contains four additional Intra-frame-only profiles, which are defined as simple subsets of other corresponding profiles. These are mostly for professional (e.g., camera and editing system) applications:
As a result of the Scalable Video Coding (SVC) extension, the standard contains five additional scalable profiles, which are defined as a combination of a H.264/AVC profile for the base layer (identified by the second word in the scalable profile name) and tools that achieve the scalable extension:
As a result of the Multiview Video Coding (MVC) extension, the standard contains two multiview profiles:
The Multi-resolution Frame-Compatible (MFC) extension added two more profiles:
The 3D-AVC extension added two more profiles:
Feature | CBP | BP | XP | MP | ProHiP | HiP | Hi10P | Hi422P | Hi444PP |
---|---|---|---|---|---|---|---|---|---|
Bit depth (per sample) | 8 | 8 | 8 | 8 | 8 | 8 | 8 to 10 | 8 to 10 | 8 to 14 |
Chroma formats | 4:2:0 | 4:2:0 | 4:2:0 | 4:2:0 | 4:2:0 | 4:2:0 | 4:2:0 | 4:2:0/ 4:2:2 | 4:2:0/ 4:2:2/ 4:4:4 |
Flexible macroblock ordering (FMO) | No | Yes | Yes | No | No | No | No | No | No |
Arbitrary slice ordering (ASO) | No | Yes | Yes | No | No | No | No | No | No |
Redundant slices (RS) | No | Yes | Yes | No | No | No | No | No | No |
Data Partitioning | No | No | Yes | No | No | No | No | No | No |
SI and SP slices | No | No | Yes | No | No | No | No | No | No |
Interlaced coding (PicAFF, MBAFF) | No | No | Yes | Yes | No | Yes | Yes | Yes | Yes |
B slices | No | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
CABAC entropy coding | No | No | No | Yes | Yes | Yes | Yes | Yes | Yes |
4:0:0 (Monochrome) | No | No | No | No | Yes | Yes | Yes | Yes | Yes |
8×8 vs. 4×4 transform adaptivity | No | No | No | No | Yes | Yes | Yes | Yes | Yes |
Quantization scaling matrices | No | No | No | No | Yes | Yes | Yes | Yes | Yes |
Separate CB and CR QP control | No | No | No | No | Yes | Yes | Yes | Yes | Yes |
Separate color plane coding | No | No | No | No | No | No | No | No | Yes |
Predictive lossless coding | No | No | No | No | No | No | No | No | Yes |
As the term is used in the standard, a "level" is a specified set of constraints that indicate a degree of required decoder performance for a profile. For example, a level of support within a profile specifies the maximum picture resolution, frame rate, and bit rate that a decoder may use. A decoder that conforms to a given level must be able to decode all bitstreams encoded for that level and all lower levels.
Level | Maximum decoding speed (macroblocks/s) | Maximum frame size (macroblocks) | Maximum video bit rate for video coding layer (VCL) (Constrained Baseline, Baseline, Extended and Main Profiles) (kbits/s) | Examples for high resolution @ highest frame rate (maximum stored frames) Toggle additional details |
---|---|---|---|---|
1 | 1,485 | 99 | 64 | 128×96@30.9 (8) 176×144@15.0 (4) |
1b | 1,485 | 99 | 128 | 128×96@30.9 (8) 176×144@15.0 (4) |
1.1 | 3,000 | 396 | 192 | 176×144@30.3 (9) 352×288@7.5 (2)320×240@10.0 (3) |
1.2 | 6,000 | 396 | 384 | 320×240@20.0 (7) 352×288@15.2 (6) |
1.3 | 11,880 | 396 | 768 | 320×240@36.0 (7) 352×288@30.0 (6) |
2 | 11,880 | 396 | 2,000 | 320×240@36.0 (7) 352×288@30.0 (6) |
2.1 | 19,800 | 792 | 4,000 | 352×480@30.0 (7) 352×576@25.0 (6) |
2.2 | 20,250 | 1,620 | 4,000 | 352×480@30.7 (12) 720×576@12.5 (5)352×576@25.6 (10) 720×480@15.0 (6) |
3 | 40,500 | 1,620 | 10,000 | 352×480@61.4 (12) 720×576@25.0 (5)352×576@51.1 (10) 720×480@30.0 (6) |
3.1 | 108,000 | 3,600 | 14,000 | 720×480@80.0 (13) 1,280×720@30.0 (5)720×576@66.7 (11) |
3.2 | 216,000 | 5,120 | 20,000 | 1,280×720@60.0 (5) 1,280×1,024@42.2 (4) |
4 | 245,760 | 8,192 | 20,000 | 1,280×720@68.3 (9) 2,048×1,024@30.0 (4)1,920×1,080@30.1 (4) |
4.1 | 245,760 | 8,192 | 50,000 | 1,280×720@68.3 (9) 2,048×1,024@30.0 (4)1,920×1,080@30.1 (4) |
4.2 | 522,240 | 8,704 | 50,000 | 1,280×720@145.1 (9) 2,048×1,080@60.0 (4)1,920×1,080@64.0 (4) |
5 | 589,824 | 22,080 | 135,000 | 1,920×1,080@72.3 (13) 3,672×1,536@26.7 (5)2,048×1,024@72.0 (13) 2,048×1,080@67.8 (12) 2,560×1,920@30.7 (5) |
5.1 | 983,040 | 36,864 | 240,000 | 1,920×1,080@120.5 (16) 4,096×2,304@26.7 (5)2,560×1,920@51.2 (9) 3,840×2,160@31.7 (5) 4,096×2,048@30.0 (5) 4,096×2,160@28.5 (5) |
5.2 | 2,073,600 | 36,864 | 240,000 | 1,920×1,080@172.0 (16) 4,096×2,304@56.3 (5)2,560×1,920@108.0 (9) 3,840×2,160@66.8 (5) 4,096×2,048@63.3 (5) 4,096×2,160@60.0 (5) |
6 | 4,177,920 | 139,264 | 240,000 | 3,840×2,160@128.9 (16) 8,192×4,320@30.2 (5)7,680×4,320@32.2 (5) |
6.1 | 8,355,840 | 139,264 | 480,000 | 3,840×2,160@257.9 (16) 8,192×4,320@60.4 (5)7,680×4,320@64.5 (5) |
6.2 | 16,711,680 | 139,264 | 800,000 | 3,840×2,160@300.0 (16) 8,192×4,320@120.9 (5)7,680×4,320@128.9 (5) |
The maximum bit rate for the High Profile is 1.25 times that of the Constrained Baseline, Baseline, Extended and Main Profiles; 3 times for Hi10P, and 4 times for Hi422P/Hi444PP.
The number of luma samples is 16×16=256 times the number of macroblocks (and the number of luma samples per second is 256 times the number of macroblocks per second).
Previously encoded pictures are used by H.264/AVC encoders to provide predictions of the values of samples in other pictures. This allows the encoder to make efficient decisions on the best way to encode a given picture. At the decoder, such pictures are stored in a virtual decoded picture buffer (DPB). The maximum capacity of the DPB, in units of frames (or pairs of fields), as shown in parentheses in the right column of the table above, can be computed as follows:
Where MaxDpbMbs is a constant value provided in the table below as a function of level number, and PicWidthInMbs and FrameHeightInMbs are the picture width and frame height for the coded video data, expressed in units of macroblocks (rounded up to integer values and accounting for cropping and macroblock pairing when applicable). This formula is specified in sections A.3.1.h and A.3.2.f of the 2017 edition of the standard. [28]
Level | 1 | 1b | 1.1 | 1.2 | 1.3 | 2 | 2.1 | 2.2 | 3 | 3.1 | 3.2 | 4 | 4.1 | 4.2 | 5 | 5.1 | 5.2 | 6 | 6.1 | 6.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MaxDpbMbs | 396 | 396 | 900 | 2,376 | 2,376 | 2,376 | 4,752 | 8,100 | 8,100 | 18,000 | 20,480 | 32,768 | 32,768 | 34,816 | 110,400 | 184,320 | 184,320 | 696,320 | 696,320 | 696,320 |
For example, for an HDTV picture that is 1,920 samples wide (PicWidthInMbs = 120) and 1,080 samples high (FrameHeightInMbs = 68), a Level 4 decoder has a maximum DPB storage capacity of floor(32768/(120*68)) = 4 frames (or 8 fields). Thus, the value 4 is shown in parentheses in the table above in the right column of the row for Level 4 with the frame size 1920×1080.
The current picture being decoded is not included in the computation of DPB fullness (unless the encoder has indicated for it to be stored for use as a reference for decoding other pictures or for delayed output timing). Thus, a decoder needs to actually have sufficient memory to handle (at least) one frame more than the maximum capacity of the DPB as calculated above.
In 2009, the HTML5 working group was split between supporters of Ogg Theora, a free video format which is thought to be unencumbered by patents, and H.264, which contains patented technology. As late as July 2009, Google and Apple were said to support H.264, while Mozilla and Opera support Ogg Theora (now Google, Mozilla and Opera all support Theora and WebM with VP8). [52] Microsoft, with the release of Internet Explorer 9, has added support for HTML 5 video encoded using H.264. At the Gartner Symposium/ITXpo in November 2010, Microsoft CEO Steve Ballmer answered the question "HTML 5 or Silverlight?" by saying "If you want to do something that is universal, there is no question the world is going HTML5." [53] In January 2011, Google announced that they were pulling support for H.264 from their Chrome browser and supporting both Theora and WebM/VP8 to use only open formats. [54]
On March 18, 2012, Mozilla announced support for H.264 in Firefox on mobile devices, due to prevalence of H.264-encoded video and the increased power-efficiency of using dedicated H.264 decoder hardware common on such devices. [55] On February 20, 2013, Mozilla implemented support in Firefox for decoding H.264 on Windows 7 and above. This feature relies on Windows' built in decoding libraries. [56] Firefox 35.0, released on January 13, 2015, supports H.264 on OS X 10.6 and higher. [57]
On October 30, 2013, Rowan Trollope from Cisco Systems announced that Cisco would release both binaries and source code of an H.264 video codec called OpenH264 under the Simplified BSD license, and pay all royalties for its use to MPEG LA for any software projects that use Cisco's precompiled binaries, thus making Cisco's OpenH264 binaries free to use. However, any software projects that use Cisco's source code instead of its binaries would be legally responsible for paying all royalties to MPEG LA. Target CPU architectures include x86 and ARM, and target operating systems include Linux, Windows XP and later, Mac OS X, and Android; iOS was notably absent from this list, because it doesn't allow applications to fetch and install binary modules from the Internet. [58] [59] [60] Also on October 30, 2013, Brendan Eich from Mozilla wrote that it would use Cisco's binaries in future versions of Firefox to add support for H.264 to Firefox where platform codecs are not available. [61] Cisco published the source code to OpenH264 on December 9, 2013. [62]
Although iOS was not supported by the 2013 Cisco software release, Apple updated its Video Toolbox Framework with iOS 8 (released in September 2014) to provide direct access to hardware-based H.264/AVC video encoding and decoding. [59]
Feature | QuickTime | Nero | OpenH264 | x264 | Main- Concept | Elecard | TSE | Pro- Coder | Avivo | Elemental | IPP |
---|---|---|---|---|---|---|---|---|---|---|---|
B slices | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes |
Multiple reference frames | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes |
Interlaced coding (PicAFF, MBAFF) | No | MBAFF | MBAFF | MBAFF | Yes | Yes | No | Yes | MBAFF | Yes | No |
CABAC entropy coding | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes |
8×8 vs. 4×4 transform adaptivity | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes |
Quantization scaling matrices | No | No | Yes | Yes | Yes | No | No | No | No | No | No |
Separate CB and CR QP control | No | No | Yes | Yes | Yes | Yes | No | No | No | No | No |
Extended chroma formats | No | No | No | 4:0:0 [63] 4:2:0 4:2:2 [64] 4:4:4 [65] | 4:2:2 | 4:2:2 | 4:2:2 | No | No | 4:2:0 4:2:2 | No |
Largest sample depth (bit) | 8 | 8 | 8 | 10 [66] | 10 | 8 | 8 | 8 | 8 | 10 | 12 |
Predictive lossless coding | No | No | No | Yes [67] | No | No | No | No | No | No | No |
Because H.264 encoding and decoding requires significant computing power in specific types of arithmetic operations, software implementations that run on general-purpose CPUs are typically less power efficient. However, the latest[ when? ] quad-core general-purpose x86 CPUs have sufficient computation power to perform real-time SD and HD encoding. Compression efficiency depends on video algorithmic implementations, not on whether hardware or software implementation is used. Therefore, the difference between hardware and software based implementation is more on power-efficiency, flexibility and cost. To improve the power efficiency and reduce hardware form-factor, special-purpose hardware may be employed, either for the complete encoding or decoding process, or for acceleration assistance within a CPU-controlled environment.
CPU based solutions are known to be much more flexible, particularly when encoding must be done concurrently in multiple formats, multiple bit rates and resolutions (multi-screen video), and possibly with additional features on container format support, advanced integrated advertising features, etc. CPU based software solution generally makes it much easier to load balance multiple concurrent encoding sessions within the same CPU.
The 2nd generation Intel "Sandy Bridge" Core i3/i5/i7 processors introduced at the January 2011 CES (Consumer Electronics Show) offer an on-chip hardware full HD H.264 encoder, known as Intel Quick Sync Video. [68] [69]
A hardware H.264 encoder can be an ASIC or an FPGA.
ASIC encoders with H.264 encoder functionality are available from many different semiconductor companies, but the core design used in the ASIC is typically licensed from one of a few companies such as Chips&Media, Allegro DVT, On2 (formerly Hantro, acquired by Google), Imagination Technologies, NGCodec. Some companies have both FPGA and ASIC product offerings. [70]
Texas Instruments manufactures a line of ARM + DSP cores that perform DSP H.264 BP encoding 1080p at 30fps. [71] This permits flexibility with respect to codecs (which are implemented as highly optimized DSP code) while being more efficient than software on a generic CPU.
In countries where patents on software algorithms are upheld, vendors and commercial users of products that use H.264/AVC are expected to pay patent licensing royalties for the patented technology that their products use. [72] This applies to the Baseline Profile as well. [73]
A private organization known as MPEG LA, which is not affiliated in any way with the MPEG standardization organization, administers the licenses for patents applying to this standard, as well as other patent pools, such as for MPEG-4 Part 2 Video, HEVC and MPEG-DASH. The patent holders include Fujitsu, Panasonic, Sony, Mitsubishi, Apple, Columbia University, KAIST, Dolby, Google, JVC Kenwood, LG Electronics, Microsoft, NTT Docomo, Philips, Samsung, Sharp, Toshiba and ZTE, [74] although the majority of patents in the pool are held by Panasonic (1,197 patents), Godo Kaisha IP Bridge (1,130 patents) and LG Electronics (990 patents). [75]
On August 26, 2010, MPEG LA announced that royalties won't be charged for H.264 encoded Internet video that is free to end users. [76] All other royalties remain in place, such as royalties for products that decode and encode H.264 video as well as to operators of free television and subscription channels. [77] The license terms are updated in 5-year blocks. [78]
Since the first version of the standard was completed in May 2003 (21 years ago) and the most commonly used profile (the High profile) was completed in June 2004[ citation needed ] (20 years ago), some of the relevant patents are expired by now, [75] while others are still in force in jurisdictions around the world and one of the US patents in the MPEG LA H.264 pool (granted in 2016, priority from 2001) lasts at least until November 2030. [79]
In 2005, Qualcomm sued Broadcom in US District Court, alleging that Broadcom infringed on two of its patents by making products that were compliant with the H.264 video compression standard. [80] In 2007, the District Court found that the patents were unenforceable because Qualcomm had failed to disclose them to the JVT prior to the release of the H.264 standard in May 2003. [80] In December 2008, the US Court of Appeals for the Federal Circuit affirmed the District Court's order that the patents be unenforceable but remanded to the District Court with instructions to limit the scope of unenforceability to H.264 compliant products. [80]
In October 2023 Nokia sued HP and Amazon for H.264/H.265 patent infringement in USA, UK and other locations. [81]
H.263 is a video compression standard originally designed as a low-bit-rate compressed format for videotelephony. It was standardized by the ITU-T Video Coding Experts Group (VCEG) in a project ending in 1995/1996. It is a member of the H.26x family of video coding standards in the domain of the ITU-T.
MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s without excessive quality loss, making video CDs, digital cable/satellite TV and digital audio broadcasting (DAB) practical.
A video codec is software or hardware that compresses and decompresses digital video. In the context of video compression, codec is a portmanteau of encoder and decoder, while a device that only compresses is typically called an encoder, and one that only decompresses is a decoder.
A bitstream format is the format of the data found in a stream of bits used in a digital communication or data storage application. The term typically refers to the data format of the output of an encoder, or the data format of the input to a decoder when using data compression.
H.261 is an ITU-T video compression standard, first ratified in November 1988. It is the first member of the H.26x family of video coding standards in the domain of the ITU-T Study Group 16 Video Coding Experts Group. It was the first video coding standard that was useful in practical terms.
In the field of video compression a video frame is compressed using different algorithms with different advantages and disadvantages, centered mainly around amount of data compression. These different algorithms for video frames are called picture types or frame types. The three major picture types used in the different video algorithms are I, P and B. They are different in the following characteristics:
SMPTE 421, informally known as VC-1, is a video coding format. Most of it was initially developed as Microsoft's proprietary video format Windows Media Video 9 in 2003. With some enhancements including the development of a new Advanced Profile, it was officially approved as a SMPTE standard on April 3, 2006. It was primarily marketed as a lower-complexity competitor to the H.264/MPEG-4 AVC standard. After its development, several companies other than Microsoft asserted that they held patents that applied to the technology, including Panasonic, LG Electronics and Samsung Electronics.
H.262 or MPEG-2 Part 2 is a video coding format standardised and jointly maintained by ITU-T Study Group 16 Video Coding Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG), and developed with the involvement of many companies. It is the second part of the ISO/IEC MPEG-2 standard. The ITU-T Recommendation H.262 and ISO/IEC 13818-2 documents are identical.
x264 is a free and open-source software library and a command-line utility developed by VideoLAN for encoding video streams into the H.264/MPEG-4 AVC video coding format. It is released under the terms of the GNU General Public License.
MPEG-4 Part 2, MPEG-4 Visual is a video compression format developed by the Moving Picture Experts Group (MPEG). It belongs to the MPEG-4 ISO/IEC standards. It uses block-wise motion compensation and a discrete cosine transform (DCT), similar to previous standards such as MPEG-1 Part 2 and H.262/MPEG-2 Part 2.
Α video codec is software or a device that provides encoding and decoding for digital video, and which may or may not include the use of video compression and/or decompression. Most codecs are typically implementations of video coding formats.
The macroblock is a processing unit in image and video compression formats based on linear block transforms, typically the discrete cosine transform (DCT). A macroblock typically consists of 16×16 samples, and is further subdivided into transform blocks, and may be further subdivided into prediction blocks. Formats which are based on macroblocks include JPEG, where they are called MCU blocks, H.261, MPEG-1 Part 2, H.262/MPEG-2 Part 2, H.263, MPEG-4 Part 2, and H.264/MPEG-4 AVC. In H.265/HEVC, the macroblock as a basic processing unit has been replaced by the coding tree unit.
Gary Joseph Sullivan is an American electrical engineer who led the development of the AVC, HEVC, and VVC video coding standards and created the DirectX Video Acceleration (DXVA) API/DDI video decoding feature of the Microsoft Windows operating system. He is currently Director of Video Research and Standards at Dolby Laboratories and is the chair of ISO/IEC JTC 1/SC 29 and of the ITU-T Video Coding Experts Group (VCEG).
A deblocking filter is a video filter applied to decoded compressed video to improve visual quality and prediction performance by smoothing the sharp edges which can form between macroblocks when block coding techniques are used. The filter aims to improve the appearance of decoded pictures. It is a part of the specification for both the SMPTE VC-1 codec and the ITU H.264 codec.
Flexible Macroblock Ordering or FMO is one of several error resilience tools defined in the Baseline profile of the H.264/MPEG-4 AVC video compression standard.
Reference frames are frames of a compressed video that are used to define future frames. As such, they are only used in inter-frame compression techniques. In older video encoding standards, such as MPEG-2, only one reference frame – the previous frame – was used for P-frames. Two reference frames were used for B-frames.
Multi View Video Coding is a stereoscopic video coding standard for video compression that allows for encoding of video sequences captured simultaneously from multiple camera angles in a single video stream. It uses the 2D plus Delta method and is an amendment to the H.264 video compression standard, developed jointly by MPEG and VCEG, with contributions from a number of companies, primarily Panasonic and LG Electronics.
High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard designed as part of the MPEG-H project as a successor to the widely used Advanced Video Coding. In comparison to AVC, HEVC offers from 25% to 50% better data compression at the same level of video quality, or substantially improved video quality at the same bit rate. It supports resolutions up to 8192×4320, including 8K UHD, and unlike the primarily 8-bit AVC, HEVC's higher fidelity Main 10 profile has been incorporated into nearly all supporting hardware.
A video coding format is a content representation format of digital video content, such as in a data file or bitstream. It typically uses a standardized video compression algorithm, most commonly based on discrete cosine transform (DCT) coding and motion compensation. A specific software, firmware, or hardware implementation capable of compression or decompression in a specific video coding format is called a video codec.
High Efficiency Video Coding tiers and levels are constraints that define a High Efficiency Video Coding (HEVC) bitstream in terms of maximum bit rate, maximum luma sample rate, maximum luma picture size, minimum compression ratio, maximum number of slices allowed, and maximum number of tiles allowed. Lower tiers are more constrained than higher tiers and lower levels are more constrained than higher levels.