Group of pictures

Last updated April 05, 2024

In video coding, a group of pictures, or GOP structure, specifies the order in which intra- and inter-frames are arranged. The GOP is a collection of successive pictures within a coded video stream. Each coded video stream consists of successive GOPs, from which the visible frames are generated. Encountering a new GOP in a compressed video stream means that the decoder doesn't need any previous frames in order to decode the next ones, and allows fast seeking through the video.

Elements

A GOP can contain the following picture types:

I frame (intra coded picture, also called keyframe^[1]) – a picture that is coded independently of all other pictures. Each GOP begins (in decoding order) with this type of frame.
- IDR frame (Instantaneous Decoder Refresh): I frame with a marking indicating that no subsequent P frames have references reaching further back than this I frame.
P frame (predictive coded picture) – contains motion-compensated difference information relative to previously decoded pictures. In older designs such as MPEG-1, H.262/MPEG-2 and H.263, each P frame can only reference one picture, and that picture must precede the P frame in display order as well as in decoding order, and the reference must be an I or P frame. These constraints do not apply in the newer standards H.264/MPEG-4 AVC and HEVC.
B frame (bipredictive coded picture) – contains motion-compensated difference information relative to previously decoded pictures. In older designs such as MPEG-1 and H.262/MPEG-2, each B frame can only reference two frames, the one which precedes the B frame in display order and the one which follows, and all referenced pictures must be I or P frames. These constraints do not apply in newer standards H.264/MPEG-4 AVC and HEVC.
D frame (DC direct coded picture) – serves as a fast-access representation of a frame for loss robustness or fast-forward. D frames are only used in MPEG-1 video.

An I frame indicates the beginning of a GOP. Afterwards, several P and B frames follow. In older designs, the allowed ordering and referencing structure is relatively constrained.^[2]

The I frames contain the full image and do not require any additional information to reconstruct them. Typically, encoders use GOP structures that cause each I frame to be a "clean random access point," such that decoding can start cleanly on an I frame and any errors within the GOP structure are corrected after processing a correct I frame.

In the newer designs found in H.264/MPEG-4 AVC and HEVC, encoders have much more flexibility about referencing structures. They can use the same referencing structures as were previously used in older designs, or they can use more pictures as references and they can use more flexible ordering of the coding order relative to the display order. They are also allowed to use B frames as references when coding other (B or P) frames. This extra flexibility can improve compression efficiency, but it can cause propagation of errors if some data becomes lost or corrupted. One popular structure for use with the newer designs is the use of a hierarchy of B frames. Hierarchical B frames can provide very good compression efficiency and can also limit the propagation of errors, since the hierarchy can ensure that the number of pictures affected by any data corruption problem is strictly limited.^[3]

Generally, the more I frames the video stream has, the more editable it is. However, having more I frames substantially increases bit rate needed to code the video.

Structure

The GOP structure is often referred by two numbers, for example, $M =3, N =12$ . The first number tells the distance between two anchor frames (I or P), also known as the length of a "mini-GOP".^[4] The second one tells the distance between two full images (I-frames): it is the GOP size.^[5] Instead of the M parameter, the maximal count of B-frames between two consecutive anchor frames can be used; this is the approach used by ffmpeg.^[6]

Examples:

For $M =3, N =12$ , the GOP structure is IBBPBBPBBPBB. There are 2 B-frames between two consecutive anchor frames.
For the sequence IBBBBPBBBBPBBBB, GOP size $N =15$ , anchor-distance $M =5$ . There are 4 B-frames between two consecutive anchor frames.

The GOP structure does not need to stay fixed throughout encoding. Varying $N$ to insert an I-frame on scene change is a well-known technique.^[7] Newer techniques also vary $M$ based on the amount of motion in the video.^[8]

Additional concepts

With H.264 and later designs which allow highly flexible reference structures, a B frame in one GOP is able to reference a frame in a different GOP. A GOP that contains any such outward-referencing frame is known as an "open GOP". The opposite is a self-contained GOP, known as a "closed GOP".^[4]

Related Research Articles

MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s without excessive quality loss, making video CDs, digital cable/satellite TV and digital audio broadcasting (DAB) practical.

A video codec is software or hardware that compresses and decompresses digital video. In the context of video compression, codec is a portmanteau of encoder and decoder, while a device that only compresses is typically called an encoder, and one that only decompresses is a decoder.

Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. It supports a maximum resolution of 8K UHD.

Audio Video Coding Standard (AVS) refers to the digital audio and digital video series compression standard formulated by the Audio and Video coding standard workgroup of China. Work began in 2002, and three generations of standards were published.

In the field of video compression a video frame is compressed using different algorithms with different advantages and disadvantages, centered mainly around amount of data compression. These different algorithms for video frames are called picture types or frame types. The three major picture types used in the different video algorithms are I, P and B. They are different in the following characteristics:

An inter frame is a frame in a video compression stream which is expressed in terms of one or more neighboring frames. The "inter" part of the term refers to the use of Inter frame prediction. This kind of prediction tries to take advantage from temporal redundancy between neighboring frames enabling higher compression rates.

x264 is a free and open-source software library and a command-line utility developed by VideoLAN for encoding video streams into the H.264/MPEG-4 AVC video coding format. It is released under the terms of the GNU General Public License.

Α video codec is software or a device that provides encoding and decoding for digital video, and which may or may not include the use of video compression and/or decompression. Most codecs are typically implementations of video coding formats.

Reference frames are frames of a compressed video that are used to define future frames. As such, they are only used in inter-frame compression techniques. In older video encoding standards, such as MPEG-2, only one reference frame – the previous frame – was used for P-frames. Two reference frames were used for B-frames.

The Apple Intermediate Codec is a high-quality 8-bit 4:2:0 video codec used mainly as a less processor-intensive way of working with long-GOP MPEG-2 footage such as HDV. It is recommended for use with all HD workflows in Final Cut Express, iMovie, and until Final Cut Pro version 5. The Apple Intermediate Codec abbreviated AIC is designed by Apple Inc. to be an intermediate format in an HDV and AVCHD workflow. It features high performance and quality, being less processor intensive to work with than other editing formats. Unlike native MPEG-2 based HDV - and similar to the standard-definition DV codec - the Apple Intermediate Codec does not use temporal compression, enabling every frame to be decoded immediately without decoding other frames. As a result of this, the Apple Intermediate Codec takes three to four times more space than HDV.

VP8 is an open and royalty-free video compression format released by On2 Technologies in 2008.

Multi view Video Coding is a stereoscopic video coding standard for video compression that allows for encoding of video sequences captured simultaneously from multiple camera angles in a single video stream. It uses the 2D plus Delta method and is an amendment to the H.264 video compression standard, developed jointly by MPEG and VCEG, with contributions from a number of companies, primarily Panasonic and LG Electronics.

High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard designed as part of the MPEG-H project as a successor to the widely used Advanced Video Coding. In comparison to AVC, HEVC offers from 25% to 50% better data compression at the same level of video quality, or substantially improved video quality at the same bit rate. It supports resolutions up to 8192×4320, including 8K UHD, and unlike the primarily 8-bit AVC, HEVC's higher fidelity Main 10 profile has been incorporated into nearly all supporting hardware.

A video coding format is a content representation format of digital video content, such as in a data file or bitstream. It typically uses a standardized video compression algorithm, most commonly based on discrete cosine transform (DCT) coding and motion compensation. A specific software, firmware, or hardware implementation capable of compression or decompression in a specific video coding format is called a video codec.

VP9 is an open and royalty-free video coding format developed by Google.

Thomson Video Networks (TVN) was a technology broadcast company that used to provide video compression, transcoding and processing solutions for media companies, video service providers, and TV broadcasters. The firm has offices in 16 countries and headquarters in Rennes, France. TVN has been acquired by Harmonic Inc. in 2016.

Versatile Video Coding (VVC), also known as H.266, ISO/IEC 23090-3, and MPEG-I Part 3, is a video compression standard finalized on 6 July 2020, by the Joint Video Experts Team (JVET), a joint video expert team of the VCEG working group of ITU-T Study Group 16 and the MPEG working group of ISO/IEC JTC 1/SC 29. It is the successor to High Efficiency Video Coding. It was developed with two primary goals – improved compression performance and support for a very broad range of applications.

Low Complexity Enhancement Video Coding (LCEVC) is a ISO/IEC video coding standard developed by the Moving Picture Experts Group (MPEG) under the project name MPEG-5 Part 2 LCEVC.

References

↑ "Keyframes, InterFrame & Video Compression". 13 April 2021.
↑ "B-Frames".
↑ "Hierarchical B-Frames or B-Pyramid - Video Compression". www.ramugedia.com.
1 2 Vijayanagar, Krishna Rao (17 December 2020). "Closed GOP and Open GOP - Simplified Explanation - OTTVerse". ottverse.com.
↑ "Compressor 4 User Manual".
↑ "FFmpeg Codecs Documentation". ffmpeg.org. bf integer (encoding,video) Set max number of B frames between non-B-frames.
↑ "Adaptive Intra-Frame Assignment and Bit-Rate Estimation for Variable GOP Length in H.264". IEEE Transactions on Circuits and Systems for Video Technology. 16 (10): 1271–1279. October 2006. doi:10.1109/TCSVT.2006.881856.
↑ "Docs/Appendix-Adaptive-Prediction-Structure.md · master · Alliance for Open Media / SVT-AV1 · GitLab". GitLab. 23 August 2023.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Keyframes, InterFrame & Video Compression". 13 April 2021.

[2] "B-Frames".

[3] "Hierarchical B-Frames or B-Pyramid - Video Compression". www.ramugedia.com.

[ott-4] 1 2 Vijayanagar, Krishna Rao (17 December 2020). "Closed GOP and Open GOP - Simplified Explanation - OTTVerse". ottverse.com.

[5] "Compressor 4 User Manual".

[6] "FFmpeg Codecs Documentation". ffmpeg.org. bf integer (encoding,video) Set max number of B frames between non-B-frames.

[7] "Adaptive Intra-Frame Assignment and Bit-Rate Estimation for Variable GOP Length in H.264". IEEE Transactions on Circuits and Systems for Video Technology. 16 (10): 1271–1279. October 2006. doi:10.1109/TCSVT.2006.881856.

[8] "Docs/Appendix-Adaptive-Prediction-Structure.md · master · Alliance for Open Media / SVT-AV1 · GitLab". GitLab. 23 August 2023.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]