Video compression picture types

Last updated January 28, 2025

In the field of video compression, a video frame is compressed using different algorithms with different advantages and disadvantages, centered mainly around amount of data compression. These different algorithms for video frames are called picture types or frame types. The three major picture types used in the different video algorithms are I, P and B.^[1] They are different in the following characteristics:

I‑frames are the least compressible but don't require other video frames to decode.
P‑frames can use data from previous frames to decompress and are more compressible than I‑frames.
B‑frames can use both previous and forward frames for data reference to get the highest amount of data compression.

Summary

Three types of pictures (or frames) are used in video compression: I, P, and B frames.

An I‑frame (intra-coded picture) is a complete image, like a JPG or BMP image file.

A P‑frame (Predicted picture) holds only the changes in the image from a previous frame. For example, in a scene where a car moves across a stationary background, only the car's movements need to be encoded. The encoder does not need to store the unchanging background pixels in the P‑frame, thus saving space. P‑frames are also known as delta‑frames.

A B‑frame (Bidirectional predicted picture) saves even more space by using differences between the current frame and both the preceding and following frames to specify its content.

P and B frames are also called inter frames. The order in which the I, P and B frames are arranged is called the group of pictures. Video frames contain presentation timestamp (PTS) and decoding timestamp (DTS) values to keep the frames in correct order for decoding and displaying.

Pictures/frames

While the terms "frame" and "picture" are often used interchangeably, the term picture is a more general notion, as a picture can be either a frame or a field. A frame is a complete image, and a field is the set of odd-numbered or even-numbered scan lines composing a partial image. For example, an HD 1080 picture has 1080 lines (rows) of pixels. An odd field consists of pixel information for lines 1, 3, 5...1079. An even field has pixel information for lines 2, 4, 6...1080. When video is sent in interlaced-scan format, each frame is sent in two fields, the field of odd-numbered lines followed by the field of even-numbered lines.

A frame used as a reference for predicting other frames is called a reference frame.

Frames encoded without information from other frames are called I-frames. Frames that use prediction from a single preceding reference frame (or a single frame for prediction of each region) are called P-frames. B-frames use prediction from a (possibly weighted) average of two reference frames, one preceding and one succeeding.

Slices

In the H.264/MPEG-4 AVC standard, the granularity of prediction types is brought down to the "slice level." A slice is a spatially distinct region of a frame that is encoded separately from any other region in the same frame. I-slices, P-slices, and B-slices take the place of I, P, and B frames.

Macroblocks

Typically, pictures (frames) are segmented into macroblocks , and individual prediction types can be selected on a macroblock basis rather than being the same for the entire picture, as follows:

I-frames can contain only intra macroblocks
P-frames can contain both intra macroblocks and predicted macroblocks
B-frames can contain intra, predicted, and bi-predicted macroblocks

Furthermore, in the H.264 video coding standard, the frame can be segmented into sequences of macroblocks called slices, and instead of using I, B and P-frame type selections, the encoder can choose the prediction style distinctly on each individual slice. Also in H.264 are found several additional types of frames/slices:

SI‑frames/slices (Switching I): Facilitates switching between coded streams; contains SI-macroblocks (a special type of intra coded macroblock).
SP‑frames/slices (Switching P): Facilitates switching between coded streams; contains P and/or I-macroblocks
Multi‑frame motion estimation (up to 16 reference frames or 32 reference fields)

Multi‑frame motion estimation increases the quality of the video, while allowing the same compression ratio. SI and SP frames (defined for the Extended Profile) improve error correction. When such frames are used along with a smart decoder, it is possible to recover the broadcast streams of damaged DVDs.

Intra-coded (I) frames/slices (key frames)

I-frames contain an entire image. They are coded without reference to any other frame except (parts of) themselves.
May be generated by an encoder to create a random access point (to allow a decoder to start decoding properly from scratch at that picture location).
May also be generated when differentiating image details prohibit generation of effective P or B-frames.
Typically require more bits to encode than other frame types.

Often, I‑frames are used for random access and are used as references for the decoding of other pictures. Intra refresh periods of a half-second are common on such applications as digital television broadcast and DVD storage. Longer refresh periods may be used in some environments. For example, in videoconferencing systems it is common to send I-frames very infrequently.

Predicted (P) frames/slices

Require the prior decoding of some other picture(s) in order to be decoded.
May contain both image data and motion vector displacements and combinations of the two.
Can reference previous pictures in decoding order.
Older standard designs (such as MPEG-2) use only one previously decoded picture as a reference during decoding, and require that picture to also precede the P picture in display order.
In H.264, can use multiple previously decoded pictures as references during decoding, and can have any arbitrary display-order relationship relative to the picture(s) used for its prediction.
Typically require fewer bits for encoding compared to I-frames.

Bi-directional predicted (B) frames/slices (macroblocks)

Require the prior decoding of subsequent frame(s) to be displayed.
May contain image data and/or motion vector displacements. Older standards allow only a single global motion compensation vector for the entire frame or a single motion compensation vector per macroblock.
Include some prediction modes that form a prediction of a motion region (e.g., a macroblock or a smaller area) by averaging the predictions obtained using two different previously decoded reference regions. Some standards allow two motion compensation vectors per macroblock (biprediction).
In older standards (such as MPEG-2), B-frames are never used as references for the prediction of other pictures. As a result, a lower quality encoding (requiring less space) can be used for such B-frames because the loss of detail will not harm the prediction quality for subsequent pictures.
H.264 relaxes this restriction, and allows B-frames to be used as references for the decoding of other frames at the encoder's discretion.
Older standards (such as MPEG-2), use exactly two previously decoded pictures as references during decoding, and require one of those pictures to precede the B-frame in display order and the other one to follow it.
H.264 allows for one, two, or more than two previously decoded pictures as references during decoding, and can have any arbitrary display-order relationship relative to the picture(s) used for its prediction.
The heightened flexibility of information retrieval means that B-frames typically require fewer bits for encoding than either I or P-frames.

Related Research Articles

MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s without excessive quality loss, making video CDs, digital cable/satellite TV and digital audio broadcasting (DAB) practical.

Motion compensation in computing is an algorithmic technique used to predict a frame in a video given the previous and/or future frames by accounting for motion of the camera and/or objects in the video. It is employed in the encoding of video data for video compression, for example in the generation of MPEG-2 files. Motion compensation describes a picture in terms of the transformation of a reference picture to the current picture. The reference picture may be previous in time or even from the future. When images can be accurately synthesized from previously transmitted/stored images, the compression efficiency can be improved.

Motion JPEG is a video compression format in which each video frame or interlaced field of a digital video sequence is compressed separately as a JPEG image.

<span class="mw-page-title-main">Compression artifact</span> Distortion of media caused by lossy data compression

A compression artifact is a noticeable distortion of media caused by the application of lossy compression. Lossy data compression involves discarding some of the media's data so that it becomes small enough to be stored within the desired disk space or transmitted (streamed) within the available bandwidth. If the compressor cannot store enough data in the compressed version, the result is a loss of quality, or introduction of artifacts. The compression algorithm may not be intelligent enough to discriminate between distortions of little subjective importance and those objectionable to the user.

Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. It supports a maximum resolution of 8K UHD.

H.261 is an ITU-T video compression standard, first ratified in November 1988. It is the first member of the H.26x family of video coding standards in the domain of the ITU-T Study Group 16 Video Coding Experts Group. It was the first video coding standard that was useful in practical terms.

Audio Video Coding Standard (AVS) refers to the digital audio and digital video series compression standard formulated by the Audio and Video coding standard workgroup of China. Work began in 2002, and three generations of standards were published.

An inter frame is a frame in a video compression stream which is expressed in terms of one or more neighboring frames. The "inter" part of the term refers to the use of Inter frame prediction. This kind of prediction tries to take advantage from temporal redundancy between neighboring frames enabling higher compression rates.

H.262 or MPEG-2 Part 2 is a video coding format standardised and jointly maintained by ITU-T Study Group 16 Video Coding Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG), and developed with the involvement of many companies. It is the second part of the ISO/IEC MPEG-2 standard. The ITU-T Recommendation H.262 and ISO/IEC 13818-2 documents are identical.

Global motion compensation(GMC) is a motion compensation technique used in video compression to reduce the bitrate required to encode video. It is most commonly used in MPEG-4 ASP, such as with the DivX and Xvid codecs.

Α video codec is software or a device that provides encoding and decoding for digital video, and which may or may not include the use of video compression and/or decompression. Most codecs are typically implementations of video coding formats.

The macroblock is a processing unit in image and video compression formats based on linear block transforms, typically the discrete cosine transform (DCT). A macroblock typically consists of 16×16 samples, and is further subdivided into transform blocks, and may be further subdivided into prediction blocks. Formats which are based on macroblocks include JPEG, where they are called MCU blocks, H.261, MPEG-1 Part 2, H.262/MPEG-2 Part 2, H.263, MPEG-4 Part 2, and H.264/MPEG-4 AVC. In H.265/HEVC, the macroblock as a basic processing unit has been replaced by the coding tree unit.

In video coding, a group of pictures, or GOP structure, specifies the order in which intra- and inter-frames are arranged. The GOP is a collection of successive pictures within a coded video stream. Each coded video stream consists of successive GOPs, from which the visible frames are generated. Encountering a new GOP in a compressed video stream means that the decoder doesn't need any previous frames in order to decode the next ones, and allows fast seeking through the video.

<span class="mw-page-title-main">Intra-frame coding</span> Data compression technique

Intra-frame coding is a data compression technique used within a video frame, enabling smaller file sizes and lower bitrates, with little or no loss in quality. Since neighboring pixels within an image are often very similar, rather than storing each pixel independently, the frame image is divided into blocks and the typically minor difference between each pixel can be encoded using fewer bits.

A deblocking filter is a video filter applied to decoded compressed video to improve visual quality and prediction performance by smoothing the sharp edges which can form between macroblocks when block coding techniques are used. The filter aims to improve the appearance of decoded pictures. It is a part of the specification for both the SMPTE VC-1 codec and the ITU H.264 codec.

Flexible Macroblock Ordering or FMO is one of several error resilience tools defined in the Baseline profile of the H.264/MPEG-4 AVC video compression standard.

Reference frames are frames of a compressed video that are used to define future frames. As such, they are only used in inter-frame compression techniques. In older video encoding standards, such as MPEG-2, only one reference frame – the previous frame – was used for P-frames. Two reference frames were used for B-frames.

VP8 is an open and royalty-free video compression format released by On2 Technologies in 2008.

High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard designed as part of the MPEG-H project as a successor to the widely used Advanced Video Coding. In comparison to AVC, HEVC offers from 25% to 50% better data compression at the same level of video quality, or substantially improved video quality at the same bit rate. It supports resolutions up to 8192×4320, including 8K UHD, and unlike the primarily 8-bit AVC, HEVC's higher fidelity Main 10 profile has been incorporated into nearly all supporting hardware.

A video coding format is a content representation format of digital video content, such as in a data file or bitstream. It typically uses a standardized video compression algorithm, most commonly based on discrete cosine transform (DCT) coding and motion compensation. A computer software or hardware component that compresses or decompresses a specific video coding format is a video codec.

References

↑ Beach, Andy; Owen, Aaron (2019). Video compression handbook (2nd ed.). Place of publication not identified: Peachpit Press. ISBN 978-0-13-486621-5. OCLC 1006298938.

External links

Video streaming with SP and SI frames

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Beach, Andy; Owen, Aaron (2019). Video compression handbook (2nd ed.). Place of publication not identified: Peachpit Press. ISBN 978-0-13-486621-5. OCLC 1006298938.

[1]