Rate–distortion optimization

Last updated October 09, 2023

Rate-distortion optimization (RDO) is a method of improving video quality in video compression. The name refers to the optimization of the amount of distortion (loss of video quality) against the amount of data required to encode the video, the rate. While it is primarily used by video encoders, rate-distortion optimization can be used to improve quality in any encoding situation (image, video, audio, or otherwise) where decisions have to be made that affect both file size and quality simultaneously.

Background

The classical method of making encoding decisions is for the video encoder to choose the result which yields the highest quality output image. However, this has the disadvantage that the choice it makes might require more bits while giving comparatively little quality benefit. One common example of this problem is in motion estimation, ^[1] and in particular regarding the use of quarter pixel-precision motion estimation. Adding the extra precision to the motion of a block during motion estimation might increase quality, but in some cases that extra quality isn't worth the extra bits necessary to encode the motion vector to a higher precision.

How it works

Rate-distortion optimization solves the aforementioned problem by acting as a video quality metric, measuring both the deviation from the source material and the bit cost for each possible decision outcome. The bits are mathematically measured by multiplying the bit cost by the Lagrangian, a value representing the relationship between bit cost and quality for a particular quality level. The deviation from the source is usually measured as the mean squared error, in order to maximize the PSNR video quality metric.

Calculating the bit cost is made more difficult by the entropy encoders in modern video codecs, requiring the rate-distortion optimization algorithm to pass each block of video to be tested to the entropy coder to measure its actual bit cost. In MPEG codecs, the full process consists of a discrete cosine transform, followed by quantization and entropy encoding. Because of this, rate-distortion optimization is much slower than most other block-matching metrics, such as the simple sum of absolute differences (SAD) and sum of absolute transformed differences (SATD). As such it is usually used only for the final steps of the motion estimation process, such as deciding between different partition types in H.264/AVC.

List of encoders that support RDO

Ateme H.264 encoder
Grass Valley ViBE encoders (SD & HD MPEG-2/MPEG-4)
Harmonic Electra 8000 encoder (SD & HD MPEG-2/MPEG-4)
libavcodec
MainConcept H.264 encoder
Microsoft VC-1 encoder
TANDBERG Television SD MPEG-2 EN8100
TANDBERG Television HD MPEG-4 EN8190
TANDBERG Television SD & HD MPEG-4 iPlex
Theora 1.1-alpha1 and later (the "Thusnelda" branch)
x264 H.264 encoder
x265 H.265 encoder
Xvid MPEG-4 ASP encoder
H.264/AVC reference software JM (Joint Model)
HEVC reference software HM (HEVC Test Model)
Kvazaar (partial)^[2]

Related Research Articles

<span class="mw-page-title-main">Lossy compression</span> Data compression approach that reduces data size while discarding or changing some of it

In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size for storing, handling, and transmitting content. The different versions of the photo of the cat on this page show how higher degrees of approximation create coarser images as more details are removed. This is opposed to lossless data compression which does not degrade the data. The amount of data reduction possible using lossy compression is much higher than using lossless techniques.

A video codec is software or hardware that compresses and decompresses digital video. In the context of video compression, codec is a portmanteau of encoder and decoder, while a device that only compresses is typically called an encoder, and one that only decompresses is a decoder.

Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. It supports a maximum resolution of 8K UHD.

DVB-T, short for Digital Video Broadcasting – Terrestrial, is the DVB European-based consortium standard for the broadcast transmission of digital terrestrial television that was first published in 1997 and first broadcast in Singapore in February, 1998. This system transmits compressed digital audio, digital video and other data in an MPEG transport stream, using coded orthogonal frequency-division multiplexing modulation. It is also the format widely used worldwide for Electronic News Gathering for transmission of video and audio from a mobile newsgathering vehicle to a central receive point. It is also used in the US by Amateur television operators.

x264 is a free and open-source software library and a command-line utility developed by VideoLAN for encoding video streams into the H.264/MPEG-4 AVC video coding format. It is released under the terms of the GNU General Public License.

Quarter-pixel motion(also known as Q-pel motion or Qpel motion) refers to using a quarter of the distance between pixels as the motion vector precision for motion estimation and motion compensation in video compression schemes. It is used in many modern video coding formats such as MPEG-4 ASP, H.264/AVC, and HEVC. Though higher precision motion vectors take more bits to encode, they can sometimes result in more efficient compression overall, by increasing the quality of the prediction signal.

Α video codec is software or a device that provides encoding and decoding for digital video, and which may or may not include the use of video compression and/or decompression. Most codecs are typically implementations of video coding formats.

The following is a list of H.264/MPEG-4 AVC products and implementations.

PureVideo is Nvidia's hardware SIP core that performs video decoding. PureVideo is integrated into some of the Nvidia GPUs, and it supports hardware decoding of multiple video codec standards: MPEG-2, VC-1, H.264, HEVC, and AV1. PureVideo occupies a considerable amount of a GPU's die area and should not be confused with Nvidia NVENC. In addition to video decoding on chip, PureVideo offers features such as edge enhancement, noise reduction, deinterlacing, dynamic contrast enhancement and color enhancement.

Video Acceleration API (VA-API) is an open source application programming interface that allows applications such as VLC media player or GStreamer to use hardware video acceleration capabilities, usually provided by the graphics processing unit (GPU). It is implemented by the free and open-source library libva, combined with a hardware-specific driver, usually provided together with the GPU driver.

AVC-Intra is a type of video coding developed by Panasonic, and then supported in products made by other companies. AVC-Intra is available in Panasonic's high definition broadcast products, such as, for example, their P2 card equipped broadcast cameras.

VP8 is an open and royalty-free video compression format released by On2 Technologies in 2008.

High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard designed as part of the MPEG-H project as a successor to the widely used Advanced Video Coding. In comparison to AVC, HEVC offers from 25% to 50% better data compression at the same level of video quality, or substantially improved video quality at the same bit rate. It supports resolutions up to 8192×4320, including 8K UHD, and unlike the primarily 8-bit AVC, HEVC's higher fidelity Main 10 profile has been incorporated into nearly all supporting hardware.

Chips&Media, Inc. is a provider of intellectual property for integrated circuits such as system on a chip technology for encoding and decoding video, and image processing. Headquartered in Seoul, South Korea.

x265 is a encoder for creating digital video streams in the High Efficiency Video Coding (HEVC/H.265) video compression format developed by the Joint Collaborative Team on Video Coding (JCT-VC). It is available as a command-line app or a software library, under the terms of GNU General Public License (GPL) version 2 or later; however, customers may request a commercial license.

A video coding format is a content representation format for storage or transmission of digital video content. It typically uses a standardized video compression algorithm, most commonly based on discrete cosine transform (DCT) coding and motion compensation. A specific software, firmware, or hardware implementation capable of compression or decompression to/from a specific video coding format is called a video codec.

VP9 is an open and royalty-free video coding format developed by Google.

Thomson Video Networks (TVN) was a technology broadcast company that used to provide video compression, transcoding and processing solutions for media companies, video service providers, and TV broadcasters. The firm has offices in 16 countries and headquarters in Rennes, France. TVN has been acquired by Harmonic Inc. in 2016.

References

↑ Hoang, D.T.; Long, P.M.; Vitter, Jeffrey (August 1998). "Rate-Distortion Optimizations for Motion Estimation in Low-Bitrate Video Coding" (PDF). IEEE Transactions on Circuits and Systems for Video Technology. 8 (4): 488–500. doi:10.1109/76.709413. A shorter version appears in Hoang, D.T.; Long, P.M.; Vitter, J.S. (March 1996). "Rate-distortion optimizations for motion estimation in low-bit-rate video coding". Digital Video Compression: Algorithms and Technologies 1996. Vol. 2668. SPIE. pp. 18–27. doi:10.1117/12.235433.
↑ "Ultra Video Group".

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Hoang, D.T.; Long, P.M.; Vitter, Jeffrey (August 1998). "Rate-Distortion Optimizations for Motion Estimation in Low-Bitrate Video Coding" (PDF). IEEE Transactions on Circuits and Systems for Video Technology. 8 (4): 488–500. doi:10.1109/76.709413. A shorter version appears in Hoang, D.T.; Long, P.M.; Vitter, J.S. (March 1996). "Rate-distortion optimizations for motion estimation in low-bit-rate video coding". Digital Video Compression: Algorithms and Technologies 1996. Vol. 2668. SPIE. pp. 18–27. doi:10.1117/12.235433.

[2] "Ultra Video Group".

[1]

[2]