Context-adaptive binary arithmetic coding

Last updated

Context-adaptive binary arithmetic coding (CABAC) is a form of entropy encoding used in the H.264/MPEG-4 AVC [1] [2] and High Efficiency Video Coding (HEVC) standards. It is a lossless compression technique, although the video coding standards in which it is used are typically for lossy compression applications. CABAC is notable for providing much better compression than most other entropy encoding algorithms used in video encoding, and it is one of the key elements that provides the H.264/AVC encoding scheme with better compression capability than its predecessors. [3]

Contents

In H.264/MPEG-4 AVC, CABAC is only supported in the Main and higher profiles (but not the extended profile) of the standard, as it requires a larger amount of processing to decode than the simpler scheme known as context-adaptive variable-length coding (CAVLC) that is used in the standard's Baseline profile. CABAC is also difficult to parallelize and vectorize, so other forms of parallelism (such as spatial region parallelism) may be coupled with its use. In HEVC, CABAC is used in all profiles of the standard.

Algorithm

CABAC is based on arithmetic coding, with a few innovations and changes to adapt it to the needs of video encoding standards: [4]

CABAC has multiple probability modes for different contexts. It first converts all non-binary symbols to binary. Then, for each bit, the coder selects which probability model to use, then uses information from nearby elements to optimize the probability estimate. Arithmetic coding is finally applied to compress the data.

CABAC method of entropy encoding used within H264 video compression standard in English CABAC Encoder Workflow Diagram.png
CABAC method of entropy encoding used within H264 video compression standard in English

The context modeling provides estimates of conditional probabilities of the coding symbols. Utilizing suitable context models, a given inter-symbol redundancy can be exploited by switching between different probability models according to already-coded symbols in the neighborhood of the current symbol to encode. The context modeling is responsible for most of CABAC's roughly 10% savings in bit rate over the CAVLC entropy coding method.

Coding a data symbol involves the following stages.

Example

MVDxBinarization
00
110
2110
31110
411110
5111110
61111110
711111110
8111111110

1. Binarize the value MVDx, the motion vector difference in the x direction.

The first bit of the binarized codeword is bin 1; the second bit is bin 2; and so on.

ekContext model for bin 1
0 ≤ ek < 3Model 0
3 ≤ ek < 33Model 1
33 ≤ ekModel 2

2. Choose a context model for each bin. One of 3 models is selected for bin 1, based on previous coded MVD values. The L1 norm of two previously-coded values, ek, is calculated:

BinContext model
10, 1 or 2 depending on ek
23
34
45
5 and higher6

If ek is small, then there is a high probability that the current MVD will have a small magnitude; conversely, if ek is large then it is more likely that the current MVD will have a large magnitude. We select a probability table (context model) accordingly. The remaining bins are coded using one of 4 further context models:

3. Encode each bin. The selected context model supplies two probability estimates: the probability that the bin contains “1” and the probability that the bin contains “0”. These estimates determine the two sub-ranges that the arithmetic coder uses to encode the bin.

4. Update the context models. For example, if context model 2 was selected for bin 1 and the value of bin 1 was “0”, the frequency count of “0”s is incremented. This means that the next time this model is selected, the probability of a “0” will be slightly higher. When the total number of occurrences of a model exceeds a threshold value, the frequency counts for “0” and “1” will be scaled down, which in effect gives higher priority to recent observations.

The arithmetic decoding engine

The arithmetic decoder is described in some detail in the Standard. It has three distinct properties:

  1. Probability estimation is performed by a transition process between 64 separate probability states for "Least Probable Symbol" (LPS, the least probable of the two binary decisions "0" or "1").
  2. The range R representing the current state of the arithmetic coder is quantized to a small range of pre-set values before calculating the new range at each step, making it possible to calculate the new range using a look-up table (i.e. multiplication-free).
  3. A simplified encoding and decoding process is defined for data symbols with a near uniform probability distribution.

The definition of the decoding process is designed to facilitate low-complexity implementations of arithmetic encoding and decoding. Overall, CABAC provides improved coding efficiency compared with CAVLC-based coding, at the expense of greater computational complexity.

History

In 1986, IBM researchers Kottappuram M. A. Mohiuddin and Jorma Johannen Rissanen filed a patent for a multiplication-free binary arithmetic coding algorithm. [5] [6] In 1988, an IBM research team including R.B. Arps, T.K. Truong, D.J. Lu, W. B. Pennebaker, L. Mitchell and G. G. Langdon presented an adaptive binary arithmetic coding (ABAC) algorithm called Q-Coder. [7] [8]

The above patents and research papers, along several others from IBM and Mitsubishi Electric, were later cited by the CCITT and Joint Photographic Experts Group as the basis for the JPEG image compression format's adaptive binary arithmetic coding algorithm in 1992. [5] However, encoders and decoders of the JPEG file format, which has options for both Huffman encoding and arithmetic coding, typically only support the Huffman encoding option, which was originally because of patent concerns, although JPEG's arithmetic coding patents [9] have since expired due to the age of the JPEG standard. [10] The first reported use of adaptive binary arithmetic coding in motion video compression was in a proposal by IBM researchers to the MPEG group in 1989. [11] [12] This proposal extended the use of arithmetic coding from intraframe JPEG to interframe video coding.

In 1999, Youngjun Yoo (Texas Instruments), Young Gap Kwon and Antonio Ortega (University of Southern California) presented a context-adaptive form of binary arithmetic coding. [13] The modern context-adaptive binary arithmetic coding (CABAC) algorithm was commercially introduced with the H.264/MPEG-4 AVC format in 2003. [14] The majority of patents for the AVC format are held by Panasonic, Godo Kaisha IP Bridge and LG Electronics. [15]

See also

Related Research Articles

In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder.

<span class="mw-page-title-main">Huffman coding</span> Technique to compress data

In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".

<span class="mw-page-title-main">Lossy compression</span> Data compression approach that reduces data size while discarding or changing some of it

In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size for storing, handling, and transmitting content. The different versions of the photo of the cat on this page show how higher degrees of approximation create coarser images as more details are removed. This is opposed to lossless data compression which does not degrade the data. The amount of data reduction possible using lossy compression is much higher than using lossless techniques.

<span class="mw-page-title-main">Video codec</span> Digital video processing

A video codec is software or hardware that compresses and decompresses digital video. In the context of video compression, codec is a portmanteau of encoder and decoder, while a device that only compresses is typically called an encoder, and one that only decompresses is a decoder.

<span class="mw-page-title-main">Arithmetic coding</span> Form of entropy encoding used in data compression

Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a string of characters is represented using a fixed number of bits per character, as in the ASCII code. When a string is converted to arithmetic encoding, frequently used characters will be stored with fewer bits and not-so-frequently occurring characters will be stored with more bits, resulting in fewer bits used in total. Arithmetic coding differs from other forms of entropy encoding, such as Huffman coding, in that rather than separating the input into component symbols and replacing each with a code, arithmetic coding encodes the entire message into a single number, an arbitrary-precision fraction q, where 0.0 ≤ q < 1.0. It represents the current information as a range, defined by two numbers. A recent family of entropy coders called asymmetric numeral systems allows for faster implementations thanks to directly operating on a single natural number representing the current information.

<span class="mw-page-title-main">JPEG 2000</span> Image compression standard and coding system

JPEG 2000 (JP2) is an image compression standard and coding system. It was developed from 1997 to 2000 by a Joint Photographic Experts Group committee chaired by Touradj Ebrahimi, with the intention of superseding their original JPEG standard, which is based on a discrete cosine transform (DCT), with a newly designed, wavelet-based method. The standardized filename extension is .jp2 for ISO/IEC 15444-1 conforming files and .jpx for the extended part-2 specifications, published as ISO/IEC 15444-2. The registered MIME types are defined in RFC 3745. For ISO/IEC 15444-1 it is image/jp2.

Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is simple to implement and has the potential for very high throughput in hardware implementations. It is the algorithm of the Unix file compression utility compress and is used in the GIF image format.

<span class="mw-page-title-main">Compression artifact</span> Distortion of media caused by lossy data compression

A compression artifact is a noticeable distortion of media caused by the application of lossy compression. Lossy data compression involves discarding some of the media's data so that it becomes small enough to be stored within the desired disk space or transmitted (streamed) within the available bandwidth. If the compressor cannot store enough data in the compressed version, the result is a loss of quality, or introduction of artifacts. The compression algorithm may not be intelligent enough to discriminate between distortions of little subjective importance and those objectionable to the user.

<span class="mw-page-title-main">Advanced Video Coding</span> Most widely used standard for video compression

Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. It supports a maximum resolution of 8K UHD.

MPEG-4 Part 2, MPEG-4 Visual is a video compression format developed by the Moving Picture Experts Group (MPEG). It belongs to the MPEG-4 ISO/IEC standards. It uses block-wise motion compensation and a discrete cosine transform (DCT), similar to previous standards such as MPEG-1 Part 2 and H.262/MPEG-2 Part 2.

JBIG2 is an image compression standard for bi-level images, developed by the Joint Bi-level Image Experts Group. It is suitable for both lossless and lossy compression. According to a press release from the Group, in its lossless mode JBIG2 typically generates files 3–5 times smaller than Fax Group 4 and 2–4 times smaller than JBIG, the previous bi-level compression standard released by the Group. JBIG2 was published in 2000 as the international standard ITU T.88, and in 2001 as ISO/IEC 14492.

Α video codec is software or a device that provides encoding and decoding for digital video, and which may or may not include the use of video compression and/or decompression. Most codecs are typically implementations of video coding formats.

Lossless JPEG is a 1993 addition to JPEG standard by the Joint Photographic Experts Group to enable lossless compression. However, the term may also be used to refer to all lossless compression schemes developed by the group, including JPEG 2000 and JPEG-LS.

Context-adaptive variable-length coding (CAVLC) is a form of entropy coding used in H.264/MPEG-4 AVC video encoding. It is an inherently lossless compression technique, like almost all entropy-coders. In H.264/MPEG-4 AVC, it is used to encode residual, zig-zag order, blocks of transform coefficients. It is an alternative to context-based adaptive binary arithmetic coding (CABAC). CAVLC requires considerably less processing to decode than CABAC, although it does not compress the data quite as effectively. CAVLC is supported in all H.264 profiles, unlike CABAC which is not supported in Baseline and Extended profiles.

Video Acceleration API (VA-API) is an open source application programming interface that allows applications such as VLC media player or GStreamer to use hardware video acceleration capabilities, usually provided by the graphics processing unit (GPU). It is implemented by the free and open-source library libva, combined with a hardware-specific driver, usually provided together with the GPU driver.

High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard designed as part of the MPEG-H project as a successor to the widely used Advanced Video Coding. In comparison to AVC, HEVC offers from 25% to 50% better data compression at the same level of video quality, or substantially improved video quality at the same bit rate. It supports resolutions up to 8192×4320, including 8K UHD, and unlike the primarily 8-bit AVC, HEVC's higher fidelity Main 10 profile has been incorporated into nearly all supporting hardware.

A video coding format is a content representation format of digital video content, such as in a data file or bitstream. It typically uses a standardized video compression algorithm, most commonly based on discrete cosine transform (DCT) coding and motion compensation. A specific software, firmware, or hardware implementation capable of compression or decompression in a specific video coding format is called a video codec.

Asymmetric numeral systems (ANS) is a family of entropy encoding methods introduced by Jarosław (Jarek) Duda from Jagiellonian University, used in data compression since 2014 due to improved performance compared to previous methods. ANS combines the compression ratio of arithmetic coding, with a processing cost similar to that of Huffman coding. In the tabled ANS (tANS) variant, this is achieved by constructing a finite-state machine to operate on a large alphabet without using multiplication.

JPEG XS is an interoperable, visually lossless, low-latency and lightweight image and video coding system used in professional applications. Applications of the standard include streaming high quality content for virtual reality, drones, autonomous vehicles using cameras, gaming, and broadcasting. In this respect, JPEG XS is unique, being the first ISO codec ever designed for this specific purpose. JPEG XS, built on core technology from both intoPIX and Fraunhofer IIS, is formally standardized as ISO/IEC 21122 by the Joint Photographic Experts Group with the first edition published in 2019. Although not official, the XS acronym was chosen to highlight the eXtra Small and eXtra Speed characteristics of the codec. Today, the JPEG committee is still actively working on further improvements to XS, with the second edition scheduled for publication and initial efforts being launched towards a third edition.

References

  1. Richardson, Iain E. G., H.264 / MPEG-4 Part 10 White Paper, 17 October 2002.
  2. Richardson, Iain E. G. (2003). H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia. Chichester: John Wiley & Sons Ltd.
  3. Ze-Nian Li; Mark S. Drew; Jiangchuan Liu (9 April 2014). Fundamentals of Multimedia. Springer Science & Business Media. ISBN   978-3-319-05290-8.
  4. Marpe, D., Schwarz, H., and Wiegand, T., Context-Based Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard, IEEE Trans. Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 620–636, July, 2003.
  5. 1 2 "T.81 – DIGITAL COMPRESSION AND CODING OF CONTINUOUS-TONE STILL IMAGES – REQUIREMENTS AND GUIDELINES" (PDF). CCITT. September 1992. Retrieved 12 July 2019.
  6. U.S. Patent 4,652,856
  7. Arps, R. B.; Truong, T. K.; Lu, D. J.; Pasco, R. C.; Friedman, T. D. (November 1988). "A multi-purpose VLSI chip for adaptive data compression of bilevel images". IBM Journal of Research and Development. 32 (6): 775–795. doi:10.1147/rd.326.0775. ISSN   0018-8646.
  8. Pennebaker, W. B.; Mitchell, J. L.; Langdon, G. G.; Arps, R. B. (November 1988). "An overview of the basic principles of the Q-Coder adaptive binary arithmetic coder". IBM Journal of Research and Development. 32 (6): 717–726. doi:10.1147/rd.326.0717. ISSN   0018-8646.
  9. "Recommendation T.81 (1992) Corrigendum 1 (01/04)". Recommendation T.81 (1992). International Telecommunication Union. 9 November 2004. Retrieved 3 February 2011.
  10. JPEG Still Image Data Compression Standard, W. B. Pennebaker and J. L. Mitchell, Kluwer Academic Press, 1992. ISBN   0-442-01272-1
  11. DCT Coding for Motion Video Storage using Adaptive Arithmetic Coding, C. A. Gonzales. L. Allman, T. McCarthy, P. Wendt and A. N. Akansu, Signal Processing: Image Communication, 2, 145, 1990.
  12. Encoding of motion video sequences for the MPEG environment using arithmetic coding, E. Viscito and C. Gonzales, SPIE Visual Communications and Image Processing '90, October 2–4, 1990.
  13. Ortega, A. (October 1999). "Embedded image-domain compression using context models". Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348). Vol. 1. pp. 477–481 vol.1. doi:10.1109/ICIP.1999.821655. ISBN   0-7803-5467-2. S2CID   27303716.
  14. "Context-Based Adaptive Binary Arithmetic Coding (CABAC)". Fraunhofer Heinrich Hertz Institute. Retrieved 13 July 2019.
  15. "AVC/H.264 – Patent List" (PDF). MPEG LA. Retrieved 6 July 2019.