Coding tree unit (CTU) is the basic processing unit of the High Efficiency Video Coding (HEVC) video standard and conceptually corresponds in structure to macroblock units that were used in several previous video standards. [1] [2] CTU is also referred to as largest coding unit (LCU). [3]
A CTU can be between 16×16 pixels and 64×64 pixels in size with a larger size usually increasing coding efficiency. [4] [2] The first video standard that uses CTUs is HEVC/H.265 which became an ITU-T standard on April 13, 2013. [5] [6] [7]
Macroblock encoding methods have been used in digital video coding standards since H.261 which was first released in 1988. However, for error correction and signal-to-noise ratio the standard 16x16 macroblock size is not capable of getting the kind of bit reductions that information theory and coding theory suggest are theoretically and practically possible. [8]
HEVC replaces macroblocks, which were used with previous video standards, with CTUs which can use larger block structures of up to 64×64 pixels and can better sub-partition the picture into variable sized structures. [4] [9]
HEVC initially divides the picture into CTUs which are then divided for each luma/chroma component into coding tree blocks (CTBs). [4] [9]
A CTB can be 64×64, 32×32, or 16×16 with a larger pixel block size usually increasing the coding efficiency. [4] CTBs are then divided into one or more coding units (CUs), so that the CTU size is also the largest coding unit size. [4]
At the July 2012 HEVC meeting it was decided, based on proposal JCTVC-J0334, that HEVC level 5 and higher would be required to use CTB sizes of either 32×32 or 64×64. [3] [12] This was added to HEVC in the Draft International Standard as a level limit for the Log2MaxCtbSize variable. [13]
Log2MaxCtbSize was renamed CtbSizeY in the October 2012 HEVC draft and then renamed CtbLog2SizeY in the January 2013 HEVC draft. [10] [14]
The design of most video coding standards is primarily aimed at having the highest coding efficiency. [2] Coding efficiency is the ability to encode video at the lowest possible bit rate while maintaining a certain level of video quality. [2] HEVC benefits from the use of larger CTB sizes. [2]
This has been shown in peak signal-to-noise ratio (PSNR) tests with a HM-8.0 HEVC encoder where it was forced to use progressively smaller CTU sizes. [2] For all test sequences when compared to a 64×64 CTU size it was shown that the HEVC bit rate increased by 2.2% when forced to use a 32×32 CTU size and increased by 11.0% when forced to use a 16×16 CTU size. [2]
In the Class A test sequences, where the resolution of the video was 2560×1600, when compared to a 64×64 CTU size it was shown that the HEVC bit rate increased by 5.7% when forced to use a 32×32 CTU size and increased by 28.2% when forced to use a 16×16 CTU size. [2]
The tests showed that large CTU sizes become even more important for coding efficiency with higher resolution video. [2] The tests also showed that it took 60% longer to decode HEVC video encoded at 16×16 CTU size than at 64×64 CTU size. [2] The tests showed that large CTU sizes increase coding efficiency while also reducing decoding time. [2] The tests were conducted with the Main profile of HEVC based on equal PSNR. [2]
Video test sequences | Maximum CTU size used in video encoding in comparison to 64×64 CTUs | ||
---|---|---|---|
64×64 CTUs | 32×32 CTUs | 16×16 CTUs | |
Class A (2560×1600 pixels) | 0% | 5.7% | 28.2% |
Class B (1920×1080 pixels) | 0% | 3.7% | 18.4% |
Class C (832×480 pixels) | 0% | 1.8% | 8.5% |
Class D (416×240 pixels) | 0% | 0.8% | 4.2% |
Overall | 0% | 2.2% | 11.0% |
Encoding time | 100% | 82% | 58% |
Decoding time | 100% | 111% | 160% |
The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by ISO and IEC that sets standards for media coding, including compression coding of audio, video, graphics, and genomic data; and transmission and file formats for various applications. Together with JPEG, MPEG is organized under ISO/IEC JTC 1/SC 29 – Coding of audio, picture, multimedia and hypermedia information.
A compression artifact is a noticeable distortion of media caused by the application of lossy compression. Lossy data compression involves discarding some of the media's data so that it becomes small enough to be stored within the desired disk space or transmitted (streamed) within the available bandwidth. If the compressor cannot store enough data in the compressed version, the result is a loss of quality, or introduction of artifacts. The compression algorithm may not be intelligent enough to discriminate between distortions of little subjective importance and those objectionable to the user.
Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. It supports a maximum resolution of 8K UHD.
H.261 is an ITU-T video compression standard, first ratified in November 1988. It is the first member of the H.26x family of video coding standards in the domain of the ITU-T Study Group 16 Video Coding Experts Group. It was the first video coding standard that was useful in practical terms.
Audio Video Coding Standard (AVS) refers to the digital audio and digital video series compression standard formulated by the Audio and Video coding standard workgroup of China. Work began in 2002, and three generations of standards were published.
Color depth or colour depth, also known as bit depth, is either the number of bits used to indicate the color of a single pixel, or the number of bits used for each color component of a single pixel. When referring to a pixel, the concept can be defined as bits per pixel (bpp). When referring to a color component, the concept can be defined as bits per component, bits per channel, bits per color, and also bits per pixel component, bits per color channel or bits per sample (bps). Modern standards tend to use bits per component, but historical lower-depth systems used bits per pixel more often.
An inter frame is a frame in a video compression stream which is expressed in terms of one or more neighboring frames. The "inter" part of the term refers to the use of Inter frame prediction. This kind of prediction tries to take advantage from temporal redundancy between neighboring frames enabling higher compression rates.
JPEG XR is an image compression standard for continuous tone photographic images, based on the HD Photo specifications that Microsoft originally developed and patented. It supports both lossy and lossless compression, and is the preferred image format for Ecma-388 Open XML Paper Specification documents.
The Video Coding Experts Group or Visual Coding Experts Group is a working group of the ITU Telecommunication Standardization Sector (ITU-T) concerned with standards for compression coding of video, images, audio signals, biomedical waveforms, and other signals. It is responsible for standardization of the "H.26x" line of video coding standards, the "T.8xx" line of image coding standards, and related technologies.
The macroblock is a processing unit in image and video compression formats based on linear block transforms, typically the discrete cosine transform (DCT). A macroblock typically consists of 16×16 samples, and is further subdivided into transform blocks, and may be further subdivided into prediction blocks. Formats which are based on macroblocks include JPEG, where they are called MCU blocks, H.261, MPEG-1 Part 2, H.262/MPEG-2 Part 2, H.263, MPEG-4 Part 2, and H.264/MPEG-4 AVC. In H.265/HEVC, the macroblock as a basic processing unit has been replaced by the coding tree unit.
Gary Joseph Sullivan is an American electrical engineer who led the development of the AVC, HEVC, and VVC video coding standards and created the DirectX Video Acceleration (DXVA) API/DDI video decoding feature of the Microsoft Windows operating system. He is currently Director of Video Research and Standards at Dolby Laboratories and is the chair of ISO/IEC JTC 1/SC 29 and of the ITU-T Video Coding Experts Group (VCEG).
A deblocking filter is a video filter applied to decoded compressed video to improve visual quality and prediction performance by smoothing the sharp edges which can form between macroblocks when block coding techniques are used. The filter aims to improve the appearance of decoded pictures. It is a part of the specification for both the SMPTE VC-1 codec and the ITU H.264 codec.
High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard designed as part of the MPEG-H project as a successor to the widely used Advanced Video Coding. In comparison to AVC, HEVC offers from 25% to 50% better data compression at the same level of video quality, or substantially improved video quality at the same bit rate. It supports resolutions up to 8192×4320, including 8K UHD, and unlike the primarily 8-bit AVC, HEVC's higher fidelity Main 10 profile has been incorporated into nearly all supporting hardware.
Michael J. Horowitz is an American electrical engineer who actively participated in the creation of the H.264/MPEG-4 AVC and H.265/HEVC video coding standards. He is co-inventor of flexible macroblock ordering (FMO) and tiles, essential features in H.264/MPEG-4 AVC and H.265/HEVC, respectively. He is managing partner of Applied Video Compression and has served on the technical advisory boards of Vivox, Inc., Vidyo, Inc., and RipCode, Inc.
x265 is an encoder for creating digital video streams in the High Efficiency Video Coding (HEVC/H.265) video compression format developed by the Joint Collaborative Team on Video Coding (JCT-VC). It is available as a command-line app or a software library, under the terms of GNU General Public License (GPL) version 2 or later; however, customers may request a commercial license.
ITU-R Recommendation BT.2020, more commonly known by the abbreviations Rec. 2020 or BT.2020, defines various aspects of ultra-high-definition television (UHDTV) with standard dynamic range (SDR) and wide color gamut (WCG), including picture resolutions, frame rates with progressive scan, bit depths, color primaries, RGB and luma-chroma color representations, chroma subsamplings, and an opto-electronic transfer function. The first version of Rec. 2020 was posted on the International Telecommunication Union (ITU) website on August 23, 2012, and two further editions have been published since then.
High Efficiency Video Coding tiers and levels are constraints that define a High Efficiency Video Coding (HEVC) bitstream in terms of maximum bit rate, maximum luma sample rate, maximum luma picture size, minimum compression ratio, maximum number of slices allowed, and maximum number of tiles allowed. Lower tiers are more constrained than higher tiers and lower levels are more constrained than higher levels.
VP9 is an open and royalty-free video coding format developed by Google.
MPEG-H is a group of international standards under development by the ISO/IEC Moving Picture Experts Group (MPEG). It has various "parts" – each of which can be considered a separate standard. These include a media transport protocol standard, a video compression standard, an audio compression standard, a digital file format container standard, three reference software packages, three conformance testing standards, and related technologies and technical reports. The group of standards is formally known as ISO/IEC 23008 – High efficiency coding and media delivery in heterogeneous environments. Development of the standards began around 2010, and the first fully approved standard in the group was published in 2013. Most of the standards in the group have been revised or amended several times to add additional extended features since their first edition.
High Efficiency Video Coding implementations and products covers the implementations and products of High Efficiency Video Coding (HEVC).