Scalable Video Coding

Last updated July 05, 2021

Scalable Video Coding: (SVC) is the name for the Annex G extension of the H.264/MPEG-4 AVC video compression standard. SVC standardizes the encoding of a high-quality video bitstream that also contains one or more subset bitstreams (a form of layered coding). A subset video bitstream is derived by dropping packets from the larger video to reduce the bandwidth required for the subset bitstream. The subset bitstream can represent a lower spatial resolution (smaller screen), lower temporal resolution (lower frame rate), or lower quality video signal. H.264/MPEG-4 AVC was developed jointly by ITU-T and ISO/IEC JTC 1. These two groups created the Joint Video Team (JVT) to develop the H.264/MPEG-4 AVC standard.

Overview

The objective of the SVC standardization has been to enable the encoding of a high-quality video bitstream that contains one or more subset bitstreams that can themselves be decoded with a complexity and reconstruction quality similar to that achieved using the existing H.264/MPEG-4 AVC design with the same quantity of data as in the subset bitstream. The subset bitstream is derived by dropping packets from the larger bitstream.

A subset bitstream can represent a lower spatial resolution (smaller screen), or a lower temporal resolution (lower frame rate), or a lower quality video signal (each separately or in combination) compared to the bitstream it is derived from. The following modalities are possible:

Temporal (frame rate) scalability: the motion compensation dependencies are structured so that complete pictures (i.e. their associated packets) can be dropped from the bitstream. Temporal scalability is already enabled by H.264/MPEG-4 AVC (also it's available in some other formats, such that VP8 ^[1]). SVC has only provided supplemental enhancement information to improve its usage.
Spatial (picture size) scalability: video is coded at multiple spatial resolutions. The data and decoded samples of lower resolutions can be used to predict data or samples of higher resolutions in order to reduce the bit rate to code the higher resolutions.
SNR/Quality/Fidelity scalability: video is coded at a single spatial resolution but at different qualities. The data and decoded samples of lower qualities can be used to predict data or samples of higher qualities in order to reduce the bit rate to code the higher qualities.
Combined scalability: a combination of the 3 scalability modalities described above.

SVC enables forward compatibility for older hardware: the same bitstream can be consumed by basic hardware which can only decode a low-resolution subset (i.e. 720p or 1080i), while more advanced hardware will be able decode high quality video stream (1080p).

Background and applications

Bit-stream scalability for video is a desirable feature for many multimedia applications. The need for scalability arises from graceful degradation transmission requirements, or adaptation needs for spatial formats, bit rates or power. To fulfill these requirements, it is beneficial that video is simultaneously transmitted or stored with a variety of spatial or temporal resolutions or qualities which is the purpose of video bit-stream scalability.

Traditional digital video transmission and storage systems are based on H.222.0/MPEG-2 TS systems for broadcasting services over satellite, cable, and terrestrial transmission channels, and for DVD storage, or on H.320 for conversational video conferencing services. These channels are typically characterized by a fixed spatio-temporal format of the video signal (SDTV or HDTV or CIF for H.320 video telephone). The application behavior in such systems typically falls into one of the two categories: it works or it doesn't work.

Modern video transmission and storage systems using the Internet and mobile networks are typically based on RTP/IP for real-time services (conversational and streaming) and on computer file formats like mp4 or 3gp. Most RTP/IP access networks are typically characterized by a wide range of connection qualities and receiving devices. The varying connection quality results from adaptive resource sharing mechanisms of these networks addressing the time varying data throughput requirements of a varying number of users. The variety of devices with different capabilities ranging from cell phones with small screens and restricted processing power to high-end PCs with high-definition displays results from the continuous evolution of these endpoints.

Scalable video coding (SVC) is one solution to the problems posed by the characteristics of modern video transmission systems. The following video applications can benefit from SVC:

Streaming
Conferencing
Surveillance
Broadcast
Storage

History and timeline

October 2003: The Moving Picture Experts Group (MPEG) issued a call for proposals on SVC Technology.
April 2004: Fourteen proposals were submitted; twelve were based on compression by wavelets, and two were extensions of H.264/MPEG-4 AVC.
October 2004: The proposal made by the image communication group of the Heinrich-Hertz-Institute (HHI) was chosen by MPEG as the starting point of its SVC standardization project.
January 2005: MPEG and the Video Coding Experts Group (VCEG) did agree to standardize the SVC project as an amendment of the H.264/MPEG-4 AVC standard.
July 2007: The SVC project received final approval^{[ clarification needed ]}

Profiles and levels

As a result of the Scalable Video Coding extension, the standard contains five additional scalable profiles: Scalable Baseline, Scalable High, Scalable High Intra, Scalable Constrained Baseline and Scalable Constrained High Profile. These profiles are defined as a combination of the H.264/MPEG-4 AVC profile for the base layer (2nd word in scalable profile name) and tools that achieve the scalable extension:

Scalable Baseline Profile: Mainly targeted for conversational, mobile, and surveillance applications.
- A bitstream conforming to Scalable Baseline profile contains a base layer bitstream that conforms to a restricted version of Baseline profile of H.264/MPEG-4 AVC.
- Supports B slices, weighted prediction, CABAC entropy coding, and 8×8 luma transform in enhancement layers (CABAC and the 8×8 transform are only supported for certain levels), although the base layer has to conform to the restricted Baseline profile, which does not support these tools. Coding tools for interlaced sources are not included.
- Spatial scalable coding is restricted to resolution ratios of 1.5 and 2 between successive spatial layers in both horizontal and vertical direction and to macroblock-aligned cropping.
- Quality and temporal scalable coding are supported without any restriction.
Scalable High Profile: Primarily designed for broadcast, streaming, storage and videoconferencing applications.
- A bitstream conforming to Scalable High profile contains a base layer bitstream that conforms to High profile of H.264/MPEG-4 AVC.
- Supports all tools specified in the Scalable Video Coding extension.
- Spatial scalable coding without any restriction, i.e., arbitrary resolution ratios and cropping parameters is supported.
- Quality and temporal scalable coding are supported without any restriction.
Scalable High Intra Profile: Mainly designed for professional applications.
- Uses Instantaneous Decoder Refresh (IDR) pictures only. IDR pictures can be decoded without reference to previous frames.
- A bitstream conforming to Scalable High Intra profile contains a base layer bitstream that conforms to High profile of H.264/MPEG-4 AVC with only IDR pictures allowed.
- All scalability tools are allowed as in Scalable High profile but only IDR pictures are permitted in any layer.
Scalable Constrained Baseline Profile
Scalable Constrained High Profile

Related Research Articles

MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s without excessive quality loss, making video CDs, digital cable/satellite TV and digital audio broadcasting (DAB) practical.

MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth. While MPEG-2 is not as efficient as newer standards such as H.264/AVC and H.265/HEVC, backwards compatibility with existing hardware and software means it is still widely used, for example in over-the-air digital television broadcasting and in the DVD-Video standard.

MPEG-4 is a method of defining compression of audio and visual (AV) digital data. It was introduced in late 1998 and designated a standard for a group of audio and video coding formats and related technology agreed upon by the ISO/IEC Moving Picture Experts Group (MPEG) under the formal standard ISO/IEC 14496 – Coding of audio-visual objects. Uses of MPEG-4 include compression of AV data for Internet video and CD distribution, voice and broadcast television applications. The MPEG-4 standard was developed by a group led by Touradj Ebrahimi and Fernando Pereira.

A video codec is software or hardware that compresses and decompresses digital video. In the context of video compression, codec is a portmanteau of encoder and decoder, while a device that only compresses is typically called an encoder, and one that only decompresses is a decoder.

A bitstream format is the format of the data found in a stream of bits used in a digital communication or data storage application. The term typically refers to the data format of the output of an encoder, or the data format of the input to a decoder when using data compression.

Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, Advanced Video Coding, is a video compression standard based on block-oriented, motion-compensated integer-DCT coding. It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. It supports resolutions up to and including 8K UHD.

H.262 or MPEG-2 Part 2 is a video coding format standardised and jointly maintained by ITU-T Study Group 16 Video Coding Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG), and developed with the involvement of many companies. It is the second part of the ISO/IEC MPEG-2 standard. The ITU-T Recommendation H.262 and ISO/IEC 13818-2 documents are identical.

1080p is a set of HDTV high-definition video modes characterized by 1,920 pixels displayed across the screen horizontally and 1,080 pixels down the screen vertically; the p stands for progressive scan, i.e. non-interlaced. The term usually assumes a widescreen aspect ratio of 16:9, implying a resolution of 2.1 megapixels. It is often marketed as Full HD or FHD, to contrast 1080p with 720p resolution screens. Although 1080p is sometimes informally referred to as 2K, these terms reflect two distinct technical standards, with differences including resolution and aspect ratio.

MPEG-4 Part 2, MPEG-4 Visual is a video compression format developed by the Moving Picture Experts Group (MPEG). It belongs to the MPEG-4 ISO/IEC standards. It is a discrete cosine transform (DCT) compression standard, similar to previous standards such as MPEG-1 Part 2 and H.262/MPEG-2 Part 2.

Α video codec is software or a device that provides encoding and decoding for digital video, and which may or may not include the use of video compression and/or decompression. Most codecs are typically implementations of video coding formats.

ATI Avivo is a set of hardware and low level software features present on the ATI Radeon R520 family of GPUs and all later ATI Radeon products. ATI Avivo was designed to offload video decoding, encoding, and post-processing from a computer's CPU to a compatible GPU. ATI Avivo compatible GPUs have lower CPU usage when a player and decoder software that support ATI Avivo is used. ATI Avivo has been long superseded by Unified Video Decoder (UVD) and Video Coding Engine (VCE).

Gary Joseph Sullivan is an American electrical engineer who led the development of the AVC, HEVC, and VVC video coding standards and created the DirectX Video Acceleration (DXVA) API/DDI video decoding feature of the Microsoft Windows operating system.

The following is a list of H.264/MPEG-4 AVC products and implementations.

The Network Abstraction Layer (NAL) is a part of the H.264/AVC and HEVC video coding standards. The main goal of the NAL is the provision of a "network-friendly" video representation addressing "conversational" and "non conversational" applications. NAL has achieved a significant improvement in application flexibility relative to prior video coding standards.

High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard designed as part of the MPEG-H project as a successor to the widely used Advanced Video Coding. In comparison to AVC, HEVC offers from 25% to 50% better data compression at the same level of video quality, or substantially improved video quality at the same bit rate. It supports resolutions up to 8192×4320, including 8K UHD, and unlike the primarily 8-bit AVC, HEVC's higher fidelity Main10 profile has been incorporated into nearly all supporting hardware.

A video coding format is a content representation format for storage or transmission of digital video content. It typically uses a standardized video compression algorithm, most commonly based on discrete cosine transform (DCT) coding and motion compensation. Examples of video coding formats include H.262, MPEG-4 Part 2, H.264, HEVC (H.265), Theora, RealVideo RV40, VP9, and AV1. A specific software or hardware implementation capable of compression or decompression to/from a specific video coding format is called a video codec; an example of a video codec is Xvid, which is one of several different codecs which implements encoding and decoding videos in the MPEG-4 Part 2 video coding format in software.

High Efficiency Video Coding tiers and levels are constraints that define a High Efficiency Video Coding (HEVC) bitstream in terms of maximum bit rate, maximum luma sample rate, maximum luma picture size, minimum compression ratio, maximum number of slices allowed, and maximum number of tiles allowed. Lower tiers are more constrained than higher tiers and lower levels are more constrained than higher levels.

AOMedia Video 1 (AV1) is an open, royalty-free video coding format initially designed for video transmissions over the Internet. It was developed as a successor to VP9 by the Alliance for Open Media (AOMedia), a consortium founded in 2015 that includes semiconductor firms, video on demand providers, video content producers, software development companies and web browser vendors. The AV1 bitstream specification includes a reference video codec. In 2018, Facebook conducted testing that approximated real world conditions, and the AV1 reference encoder achieved 34%, 46.2% and 50.3% higher data compression than libvpx-vp9, x264 high profile, and x264 main profile respectively.

Versatile Video Coding (VVC), also known as H.266, ISO/IEC 23090-3, MPEG-I Part 3 and Future Video Coding (FVC), is a video compression standard finalized on 6 July 2020, by the Joint Video Experts Team (JVET), a joint video expert team of the VCEG working group of ITU-T Study Group 16 and the MPEG working group of ISO/IEC JTC 1. It is the successor to High Efficiency Video Coding. The aim is to make 4K broadcast and streaming commercially viable.

References

↑ Draft IETF, P. Westin, H. Lundin, M. Glover, J. Uberti, F. Galligan, "RTP Payload Format for VP8 Video"

External links

Introduction and overview

(Wayback Machine copy)

MPEG - Technologies - Overview of Scalable Video Coding (chiariglione.org)

Standardization committee

Miscellaneous

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Draft IETF, P. Westin, H. Lundin, M. Glover, J. Uberti, F. Galligan, "RTP Payload Format for VP8 Video"

[1]