VC-6

Last updated

VC-6 logo VC-6 logo.png
VC-6 logo

SMPTE ST 2117-1, [1] informally known as VC-6, is a video coding format. [2]

Contents

Overview

The VC-6 codec is optimized for intermediate, mezzanine or contribution coding applications. [2] Typically, these applications involve compressing finished compositions for editing, contribution, primary distribution, archiving and other applications where it is necessary to preserve image quality as close to the original as possible, whilst reducing bitrates, and optimizing processing, power and storage requirements. VC-6, like other codecs in this category [3] [4] uses only intra-frame compressions, where each frame is stored independently and can be decoded with no dependencies on any other frame. [5] The codec implements lossless and lossy compression, depending on the encoding parameters that have been selected. It was standardized in 2020. Earlier variants of the codec have been deployed by V-Nova since 2015 under the trade name Perseus. The codec is based on hierarchical data structures called s-trees, and does not involve DCT or wavelet transform compression. The compression mechanism is independent of the data being compressed, and can be applied to pixels as well as other non-image data. [6]

Unlike DCT based codecs, VC-6 is based on hierarchical, repeatable s-tree structures that are similar to modified quadtrees. These simple structures provide intrinsic capabilities, such as massive parallelism [7] and the ability to choose the type of filtering used to reconstruct higher-resolution images from lower-resolution images. [8] In the VC-6 standard [2] an up-sampler developed with an in-loop Convolutional Neural Network is provided to optimize the detail in the reconstructed image, without requiring a large computational overhead. The ability to navigate spatially within the VC-6 bitstream at multiple levels [2] also provides the ability for decoding devices to apply more resources to different regions of the image allowing for Region-of-Interest applications to operate on compressed bitstreams without requiring a decode of the full-resolution image. [9]

History

At the NAB Show in 2015, V-Nova claimed "2x–3x average compression gains, at all quality levels, under practical real-time operating scenarios versus H.264, HEVC and JPEG2000.". [10] Making this announcement on 1 April before a major trade show attracted the attention of many compression experts. [11] Since then, V-Nova have deployed and licensed the technology, known at the time as Perseus, [10] in both contribution and distribution applications around the world including Sky Italia, [12] Fast Filmz, [13] [14] Harmonic Inc, and others. A variant of the technology optimized for enhancing distribution codec will soon be standardized as MPEG-5 Part-2 LCEVC. [15] [16] [17]

Core concepts

Planes

The standard [2] describes a compression algorithm that is applied to independent planes of data. These planes might be RGB or RGBA pixels originating in a camera, YCbCr pixels from a conventional TV-centric video source or some other planes of data. There may be up to 255 independent planes of data, and each plane can have a grid of data values of dimensions up to 65535 x 65535. [18] The SMPTE ST 2117-1 standard focuses on compressing planes of data values, typically pixels. To compress and decompress the data in each plane, VC-6 uses hierarchical representations of small tree-like structure that carry metadata used to predict other trees. There are 3 fundamental structures repeated in each plane. [2]

S-tree

The core compression structure in VC-6 is the s-tree. It is similar to the quadtree structure common in other schemes. An s-tree is comprised nodes arranged in a tree structure, where each node links to 4 nodes in the next layer. The total number of layers above the root node is known as the rise of the s-tree. Compression is achieved in an s-tree by using metadata to signal whether levels can be predicted with selective carrying of enhancement data in the bitstream. The more data that can be predicted, the less information that is sent, and the better the compression ratio. [6] [2]

Tableau

The standard [2] defines a tableau as the root node, or the highest layer of an s-tree, that contains nodes for another s-tree. Like the generic s-trees from which they are constructed, tableaux are arranged in layers with metadata in the nodes indicating whether or not higher layers are predicted or transmitted in the bitstream. [6]

Echelon

The hierarchical s-tree and tableau structures in the standard [2] are used to carry enhancements (called resid-vals) and other metadata to reduce the amount of raw data that needs to be carried in the bitstream payload. The final hierarchical tool is an ability to arrange the tableaux, so that data from each plane (i.e. pixels) can be dequantized at different resolutions and used as predictors for higher resolutions. Each of these resolutions is defined by the standard [2] as an echelon. Each echelon within a plane is identified by an index, where a more negative index indicates a low resolution and a larger, more positive index indicates a higher resolution.

Bitstream overview

VC-6 is an example of intra-frame coding, where each picture is coded without referencing other pictures. It is also intra-plane, where no information from one plane is used to predict another plane. As a result, the VC-6 bitstream contains all of the information for all of the planes of a single image. [2] An image sequence is created by concatenating the bitstreams for multiple images, or by packaging them in a container such as MXF or Quicktime or Matroska.

The VC-6 bitstream is defined in the standard. [2] by pseudo code, and a reference decoder has been demonstrated based on that definition. The primary header is the only fixed structure defined by the standard. [2] The secondary header contains marker and sizing information depending on the values in the primary header. The tertiary header is entirely calculated, and then the payload structure is derived from the parameters calculated during header decoding [2]

Decoding overview

The standard [2] defines a process called plane reconstruction for decoding images from a bitstream. The process starts with the echelon having the lowest index. No predictions are used for this echelon. Firstly, the bitstream rules are used to reconstruct residuals. Next, desparsification and entropy decoding processes are performed to fill the grid with data values at each coordinate. These values are then dequantised to create full-range values that can be used as predictions for the echelon with the next highest index. Each echelon uses the upsampler specified in the header to create a predicted plane from the echelon below which is added to the residual grid from the current echelon that can be upsampled as a prediction for the next echelon. [19]

The final, full-resolution, echelon, defined by the standard, is at index 0, and its results are displayed, rather than used for another echelon. [2]

Upsampler options

Basic options

The standard [2] defines a number of basic upsamplers [20] to create higher-resolution predictions from lower-resolution echelons. There are two linear upsamplers, bicubic and sharp, and a nearest-neighbour upsampler.

Convolutional Neural Network Upsampler

Six different non-linear upsamplers are defined [2] by a set of processes and coefficients that are provided in JSON format. [20] These coefficients were generated using Convolutional Neural Network [21] techniques.

Related Research Articles

<span class="mw-page-title-main">Video codec</span> Digital video processing

A video codec is software or hardware that compresses and decompresses digital video. In the context of video compression, codec is a portmanteau of encoder and decoder, while a device that only compresses is typically called an encoder, and one that only decompresses is a decoder.

Dirac is an open and royalty-free video compression format, specification and system developed by BBC Research & Development. Schrödinger and dirac-research are open and royalty-free software implementations of Dirac. Dirac format aims to provide high-quality video compression for Ultra HDTV and beyond, and as such competes with existing formats such as H.264 and VC-1.

Windows Media Video (WMV) is a series of video codecs and their corresponding video coding formats developed by Microsoft. It is part of the Windows Media framework. WMV consists of three distinct codecs: The original video compression technology known as WMV, was originally designed for Internet streaming applications, as a competitor to RealVideo. The other compression technologies, WMV Screen and WMV Image, cater for specialized content. After standardization by the Society of Motion Picture and Television Engineers (SMPTE), WMV version 9 was adapted for physical-delivery formats such as HD DVD and Blu-ray Disc and became known as VC-1. Microsoft also developed a digital container format called Advanced Systems Format to store video encoded by Windows Media Video.

<span class="mw-page-title-main">Advanced Video Coding</span> Most widely used standard for video compression

Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. It supports a maximum resolution of 8K UHD.

SMPTE 421, informally known as VC-1, is a video coding format. Most of it was initially developed as Microsoft's proprietary video format Windows Media Video 9 in 2003. With some enhancements including the development of a new Advanced Profile, it was officially approved as a SMPTE standard on April 3, 2006. It was primarily marketed as a lower-complexity competitor to the H.264/MPEG-4 AVC standard. After its development, several companies other than Microsoft asserted that they held patents that applied to the technology, including Panasonic, LG Electronics and Samsung Electronics.

These tables compare features of multimedia container formats, most often used for storing or streaming digital video or digital audio content. To see which multimedia players support which container format, look at comparison of media players.

Dolby Digital Plus, also known as Enhanced AC-3, is a digital audio compression scheme developed by Dolby Labs for transport and storage of multi-channel digital audio. It is a successor to Dolby Digital (AC-3), also developed by Dolby, and has a number of improvements including support for a wider range of data rates, increased channel count and multi-program support, and additional tools (algorithms) for representing compressed data and counteracting artifacts. While Dolby Digital (AC-3) supports up to five full-bandwidth audio channels at a maximum bitrate of 640 kbit/s, E-AC-3 supports up to 15 full-bandwidth audio channels at a maximum bitrate of 6.144 Mbit/s.

Avid DNxHD is a lossy high-definition video post-production codec developed by Avid for multi-generation compositing with reduced storage and bandwidth requirements. It is an implementation of SMPTE VC-3 standard.

The first attempt at producing pre-recorded HDTV media was a scarce Japanese analog MUSE-encoded laser disc which is no longer produced.

H.264 and VC-1 are popular video compression standards gaining use in the industry as of 2007.

High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard designed as part of the MPEG-H project as a successor to the widely used Advanced Video Coding. In comparison to AVC, HEVC offers from 25% to 50% better data compression at the same level of video quality, or substantially improved video quality at the same bit rate. It supports resolutions up to 8192×4320, including 8K UHD, and unlike the primarily 8-bit AVC, HEVC's higher fidelity Main 10 profile has been incorporated into nearly all supporting hardware.

CineForm Intermediate is an open source video codec developed for CineForm Inc by David Taylor, David Newman and Brian Schunck. On March 30, 2011, the company was acquired by GoPro which in particular wanted to use the 3D film capabilities of the CineForm 444 Codec for its 3D HERO System.

A video coding format is a content representation format for storage or transmission of digital video content. It typically uses a standardized video compression algorithm, most commonly based on discrete cosine transform (DCT) coding and motion compensation. A specific software, firmware, or hardware implementation capable of compression or decompression to/from a specific video coding format is called a video codec.

<span class="mw-page-title-main">VP9</span> Open and royalty-free video coding format released by Google in 2013

VP9 is an open and royalty-free video coding format developed by Google.

JPEG XS is an interoperable, visually lossless, low-latency and lightweight image and video coding system used in professional applications. Applications of the standard include streaming high quality content for virtual reality, drones, autonomous vehicles using cameras, gaming, and broadcasting. In this respect, JPEG XS is unique, being the first ISO codec ever designed for this specific purpose. JPEG XS, built on core technology from both intoPIX and Fraunhofer IIS, is formally standardized as ISO/IEC 21122 by the Joint Photographic Experts Group with the first edition published in 2019. Although not official, the XS acronym was chosen to highlight the eXtra Small and eXtra Speed characteristics of the codec. Today, the JPEG committee is still actively working on further improvements to XS, with the second edition scheduled for publication and initial efforts being launched towards a third edition.

MPEG-5 Essential Video Coding (EVC) is a current video compression standard that has been completed in April 2020 by decision of MPEG Working Group 11 at its 130th meeting.

SMPTE 2110 is a suite of standards from the Society of Motion Picture and Television Engineers (SMPTE) that describes how to send digital media over an IP network.

<span class="mw-page-title-main">V-NOVA</span>

V-NOVA is a multinational IP and Technology company headquartered in London, UK. It is best known for innovation in data compression technology for video and images. V-Nova has partnered with large organizations including Sky, Xilinx, Nvidia, Eutelsat, and Amazon Web Services to provide its video compression technology.

Interoperable Master Format (IMF) is a container format for the standardized digital delivery and storage of finished audio-visual masters, including movies, episodic content and advertisements.

References

  1. "IEEE Xplore Search Results". ieeexplore.ieee.org. Retrieved 17 September 2020.
  2. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 "ST 2117-1:2020 - SMPTE Standard - VC-6 Multiplanar Picture Format — Part 1. Elementary Bitstream". St 2117-1:2020: 1–156. July 2020. doi:10.5594/SMPTE.ST2117-1.2020. ISBN   978-1-68303-219-9.
  3. "ST 2042-1:2012 - SMPTE Standard - VC-2 Video Compression". St 2042-1:2012: 1–137. August 2012. doi:10.5594/SMPTE.ST2042-1.2012. ISBN   978-1-61482-890-7.
  4. "ST 2019-1:2016 - SMPTE Standard - VC-3 Picture Compression and Data Stream Format". St 2019-1:2016: 1–108. June 2016. doi:10.5594/SMPTE.ST2019-1.2016. ISBN   978-1-68303-020-1.
  5. "ST 2073-1:2014 - SMPTE Standard - VC-5 Video Essence - Part 1: Elementary Bitstream". St 2073-1:2014: 1–50. March 2014. doi:10.5594/SMPTE.ST2073-1.2014. ISBN   978-1-61482-797-9.
  6. 1 2 3 "SMPTE Ratifies V-Nova's AI-Powered VC-6 Video Codec". Digital Media World. 7 October 2020.
  7. Hung, Yubin; Rosenfeld, Azriel (1 August 1989). "Parallel processing of linear quadtrees on a mesh-connected computer". Journal of Parallel and Distributed Computing. 7 (1): 1–27. doi:10.1016/0743-7315(89)90049-X. ISSN   0743-7315.
  8. Samet, Hanan (1988), "An Overview of Quadtrees, Octrees, and Related Hierarchical Data Structures", Theoretical Foundations of Computer Graphics and CAD, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 51–68, doi:10.1007/978-3-642-83539-1_2, ISBN   978-3-642-83541-4 , retrieved 9 September 2020
  9. S., V. G. (5 October 2020). "SMPTE Publishes New VC-6 Video Production Codec Standard". Sports Video Group.
  10. 1 2 "Review: V-Nova Perseus: Does its Compression Live Up to the Hype?". Streaming Media Magazine. 17 June 2016. Retrieved 4 September 2020.
  11. "Historical timeline of video coding standards and formats". Vcodex. Retrieved 30 July 2021.
  12. "Sky Italia chooses V-Nova to extend IPTV reach". Digital TV Europe.
  13. "India's FastFilmz taps V-Nova to deliver OTT to 2G phones". Digital TV Europe. 7 April 2016. Retrieved 9 September 2020.
  14. "SHAREit Acquires Fastfilmz To Increase Video Content, Regional Users". Inc42 Media. 8 May 2018. Retrieved 17 September 2020.
  15. "Low Complexity Enhancement Video Codec". LCEVC - A New Approach to Video Compression.
  16. "V-Nova announces MPEG-5 Part 2 LCEVC". TVB Europe.
  17. "Perseus politics leak out at NAB in wake of MPEG-5 revelation". Rethnk Research. 11 April 2019.
  18. "VC-6 Overview". mrmxf.com.
  19. ST 2117-1:2020 - SMPTE Standard - VC-6 Multiplanar Picture Format — Part 1. Elementary Bitstream. St 2117-1:2020. July 2020. pp. 1–156. doi:10.5594/SMPTE.ST2117-1.2020. ISBN   978-1-68303-219-9.
  20. 1 2 ST 2117-1 upsampler media element. St 2117-1:2020. 21 July 2020. pp. 1–156. doi:10.5594/SMPTE.ST2117-1.2020. ISBN   978-1-68303-219-9.
  21. Arabshahi, P. (May 1996). "Fundamentals of Artificial Neural Networks [Book Reviews]". IEEE Transactions on Neural Networks. 7 (3): 793. doi:10.1109/tnn.1996.501738. ISSN   1045-9227. S2CID   6576607.