Arbitrary slice ordering

Last updated October 14, 2021

Arbitrary slice ordering (ASO) in digital video, is an algorithm for loss prevention. It is used for restructuring the ordering of the representation of the fundamental regions (macroblocks) in pictures. This type of algorithm avoids the need to wait for a full set of scenes to get all sources. Typically considered as an error/loss robustness feature.

Applications

Primarily for lower-cost applications with limited computing resources, this profile is used widely in videoconferencing, mobile applications and security applications also.

Arbitrary Slice Ordering (ASO) relaxes the constraint that all macroblocks must be sequenced in decoding order, and thus enhances flexibility for low-delay performance important in teleconferencing applications and interactive Internet applications.

Problems

If ASO across pictures is supported in AVC, serious issues arise: slices from different pictures are interleaved. One possible way to solve these issues is to limit ASO within a picture, i.e. slices from different pictures are not interleaved.

However, even if we limit ASO within a picture, the decoder complexity is significantly increased. Because Flexible Macroblock Order FMO extend the concept of slices by allowing non-consecutive macroblocks to belong to the same slice, this section also addresses the decoder complexity introduced by (FMO).

Types of decoding ASO

Association of macroblocks to slice

Impact of ASO on AVC decoders complexity

An example of how macroblocks can be associated to different slices is shown in Figure 1. When ASO is supported, the four slices of this example can be received by the decoder in a random order. Figure 2 shown the following receiving order: slice #4, slice #3, slice #1, and slice #2. The same figure presents the AVC decoder blocks required to support ASO decoding.

Figure 1: An example of macroblock assignment to four slices. Each slice is represented by a different texture.

Figure 2: The AVC decoder blocks need to support ASO decoding.

For each slice, the slice length and the macroblock address (i.e. index with respect to the raster scan order) of the first macroblock (MB) of the slice are extracted by the slice parser (Figure 2). This information, together with the slice itself, is stored in memory (shown as DRAM). In addition, a list of pointers (Figure 2, a pointer for each slice, and each pointing to the memory location where a slice is stored), should be generated. The list of pointers, together with the address of the first macroblock of the slice, will be used to navigate through the out of order slices. The slice length will be used to transfer the slice data from the DRAM to the decoder's internal memory.

Faced with the necessity to decode out of order slices, a decoder may:

1) wait for all the slices of each picture to arrive before start decoding and de-blocking the picture.
2) decode the slices in the order in which they come to the decoder.

The first method increases latency, but allows performing decoding and de-blocking in parallel. However, managing a large number of pointers (in the worst case, one pointer for each MB) and increasing the intelligence of the DRAM access unit increase the decoder complexity.

The second method hurts significantly the decoder performance. In addition, by performing the de-blocking in a second pass, the DRAM to processor's memory bandwidth is increased.

Decoding slices in the order they are received can result in additional memory consumption or impose higher throughput requirements on the decoder and local memory to run at higher clock speed. Consider an application in which the display operation reads the pictures to be displayed right from the section of memory where the decoder stored the pictures.

Association of macroblocks to slice and slices to group of slices

Impact of ASO and FMO on AVC decoders complexity

An example of how slices can be associated to different slice group is shown in Figure 3. When ASO and FMO are supported, the four slices of this example can be received by the decoder in a random order. Figure 2 shown the following order: slice #4, slice #2, slice #1, and slice #3. The same figure presents the AVC decoder blocks required to support ASO and FMO decoding.

Figure 3: An example of macroblock assignment to four slices and to two ''Slice'' Group (SG in the figure). Each slice is represented by a different texture, and each Slice Group is represented a different color.

Figure 4: The AVC decoder blocks need to support ASO and FMO decoding.

In addition to the slice length and the macroblock address of the 1st macroblock (MB) of the slice, the slice parser (Figure 4) need to extract the Slice Group (SG) of each slice. These informations, together with the slice itself, are stored in DRAM. As in the ASO case, the list of pointers (Figure 4) should be generated.

The list of pointers, together with the address of the 1st MB of the slice, the SG, and the mb_allocation_map (stored in the processor's local memory), will be used to navigate through the slices. The slice length will be used to transfer the slice data from the DRAM to the processor local memory.

Similarly to the ASO case, in the combined ASO and FMO case the decoder may:

1) wait for all the slices of each picture to arrive before start decoding and de-blocking the picture.
2) decode the slices in the order in which they come to the decoder.

The first approach is still the preferred one. Because of FMO, decoding macroblocks in raster scan order may require to switch between different slices and/or slice groups. To speed up the DRAM access, one buffer for each Slice Group must be used (Figure 4). This additional intelligence of the DRAM access unit further increase the decoder complexity. Moreover, switching between different slices and/or slice groups requires swapping the Entropy Decoder (ED) status information. In the worst case, swapping occurs after decoding each macroblock. If the entire Entropy Decoder status information is too large to be stored in the processor local memory, each ED status need to be loaded from and stored into DRAM, thus further increasing the DRAM to processor's memory bandwidth (Figure 4).

Related Research Articles

A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions in the program. This contrasts with external components such as main memory and I/O circuitry, and specialized processors such as graphics processing units (GPUs).

Cache (computing) Data storage that enables faster access

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s without excessive quality loss, making video CDs, digital cable/satellite TV and digital audio broadcasting (DAB) practical.

A video codec is software or hardware that compresses and decompresses digital video. In the context of video compression, codec is a portmanteau of encoder and decoder, while a device that only compresses is typically called an encoder, and one that only decompresses is a decoder.

Dynamic random-access memory Type of computer memory

Dynamic random-access memory is a type of random-access semiconductor memory that stores each bit of data in a memory cell consisting of a tiny capacitor and a transistor, both typically based on metal-oxide-semiconductor (MOS) technology. The capacitor can either be charged or discharged; these two states are taken to represent the two values of a bit, conventionally called 0 and 1. The electric charge on the capacitors slowly leaks off, so without intervention the data on the chip would soon be lost. To prevent this, DRAM requires an external memory refresh circuit which periodically rewrites the data in the capacitors, restoring them to their original charge. This refresh process is the defining characteristic of dynamic random-access memory, in contrast to static random-access memory (SRAM) which does not require data to be refreshed. Unlike flash memory, DRAM is volatile memory, since it loses its data quickly when power is removed. However, DRAM does exhibit limited data remanence.

Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, Advanced Video Coding, is a video compression standard based on block-oriented, motion-compensated integer-DCT coding. It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. It supports resolutions up to and including 8K UHD.

In the field of video compression a video frame is compressed using different algorithms with different advantages and disadvantages, centered mainly around amount of data compression. These different algorithms for video frames are called picture types or frame types. The three major picture types used in the different video algorithms are I, P and B. They are different in the following characteristics:

A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels, with separate instruction-specific and data-specific caches at level 1.

An inter frame is a frame in a video compression stream which is expressed in terms of one or more neighboring frames. The "inter" part of the term refers to the use of Inter frame prediction. This kind of prediction tries to take advantage from temporal redundancy between neighboring frames enabling higher compression rates.

H.262 or MPEG-2 Part 2 is a video coding format standardised and jointly maintained by ITU-T Study Group 16 Video Coding Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG), and developed with the involvement of many companies. It is the second part of the ISO/IEC MPEG-2 standard. The ITU-T Recommendation H.262 and ISO/IEC 13818-2 documents are identical.

x264 is a free and open-source software library and a command-line utility developed by VideoLAN for encoding video streams into the H.264/MPEG-4 AVC video coding format. It is released under the terms of the GNU General Public License.

Α video codec is software or a device that provides encoding and decoding for digital video, and which may or may not include the use of video compression and/or decompression. Most codecs are typically implementations of video coding formats.

A deblocking filter is a video filter applied to decoded compressed video to improve visual quality and prediction performance by smoothing the sharp edges which can form between macroblocks when block coding techniques are used. The filter aims to improve the appearance of decoded pictures. It is a part of the specification for both the SMPTE VC-1 codec and the ITU H.264 codec.

Scalable Video Coding: (SVC) is the name for the Annex G extension of the H.264/MPEG-4 AVC video compression standard. SVC standardizes the encoding of a high-quality video bitstream that also contains one or more subset bitstreams. A subset video bitstream is derived by dropping packets from the larger video to reduce the bandwidth required for the subset bitstream. The subset bitstream can represent a lower spatial resolution, lower temporal resolution, or lower quality video signal. H.264/MPEG-4 AVC was developed jointly by ITU-T and ISO/IEC JTC 1. These two groups created the Joint Video Team (JVT) to develop the H.264/MPEG-4 AVC standard.

Flexible Macroblock Ordering or FMO is one of several error resilience tools defined in the Baseline profile of the H.264/MPEG-4 AVC video compression standard.

The Network Abstraction Layer (NAL) is a part of the H.264/AVC and HEVC video coding standards. The main goal of the NAL is the provision of a "network-friendly" video representation addressing "conversational" and "non conversational" applications. NAL has achieved a significant improvement in application flexibility relative to prior video coding standards.

High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard designed as part of the MPEG-H project as a successor to the widely used Advanced Video Coding. In comparison to AVC, HEVC offers from 25% to 50% better data compression at the same level of video quality, or substantially improved video quality at the same bit rate. It supports resolutions up to 8192×4320, including 8K UHD, and unlike the primarily 8-bit AVC, HEVC's higher fidelity Main10 profile has been incorporated into nearly all supporting hardware.

Michael J. Horowitz is an American electrical engineer who actively participated in the creation of the H.264/MPEG-4 AVC and H.265/HEVC video coding standards. He is co-inventor of flexible macroblock ordering (FMO) and tiles, essential features in H.264/MPEG-4 AVC and H.265/HEVC, respectively. He is Managing Partner of Applied Video Compression and has served on the Technical Advisory Boards of Vivox, Inc., Vidyo, Inc., and RipCode, Inc.

A video coding format is a content representation format for storage or transmission of digital video content. It typically uses a standardized video compression algorithm, most commonly based on discrete cosine transform (DCT) coding and motion compensation. Examples of video coding formats include H.262, MPEG-4 Part 2, H.264, HEVC (H.265), Theora, RealVideo RV40, VP9, and AV1. A specific software or hardware implementation capable of compression or decompression to/from a specific video coding format is called a video codec; an example of a video codec is Xvid, which is one of several different codecs which implements encoding and decoding videos in the MPEG-4 Part 2 video coding format in software.

Coding tree unit (CTU) is the basic processing unit of the High Efficiency Video Coding (HEVC) video standard and conceptually corresponds in structure to macroblock units that were used in several previous video standards. CTU is also referred to as largest coding unit (LCU).

References

Iole Moccagatta, LSI Logic (2002). «Arbitrary Slice Order and Flexible Macroblock Order Impact of AVC Compliance and Implementation Complexity»

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.