Spectral band replication

Last updated
Spectrogram of this recording of a violin playing. Note the harmonics occurring at whole-number multiples of the fundamental frequency. SBR exploits this redundancy. Spectrogram of violin.png
Spectrogram of this recording of a violin playing. Note the harmonics occurring at whole-number multiples of the fundamental frequency. SBR exploits this redundancy.

Spectral band replication (SBR) is a technology to enhance audio or speech codecs, especially at low bit rates and is based on harmonic redundancy in the frequency domain.

Contents

It can be combined with any audio compression codec: the codec itself transmits the lower and midfrequencies of the spectrum, while SBR replicates higher frequency content by transposing up harmonics from the lower and midfrequencies at the decoder. [1] Some guidance information for reconstruction of the high-frequency spectral envelope is transmitted as side information.

When needed, it also reconstructs or adaptively mixes in noise-like information in selected frequency bands in order to faithfully replicate signals that originally contained no or fewer tonal components.

The SBR idea is based on the principle that the psychoacoustic part of the human brain tends to analyse higher frequencies with less accuracy; thus harmonic phenomena associated with the spectral band replication process needs only be accurate in a perceptual sense and not technically or mathematically exact.

History and use

A Swedish company Coding Technologies (acquired by Dolby in 2007) developed and pioneered the use of SBR in its MPEG-2 AAC-derived codec called aacPlus, which first appeared in 2001. This codec was submitted to MPEG and formed the basis of MPEG-4 High-Efficiency AAC (HE-AAC), standardized in 2003. [2] Lars Liljeryd, Kristofer Kjörling, and Martin Dietz received the IEEE Masaru Ibuka Consumer Electronics Award in 2013 for their work developing and marketing HE-AAC. [3] [4] Coding Technologies' SBR method has also been used with WMA 10 Professional to create WMA 10 Pro LBR, and with MP3 to create mp3PRO.

HE-AAC which uses SBR is used in broadcast systems like DAB+, Digital Radio Mondiale (including xHE-AAC), HD Radio, and XM Satellite Radio. [5]

If the player is not capable of using the side information that has been transmitted alongside the "normal" compressed audio data, it may still be able to play the "baseband" data (e.g. sampled at 22.05 kHz instead of 44.1 kHz) as usual, resulting in a dull (since the high frequencies are missing), but otherwise mostly acceptable sound. This is, for example, the case if an mp3PRO file is played back with MP3 software incapable of utilizing the SBR information.

Opus's CELT part performs spectral folding on the MDCT bin level, making it a far less advanced but lower-delay technique compared to SBR. [6]

Dolby Digital Plus (E-AC3) performs Spectral Extension (SPX). SPX reduces high-frequency components to metadata and is similar to E-AC3 multichannel coupling calculation. [7] Dolby AC-4 expands the technique to Advanced Spectral Extension (A-SPX), with the option of interleaving with regular, non-extended data in time or frequency domain. As a result, SPX can be selective disabled for difficult portions. [8]

Methods

Encoding of SBR produces a downsampled (usually 2:1) audio signal and guidance information. In an early publication, the guiding data is described as being produced by quadrature mirror filter (QMF) analysis and an envelope estimator. [9]

Decoding of SBR requires transposing harmonics, a case of audio time stretching and pitch scaling. [10]

See also

Related Research Articles

<span class="mw-page-title-main">MP3</span> Digital audio format

MP3 is a coding format for digital audio developed largely by the Fraunhofer Society in Germany under the lead of Karlheinz Brandenburg, with support from other digital scientists in other countries. Originally defined as the third audio format of the MPEG-1 standard, it was retained and further extended—defining additional bit rates and support for more audio channels—as the third audio format of the subsequent MPEG-2 standard. A third version, known as MPEG-2.5—extended to better support lower bit rates—is commonly implemented but is not a recognized standard.

MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s without excessive quality loss, making video CDs, digital cable/satellite TV and digital audio broadcasting (DAB) practical.

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

MPEG-1 Audio Layer II or MPEG-2 Audio Layer II is a lossy audio compression format defined by ISO/IEC 11172-3 alongside MPEG-1 Audio Layer I and MPEG-1 Audio Layer III (MP3). While MP3 is much more popular for PC and Internet applications, MP2 remains a dominant standard for audio broadcasting.

Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. It was designed to be the successor of the MP3 format and generally achieves higher sound quality than MP3 at the same bit rate.

The modified discrete cosine transform (MDCT) is a transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset, where subsequent blocks are overlapped so that the last half of one block coincides with the first half of the next block. This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid artifacts stemming from the block boundaries. As a result of these advantages, the MDCT is the most widely used lossy compression technique in audio data compression. It is employed in most modern audio coding standards, including MP3, Dolby Digital (AC-3), Vorbis (Ogg), Windows Media Audio (WMA), ATRAC, Cook, Advanced Audio Coding (AAC), High-Definition Coding (HDC), LDAC, Dolby AC-4, and MPEG-H 3D Audio, as well as speech coding standards such as AAC-LD (LD-MDCT), G.722.1, G.729.1, CELT, and Opus.

A polyphase quadrature filter, or PQF, is a filter bank which splits an input signal into a given number N of equidistant sub-bands. A factor of N subsamples these sub-bands, so they are critically sampled. An important application of the polyphase filters concerns the filtering and decimation of large band input signals, e.g. coming from a high rate ADC, which can not be directly processed by an FPGA or in some case by an ASIC either. Suppose the ADC plus FPGA/ASIC interface implements a demultiplexer of the ADC samples in N internal FPGA/ASIC registers. In that case, the polyphase filter transforms the decimator FIR filter canonic structure in N parallel branches clocked at 1/N of the ADC clock, allowing digital processing when N=Clock(ADC)/Clock(FPGA).

MPEG-4 Part 3 or MPEG-4 Audio is the third part of the ISO/IEC MPEG-4 international standard developed by Moving Picture Experts Group. It specifies audio coding methods. The first version of ISO/IEC 14496-3 was published in 1999.

mp3PRO is an unmaintained proprietary audio compression codec that combines the MP3 audio format with the spectral band replication (SBR) compression method. At the time it was developed it could reduce the size of a stereo MP3 by as much as 50% while maintaining the same relative quality. This works, fundamentally, by discarding the higher half of the frequency range and algorithmically replicating that information while decoding.

Harmonic Vector Excitation Coding, abbreviated as HVXC is a speech coding algorithm specified in MPEG-4 Part 3 standard for very low bit rate speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and variable bit rate mode and sampling frequency of 8 kHz. It also operates at lower bitrates, such as 1.2 - 1.7 kbit/s, using a variable bit rate technique. The total algorithmic delay for the encoder and decoder is 36 ms.

<span class="mw-page-title-main">High-Efficiency Advanced Audio Coding</span> Audio codec

High-Efficiency Advanced Audio Coding (HE-AAC) is an audio coding format for lossy data compression of digital audio defined as an MPEG-4 Audio profile in ISO/IEC 14496–3. It is an extension of Low Complexity AAC (AAC-LC) optimized for low-bitrate applications such as streaming audio. The usage profile HE-AAC v1 uses spectral band replication (SBR) to enhance the modified discrete cosine transform (MDCT) compression efficiency in the frequency domain. The usage profile HE-AAC v2 couples SBR with Parametric Stereo (PS) to further enhance the compression efficiency of stereo signals.

Bandwidth extension of signal is defined as the deliberate process of expanding the frequency range (bandwidth) of a signal in which it contains an appreciable and useful content, and/or the frequency range in which its effects are such. Its significant advancement in recent years has led to the technology being adopted commercially in several areas including psychacoustic bass enhancement of small loudspeakers and the high frequency enhancement of coded speech and audio.

The MPEG-4 Low Delay Audio Coder is audio compression standard designed to combine the advantages of perceptual audio coding with the low delay necessary for two-way communication. It is closely derived from the MPEG-2 Advanced Audio Coding (AAC) standard. It was published in MPEG-4 Audio Version 2 and in its later revisions.

Constrained Energy Lapped Transform (CELT) is an open, royalty-free lossy audio compression format and a free software codec with especially low algorithmic delay for use in low-latency audio communication. The algorithms are openly documented and may be used free of software patent restrictions. Development of the format was maintained by the Xiph.Org Foundation and later coordinated by the Opus working group of the Internet Engineering Task Force (IETF).

In audio engineering, joint encoding refers to a joining of several channels of similar information during encoding in order to obtain higher quality, a smaller file size, or both.

Unified Speech and Audio Coding (USAC) is an audio compression format and codec for both music and speech or any mix of speech and audio using very low bit rates between 12 and 64 kbit/s. It was developed by Moving Picture Experts Group (MPEG) and was published as an international standard ISO/IEC 23003-3 and also as an MPEG-4 Audio Object Type in ISO/IEC 14496-3:2009/Amd 3 in 2012.

<span class="mw-page-title-main">Audio coding format</span> Digitally coded format for audio signals

An audio coding format is a content representation format for storage or transmission of digital audio. Examples of audio coding formats include MP3, AAC, Vorbis, FLAC, and Opus. A specific software or hardware implementation capable of audio compression and decompression to/from a specific audio coding format is called an audio codec; an example of an audio codec is LAME, which is one of several different codecs which implements encoding and decoding audio in the MP3 audio coding format in software.

Coding Technologies AB was a Swedish technology company that pioneered the use of spectral band replication in Advanced Audio Coding. It is a major provider of audio compression technologies for digital broadcasting.

Lars "Stockis" Liljeryd (1951-2020) was a Swedish audio and medical engineer, inventor, and entrepreneur. He is noted for his pioneering work in the development of the audio compression technology called AAC, which revolutionized audio processing in portable music, video, and streaming media devices. He was one of the founders of the startup company called Coding Technologies.

References

  1. Novak, Clark. "Spectral Band Replication and aacPlus Coding - An Overview" (PDF). Archived from the original (PDF) on November 30, 2010. Retrieved February 8, 2010.
  2. ISO (2003). "Bandwidth extension, ISO/IEC 14496-3:2001/Amd 1:2003". ISO. Retrieved 2009-10-13.
  3. "IEEE Masaru Ibuka Consumer Electronics Award". IEEE.org. Retrieved 7 July 2015.
  4. "Interview with Martin Dietz, Kristofer Kjörling, and Lars Liljeryd". YouTube. Retrieved 7 July 2015.
  5. "XM Radio – Fast Facts". Archived from the original on November 15, 2006. Retrieved February 8, 2010.
  6. Jean-Marc Valin; Gregory Maxwell; Timothy B. Terriberry; Koen Vos (October 17–20, 2013). "High-Quality, Low-Delay Music Coding in the Opus Codec" (PDF). www.xiph.org. New York, NY: Xiph.Org Foundation. p. 2. Archived from the original (PDF) on 14 July 2018. Retrieved 19 August 2014.
  7. Andersen, Robert Loring; Crockett, B.; Davidson, G.; Davis, Mark; Fielder, L.; Turner, Stephen C.; Vinton, M.; Williams, P. (1 October 2004). "Introduction to Dolby Digital Plus, an Enhancement to the Dolby Digital Coding System" (PDF). Journal of The Audio Engineering Society.
  8. "Dolby® AC-4: Audio delivery for next-generation entertainment services" (PDF).
  9. Ekstrand, Per (November 2022). "Bandwidth extension of audio signals by spectral band replication" (PDF). Proc.1st IEEE Benelux Workshop on Model based Processing and Coding of Audio (MPCA-2002), Leuven, Belgium.
  10. Zhong, Haishan; Villemoes, Lars; Ekstrand, Per; Disch, Sascha; Nagel, Frederik; Wilde, Stephan; Chong, Kok Seng; Norimatsu, Takeshi (19 October 2011). "QMF Based Harmonic Spectral Band Replication". Audio Engineering Society.