Sub-band coding

Last updated
Sub-band coding and decoding signal flow diagram SubBandCoding.svg
Sub-band coding and decoding signal flow diagram

In signal processing, sub-band coding (SBC) is any form of transform coding that breaks a signal into a number of different frequency bands, typically by using a fast Fourier transform, and encodes each one independently. This decomposition is often the first step in data compression for audio and video signals.

Contents

SBC is the core technique used in many popular lossy audio compression algorithms including MP3.

Encoding audio signals

The simplest way to digitally encode audio signals is pulse-code modulation (PCM), which is used on audio CDs, DAT recordings, and so on. Digitization transforms continuous signals into discrete ones by sampling a signal's amplitude at uniform intervals and rounding to the nearest value representable with the available number of bits. This process is fundamentally inexact, and involves two errors: discretization error, from sampling at intervals, and quantization error, from rounding.

The more bits used to represent each sample, the finer the granularity in the digital representation, and thus the smaller the quantization error. Such quantization errors may be thought of as a type of noise, because they are effectively the difference between the original source and its binary representation. With PCM, the audible effects of these errors can be mitigated with dither and by using enough bits to ensure that the noise is low enough to be masked either by the signal itself or by other sources of noise. A high quality signal is possible, but at the cost of a high bitrate (e.g., over 700 kbit/s for one channel of CD audio). In effect, many bits are wasted in encoding masked portions of the signal because PCM makes no assumptions about how the human ear hears.

Coding techniques reduce bitrate by exploiting known characteristics of the auditory system. A classic method is nonlinear PCM, such as the μ-law algorithm. Small signals are digitized with finer granularity than are large ones; the effect is to add noise that is proportional to the signal strength. Sun's Au file format for sound is a popular example of mu-law encoding. Using 8-bit mu-law encoding would cut the per-channel bitrate of CD audio down to about 350 kbit/s, half the standard rate. Because this simple method only minimally exploits masking effects, it produces results that are often audibly inferior compared to the original.

Basic principles

The utility of SBC is perhaps best illustrated with a specific example. When used for audio compression, SBC exploits auditory masking in the auditory system. Human ears are normally sensitive to a wide range of frequencies, but when a sufficiently loud signal is present at one frequency, the ear will not hear weaker signals at nearby frequencies. We say that the louder signal masks the softer ones.

The basic idea of SBC is to enable a data reduction by discarding information about frequencies which are masked. The result differs from the original signal, but if the discarded information is chosen carefully, the difference will not be noticeable, or more importantly, objectionable.

First, a digital filter bank divides the input signal spectrum into some number (e.g., 32) of subbands. The psychoacoustic model looks at the energy in each of these subbands, as well as in the original signal, and computes masking thresholds using psychoacoustic information. Each of the subband samples is quantized and encoded so as to keep the quantization noise below the dynamically computed masking threshold. The final step is to format all these quantized samples into groups of data called frames, to facilitate eventual playback by a decoder.

Decoding is much easier than encoding, since no psychoacoustic model is involved. The frames are unpacked, subband samples are decoded, and a frequency-time mapping reconstructs an output audio signal.

Applications

Beginning in the late 1980s, a standardization body, the Moving Picture Experts Group (MPEG), developed standards for coding of both audio and video. Subband coding resides at the heart of the popular MP3 format (more properly known as MPEG-1 Audio Layer III), for example.

Sub-band coding is used in the G.722 codec which uses sub-band adaptive differential pulse code modulation (SB-ADPCM) within a bit rate of 64 kbit/s. In the SB-ADPCM technique, the frequency band is split into two sub-bands (higher and lower) and the signals in each sub-band are encoded using ADPCM.

Related Research Articles

In signal processing, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder.

MP3 Digital audio format

MP3 is a coding format for digital audio developed largely by the Fraunhofer Society in Germany, with support from other digital scientists in the US and elsewhere. Originally defined as the third audio format of the MPEG-1 standard, it was retained and further extended — defining additional bit-rates and support for more audio channels — as the third audio format of the subsequent MPEG-2 standard. A third version, known as MPEG 2.5 — extended to better support lower bit rates — is commonly implemented, but is not a recognized standard.

MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s without excessive quality loss, making video CDs, digital cable/satellite TV and digital audio broadcasting (DAB) practical.

Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

Adaptive Transform Acoustic Coding (ATRAC) is a family of proprietary audio compression algorithms developed by Sony. MiniDisc was the first commercial product to incorporate ATRAC in 1992. ATRAC allowed a relatively small disc like MiniDisc to have the same running time as CD while storing audio information with minimal loss in perceptible quality. Improvements to the codec in the form of ATRAC3, ATRAC3plus, and ATRAC Advanced Lossless followed in 1999, 2002, and 2006 respectively.

G.711 ITU-T recommendation

G.711 is a narrowband audio codec originally designed for use in telephony that provides toll-quality audio at 64 kbit/s. G.711 passes audio signals in the range of 300–3400 Hz and samples them at the rate of 8,000 samples per second, with the tolerance on that rate of 50 parts per million (ppm). Non-uniform (logarithmic) quantization with 8 bits is used to represent each sample, resulting in a 64 kbit/s bit rate. There are two slightly different versions: μ-law, which is used primarily in North America and Japan, and A-law, which is in use in most other countries outside North America.

MPEG-1 Audio Layer II or MPEG-2 Audio Layer II is a lossy audio compression format defined by ISO/IEC 11172-3 alongside MPEG-1 Audio Layer I and MPEG-1 Audio Layer III (MP3). While MP3 is much more popular for PC and Internet applications, MP2 remains a dominant standard for audio broadcasting.

Sound quality Assessment of the audio output from an electronic device

Sound quality is typically an assessment of the accuracy, fidelity, or intelligibility of audio output from an electronic device. Quality can be measured objectively, such as when tools are used to gauge the accuracy with which the device reproduces an original sound; or it can be measured subjectively, such as when human listeners respond to the sound or gauge its perceived similarity to another sound.

Continuously variable slope delta modulation is a voice coding method. It is a delta modulation with variable step size, first proposed by Greefkes and Riemens in 1970.

Musepack or MPC is an open source lossy audio codec, specifically optimized for transparent compression of stereo audio at bitrates of 160–180 kbit/s. It was formerly known as MPEGplus, MPEG+ or MP+.

G.722 ITU-T recommendation

G.722 is an ITU-T standard 7 kHz wideband audio codec operating at 48, 56 and 64 kbit/s. It was approved by ITU-T in November 1988. Technology of the codec is based on sub-band ADPCM (SB-ADPCM). The corresponding narrow-band codec based on the same technology is G.726.

Secure voice Encrypted voice communication

Secure voice is a term in cryptography for the encryption of voice communication over a range of communication types such as radio, telephone or IP.

Dolby Digital Plus, also known as Enhanced AC-3 is a digital audio compression scheme developed by Dolby Labs for transport and storage of multi-channel digital audio. It is a successor to Dolby Digital (AC-3), also developed by Dolby, and has a number of improvements including support for a wider range of data rates, increased channel count and multi-program support, and additional tools (algorithms) for representing compressed data and counteracting artifacts. While Dolby Digital (AC-3) supports up to five full-bandwidth audio channels at a maximum bitrate of 640 kbit/s, E-AC-3 supports up to 15 full-bandwidth audio channels at a maximum bitrate of 6.144 Mbit/s.

Audio bit depth Number of bits of information recorded for each digital audio sample

In digital audio using pulse-code modulation (PCM), bit depth is the number of bits of information in each sample, and it directly corresponds to the resolution of each sample. Examples of bit depth include Compact Disc Digital Audio, which uses 16 bits per sample, and DVD-Audio and Blu-ray Disc which can support up to 24 bits per sample.

Perceptual Evaluation of Audio Quality (PEAQ) is a standardized algorithm for objectively measuring perceived audio quality, developed in 1994-1998 by a joint venture of experts within Task Group 6Q of the International Telecommunication Union's Radiocommunication Sector (ITU-R). It was originally released as ITU-R Recommendation BS.1387 in 1998 and last updated in 2001. It utilizes software to simulate perceptual properties of the human ear and then integrates multiple model output variables into a single metric. PEAQ characterizes the perceived audio quality as subjects would do in a listening test according to ITU-R BS.1116. PEAQ results principally model mean opinion scores that cover a scale from 1 (bad) to 5 (excellent).

Adaptive differential pulse-code modulation (ADPCM) is a variant of differential pulse-code modulation (DPCM) that varies the size of the quantization step, to allow further reduction of the required data bandwidth for a given signal-to-noise ratio.

SBC, or low-complexity subband codec, is an audio subband codec specified by the Bluetooth Special Interest Group (SIG) for the Advanced Audio Distribution Profile (A2DP). SBC is a digital audio encoder and decoder used to transfer data to Bluetooth audio output devices like headphones or loudspeakers. It can also be used on the Internet. It was designed with Bluetooth bandwidth limitations and processing power in mind to obtain a reasonably good audio quality at medium bit rates with low computational complexity. As of A2DP version 1.3, the Low Complexity Subband Coding remains the default codec and its implementation is mandatory for devices supporting that profile, but vendors are free to add their own codecs to match their needs.

Constrained Energy Lapped Transform (CELT) is an open, royalty-free lossy audio compression format and a free software codec with especially low algorithmic delay for use in low-latency audio communication. The algorithms are openly documented and may be used free of software patent restrictions. Development of the format was maintained by the Xiph.Org Foundation and later coordinated by the Opus working group of the Internet Engineering Task Force (IETF).

aptX Family of proprietary audio codecs owned by Qualcomm

aptX is a family of proprietary audio codec compression algorithms owned by Qualcomm, with a heavy emphasis on wireless audio applications.

Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio applications. In a PCM stream, the amplitude of the analog signal is sampled regularly at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps.