Adaptive differential pulse-code modulation

Last updated

Adaptive differential pulse-code modulation (ADPCM) is a variant of differential pulse-code modulation (DPCM) that varies the size of the quantization step, to allow further reduction of the required data bandwidth for a given signal-to-noise ratio.

Contents

Typically, the adaptation to signal statistics in ADPCM consists simply of an adaptive scale factor before quantizing the difference in the DPCM encoder. [1]

ADPCM was developed for speech coding by P. Cummiskey, Nikil S. Jayant and James L. Flanagan at Bell Labs in 1973. [2]

In telephony

In telephony, a standard audio signal for a single phone call is encoded as 8000 analog samples per second, of 8 bits each, giving a 64 kbit/s digital signal known as DS0. The default signal compression encoding on a DS0 is either μ-law (mu-law) PCM (North America and Japan) or A-law PCM (Europe and most of the rest of the world). These are logarithmic compression systems where a 13- or 14-bit linear PCM sample number is mapped into an 8-bit value. This system is described by international standard G.711. Where circuit costs are high and loss of voice quality is acceptable, it sometimes makes sense to compress the voice signal even further. An ADPCM algorithm is used to map a series of 8-bit μ-law (or a-law) PCM samples into a series of 4-bit ADPCM samples. In this way, the capacity of the line is doubled. The technique is detailed in the G.726 standard.

ADPCM techniques are used in voice over IP communications. In the early 1990s, ADPCM was also used by Interactive Multimedia Association to develop the legacy audio codecs ADPCM DVI, IMA ADPCM, and DVI4. [3]

Split-band or subband ADPCM

G.722 [4] is an ITU-T standard wideband speech codec operating at 48, 56 and 64 kbit/s, based on subband coding with two channels and ADPCM coding of each. [5] Before the digitization process, it catches the analog signal and divides it in frequency bands with quadrature mirror filters (QMF) to get two subbands of the signal. When the ADPCM bitstream of each subband is obtained, the results are multiplexed, and the next step is storage or transmission of the data. The decoder has to perform the reverse process, that is, demultiplex and decode each subband of the bitstream and recombine them.

Adpcm en.svg

Referring to the coding process, in some applications as voice coding, the subband that includes the voice is coded with more bits than the others. It is a way to reduce the file size.

Software

The Windows Sound System supported ADPCM in WAV files. [6]

The FFmpeg audio codecs supporting ADPCM are adpcm_ima_qt, adpcm_ima_wav, adpcm_ms, adpcm_swf and adpcm_yamaha. [7] [8]

The DSP in the GameCube supports ADPCM encoding on 64 simultaneous audio channels.

See also

Related Research Articles

Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions. The energy contained in audio signals or sound level is typically measured in decibels. As audio signals may be represented in either digital or analog format, processing may occur in either domain. Analog processors operate directly on the electrical signal, while digital processors operate mathematically on its digital representation.

In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder.

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

<span class="mw-page-title-main">Delta modulation</span>

A delta modulation is an analog-to-digital and digital-to-analog signal conversion technique used for transmission of voice information where quality is not of primary importance. DM is the simplest form of differential pulse-code modulation (DPCM) where the difference between successive samples is encoded into n-bit data streams. In delta modulation, the transmitted data are reduced to a 1-bit data stream representing either up (↗) or down (↘). Its main features are:

<span class="mw-page-title-main">Digital audio</span> Technology that records, stores, and reproduces sound

Digital audio is a representation of sound recorded in, or converted into, digital form. In digital audio, the sound wave of the audio signal is typically encoded as numerical samples in a continuous sequence. For example, in CD audio, samples are taken 44,100 times per second, each with 16-bit sample depth. Digital audio is also the name for the entire technology of sound recording and reproduction using audio signals that have been encoded in digital form. Following significant advances in digital audio technology during the 1970s and 1980s, it gradually replaced analog audio technology in many areas of audio engineering, record production and telecommunications in the 1990s and 2000s.

<span class="mw-page-title-main">G.711</span> ITU-T recommendation

G.711 is a narrowband audio codec originally designed for use in telephony that provides toll-quality audio at 64 kbit/s. It is an ITU-T standard (Recommendation) for audio encoding, titled Pulse code modulation (PCM) of voice frequencies released for use in 1972.

H.261 is an ITU-T video compression standard, first ratified in November 1988. It is the first member of the H.26x family of video coding standards in the domain of the ITU-T Study Group 16 Video Coding Experts Group. It was the first video coding standard that was useful in practical terms.

<span class="mw-page-title-main">Smacker video</span> Digital video file format

Smacker video is a video file format developed by RAD Game Tools, and primarily used for full-motion video in video games. Smacker uses an adaptive 8-bit RGB palette. RAD's format for video at higher color depths is Bink Video. The Smacker format specifies a container format, a video compression format, and an audio compression format. Since its release in 1994, Smacker has been used in over 2300 games. Blizzard used this format for the cinematic videos seen in its games Warcraft II, StarCraft and Diablo I.

Continuously variable slope delta modulation is a voice coding method. It is a delta modulation with variable step size, first proposed by Greefkes and Riemens in 1970.

WavPack is a free and open-source lossless audio compression format and application implementing the format. It is unique in the way that it supports hybrid audio compression alongside normal compression which is similar to how FLAC works. It also supports compressing a wide variety of lossless formats, including various variants of PCM and also DSD as used in SACDs, together with its support for surround audio.

<span class="mw-page-title-main">G.722</span> ITU-T recommendation

G.722 is an ITU-T standard 7 kHz wideband audio codec operating at 48, 56 and 64 kbit/s. It was approved by ITU-T in November 1988. Technology of the codec is based on sub-band ADPCM (SB-ADPCM). The corresponding narrow-band codec based on the same technology is G.726.

<span class="mw-page-title-main">G.726</span> ITU-T Recommendation

G.726 is an ITU-T ADPCM speech codec standard covering the transmission of voice at rates of 16, 24, 32, and 40 kbit/s. It was introduced to supersede both G.721, which covered ADPCM at 32 kbit/s, and G.723, which described ADPCM for 24 and 40 kbit/s. G.726 also introduced a new 16 kbit/s rate. The four bit rates associated with G.726 are often referred to by the bit size of a sample, which are 2, 3, 4, and 5-bits respectively. The corresponding wide-band codec based on the same technology is G.722.

Differential pulse-code modulation (DPCM) is a signal encoder that uses the baseline of pulse-code modulation (PCM) but adds some functionalities based on the prediction of the samples of the signal. The input can be an analog signal or a digital signal.

H.120 was the first digital video compression standard. It was developed by COST 211 and published by the CCITT in 1984, with a revision in 1988 that included contributions proposed by other organizations. The video turned out not to be of adequate quality, there were few implementations, and there are no existing codecs for the format, but it provided important knowledge leading directly to its practical successors, such as H.261. The latest revision was published in March 1993.

SBC, or low-complexity subband codec, is an audio subband codec specified by the Bluetooth Special Interest Group (SIG) for the Advanced Audio Distribution Profile (A2DP). SBC is a digital audio encoder and decoder used to transfer data to Bluetooth audio output devices like headphones or loudspeakers. It can also be used on the Internet. It was designed with Bluetooth bandwidth limitations and processing power in mind to obtain a reasonably good audio quality at medium bit rates with low computational complexity. As of A2DP version 1.3, the Low Complexity Subband Coding remains the default codec and its implementation is mandatory for devices supporting that profile, but vendors are free to add their own codecs to match their needs.

<span class="mw-page-title-main">Sub-band coding</span>

In signal processing, sub-band coding (SBC) is any form of transform coding that breaks a signal into a number of different frequency bands, typically by using a fast Fourier transform, and encodes each one independently. This decomposition is often the first step in data compression for audio and video signals.

Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio applications. In a PCM stream, the amplitude of the analog signal is sampled at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps.

<span class="mw-page-title-main">Audio coding format</span> Digitally coded format for audio signals

An audio coding format is a content representation format for storage or transmission of digital audio. Examples of audio coding formats include MP3, AAC, Vorbis, FLAC, and Opus. A specific software or hardware implementation capable of audio compression and decompression to/from a specific audio coding format is called an audio codec; an example of an audio codec is LAME, which is one of several different codecs which implements encoding and decoding audio in the MP3 audio coding format in software.

Nikil S. Jayant is an Indian-American communications engineer. He was a researcher at Bell Laboratories and subsequently a professor at Georgia Institute of Technology. He received his Ph.D. in Electrical Communication Engineering from the Indian Institute of Science, Bangalore, India in 1970.

References

  1. Ken C. Pohlmann (2005). Principles of Digital Audio. McGraw-Hill Professional. ISBN   978-0-07-144156-8.
  2. Cummiskey, P.; Jayant, Nikil S.; Flanagan, James L. (September 1973). "Adaptive quantization in differential PCM coding of speech". The Bell System Technical Journal . 52 (7): 1105–1118. doi:10.1002/j.1538-7305.1973.tb02007.x.
  3. Recommended Practices for Enhancing Digital Audio Compatibility in Multimedia Systems – legacy IMA ADPCM specification, Retrieved on 2009-07-06.
  4. ITU-T G.722 page. ITU-T Recommendation G.722 (11/88), "7 kHz audio-coding within 64 kbit/s".
  5. Jerry D. Gibson; Toby Berger; Tom Lookabaugh (1998). Digital Compression for Multimedia. Morgan Kaufmann. ISBN   978-1-55860-369-1.
  6. "Differences Between PCM/ADPCM Wave Files Explained". KB 89879 Revision 3.0. Microsoft Knowledge Base. 2011-09-24. Archived from the original on 2013-12-31. Retrieved 2013-12-30.
  7. "FFmpeg General Documentation - Audio Codecs". FFmpeg.org. Retrieved 2013-12-30.
  8. "FFmpeg/adpcmenc.c at ee4aa388b2231e988eccdab652c55df080d6ad45 · FFmpeg/FFmpeg". GitHub . 2017-02-15. Retrieved 2018-02-05.