Algebraic code-excited linear prediction

Last updated

Algebraic code-excited linear prediction (ACELP) is a speech coding algorithm in which a limited set of pulses is distributed as excitation to a linear prediction filter. It is a linear predictive coding (LPC) algorithm that is based on the code-excited linear prediction (CELP) method and has an algebraic structure. ACELP was developed in 1989 by the researchers at the Université de Sherbrooke in Canada. [1]

Contents

The ACELP method is widely employed in current speech coding standards such as AMR, EFR, AMR-WB (G.722.2), VMR-WB, EVRC, EVRC-B, SMV, TETRA, PCS 1900, MPEG-4 CELP and ITU-T G-series standards G.729, G.729.1 (first coding stage) and G.723.1. [2] [3] [4] [5] The ACELP algorithm is also used in the proprietary ACELP.net codec. [6] Audible Inc. use a modified version for their speaking books. It is also used in conference-calling software, speech compression tools and has become one of the 3GPP formats.

The ACELP patent expired in 2018 and is now royalty-free. [7]

Features

The main advantage of ACELP is that the algebraic codebook it uses can be made very large (> 50 bits) without running into storage (RAM/ROM) or complexity (CPU time) problems.

Technology

The ACELP algorithm is based on that used in code-excited linear prediction (CELP), but ACELP codebooks have a specific algebraic structure imposed upon them.

A 16-bit algebraic codebook shall be used in the innovative codebook search, the aim of which is to find the best innovation and gain parameters. The innovation vector contains, at most, four non-zero pulses.

In ACELP, a block of N speech samples is synthesized by filtering an appropriate innovation sequence from a codebook, scaled by a gain factor gc, through two time-varying filters.

The long-term (pitch) synthesis filter is given by:

The short-term synthesis filter is given by:

Related Research Articles

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model.

<span class="mw-page-title-main">G.723.1</span> ITU-T Recommendation

G.723.1 is an audio codec for voice that compresses voice audio in 30 ms frames. An algorithmic look-ahead of 7.5 ms duration means that total algorithmic delay is 37.5 ms. Its official name is Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s. It is sometimes associated with a Truespeech trademark in coprocessors produced by DSP Group.

Personal Digital Cellular (PDC) was a 2G mobile telecommunications standard used exclusively in Japan.

The Adaptive Multi-Rateaudio codec is an audio compression format optimized for speech coding. AMR is a multi-rate narrowband speech codec that encodes narrowband (200–3400 Hz) signals at variable bit rates ranging from 4.75 to 12.2 kbit/s with toll quality speech starting at 7.4 kbit/s.

Enhanced Full Rate or EFR or GSM-EFR or GSM 06.60 is a speech coding standard that was developed in order to improve the quality of GSM.

Adaptive Multi-Rate Wideband (AMR-WB) is a patented wideband speech audio coding standard developed based on Adaptive Multi-Rate encoding, using a similar methodology to algebraic code-excited linear prediction (ACELP). AMR-WB provides improved speech quality due to a wider speech bandwidth of 50–7000 Hz compared to narrowband speech coders which in general are optimized for POTS wireline quality of 300–3400 Hz. AMR-WB was developed by Nokia and VoiceAge and it was first specified by 3GPP.

<span class="mw-page-title-main">G.729</span> ITU-T Recommendation

G.729 is a royalty-free narrow-band vocoder-based audio data compression algorithm using a frame length of 10 milliseconds. It is officially described as Coding of speech at 8 kbit/s using code-excited linear prediction speech coding (CS-ACELP), and was introduced in 1996. The wide-band extension of G.729 is called G.729.1, which equals G.729 Annex J.

G.728 is an ITU-T standard for speech coding operating at 16 kbit/s. It is officially described as Coding of speech at 16 kbit/s using low-delay code excited linear prediction.

Extended Adaptive Multi-Rate – Wideband (AMR-WB+) is an audio codec that extends AMR-WB. It adds support for stereo signals and higher sampling rates. Another main improvement is the use of transform coding additionally to ACELP. This greatly improves the generic audio coding. Automatic switching between transform coding and ACELP provides both good speech and audio quality with moderate bit rates.

Selectable Mode Vocoder (SMV) is variable bitrate speech coding standard used in CDMA2000 networks. SMV provides multiple modes of operation that are selected based on input speech characteristics.

Harmonic Vector Excitation Coding, abbreviated as HVXC is a speech coding algorithm specified in MPEG-4 Part 3 standard for very low bit rate speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and variable bit rate mode and sampling frequency 8 kHz. It also operates at lower bitrates, such as 1.2 - 1.7 kbit/s, using a variable bit rate technique. The total algorithmic delay for the encoder and decoder is 36 ms.

FS-1016 is a deprecated secure telephony speech encoding standard for Code-excited linear prediction (CELP) developed by the United States Department of Defense and finalized February 14, 1991.

Code-excited linear prediction (CELP) is a linear predictive speech coding algorithm originally proposed by Manfred R. Schroeder and Bishnu S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algorithms, such as residual-excited linear prediction (RELP) and linear predictive coding (LPC) vocoders. Along with its variants, such as algebraic CELP, relaxed CELP, low-delay CELP and vector sum excited linear prediction, it is currently the most widely used speech coding algorithm. It is also used in MPEG-4 Audio speech coding. CELP is commonly used as a generic term for a class of algorithms and not for a particular codec.

Vector sum excited linear prediction (VSELP) is a speech coding method used in several cellular standards. The VSELP algorithm is an analysis-by-synthesis coding technique and belongs to the class of speech coding algorithms known as CELP.

Enhanced Variable Rate Codec B (EVRC-B) is a speech codec used by CDMA networks. EVRC-B is an enhancement to EVRC and compresses each 20 milliseconds of 8000 Hz, 16-bit sampled speech input into output frames of one of the four different sizes: Rate 1 - 171 bits, Rate 1/2 - 80 bits, Rate 1/4 - 40 bits, Rate 1/8 - 16 bits.

<span class="mw-page-title-main">G.729.1</span> ITU-T Recommendation

G.729.1 is an 8-32 kbit/s embedded speech and audio codec providing bitstream interoperability with G.729, G.729 Annex A and G.729 Annex B. Its official name is G.729-based embedded variable bit rate codec: An 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729. It was introduced in 2006.

<span class="mw-page-title-main">G.718</span> ITU-T Recommendation

G.718 is an ITU-T Recommendation embedded scalable speech and audio codec providing high quality narrowband speech over the lower bit rates and high quality wideband speech over the complete range of bit rates. In addition, G.718 is designed to be highly robust to frame erasures, thereby enhancing the speech quality when used in Internet Protocol (IP) transport applications on fixed, wireless and mobile networks. Despite its embedded nature, the codec also performs well with both narrowband and wideband generic audio signals. The codec has an embedded scalable structure, enabling maximum flexibility in the transport of voice packets through IP networks of today and in future media-aware networks. In addition, the embedded structure of G.718 will easily allow the codec to be extended to provide a superwideband and stereo capability through additional layers which are currently under development in ITU-T Study Group 16. The bitstream may be truncated at the decoder side or by any component of the communication system to instantaneously adjust the bit rate to the desired value without the need for out-of-band signalling. The encoder produces an embedded bitstream structured in five layers corresponding to the five available bit rates: 8, 12, 16, 24 & 32 kbit/s.

<span class="mw-page-title-main">Audio coding format</span> Digitally coded format for audio signals

An audio coding format is a content representation format for storage or transmission of digital audio. Examples of audio coding formats include MP3, AAC, Vorbis, FLAC, and Opus. A specific software or hardware implementation capable of audio compression and decompression to/from a specific audio coding format is called an audio codec; an example of an audio codec is LAME, which is one of several different codecs which implements encoding and decoding audio in the MP3 audio coding format in software.

References

  1. "Transfer of technology".
  2. ACELP map, VoiceAge Corporation, Archive.org
  3. "Related Standards Specifications". 14 October 2007. Archived from the original on 14 October 2007.
  4. VoiceAge Corporation (13 October 2007). "Codec Technologies". Archived from the original on 13 October 2007. Retrieved 20 September 2009.
  5. VoiceAge Corporation. "Codec Technologies". VoiceAge Corporation. Archived from the original on 18 October 2009. Retrieved 20 September 2009.
  6. VoiceAge Corporation. "ACELP.net — Beyond the Standards". Archived from the original on 14 October 2007. Retrieved 3 January 2010.
  7. USpatent 5717825,"Algebraic code-excited linear prediction speech coding method",issued 10 February 1998