Mixed-excitation linear prediction

Last updated

Mixed-excitation linear prediction (MELP) is a United States Department of Defense speech coding standard used mainly in military applications and satellite communications, secure voice, and secure radio devices. Its standardization and later development was led and supported by the NSA and NATO. The current "enhanced" version is known as MELPe.

Contents

History

The initial MELP was invented by Alan McCree around 1995 [1] while a graduate student at the Center for Signal and Image Processing (CSIP) at Georgia Tech, and the original MELP related patents have expired by now. That initial speech coder was standardized in 1997 and was known as MIL-STD-3005. [2] It surpassed other candidate vocoders in the US DoD competition, including: (a) Frequency Selective Harmonic Coder (FSHC), (b) Advanced Multi-Band Excitation (AMBE), (c) Enhanced Multiband Excitation (EMBE), (d) Sinusoid Transform Coder (STC), and (e) Subband LPC Coder (SBC). [3] Due to its lower complexity than Waveform Interpolative (WI) coder, the MELP vocoder won the DoD competition and was selected for MIL-STD-3005. [4]

MIL-STD-3005

Between 1998 and 2001, a new MELP-based vocoder was created at half the rate (i.e. 1200 bit/s), and substantial enhancements were added to the MIL-STD-3005 by SignalCom (later acquired by Microsoft), Compandent, and AT&T Corporation, which included (a) additional new vocoder at half the rate (i.e. 1200 bit/s), (b) substantially improved encoding (analysis), (c) substantially improved decoding (synthesis), (d) Noise-Preprocessing for removing background noise, (e) transcoding between the 2400 bit/s and 1200 bit/s bitstreams, and (f) new postfilter. This fairly significant development was aimed to create a new coder at half the rate and have it interoperable with the old MELP standard. This enhanced-MELP (also known as MELPe) was adopted as the new MIL-STD-3005 in 2001 in form of annexes and supplements made to the original MIL-STD-3005, enabling the same quality as the old 2400 bit/s MELP's at half the rate. One of the greatest advantages of the new 2400 bit/s MELPe is that it shares the same bit format as MELP, and hence can interoperate with legacy MELP systems, but would deliver better quality at both ends. MELPe provides much better quality than all older military standards, especially in noisy environments such as battlefield and vehicles and aircraft.

STANAG-4591 (NATO)

In 2002, following extensive competition and testing, the 2400 and 1200 bit/s US DoD MELPe was adopted also as NATO standard, known as STANAG-4591. [5] The NATO testing performance measurements included voice intelligibility, voice quality, speaker recognition, language dependency, speaker dependency, 10 acoustic noise environments, transmission channel under 1% BER, tandem using 16 kbit/s CVSD vocoder, whispered speech, and real-time implementation. The testing data included Over 36,000 files, or 500 hours of speech under various conditions and languages. As part of NATO testing for new NATO standard, MELPe was tested against other candidates such as France's HSX (Harmonic Stochastic eXcitation) and Turkey's SB-LPC (Split-Band Linear Predictive Coding), as well as the old secure voice standards such as FS1015 LPC-10e (2.4 kbit/s), FS1016 CELP (4.8 kbit/s) and CVSD (16 kbit/s). Subsequently, the MELPe won also the NATO competition, surpassing the quality of all other candidates as well as the quality of all old secure voice standards (CVSD, CELP and LPC-10e). The NATO competition concluded that MELPe substantially improved performance (in terms of speech quality, intelligibility, and noise immunity), while reducing throughput requirements. The NATO testing also included interoperability tests, used over 200 hours of speech data, and was conducted by 3 test laboratories worldwide. Compandent Inc, as a part of MELPe-based projects performed for NSA and NATO, provided NSA and NATO with special test-bed platform known as MELCODER device that provided the golden reference for real-time implementation of MELPe. The low-cost FLEXI-232 Data Terminal Equipment (DTE) made by Compandent, which are based on the MELCODER golden reference, are very popular and widely used for evaluating and testing MELPe in real-time, various channels & networks, and field conditions.

In 2005, a new 600 bit/s rate MELPe variation by Thales Group (France) was added (without extensive competition and testing as performed for the 2400/1200 bit/s MELPe) to the NATO standard STANAG-4591. [6]

300 bit/s MELP

In 2010, MIT Lincoln Labs, Compandent, BBN, and General Dynamics also developed for DARPA a 300 bit/s MELP device . [7] Its quality was better than the 600 bit/s MELPe, but its algorithmic delay was longer.

Implementations

The MELPe has been implemented in many applications including secure radio devices, satellite communications, VoIP, and cellphone applications. In such applications, additional expertise is required for combating channel errors, packet loss, and synchronization loss. Such expertise requires the understanding of the MELPe's bits sensitivity to errors. The 2400 bit/s and 1200 bit/s MELPe include synchronization bit, which is useful in serial communications.

Compression level

MELPe is intended for the compression of speech. Given an audio input sampled at 8 kHz, the MELPe codec yields the following compression ratios over a 64 kbit/s μ-Law G.711 datastream, discounting the effects of protocol overhead:

BitrateCompression ratio over G.711Payload sizePayload interval
2400 bit/s26.7 X54 bits22.5 ms
1200 bit/s53.3 X81 bits67.5 ms
600 bit/s106.7 X54 bits90 ms

Generally, speech coding involves a trade-off of different aspects including bit-rate, speech quality, delay (frame size and lookahead), computational complexity, robustness to different speakers and languages, robustness to different background noises, channel error robustness, and also codec state recovery in the face of packet loss. Since the MELPe's lower rates (600 and 1200 bit/s) are supersets of the 2400 bit/s rate, the algorithm complexity (e.g. in MIPS) is about the same for all rates. The lower rates use increased frames and lookahead, as well as codebook size, therefore they require more memory.

Intellectual property rights

MELPe (and/or its derivatives) is subject to IPR licensing from the following companies, Texas Instruments (2400 bit/s MELP algorithm / source code), Microsoft (1200 bit/s transcoder), Thales Group (600 bit/s rate), Compandent, and AT&T (Noise Pre-Processor NPP).

See also

Related Research Articles

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

<span class="mw-page-title-main">Vocoder</span> Voice encryption, transformation, and synthesis device

A vocoder is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation.

Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model.

<span class="mw-page-title-main">Delta modulation</span>

A delta modulation is an analog-to-digital and digital-to-analog signal conversion technique used for transmission of voice information where quality is not of primary importance. DM is the simplest form of differential pulse-code modulation (DPCM) where the difference between successive samples is encoded into n-bit data streams. In delta modulation, the transmitted data are reduced to a 1-bit data stream representing either up (↗) or down (↘). Its main features are:

cdmaOne First CDMA-based digital cellular technology

Interim Standard 95 (IS-95) was the first digital cellular technology that used code-division multiple access (CDMA). It was developed by Qualcomm and later adopted as a standard by the Telecommunications Industry Association in TIA/EIA/IS-95 release published in 1995. The proprietary name for IS-95 is cdmaOne.

Continuously variable slope delta modulation is a voice coding method. It is a delta modulation with variable step size, first proposed by Greefkes and Riemens in 1970.

The Adaptive Multi-Rateaudio codec is an audio compression format optimized for speech coding. AMR is a multi-rate narrowband speech codec that encodes narrowband (200–3400 Hz) signals at variable bit rates ranging from 4.75 to 12.2 kbit/s with toll quality speech starting at 7.4 kbit/s.

The Secure Communications Interoperability Protocol (SCIP) is a US standard for secure voice and data communication, for circuit-switched one-to-one connections, not packet-switched networks. SCIP derived from the US Government Future Narrowband Digital Terminal (FNBDT) project. SCIP supports a number of different modes, including national and multinational modes which employ different cryptography. Many nations and industries develop SCIP devices to support the multinational and national modes of SCIP.

Full Rate was the first digital speech coding standard used in the GSM digital mobile phone system. It uses linear predictive coding (LPC). The bit rate of the codec is 13 kbit/s, or 1.625 bits/audio sample. The quality of the coded speech is quite poor by modern standards, but at the time of development it was a good compromise between computational complexity and quality, requiring only on the order of a million additions and multiplications per second. The codec is still widely used in networks around the world. Gradually FR will be replaced by Enhanced Full Rate (EFR) and Adaptive Multi-Rate (AMR) standards, which provide much higher speech quality with lower bit rate.

Qualcomm code-excited linear prediction (QCELP), also known as Qualcomm PureVoice, is a speech codec developed in 1994 by Qualcomm to increase the speech quality of the IS-96A codec earlier used in CDMA networks. It was later replaced with EVRC since it provides better speech quality with fewer bits. The two versions, QCELP8 and QCELP13, operate at 8 and 13 kilobits per second (Kbit/s) respectively.

Selectable Mode Vocoder (SMV) is variable bitrate speech coding standard used in CDMA2000 networks. SMV provides multiple modes of operation that are selected based on input speech characteristics.

Harmonic Vector Excitation Coding, abbreviated as HVXC is a speech coding algorithm specified in MPEG-4 Part 3 standard for very low bit rate speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and variable bit rate mode and sampling frequency of 8 kHz. It also operates at lower bitrates, such as 1.2 - 1.7 kbit/s, using a variable bit rate technique. The total algorithmic delay for the encoder and decoder is 36 ms.

FIPS 137, originally issued as FED-STD-1015, is a secure telephony speech encoding standard for Linear Predictive Coding vocoder developed by the United States Department of Defense and finished on November 28, 1984. It was based on the earlier STANAG 4198 promulgated by NATO on February 13, 1984.

FS-1016 is a deprecated secure telephony speech encoding standard for Code-excited linear prediction (CELP) developed by the United States Department of Defense and finalized February 14, 1991.

Code-excited linear prediction (CELP) is a linear predictive speech coding algorithm originally proposed by Manfred R. Schroeder and Bishnu S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algorithms, such as residual-excited linear prediction (RELP) and linear predictive coding (LPC) vocoders. Along with its variants, such as algebraic CELP, relaxed CELP, low-delay CELP and vector sum excited linear prediction, it is currently the most widely used speech coding algorithm. It is also used in MPEG-4 Audio speech coding. CELP is commonly used as a generic term for a class of algorithms and not for a particular codec.

<span class="mw-page-title-main">Secure voice</span> Encrypted voice communication

Secure voice is a term in cryptography for the encryption of voice communication over a range of communication types such as radio, telephone or IP.

Vector sum excited linear prediction (VSELP) is a speech coding method used in several cellular standards. The VSELP algorithm is an analysis-by-synthesis coding technique and belongs to the class of speech coding algorithms known as CELP.

Enhanced Variable Rate Codec B (EVRC-B) is a speech codec used by CDMA networks. EVRC-B is an enhancement to EVRC and compresses each 20 milliseconds of 8000 Hz, 16-bit sampled speech input into output frames of one of the four different sizes: Rate 1 - 171 bits, Rate 1/2 - 80 bits, Rate 1/4 - 40 bits, Rate 1/8 - 16 bits.

<span class="mw-page-title-main">AN/PRC-150</span> American military radio system

The AN/PRC-150(C) Falcon II Manpack Radio, is a tactical HF-SSB/ VHF-FM manpack radio manufactured by Harris Corporation. It holds an NSA certification for Type 1 encryption.

References

  1. A Mixed Excitation LPC Vocoder Model for Low Bit Rate Speech Coding, Alan V. McCree, Thomas P. Barnweell, 1995 in IEEE Trans. Speech and Audio Processing (Original MELP)
  2. Analog-to-Digital Conversion of Voice by 2,400 Bit/Second Mixed Excitation Linear Prediction (MELP), US DoD (MIL_STD-3005, Original MELP)
  3. M.R. Bielefeld, L.M. Supplee, "Developing a test program for the DoD 2400 bps vocoder selection process", Acoustics Speech and Signal Processing 1996. ICASSP-96. Conference Proceedings. 1996 IEEE International Conference on, vol. 2, pp. 1141-1144 vol. 2, 1996.
  4. L.M. Supplee, R.P. Cohn, J.S. Collura, A.V. McCree, "MELP: the new Federal Standard at 2400 bps", Acoustics Speech and Signal Processing 1997. ICASSP-97. 1997 IEEE International Conference on, vol. 2, pp. 1591-1594 vol.2, 1997.
  5. THE 1200 AND 2400 BIT/S NATO INTEROPERABLE NARROW BAND VOICE CODER, STANAG-4591, NATO
  6. MELPe VARIATION FOR 600 BIT/S NATO NARROW BAND VOICE CODER, STANAG-4591, NATO
  7. Alan McCree, “A scalable phonetic vocoder framework using joint predictive vector quantization of MELP parameters,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 2006, pp. I 705–708, Toulouse, France