Selectable Mode Vocoder

Last updated September 04, 2021

Selectable Mode Vocoder (SMV) is variable bitrate speech coding standard used in CDMA2000 networks.^[1] SMV provides multiple modes of operation that are selected based on input speech characteristics.

The SMV for Wideband CDMA is based on 4 codecs: full rate at 8.5 kbit/s, half rate at 4 kbit/s, quarter rate at 2 kbit/s, and eighth rate at 800 bit/s.^[1] The full rate and half rate are based on the CELP algorithm ^[1] that is based on a combined closed-loop-open-loop-analysis (COLA). In SMV the signal frames are first classified as:

Silence/Background noise
Non-stationary unvoiced
Stationary unvoiced
Onset
Non-stationary voiced
Stationary voiced

The algorithm includes voice activity detection (VAD) followed by an elaborate frame classification scheme. Silence/background noise and stationary unvoiced frames are represented by spectrum-modulated noise and coded at 1/4 or 1/8 rate. The SMV uses 4 subframes for full rate and two/three subframes for half rate. The stochastic (fixed) codebook structure is also elaborate and uses sub-codebooks each tuned for a particular type of speech. The sub-codebooks have different degrees of pulse sparseness (more sparse for noise like excitation). SMV scores a high of 3.6 MOS ^[2] at full rate with clean speech.

The coder works on a frame of 160 speech samples (20 ms) and requires a look ahead of 80 samples (10 ms) if noise-suppression option B is used. An additional 24 samples of look ahead is required if noise-suppression option A is used. So the algorithmic delay for the coder is 30 ms with noise-suppression option B and 33 ms with noise-suppression option A.

The next evolution of CDMA speech codecs is VMR-WB which provides much higher speech quality with wideband while fitting to the same networks.

SMV can be also used in 3GPP2 container file format – 3G2.

Related Research Articles

Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

Speex is an audio compression codec specifically tuned for the reproduction of human speech and also a free software speech codec that may be used on VoIP applications and podcasts. It is based on the CELP speech coding algorithm. Speex claims to be free of any patent restrictions and is licensed under the revised (3-clause) BSD license. It may be used with the Ogg container format or directly transmitted over UDP/RTP. It may also be used with the FLV container format.

G.723.1 is an audio codec for voice that compresses voice audio in 30 ms frames. An algorithmic look-ahead of 7.5 ms duration means that total algorithmic delay is 37.5 ms. Its official name is Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s. It is sometimes associated with a Truespeech trademark in coprocessors produced by DSP Group.

The Adaptive Multi-Rateaudio codec is an audio compression format optimized for speech coding. AMR speech codec consists of a multi-rate narrowband speech codec that encodes narrowband (200–3400 Hz) signals at variable bit rates ranging from 4.75 to 12.2 kbit/s with toll quality speech starting at 7.4 kbit/s.

Full Rate was the first digital speech coding standard used in the GSM digital mobile phone system. It uses linear predictive coding (LPC). The bit rate of the codec is 13 kbit/s, or 1.625 bits/audio sample. The quality of the coded speech is quite poor by modern standards, but at the time of development it was a good compromise between computational complexity and quality, requiring only on the order of a million additions and multiplications per second. The codec is still widely used in networks around the world. Gradually FR will be replaced by Enhanced Full Rate (EFR) and Adaptive Multi-Rate (AMR) standards, which provide much higher speech quality with lower bit rate.

Half Rate is a speech coding system for GSM, developed in the early 1990s.

Adaptive Multi-Rate Wideband (AMR-WB) is a patented wideband speech audio coding standard developed based on Adaptive Multi-Rate encoding, using similar methodology as algebraic code-excited linear prediction (ACELP). AMR-WB provides improved speech quality due to a wider speech bandwidth of 50–7000 Hz compared to narrowband speech coders which in general are optimized for POTS wireline quality of 300–3400 Hz. AMR-WB was developed by Nokia and VoiceAge and it was first specified by 3GPP.

G.729 is a royalty-free narrow-band vocoder-based audio data compression algorithm using a frame length of 10 milliseconds. It is officially described as Coding of speech at 8 kbit/s using code-excited linear prediction speech coding (CS-ACELP), and was introduced in 1996. The wide-band extension of G.729 is called G.729.1, which equals G.729 Annex J.

G.728 is an ITU-T standard for speech coding operating at 16 kbit/s. It is officially described as Coding of speech at 16 kbit/s using low-delay code excited linear prediction.

G.722.1 is a licensed royalty-free ITU-T standard audio codec providing high quality, moderate bit rate wideband (50 Hz – 7 kHz audio bandwidth, 16 ksps audio coding. It is a partial implementation of Siren 7 audio coding format developed by PictureTel Corp.. Its official name is Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss. It uses a modified discrete cosine transform audio data compression algorithm.

Enhanced Variable Rate CODEC (EVRC) is a speech codec used in CDMA networks. It was developed in 1995 to replace the QCELP vocoder which used more bandwidth on the carrier's network, thus EVRC's primary goal was to offer the mobile carriers more capacity on their networks while not increasing the amount of bandwidth or wireless spectrum needed. EVRC uses RCELP technology.

Variable-Rate Multimode Wideband (VMR-WB) is a source-controlled variable-rate multimode codec designed for robust encoding/decoding of wideband/narrowband speech. The operation of VMR-WB is controlled by speech signal characteristics and by traffic condition of the network. Depending on the traffic conditions and the desired quality of service (QoS), one of the 4 operational modes is used. All operating modes of the existing VMR-WB standard are fully compliant with cdma2000 rate-set II. VMR-WB modes 0, 1, and 2 are cdma2000 native modes with mode 0 providing the highest quality and mode 2 the lowest ADR. VMR-WB mode 3 is the AMR-WB interoperable mode operating at an ADR slightly higher than mode 0 and providing a quality equal or better than that of AMR-WB at 12.65 kbit/s when in an interoperable interconnection with AMR-WB at 12.65 kbit/s.

Harmonic Vector Excitation Coding, abbreviated as HVXC is a speech coding algorithm specified in MPEG-4 Part 3 standard for very low bit rate speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and variable bit rate mode and sampling frequency 8 kHz. It also operates at lower bitrates, such as 1.2 - 1.7 kbit/s, using a variable bit rate technique. The total algorithmic delay for the encoder and decoder is 36 ms.

Vector sum excited linear prediction (VSELP) is a speech coding method used in several cellular standards. The VSELP algorithm is an analysis-by-synthesis coding technique and belongs to the class of speech coding algorithms known as CELP.

Enhanced Variable Rate Codec B (EVRC-B) is a speech codec used by CDMA networks. EVRC-B is an enhancement to EVRC and compresses each 20 milliseconds of 8000 Hz, 16-bit sampled speech input into output frames of one of the four different sizes: Rate 1 - 171 bits, Rate 1/2 - 80 bits, Rate 1/4 - 40 bits, Rate 1/8 - 16 bits.

G.719 is an ITU-T standard audio coding format providing high quality, moderate bit rate wideband audio coding at low computational load. It was produced through a collaboration between Polycom and Ericsson.

G.718 is an ITU-T Recommendation embedded scalable speech and audio codec providing high quality narrowband speech over the lower bit rates and high quality wideband speech over the complete range of bit rates. In addition, G.718 is designed to be highly robust to frame erasures, thereby enhancing the speech quality when used in internet protocol (IP) transport applications on fixed, wireless and mobile networks. Despite its embedded nature, the codec also performs well with both narrowband and wideband generic audio signals. The codec has an embedded scalable structure, enabling maximum flexibility in the transport of voice packets through IP networks of today and in future media-aware networks. In addition, the embedded structure of G.718 will easily allow the codec to be extended to provide a superwideband and stereo capability through additional layers which are currently under development in ITU-T Study Group 16. The bitstream may be truncated at the decoder side or by any component of the communication system to instantaneously adjust the bit rate to the desired value without the need for out-of-band signalling. The encoder produces an embedded bitstream structured in five layers corresponding to the five available bit rates: 8, 12, 16, 24 & 32 kbit/s.

CDMA spectral efficiency refers to the system spectral efficiency in bit/s/Hz/site or Erlang/MHz/site that can be achieved in a certain CDMA based wireless communication system. CDMA techniques are characterized by a very low link spectral efficiency in (bit/s)/Hz as compared to non-spread spectrum systems, but a comparable system spectral efficiency.

References

1 2 3 "3GPP2 C.S0030-0 Selectable Mode Vocoder (SMV) Service Option for Wideband Spread Spectrum Communication Systems" (PDF). 3rd Generation Partnership Project 2. 2004. Archived from the original (PDF) on 2011-07-23. Retrieved 2009-05-26.
↑ J. Makinen; P. Ojala; H. Toukomaa. "Performance Comparison of Source Controlled GSM AMR and SMV Vocoders" (PDF). Nokia Research Center, Multimedia Technologies Laboratory. Retrieved 2009-05-26.^{[ permanent dead link ]}

External links

RFC 3558 - RTP Payload Format for Enhanced Variable Rate Codecs (EVRC) and Selectable Mode Vocoders (SMV)

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[3gpp2_smv-1] 1 2 3 "3GPP2 C.S0030-0 Selectable Mode Vocoder (SMV) Service Option for Wideband Spread Spectrum Communication Systems" (PDF). 3rd Generation Partnership Project 2. 2004. Archived from the original (PDF) on 2011-07-23. Retrieved 2009-05-26.

[nokiamos-2] J. Makinen; P. Ojala; H. Toukomaa. "Performance Comparison of Source Controlled GSM AMR and SMV Vocoders" (PDF). Nokia Research Center, Multimedia Technologies Laboratory. Retrieved 2009-05-26.^{[ permanent dead link ]}

[1]

[2]