Continuously variable slope delta modulation

Last updated

Continuously variable slope delta modulation (CVSD or CVSDM) is a voice coding method. It is a delta modulation with variable step size (i.e., special case of adaptive delta modulation), first proposed by Greefkes and Riemens in 1970.

Contents

CVSD encodes at 1 bit per sample, so that audio sampled at 16 kHz is encoded at 16 kbit/s.

The encoder maintains a reference sample and a step size. Each input sample is compared to the reference sample. If the input sample is larger, the encoder emits a 1 bit and adds the step size to the reference sample. If the input sample is smaller, the encoder emits a 0 bit and subtracts the step size from the reference sample. The encoder also keeps the previous N bits of output (N = 3 or N = 4 are very common) to determine adjustments to the step size; if the previous N bits are all 1s or 0s, the step size is increased. Otherwise, the step size is decreased (usually in an exponential manner, with being in the range of 5 ms). The step size is adjusted for every input sample processed.

To allow for bit errors to fade out and to allow (re)synchronization to an ongoing bitstream, the output register (which keeps the reference sample) is normally realized as a leaky integrator with a time constant () of about 1 ms.

The decoder reverses this process, starting with the reference sample, and adding or subtracting the step size according to the bit stream. The sequence of adjusted reference samples are the reconstructed waveform, and the step size is adjusted according to the same all-1s-or-0s logic as in the encoder.

Adaptation of step size allows one to avoid slope overload (step of quantization increases when the signal rapidly changes) and decreases granular noise when the signal is constant (decrease of step of quantisation).

CVSD is sometimes called a compromise between simplicity, low bitrate, and quality. Common bitrates are 9.6–128 kbit/s.

Like other delta-modulation techniques, the output of the decoder does not exactly match the original input to the encoder.

Applications

12 kbit/s CVSD is used by Motorola's SECURENET line of digitally encrypted two-way radio products.

16 and 32 kbit/s CVSD were used by military TRI-TAC digital telephones (DNVT, DSVT) for use in deployed areas to provide voice recognition quality audio. 16 kbit/s rates were typically used by US Army forces to conserve bandwidth over tactical links. 32 kbit/s rates were typically used by US Air Force forces for improved voice quality.

64 kbit/s CVSD is one of the options to encode voice signals in telephony-related Bluetooth service profiles; e.g., between mobile phones and wireless headsets. The other options are PCM with logarithmic a-law or μ-law quantization, as well as mSBC codec featuring 16 kHz sample rate and best quality.

Numerous arcade games, such as Sinistar and Smash TV , and pinball machines, such as Gorgar or Space Shuttle , play pre-recorded speech through an HC-55516 CVSD decoder. [1] [2]

SBS application 24 kbit/s delta modulation

Delta modulation was used by Satellite Business Systems or SBS for its voice ports to provide long-distance phone service to large domestic corporations with a significant inter-corporation communications need (such as IBM). This system was in service throughout the 1980s. The voice ports used digitally implemented 24 kbit/s delta modulation with voice activity compression (VAC) and echo suppressors to control the half second echo path through the satellite. Listening tests were conducted to verify that the 24 kbit/s Delta Modulator achieved "full voice quality" with no discernible degradation as compared to a high quality phone line or the standard 64 kbit/s μ-law companded PCM. This provided an 8:3 improvement in satellite channel capacity. IBM developed the Satellite Communications Controller and the voice port functions.

The original proposal in 1974 used a state-of-the-art 24 kbit/s Delta Modulator with a single integrator and a Shindler compander modified for gain error recovery. This proved to have less than full phone line speech quality. In 1977, one engineer with two assistants in the IBM Research Triangle Park, NC laboratory was assigned to improve the quality.

The final implementation replaced the integrator with a predictor implemented with a two-pole complex-pair low-pass filter designed to approximate the long-term average speech spectrum. The theory was that ideally the integrator should be a predictor designed to match the signal spectrum. A nearly perfect Shindler compander replaced the modified version. It was found the modified compander resulted in a less than perfect step size at most signal levels and the fast gain error recovery increased the noise as determined by actual listening tests as compared to simple signal to noise measurements. The final compander achieved a very mild gain error recovery due to the natural truncation rounding error caused by 12-bit arithmetic.

The complete function of delta modulation, VAC, and echo control for 6 ports was implemented in a single digital integrated circuit chip with 12-bit arithmetic. A single DAC was shared by all 6 ports providing voltage compare functions for the modulators and feeding sample and hold circuits for the demodulator outputs. A single card held the chip, DAC, and all the analog circuits for the phone line interface including transformers.

See also

Related Research Articles

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

<span class="mw-page-title-main">Vocoder</span> Voice encryption, transformation, and synthesis device

A vocoder is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation.

<span class="mw-page-title-main">Analog-to-digital converter</span> System that converts an analog signal into a digital signal

In electronics, an analog-to-digital converter is a system that converts an analog signal, such as a sound picked up by a microphone or light entering a digital camera, into a digital signal. An ADC may also provide an isolated measurement such as an electronic device that converts an analog input voltage or current to a digital number representing the magnitude of the voltage or current. Typically the digital output is a two's complement binary number that is proportional to the input, but there are other possibilities.

<span class="mw-page-title-main">Delta modulation</span> Signal conversion technique

Delta modulation is an analog-to-digital and digital-to-analog signal conversion technique used for transmission of voice information where quality is not of primary importance. DM is the simplest form of differential pulse-code modulation (DPCM) where the difference between successive samples is encoded into n-bit data streams. In delta modulation, the transmitted data are reduced to a 1-bit data stream representing either up (↗) or down (↘). Its main features are:

μ-law algorithm Audio companding algorithm

The μ-law algorithm is a companding algorithm, primarily used in 8-bit PCM digital telecommunications systems in North America and Japan. It is one of the two companding algorithms in the G.711 standard from ITU-T, the other being the similar A-law. A-law is used in regions where digital telecommunication signals are carried on E-1 circuits, e.g. Europe.

<span class="mw-page-title-main">Digital audio</span> Technology that records, stores, and reproduces sound

Digital audio is a representation of sound recorded in, or converted into, digital form. In digital audio, the sound wave of the audio signal is typically encoded as numerical samples in a continuous sequence. For example, in CD audio, samples are taken 44,100 times per second, each with 16-bit sample depth. Digital audio is also the name for the entire technology of sound recording and reproduction using audio signals that have been encoded in digital form. Following significant advances in digital audio technology during the 1970s and 1980s, it gradually replaced analog audio technology in many areas of audio engineering, record production and telecommunications in the 1990s and 2000s.

<span class="mw-page-title-main">G.711</span> ITU-T recommendation

G.711 is a narrowband audio codec originally designed for use in telephony that provides toll-quality audio at 64 kbit/s. It is an ITU-T standard (Recommendation) for audio encoding, titled Pulse code modulation (PCM) of voice frequencies released for use in 1972.

<span class="mw-page-title-main">Sampling (signal processing)</span> Measurement of a signal at discrete time intervals

In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. A common example is the conversion of a sound wave to a sequence of "samples". A sample is a value of the signal at a point in time and/or space; this definition differs from the term's usage in statistics, which refers to a set of such values.

Mixed-excitation linear prediction (MELP) is a United States Department of Defense speech coding standard used mainly in military applications and satellite communications, secure voice, and secure radio devices. Its standardization and later development was led and supported by the NSA and NATO. The current "enhanced" version is known as MELPe.

Differential pulse-code modulation (DPCM) is a signal encoder that uses the baseline of pulse-code modulation (PCM) but adds some functionalities based on the prediction of the samples of the signal. The input can be an analog signal or a digital signal.

<span class="mw-page-title-main">Secure voice</span> Encrypted voice communication

Secure voice is a term in cryptography for the encryption of voice communication over a range of communication types such as radio, telephone or IP.

<span class="mw-page-title-main">Delta-sigma modulation</span> Method for converting signals between digital and analog

Delta-sigma modulation is an oversampling method for encoding signals into low bit depth digital signals at a very high sample-frequency as part of the process of delta-sigma analog-to-digital converters (ADCs) and digital-to-analog converters (DACs). Delta-sigma modulation achieves high quality by utilizing a negative feedback loop during quantization to the lower bit depth that continuously corrects quantization errors and moves quantization noise to higher frequencies well above the original signal's bandwidth. Subsequent low-pass filtering for demodulation easily removes this high frequency noise and time averages to achieve high accuracy in amplitude which can be ultimately encoded as pulse-code modulation (PCM).

The dbx Model 700 Digital Audio Processor was a professional audio ADC/DAC combination unit, which digitized a stereo analog audio input into a bitstream, which was then encoded and encapsulated in an analog composite video signal, for recording to tape using a VCR as a transport. Unlike other similar pieces of equipment like the Sony PCM-F1, the Model 700 used a technique called Companded Predictive Delta Modulation, rather than the now-common pulse-code modulation. At the time of its introduction in the mid-1980s the device was the first commercial product to use this method, although it had been proposed in the 1960s and prototyped in the late '70s.

<span class="mw-page-title-main">Audio bit depth</span> Number of bits of information recorded for each digital audio sample

In digital audio using pulse-code modulation (PCM), bit depth is the number of bits of information in each sample, and it directly corresponds to the resolution of each sample. Examples of bit depth include Compact Disc Digital Audio, which uses 16 bits per sample, and DVD-Audio and Blu-ray Disc, which can support up to 24 bits per sample.

Adaptive differential pulse-code modulation (ADPCM) is a variant of differential pulse-code modulation (DPCM) that varies the size of the quantization step, to allow further reduction of the required data bandwidth for a given signal-to-noise ratio.

<span class="mw-page-title-main">Sub-band coding</span>

In signal processing, sub-band coding (SBC) is any form of transform coding that breaks a signal into a number of different frequency bands, typically by using a fast Fourier transform, and encodes each one independently. This decomposition is often the first step in data compression for audio and video signals.

Pulse-code modulation (PCM) is a method used to digitally represent analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio applications. In a PCM stream, the amplitude of the analog signal is sampled at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps. Alec Reeves, Claude Shannon, Barney Oliver and John R. Pierce are credited with its invention.

Tandem Free Operation (TFO) is a part of ETSI's 3GPP standard specification, which has been included from R99 of the standards specifications onwards.

Pulse-density modulation, or PDM, is a form of modulation used to represent an analog signal with a binary signal. In a PDM signal, specific amplitude values are not encoded into codewords of pulses of different weight as they would be in pulse-code modulation (PCM); rather, the relative density of the pulses corresponds to the analog signal's amplitude. The output of a 1-bit DAC is the same as the PDM encoding of the signal.

References

  1. "MAME 0.36b7 changelog". Archived from the original on 2011-10-07. Retrieved 2010-10-02.
  2. Williams/Midway Y-Unit games