Audio signal processing

Last updated

Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waveslongitudinal waves which travel through air, consisting of compressions and rarefactions. The energy contained in audio signals is typically measured in decibels. As audio signals may be represented in either digital or analog format, processing may occur in either domain. Analog processors operate directly on the electrical signal, while digital processors operate mathematically on its digital representation.



The motivation for audio signal processing began at the beginning of the 20th century with inventions like the telephone, phonograph, and radio that allowed for the transmission and storage of audio signals. Audio processing was necessary for early radio broadcasting, as there were many problems with studio-to-transmitter links. [1] The theory of signal processing and its application to audio was largely developed at Bell Labs in the mid 20th century. Claude Shannon and Harry Nyquist's early work on communication theory, sampling theory and pulse-code modulation (PCM) laid the foundations for the field. In 1957, Max Mathews became the first person to synthesize audio from a computer, giving birth to computer music.

Major developments in digital audio coding and audio data compression include differential pulse-code modulation (DPCM) by C. Chapin Cutler at Bell Labs in 1950, [2] linear predictive coding (LPC) by Fumitada Itakura (Nagoya University) and Shuzo Saito (Nippon Telegraph and Telephone) in 1966, [3] adaptive DPCM (ADPCM) by P. Cummiskey, Nikil S. Jayant and James L. Flanagan at Bell Labs in 1973, [4] [5] discrete cosine transform (DCT) coding by Nasir Ahmed, T. Natarajan and K. R. Rao in 1974, [6] and modified discrete cosine transform (MDCT) coding by J. P. Princen, A. W. Johnson and A. B. Bradley at the University of Surrey in 1987. [7] LPC is the basis for perceptual coding and is widely used in speech coding, [8] while MDCT coding is widely used in modern audio coding formats such as MP3 [9] and Advanced Audio Coding (AAC). [10]

Analog signals

An analog audio signal is a continuous signal represented by an electrical voltage or current that is “analogous” to the sound waves in the air. Analog signal processing then involves physically altering the continuous signal by changing the voltage or current or charge via electrical circuits.

Historically, before the advent of widespread digital technology, analog was the only method by which to manipulate a signal. Since that time, as computers and software have become more capable and affordable, digital signal processing has become the method of choice. However, in music applications, analog technology is often still desirable as it often produces nonlinear responses that are difficult to replicate with digital filters.

Digital signals

A digital representation expresses the audio waveform as a sequence of symbols, usually binary numbers. This permits signal processing using digital circuits such as digital signal processors, microprocessors and general-purpose computers. Most modern audio systems use a digital approach as the techniques of digital signal processing are much more powerful and efficient than analog domain signal processing. [11]

Application areas

Processing methods and application areas include storage, data compression, music information retrieval, speech processing, localization, acoustic detection, transmission, noise cancellation, acoustic fingerprinting, sound recognition, synthesis, and enhancement (e.g. equalization, filtering, level compression, echo and reverb removal or addition, etc.).

Audio broadcasting

Audio signal processing is used when broadcasting audio signals in order to enhance their fidelity or optimize for bandwidth or latency. In this domain, the most important audio processing takes place just before the transmitter. The audio processor here must prevent or minimize overmodulation, compensate for non-linear transmitters (a potential issue with medium wave and shortwave broadcasting), and adjust overall loudness to desired level.

Active noise control

Active noise control is a technique designed to reduce unwanted sound. By creating a signal that is identical to the unwanted noise but with the opposite polarity, the two signals cancel out due to destructive interference.

Audio synthesis

Audio synthesis is the electronic generation of audio signals. A musical instrument that accomplishes this is called a synthesizer. Synthesizers can either imitate sounds or generate new ones. Audio synthesis is also used to generate human speech using speech synthesis.

Audio effects

Audio effects are systems designed to alter how an audio signal sounds. Unprocessed audio is metaphorically referred to as dry, while processed audio is referred to as wet. [12]

See also

Related Research Articles

In signal processing, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder.

Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a sequence of numbers that represent samples of a continuous variable in a domain such as time, space, or frequency. In digital electronics, a digital signal is represented as a pulse train, which is typically generated by the switching of a transistor.

Effects unit Electronic device that alters audio

An effects unit or effects pedal is an electronic device that alters the sound of a musical instrument or other audio source through audio signal processing.

Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

Signal processing Academic subfield of electrical engineering

Signal processing is an electrical engineering subfield that focuses on analysing, modifying, and synthesizing signals such as sound, images, and scientific measurements. Signal processing techniques can be used to improve transmission, storage efficiency and subjective quality and to also emphasize or detect components of interest in a measured signal.

Vocoder Voice encryption, transformation and synthesis device

A vocoder is a category of voice codec that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation.

Sound effect Artificially created or enhanced sound

A sound effect is an artificially created or enhanced sound, or sound process used to emphasize artistic or other content of films, television shows, live performance, animation, video games, music, or other media. These are normally created with foley. In motion picture and television production, a sound effect is a sound recorded and presented to make a specific storytelling or creative point without the use of dialogue or music. The term often refers to a process applied to a recording, without necessarily referring to the recording itself. In professional motion picture and television production, dialogue, music, and sound effects recordings are treated as separate elements. Dialogue and music recordings are never referred to as sound effects, even though the processes applied to such as reverberation or flanging effects, often are called "sound effects".

Time stretching is the process of changing the speed or duration of an audio signal without affecting its pitch. Pitch scaling is the opposite: the process of changing the pitch without affecting the speed. Pitch shift is pitch scaling implemented in an effects unit and intended for live performance. Pitch control is a simpler process which affects pitch and speed simultaneously by slowing down or speeding up a recording.

Digital audio Technology that records, stores, and reproduces sound

Digital audio is a representation of sound recorded in, or converted into, digital form. In digital audio, the sound wave of the audio signal is typically encoded as numerical samples in a continuous sequence. For example, in CD audio, samples are taken 44,100 times per second, each with 16-bit sample depth. Digital audio is also the name for the entire technology of sound recording and reproduction using audio signals that have been encoded in digital form. Following significant advances in digital audio technology during the 1970s and 1980s, it gradually replaced analog audio technology in many areas of audio engineering and telecommunications in the 1990s and 2000s.

Flanging is an audio effect produced by mixing two identical signals together, one signal delayed by a small and gradually changing period, usually smaller than 20 milliseconds. This produces a swept comb filter effect: peaks and notches are produced in the resulting frequency spectrum, related to each other in a linear harmonic series. Varying the time delay causes these to sweep up and down the frequency spectrum. A flanger is an effects unit that creates this effect.

Telephone hybrid

A telephone hybrid is the component at the ends of a subscriber line of the public switched telephone network (PSTN) that converts between two-wire and four-wire forms of bidirectional audio paths. When used in broadcast facilities to enable the airing of telephone callers, the broadcast-quality telephone hybrid is known as a broadcast telephone hybrid or telephone balance unit.

A phaser is an electronic sound processor used to filter a signal by creating a series of peaks and troughs in the frequency spectrum. The position of the peaks and troughs of the waveform being affected is typically modulated by an internal low-frequency oscillator so that they vary over time, creating a sweeping effect.

Pre-echo, sometimes called a forward echo, is a digital audio compression artifact where a sound is heard before it occurs. It is most noticeable in impulsive sounds from percussion instruments such as castanets or cymbals.


Moogerfooger is the trademark for a series of analog effects pedals manufactured by Moog Music. There are currently eight different pedals produced; however, one of these models is designed for processing control voltages rather than audio signal. A sixth model, the Analog Delay, was released in a limited edition of 1000 units and has become a collector's item. Moog Music announced on August 28, 2018 that the Moogerfooger, CP-251, Minifooger, Voyager synthesizers, and some other product lines were being built using the remaining parts on hand and discontinued thereafter.

Adaptive differential pulse-code modulation (ADPCM) is a variant of differential pulse-code modulation (DPCM) that varies the size of the quantization step, to allow further reduction of the required data bandwidth for a given signal-to-noise ratio.

Musical outboard equipment or outboard gear is used to alter how a musical instrument sounds. Outboard, external effects units can be used either during a live performance or in the recording studio. These are separate from the effects that may be applied by using a mixing console or a digital audio workstation. Some outboard effects units and digital signal processing (DSP) boxes commonly found in a studio are:

Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio applications. In a PCM stream, the amplitude of the analog signal is sampled regularly at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps.

Eventide, Inc

Eventide, Inc. is an audio, broadcast and communications company in the United States whose audio division manufactures digital audio processors, DSP software, and guitar effects. Eventide was one of the first companies to manufacture digital audio processors, and its products are mainstays in sound recording and reproduction, post production, and broadcast studios.

Audio coding format Digitally coded format for audio signals

An audio coding format is a content representation format for storage or transmission of digital audio. Examples of audio coding formats include MP3, AAC, Vorbis, FLAC, and Opus. A specific software or hardware implementation capable of audio compression and decompression to/from a specific audio coding format is called an audio codec; an example of an audio codec is LAME, which is one of several different codecs which implements encoding and decoding audio in the MP3 audio coding format in software.

Calf Studio Gear, often referred to as Calf Plugins, is a set of open source LV2 plugins for the Linux platform. The suite intends to be a complete set of plugins for audio mixing, virtual instruments and mastering. As of version 0.90.0 there are 47 plugins in the suite.


  1. Atti, Andreas Spanias, Ted Painter, Venkatraman (2006). Audio signal processing and coding ([Online-Ausg.] ed.). Hoboken, NJ: John Wiley & Sons. p. 464. ISBN   0-471-79147-4.
  2. USpatent 2605361,C. Chapin Cutler,"Differential Quantization of Communication Signals",issued 1952-07-29
  3. Gray, Robert M. (2010). "A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol" (PDF). Found. Trends Signal Process. 3 (4): 203–303. doi: 10.1561/2000000036 . ISSN   1932-8346.
  4. P. Cummiskey, Nikil S. Jayant, and J. L. Flanagan, "Adaptive quantization in differential PCM coding of speech", Bell Syst. Tech. J., vol. 52, pp. 1105—1118, Sept. 1973
  5. Cummiskey, P.; Jayant, Nikil S.; Flanagan, J. L. (1973). "Adaptive quantization in differential PCM coding of speech". The Bell System Technical Journal. 52 (7): 1105–1118. doi:10.1002/j.1538-7305.1973.tb02007.x. ISSN   0005-8580.
  6. Nasir Ahmed; T. Natarajan; Kamisetty Ramamohan Rao (January 1974). "Discrete Cosine Transform" (PDF). IEEE Transactions on Computers. C-23 (1): 90–93. doi:10.1109/T-C.1974.223784.
  7. J. P. Princen, A. W. Johnson und A. B. Bradley: Subband/transform coding using filter bank designs based on time domain aliasing cancellation, IEEE Proc. Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2161–2164, 1987.
  8. Schroeder, Manfred R. (2014). "Bell Laboratories". Acoustics, Information, and Communication: Memorial Volume in Honor of Manfred R. Schroeder. Springer. p. 388. ISBN   9783319056609.
  9. Guckert, John (Spring 2012). "The Use of FFT and MDCT in MP3 Audio Compression" (PDF). University of Utah . Retrieved 14 July 2019.
  10. Brandenburg, Karlheinz (1999). "MP3 and AAC Explained" (PDF). Archived (PDF) from the original on 2017-02-13.
  11. Zölzer, Udo (1997). Digital Audio Signal Processing. John Wiley and Sons. ISBN   0-471-97226-6.
  12. Hodgson, Jay (2010). Understanding Records, p.95. ISBN   978-1-4411-5607-5.

Further reading