Dereverberation

Last updated

Dereverberation is the process by which the effects of reverberation are removed from sound, after such reverberant sound has been picked up by microphones. Dereverberation is a subtopic of acoustic digital signal processing and is most commonly applied to speech but also has relevance in some aspects of music processing. Dereverberation of audio (speech or music) is a corresponding function to blind deconvolution of images, although the techniques used are usually very different. Reverberation itself is caused by sound reflections in a room (or other enclosed space) and is quantified by the room reverberation time and the direct-to-reverberant ratio. The effect of dereverberation is to increase the direct-to-reverberant ratio so that the sound is perceived as closer and clearer.

A main application of dereverberation is in hands-free phones and desktop conferencing terminals because, in these cases, the microphones are not close to the source of sound – the talker’s mouth – but at arm’s length or further distance. As well as telecommunications, dereverberation is importantly applied in automatic speech recognition because speech recognizers are usually error-prone in reverberant scenarios.

Dereverberation became established as a topic of scientific research in the years 2000 to 2005., [1] although a few notable early articles exist. [2] The first scientific text book on the topic was published in 2010. [3] A global scientific study sponsored by the IEEE Technical Committee for Audio and Acoustic Signal Processing took place in 2014. [4]

Three different approaches can be followed [5] to perform dereverberation. In the first approach, reverberation is cancelled by exploiting a mathematical model of the acoustic system (or room) and, after estimation of the room acoustic model parameters, forming an estimate for the original signal. In the second approach, reverberation is suppressed by treating it as a type of (convolutional) noise and performing a de-noising process specifically adapted to reverberation. In the third approach, the original dereverberated signal is directly estimate from the microphone signals using, for example, a deep neural network machine learning approach or alternatively a multichannel linear filter. Examples of the most effective methods in the state-of-the art include approaches based on linear prediction [6] [7]

Related Research Articles

Acoustics Branch of physics involving mechanical waves

Acoustics is a branch of physics that deals with the study of mechanical waves in gases, liquids, and solids including topics such as vibration, sound, ultrasound and infrasound. A scientist who works in the field of acoustics is an acoustician while someone working in the field of acoustics technology may be called an acoustical engineer. The application of acoustics is present in almost all aspects of modern society with the most obvious being the audio and noise control industries.

Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions. The energy contained in audio signals is typically measured in decibels. As audio signals may be represented in either digital or analog format, processing may occur in either domain. Analog processors operate directly on the electrical signal, while digital processors operate mathematically on its digital representation.

Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate and provides highly accurate estimates of speech parameters. LPC is the most widely used method in speech coding and speech synthesis.

Reverberation, in psychoacoustics and acoustics, is a persistence of sound after the sound is produced. A reverberation, or reverb, is created when a sound or signal is reflected causing numerous reflections to build up and then decay as the sound is absorbed by the surfaces of objects in the space – which could include furniture, people, and air. This is most noticeable when the sound source stops but the reflections continue, their amplitude decreasing, until zero is reached.

Recording studio

A recording studio is a specialized facility for sound recording, mixing, and audio production of instrumental or vocal musical performances, spoken words, and other sounds. They range in size from a small in-home project studio large enough to record a single singer-guitarist, to a large building with space for a full orchestra of 100 or more musicians. Ideally both the recording and monitoring spaces are specially designed by an acoustician or audio engineer to achieve optimum acoustic properties.

Echo chamber Hollow enclosure used to produce reverberated sounds

An echo chamber is a hollow enclosure used to produce reverberation, usually for recording purposes. For example, the producers of a television or radio program might wish to produce the aural illusion that a conversation is taking place in a large room or a cave; these effects can be accomplished by playing the recording of the conversation inside an echo chamber, with an accompanying microphone to catch the reverberation. Nowadays effects units are more widely used to create such effects, but echo chambers are still used today, such as the famous echo chambers at Capitol Studios.

Acoustical engineering

Acoustical engineering is the branch of engineering dealing with sound and vibration. It includes the application of acoustics, the science of sound and vibration, in technology. Acoustical engineers are typically concerned with the design, analysis and control of sound.

Sound reinforcement system

A sound reinforcement system is the combination of microphones, signal processors, amplifiers, and loudspeakers in enclosures all controlled by a mixing console that makes live or pre-recorded sounds louder and may also distribute those sounds to a larger or more distant audience. In many situations, a sound reinforcement system is also used to enhance or alter the sound of the sources on the stage, typically by using electronic effects, such as reverb, as opposed to simply amplifying the sources unaltered.

Noise gate Audio processing device

A noise gate or gate is an electronic device or software that is used to control the volume of an audio signal. Comparable to a compressor, which attenuates signals above a threshold, such as loud attacks from the start of musical notes, noise gates attenuate signals that register below the threshold. However, noise gates attenuate signals by a fixed amount, known as the range. In its simplest form, a noise gate allows a main signal to pass through only when it is above a set threshold: the gate is "open". If the signal falls below the threshold, no signal is allowed to pass : the gate is "closed". A noise gate is used when the level of the "signal" is above the level of the unwanted "noise". The threshold is set above the level of the "noise", and so when there is no main "signal", the gate is closed.

In speech communication, intelligibility is a measure of how comprehensible speech is in given conditions. Intelligibility is affected by the level and quality of the speech signal, the type and level of background noise, reverberation, and, for speech over communication devices, the properties of the communication system. A common standard measurement for the quality of the intelligibility of speech is the Speech Transmission Index (STI). The concept of speech intelligibility is relevant to several fields, including phonetics, human factors, acoustical engineering, and audiometry.

Adaptive feedback cancellation is a common method of cancelling audio feedback in a variety of electro-acoustic systems such as digital hearing aids. The time varying acoustic feedback leakage paths can only be eliminated with adaptive feedback cancellation. When an electro-acoustic system with an adaptive feedback canceller is presented with a correlated input signal, a recurrent distortion artifact, entrainment is generated. There is a difference between the system identification and feedback cancellation.

In audio signal processing, convolution reverb is a process used for digitally simulating the reverberation of a physical or virtual space through the use of software profiles; a piece of software that creates a simulation of an audio environment. It is based on the mathematical convolution operation, and uses a pre-recorded audio sample of the impulse response of the space being modeled. To apply the reverberation effect, the impulse-response recording is first stored in a digital signal-processing system. This is then convolved with the incoming audio signal to be processed.

Computational auditory scene analysis (CASA) is the study of auditory scene analysis by computational means. In essence, CASA systems are "machine listening" systems that aim to separate mixtures of sound sources in the same way that human listeners do. CASA differs from the field of blind signal separation in that it is based on the mechanisms of the human auditory system, and thus uses no more than two microphone recordings of an acoustic environment. It is related to the cocktail party problem.

The following outline is provided as an overview of and topical guide to acoustics:

Audio mixing (recorded music)

In sound recording and reproduction, audio mixing is the process of optimizing and combining multitrack recordings into a final mono, stereo or surround sound product. In the process of combining the separate tracks, their relative levels are adjusted and balanced and various processes such as equalization and compression are commonly applied to individual tracks, groups of tracks, and the overall mix. In stereo and surround sound mixing, the placement of the tracks within the stereo field are adjusted and balanced. Audio mixing techniques and approaches vary widely and have a significant influence on the final product.

Assistive listening device

An assistive listening device (ALD) is part of a system used to improve hearing ability for people in a variety of situations where they are unable to distinguish speech in noisy environments. Often, in a noisy or crowded room it is almost impossible for an individual who is hard of hearing to distinguish one voice among many. This is often exacerbated by the effect of room acoustics on the quality of perceived speech. Hearing aids are able to amplify and process these sounds, and improve the speech to noise ratio. However, if the sound is too distorted by the time it reaches the listener, even the best hearing aids will struggle to unscramble the signal. Assistive listening devices offer a more adaptive alternative to hearing aids, but can be more complex and cumbersome.

The Variable Room Acoustics System is an acoustic enhancement system for controlling room acoustics electronically. Such systems are increasingly being used to provide variable acoustics for multipurpose venues.

Direct-field acoustic testing

Direct-field acoustic testing, or DFAT, is a technique used for acoustic testing of aerospace structures by subjecting them to sound waves created by an array of acoustic drivers. The method uses electro-dynamic acoustic speakers, arranged around the test article to provide a uniform, well-controlled, direct sound field at the surface of the unit under test. The system employs high capability acoustic drivers, powerful audio amplifiers, a narrow-band multiple-input-multiple-output (MIMO) controller and precision laboratory microphones to produce an acoustic environment that can simulate a helicopter, aircraft, jet engine or launch vehicle sound pressure field. A high level system is capable of overall sound pressure levels in the 125–147 dB for more than one minute over a frequency range from 25 Hz to 10 kHz.

Echo suppression and echo cancellation are methods used in telephony to improve voice quality by preventing echo from being created or removing it after it is already present. In addition to improving subjective audio quality, echo suppression increases the capacity achieved through silence suppression by preventing echo from traveling across a telecommunications network. Echo suppressors were developed in the 1950s in response to the first use of satellites for telecommunications.

Mads Græsbøll Christensen

Mads Græsbøll Christensen is a Danish Professor in Audio Processing at Department of Architecture, Design & Media Technology, Aalborg University, where he is also head and founder of the Audio Analysis Lab which conducts research in audio and acoustic signal processing. Before that he worked at the Department of Electronic Systems at Aalborg University and has held visiting positions at Philips Research Labs, ENST, UCSB, and Columbia University. He has published extensively on these topics in books, scientific journals and conference proceedings, and he has given tutorials and keynote talks at major international scientific conferences.

References