Echo suppression and cancellation

Last updated

Echo suppression and echo cancellation are methods used in telephony to improve voice quality by preventing echo from being created or removing it after it is already present. In addition to improving subjective audio quality, echo suppression increases the capacity achieved through silence suppression by preventing echo from traveling across a telecommunications network. Echo suppressors were developed in the 1950s in response to the first use of satellites for telecommunications.

Contents

Echo suppression and cancellation methods are commonly called acoustic echo suppression (AES) and acoustic echo cancellation (AEC), and more rarely line echo cancellation (LEC). In some cases, these terms are more precise, as there are various types and causes of echo with unique characteristics, including acoustic echo (sounds from a loudspeaker being reflected and recorded by a microphone, which can vary substantially over time) and line echo (electrical impulses caused by, e.g., coupling between the sending and receiving wires, impedance mismatches, electrical reflections, etc., [1] which varies much less than acoustic echo). In practice, however, the same techniques are used to treat all types of echo, so an acoustic echo canceller can cancel line echo as well as acoustic echo. AEC in particular is commonly used to refer to echo cancelers in general, regardless of whether they were intended for acoustic echo, line echo, or both.

Although echo suppressors and echo cancellers have similar goalspreventing a speaking individual from hearing an echo of their own voicethe methods they use are different:

ITU standards G.168 and P.340 describe requirements and tests for echo cancellers in digital and PSTN applications, respectively.

History

In telephony, echo is the reflected copy of one's voice heard some time later. If the delay is fairly significant (more than a few hundred milliseconds), it is considered annoying. If the delay is very small (tens of milliseconds or less [3] ), the phenomenon is called sidetone. If the delay is slightly longer, around 50 milliseconds, humans cannot hear the echo as a distinct sound, but instead hear a chorus effect. [3]

In the earlier days of telecommunications, echo suppression was used to reduce the objectionable nature of echos to human users. One person speaks while the other listens, and they speak back and forth. An echo suppressor attempts to determine which is the primary direction and allows that channel to go forward. In the reverse channel, it places attenuation to block or suppress any signal on the assumption that the signal is echo. Although the suppressor effectively deals with echo, this approach leads to several problems which may be frustrating for both parties to a call.

In response to this, Bell Labs developed echo canceler theory in the early 1960s, [4] [5] which then resulted in laboratory echo cancelers in the late 1960s and commercial echo cancelers in the 1980s. [6] An echo canceller works by generating an estimate of the echo from the talker's signal, and subtracts that estimate from the return path. This technique requires an adaptive filter to generate a signal accurate enough to effectively cancel the echo, where the echo can differ from the original due to various kinds of degradation along the way. Since invention at AT&T Bell Labs [5] echo cancellation algorithms have been improved and honed. Like all echo cancelling processes, these first algorithms were designed to anticipate the signal which would inevitably re-enter the transmission path, and cancel it out.

Rapid advances in digital signal processing allowed echo cancellers to be made smaller and more cost-effective. In the 1990s, echo cancellers were implemented within voice switches for the first time (in the Northern Telecom DMS-250) rather than as standalone devices. The integration of echo cancellation directly into the switch meant that echo cancellers could be reliably turned on or off on a call-by-call basis, removing the need for separate trunk groups for voice and data calls. Today's telephony technology often employs echo cancellers in small or handheld communications devices via a software voice engine, which provides cancellation of either acoustic echo or the residual echo introduced by a far-end PSTN gateway system; such systems typically cancel echo reflections with up to 64 milliseconds delay.

Operation

An adaptive echo canceler for a telephone circuit. The function of H, the hybrid transformer, is to route incoming speech from the far end xk to the local telephone and route speech from the telephone to the far end. However, the hybrid is never perfect, so its output dk contains both the desired speech from the local telephone plus filtered speech from the far end. The echo canceller is the adaptive filter fk, which attempts to minimize the error signal ek by filtering the incoming far end speech into a replica yk of the far end speech that leaks through the hybrid. Once the adaption is complete, the error signal consists mostly of speech from the local telephone. Telephone Echo Canceller.png
An adaptive echo canceler for a telephone circuit. The function of H, the hybrid transformer, is to route incoming speech from the far end xk to the local telephone and route speech from the telephone to the far end. However, the hybrid is never perfect, so its output dk contains both the desired speech from the local telephone plus filtered speech from the far end. The echo canceller is the adaptive filter fk, which attempts to minimize the error signal εk by filtering the incoming far end speech into a replica yk of the far end speech that leaks through the hybrid. Once the adaption is complete, the error signal consists mostly of speech from the local telephone.

The echo cancellation process works as follows:

  1. A far-end signal is delivered to the system.
  2. The far-end signal is reproduced.
  3. The far-end signal is filtered and delayed to resemble the near-end signal.
  4. The filtered far-end signal is subtracted from the near-end signal.
  5. The resultant signal represents sounds present in the room excluding any direct or reverberated sound.

The primary challenge for an echo canceller is determining the response characteristics of the filter to be applied to the far-end signal such that it resembles the resultant near-end echo. The filter is essentially a model of speaker, microphone and the room's acoustical attributes. Echo cancellers must be adaptive because the characteristics of the near-end's speaker and microphone are generally not known in advance. The acoustical attributes of the near-end's room are also not generally known in advance, and may change (e.g., if the microphone is moved relative to the speaker, or if individuals walk around the room causing changes in the acoustic reflections). [2] [7] By using the far-end signal as the stimulus, modern systems use an adaptive filter and can converge from providing no cancellation to 55 dB of cancellation in around 200 ms.[ citation needed ]

Echo cancellation alone may be insufficient in many applications. Echo cancellation and suppression can work in conjunction to achieve acceptable performance.

Quantifying echo

Echo is measured as echo return loss (ERL). This is the ratio, expressed in decibels, of the original signal and its echo. [8] High values mean the echo is very weak, while low values mean the echo is very strong. Negative indicate the echo is stronger than the original signal, which if left unchecked would cause audio feedback.

The performance of an echo canceller is measured in echo return loss enhancement (ERLE), [3] [9] which is the amount of additional signal loss applied by the echo canceller. Most echo cancellers are able to apply 18 to 35 dB ERLE.

The total signal loss of the echo (ACOM) is the sum of the ERL and ERLE. [9] [10]

Current uses

Sources of echo are found in everyday surroundings such as:

In some of these cases, sound from the loudspeaker enters the microphone almost unaltered. The difficulties in canceling echo stem from the alteration of the original sound by the ambient space. These changes can include certain frequencies being absorbed by soft furnishings and reflection of different frequencies at varying strength.

Implementing AEC requires engineering expertise and a fast processor, usually in the form of a digital signal processor (DSP), this cost in processing capability may come at a premium, however, many embedded systems do have a fully functional AEC.

Smart speakers and interactive voice response systems that accept speech for input use AEC while speech prompts are played to prevent the system's own speech recognition from falsely recognizing the echoed prompts and other output.

Modems

Standard telephone lines use the same pair of wires to both send and receive audio, which results in a small amount of the outgoing signal being reflected back. This is useful for people talking on the phone, as it provides a signal to the speaker that their voice is making it through the system. However, this reflected signal causes problems for a modem, which is unable to distinguish between a signal from the remote modem and the echo of its own signal.

For this reason, earlier dial-up modems split the signal frequencies, so that the devices on either end used different tones, allowing each one to ignore any signals in the frequency range it was using for transmission. However, this diminished the amount of bandwidth available to both sides.

Echo cancellation mitigated this problem. During the call setup and negotiation period, both modems send a series of unique tones and then listen for them to return through the phone system. They measure the total delay time, then configure a delay line for that same period. Once the connection is completed, they send their signals into the phone lines as normal, but also into the delay line. When their signal is reflected back, it is mixed with the inverted signal from the delay line, which cancels out the echo. This allowed both modems to use the full spectrum available, doubling the possible speed.

Echo cancellation is also applied by many telcos to the line itself and can cause data corruption rather than improving the signal. Some telephone switches or converters (such as analog terminal adapters) disable echo suppression or echo cancellation when they detect the 2100 or 2225 Hz answer tones associated with such calls, in accordance with ITU-T recommendation G.164 or G.165.

ISDN and DSL modems operating at frequencies above the voice band over standard twisted-pair telephone wires also make use of automated echo cancellation to allow simultaneous bidirectional data communication. The computational complexity in implementing the adaptive filter is much reduced compared to voice echo cancelling because the transmit signal is a digital bit stream. Instead of a multiplication and an addition operation for every tap in the filter, only the addition is required. A RAM lookup table based echo cancelling scheme [11] [12] eliminates even the addition operation by simply addressing a memory with a truncated transmit bit stream to obtain the echo estimate. Echo cancellation is now commonly implemented with Digital Signal Processor (DSP) techniques.

Some modems use separate incoming and outgoing frequencies or allocate separate time slots for transmitting and receiving to eliminate the need for echo cancellation. Higher frequencies beyond the original design limits of telephone cables suffer significant attenuation distortion due to bridge taps and incomplete impedance matching. Deep, narrow frequency gaps which cannot be remedied by echo cancellation often result. These are detected and mapped out during connection negotiation.

See also

Related Research Articles

<span class="mw-page-title-main">Acoustic coupler</span>

In telecommunications, an acoustic coupler is an interface device for coupling electrical signals by acoustical means—usually into and out of a telephone.

<span class="mw-page-title-main">Frequency-division multiplexing</span> Signal processing technique in telecommunications

In telecommunications, frequency-division multiplexing (FDM) is a technique by which the total bandwidth available in a communication medium is divided into a series of non-overlapping frequency bands, each of which is used to carry a separate signal. This allows a single transmission medium such as a microwave radio link, cable or optical fiber to be shared by multiple independent signals. Another use is to carry separate serial bits or segments of a higher rate signal in parallel.

<span class="mw-page-title-main">Audio feedback</span> Howling caused by a circular path in an audio system

Audio feedback is a positive feedback situation which may occur when an acoustic path exists between an audio input and an audio output. In this example, a signal received by the microphone is amplified and passed out of the loudspeaker. The sound from the loudspeaker can then be received by the microphone again, amplified further, and then passed out through the loudspeaker again. The frequency of the resulting howl is determined by resonance frequencies in the microphone, amplifier, and loudspeaker, the acoustics of the room, the directional pick-up and emission patterns of the microphone and loudspeaker, and the distance between them. The principles of audio feedback were first discovered by Danish scientist Søren Absalon Larsen, hence it is also known as the Larsen effect.

<span class="mw-page-title-main">Handsfree</span>

Handsfree is an adjective describing equipment that can be used without the use of hands or, in a wider sense, equipment which needs only limited use of hands, or for which the controls are positioned so that the hands are able to occupy themselves with another task without needing to hunt far afield for the controls.

<span class="mw-page-title-main">Public address system</span> Electronic system for amplifying sound

A public address system is an electronic system comprising microphones, amplifiers, loudspeakers, and related equipment. It increases the apparent volume (loudness) of a human voice, musical instrument, or other acoustic sound source or recorded sound or music. PA systems are used in any public venue that requires that an announcer, performer, etc. be sufficiently audible at a distance or over a large area. Typical applications include sports stadiums, public transportation vehicles and facilities, and live or recorded music venues and events. A PA system may include multiple microphones or other sound sources, a mixing console to combine and modify multiple sources, and multiple amplifiers and loudspeakers for louder volume or wider distribution.

<span class="mw-page-title-main">Hearing aid</span> Electroacoustic device

A hearing aid is a device designed to improve hearing by making sound audible to a person with hearing loss. Hearing aids are classified as medical devices in most countries, and regulated by the respective regulations. Small audio amplifiers such as personal sound amplification products (PSAPs) or other plain sound reinforcing systems cannot be sold as "hearing aids".

<span class="mw-page-title-main">Sound reinforcement system</span> Amplified sound system for public events

A sound reinforcement system is the combination of microphones, signal processors, amplifiers, and loudspeakers in enclosures all controlled by a mixing console that makes live or pre-recorded sounds louder and may also distribute those sounds to a larger or more distant audience. In many situations, a sound reinforcement system is also used to enhance or alter the sound of the sources on the stage, typically by using electronic effects, such as reverb, as opposed to simply amplifying the sources unaltered.

A speakerphone is a telephone with a microphone and loudspeaker provided separately from those in the handset. This device allows multiple persons to participate in a conversation. The loudspeaker broadcasts the voice or voices of those on the other end of the telephone line, while the microphone captures all voices of those using the speakerphone. The term speakerphone is also sometimes used for loudspeaker, as in "put it on speakerphone".

<span class="mw-page-title-main">Telephone hybrid</span> Telephone circuit element

In analog telephony, a telephone hybrid is the component at the ends of a subscriber line of the public switched telephone network (PSTN) that converts between two-wire and four-wire forms of bidirectional audio paths. When used in broadcast facilities to enable the airing of telephone callers, the broadcast-quality telephone hybrid is known as a broadcast telephone hybrid or telephone balance unit.

A duplex communication system is a point-to-point system composed of two or more connected parties or devices that can communicate with one another in both directions. Duplex systems are employed in many communications networks, either to allow for simultaneous communication in both directions between two connected parties or to provide a reverse path for the monitoring and remote adjustment of equipment in the field. There are two types of duplex communication systems: full-duplex (FDX) and half-duplex (HDX).

Sidetone is audible feedback to someone speaking or otherwise producing sound as an indication of active transmission. Sidetone is introduced by some communications circuits and anti-sidetone circuitry is used to control its level.

Adaptive feedback cancellation is a common method of cancelling audio feedback in a variety of electro-acoustic systems such as digital hearing aids. The time varying acoustic feedback leakage paths can only be eliminated with adaptive feedback cancellation. When an electro-acoustic system with an adaptive feedback canceller is presented with a correlated input signal, a recurrent distortion artifact, entrainment is generated. There is a difference between the system identification and feedback cancellation.

Latency refers to a short period of delay between when an audio signal enters a system and when it emerges. Potential contributors to latency in an audio system include analog-to-digital conversion, buffering, digital signal processing, transmission time, digital-to-analog conversion and the speed of sound in the transmission medium.

<span class="mw-page-title-main">Underwater acoustic communication</span> Wireless technique of sending and receiving messages through water

Underwater acoustic communication is a technique of sending and receiving messages below water. There are several ways of employing such communication but the most common is by using hydrophones. Underwater communication is difficult due to factors such as multi-path propagation, time variations of the channel, small available bandwidth and strong signal attenuation, especially over long ranges. Compared to terrestrial communication, underwater communication has low data rates because it uses acoustic waves instead of electromagnetic waves.

Moving target indication (MTI) is a mode of operation of a radar to discriminate a target against the clutter. It describes a variety of techniques used for finding moving objects, like an aircraft, and filter out unmoving ones, like hills or trees. It contrasts with the modern stationary target indication (STI) technique, which uses details of the signal to directly determine the mechanical properties of the reflecting objects and thereby find targets whether they are moving or not.

<span class="mw-page-title-main">ADSL</span> DSL service where downstream bandwidth exceeds upstream bandwidth

Asymmetric digital subscriber line (ADSL) is a type of digital subscriber line (DSL) technology, a data communications technology that enables faster data transmission over copper telephone lines than a conventional voiceband modem can provide. ADSL differs from the less common symmetric digital subscriber line (SDSL). In ADSL, bandwidth and bit rate are said to be asymmetric, meaning greater toward the customer premises (downstream) than the reverse (upstream). Providers usually market ADSL as an Internet access service primarily for downloading content from the Internet, but not for serving content accessed by others.

<span class="mw-page-title-main">Modem</span> Device that modulates an analog carrier signal to encode digital information

A modulator-demodulator or modem is a computer hardware device that converts data from a digital format into a format suitable for an analog transmission medium such as telephone or radio. A modem transmits data by modulating one or more carrier wave signals to encode digital information, while the receiver demodulates the signal to recreate the original digital information. The goal is to produce a signal that can be transmitted easily and decoded reliably. Modems can be used with almost any means of transmitting analog signals, from light-emitting diodes to radio.

In telecommunication, equalization is the reversal of distortion incurred by a signal transmitted through a channel. Equalizers are used to render the frequency response—for instance of a telephone line—flat from end-to-end. When a channel has been equalized the frequency domain attributes of the signal at the input are faithfully reproduced at the output. Telephones, DSL lines and television cables use equalizers to prepare data signals for transmission.

In live sound mixing, gain before feedback (GBF) is a practical measure of how much a microphone can be amplified in a sound reinforcement system before causing audio feedback. In audiology, GBF is a measure of hearing aid performance. In both fields the amount of gain is measured in decibels at or just below the point at which the sound from the speaker driver re-enters the microphone and the system begins to ring or feed back. Potential acoustic gain (PAG) is a calculated figure representing gain that a system can support without feeding back.

A feedback suppressor is an audio signal processing device which is used in the signal path in a live sound reinforcement system to prevent or suppress audio feedback.

References

  1. "Octasic: Voice Quality Enhancement & Echo Cancellation". Archived from the original on 2014-08-21. Retrieved 14 April 2014.
  2. 1 2 Eneroth, Peter (2001). Stereophonic Acoustic Echo Cancellation: Theory and Implementation (PDF) (Thesis). Lund University. ISBN   91-7874-110-6. ISSN   1402-8662 . Retrieved 2015-06-25.
  3. 1 2 3 "Echo in Voice over IP Systems" . Retrieved 2 July 2014.
  4. Sondhi, Man Mohan (March 1967). "An adaptive echo canceler" (PDF). Bell System Technical Journal. 46 (3): 497–511. doi:10.1002/j.1538-7305.1967.tb04231.x. Archived from the original (PDF) on 2014-04-16. Retrieved 14 April 2014.
  5. 1 2 US 3500000,Kelly Jr., John L.,"Self-adaptive echo canceller",published 1970-03-10, assigned to Bell Telephone Laboratories, Inc.
  6. Murano, Kazuo; Unagami, Shigeyuki; Amano, Fumio (January 1990). "Echo Cancellation and Applications" (PDF). IEEE Communications Magazine. 28 (1): 49–55. doi:10.1109/35.46671. ISSN   0163-6804. S2CID   897792 . Retrieved 14 April 2014.
  7. Åhgren, Per (November 2005). "Acoustic Echo Cancellation and Doubletalk Detection Using Estimated Loudspeaker Impulse Responses" (PDF). IEEE Transactions on Speech and Audio Processing. 13 (6): 1231–1237. CiteSeerX   10.1.1.530.4556 . doi:10.1109/TSA.2005.851995. S2CID   2575877.
  8. "What is Echo Return Loss (ERL) and how does it affect voice quality?". Archived from the original on 2015-06-26.
  9. 1 2 "Echo Analysis for Voice over IP". Cisco Systems . Retrieved 2 July 2014.
  10. Kosanovic, Bogdan (2002-04-11). "Echo Cancellation Part 1: The Basics and Acoustic Echo Cancellation". EE Times . Retrieved 7 July 2014.
  11. Holte, N.; Stueflotten, S. (1981). "A New Digital Echo Canceler for Two-Wire Subscriber Lines". IEEE Transactions on Communications. 29 (11): 1573–1581. doi:10.1109/TCOM.1981.1094923. ISSN   1558-0857.
  12. US 4237463,Bjor, Håkon E.&Raad, Bjørn H.,"Directional coupler",published 1980-12-02, assigned to Elektrisk Bureau A/S