Comfort noise

Last updated

Comfort noise (or comfort tone) is synthetic background noise used in radio and wireless communications to fill the artificial silence in a transmission resulting from voice activity detection or from the audio clarity of modern digital lines. [1]

Contents

Some modern telephone systems (such as wireless and VoIP) use voice activity detection (VAD), a form of squelching where low volume levels are ignored by the transmitting device. In digital audio transmissions, this saves bandwidth of the communications channel by transmitting nothing when the source volume is under a certain threshold, leaving only louder sounds (such as the speaker's voice) to be sent. However, improvements in background noise reduction technologies can occasionally result in the complete removal of all noise. Although maximizing call quality is of primary importance, exhaustive removal of noise may not properly simulate the typical behavior of terminals on the PSTN system.

Issues with silence

The result of receiving total silence, especially for a prolonged period, has a number of unwanted effects on the listener, including the following:

To counteract these effects, comfort noise is added, usually on the receiving end in wireless or VoIP systems, to fill in the silent portions of transmissions with artificial noise.

Noise

Generated comfort noise is at a low but audible volume level, and can vary based on the average volume level of received signals to minimize jarring transitions. [2]

In many VoIP products, users may control how VAD and comfort noise are configured, or disable the feature entirely. [1]

As part of the RTP audio video profile, RFC 3389 defines a standard for distributing comfort noise information in VoIP systems. [3]

Examples

Many radio stations broadcast birdsong, city-traffic or other atmospheric comfort noise during periods of deliberate silence. For example, in the UK, silence is observed on Remembrance Sunday, and London's quiet city ambiance is used. This is to reassure the listener that the station is on-air, but primarily to prevent silence detection systems at transmitters from automatically starting backup tapes of music (designed to be broadcast in the case of transmission link failure). [4]

During the siege of Leningrad, the beat of a metronome was used as comfort noise on the Leningrad radio network, indicating that the network was still functioning. [5]

A similar concept is that of sidetone, the effect of sound that is picked up by a telephone's mouthpiece and introduced (at low level) into the earpiece of the same handset, acting as feedback.

See also

Related Research Articles

The Real-time Transport Protocol (RTP) is a network protocol for delivering audio and video over IP networks. RTP is used in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications including WebRTC, television services and web-based push-to-talk features.

The Session Initiation Protocol (SIP) is a signaling protocol used for initiating, maintaining, and terminating communication sessions that include voice, video and messaging applications. SIP is used in Internet telephony, in private IP telephone systems, as well as mobile phone calling over LTE (VoLTE).

Speex is an audio compression codec specifically tuned for the reproduction of human speech and also a free software speech codec that may be used on voice over IP applications and podcasts. It is based on the code excited linear prediction speech coding algorithm. Its creators claim Speex to be free of any patent restrictions and it is licensed under the revised (3-clause) BSD license. It may be used with the Ogg container format or directly transmitted over UDP/RTP. It may also be used with the FLV container format.

Voice over Internet Protocol (VoIP), also called IP telephony, is a method and group of technologies for voice calls for the delivery of voice communication sessions over Internet Protocol (IP) networks, such as the Internet.

<span class="mw-page-title-main">G.711</span> ITU-T recommendation

G.711 is a narrowband audio codec originally designed for use in telephony that provides toll-quality audio at 64 kbit/s. It is an ITU-T standard (Recommendation) for audio encoding, titled Pulse code modulation (PCM) of voice frequencies released for use in 1972.

In telecommunications, in-band signaling is the sending of control information within the same band or channel used for data such as voice or video. This is in contrast to out-of-band signaling which is sent over a different channel, or even over a separate network. In-band signals may often be heard by telephony participants, while out-of-band signals are inaccessible to the user. The term is also used more generally, for example of computer data files that include both literal data, and metadata and/or instructions for how to process the literal data.

The Adaptive Multi-Rateaudio codec is an audio compression format optimized for speech coding. AMR is a multi-rate narrowband speech codec that encodes narrowband (200–3400 Hz) signals at variable bit rates ranging from 4.75 to 12.2 kbit/s with toll quality speech starting at 7.4 kbit/s.

<span class="mw-page-title-main">Telephone hybrid</span> Telephone circuit element

In analog telephony, a telephone hybrid is the component at the ends of a subscriber line of the public switched telephone network (PSTN) that converts between two-wire and four-wire forms of bidirectional audio paths. When used in broadcast facilities to enable the airing of telephone callers, the broadcast-quality telephone hybrid is known as a broadcast telephone hybrid or telephone balance unit.

VoIP spam or SPIT is unsolicited, automatically dialed telephone calls, typically using voice over Internet Protocol (VoIP) technology.

Adaptive Multi-Rate Wideband (AMR-WB) is a patented wideband speech audio coding standard developed based on Adaptive Multi-Rate encoding, using a similar methodology to algebraic code-excited linear prediction (ACELP). AMR-WB provides improved speech quality due to a wider speech bandwidth of 50–7000 Hz compared to narrowband speech coders which in general are optimized for POTS wireline quality of 300–3400 Hz. AMR-WB was developed by Nokia and VoiceAge and it was first specified by 3GPP.

<span class="mw-page-title-main">G.729</span> ITU-T Recommendation

G.729 is a royalty-free narrow-band vocoder-based audio data compression algorithm using a frame length of 10 milliseconds. It is officially described as Coding of speech at 8 kbit/s using code-excited linear prediction speech coding (CS-ACELP), and was introduced in 1996. The wide-band extension of G.729 is called G.729.1, which equals G.729 Annex J.

Voice activity detection (VAD), also known as speech activity detection or speech detection, is the detection of the presence or absence of human speech, used in speech processing. The main uses of VAD are in speaker diarization, speech coding and speech recognition. It can facilitate speech processing, and can also be used to deactivate some processes during non-speech section of an audio session: it can avoid unnecessary coding/transmission of silence packets in Voice over Internet Protocol (VoIP) applications, saving on computation and on network bandwidth.

<span class="mw-page-title-main">VoIP phone</span> Phone using one or more VoIP technologies

A VoIP phone or IP phone uses voice over IP technologies for placing and transmitting telephone calls over an IP network, such as the Internet. This is in contrast to a standard phone which uses the traditional public switched telephone network (PSTN).

Discontinuous transmission (DTX) is a means by which a mobile telephone is temporarily shut off or muted while the phone lacks a voice input.

The term silence suppression is used in telephony to describe the process of not transmitting information over the network when one of the parties involved in a telephone call is not speaking, thereby reducing bandwidth usage.

In audio and broadcast engineering, Audio over Ethernet is the use of an Ethernet-based network to distribute real-time digital audio. AoE replaces bulky snake cables or audio-specific installed low-voltage wiring with standard network structured cabling in a facility. AoE provides a reliable backbone for any audio application, such as for large-scale sound reinforcement in stadiums, airports and convention centers, multiple studios or stages.

Wideband audio, also known as wideband voice or HD voice, is high definition voice quality for telephony audio, contrasted with standard digital telephony "toll quality". It extends the frequency range of audio signals transmitted over telephone lines, resulting in higher quality speech. The range of the human voice extends from 100 Hz to 17 kHz but traditional, voiceband or narrowband telephone calls limit audio frequencies to the range of 300 Hz to 3.4 kHz. Wideband audio relaxes the bandwidth limitation and transmits in the audio frequency range of 50 Hz to 7 kHz. In addition, some wideband codecs may use a higher audio bit depth of 16 bits to encode samples, also resulting in much better voice quality.

Silence compression is an audio processing technique used to effectively encode silent intervals, reducing the amount of storage or bandwidth needed to transmit audio recordings.

<span class="mw-page-title-main">Opus (audio format)</span> Lossy audio coding format

Opus is a lossy audio coding format developed by the Xiph.Org Foundation and standardized by the Internet Engineering Task Force, designed to efficiently code speech and general audio in a single format, while remaining low-latency enough for real-time interactive communication and low-complexity enough for low-end embedded processors. Opus replaces both Vorbis and Speex for new applications, and several blind listening tests have ranked it higher-quality than any other standard audio format at any given bitrate until transparency is reached, including MP3, AAC, and HE-AAC.

RTP-MIDI is a protocol to transport MIDI messages within Real-time Transport Protocol (RTP) packets over Ethernet and WiFi networks. It is completely open and free, and is compatible both with LAN and WAN application fields. Compared to MIDI 1.0, RTP-MIDI includes new features like session management, device synchronization and detection of lost packets, with automatic regeneration of lost data. RTP-MIDI is compatible with real-time applications, and supports sample-accurate synchronization for each MIDI message.

References

  1. 1 2 "Troubleshooting Hissing and Static: Comfort Noise and VAD". CISCO. Retrieved 18 July 2014.
  2. USpatent 7649988,Suppapola, Seth; Ebenezer, Samuel Ponvara& Allen, Justin L.,"Comfort noise generator using modified Doblinger noise estimate"
  3. Real-time Transport Protocol (RTP) Payload for Comfort Noise (CN). September 2002. doi: 10.17487/RFC3389 . RFC 3389.
  4. "RB-SD1 Silence Detect Unit". Sonifex. Retrieved 2013-05-28.
  5. "Radio". Encyclopaedia of St. Petersburg.