Packet loss concealment

Last updated

Packet loss concealment (PLC) is a technique to mask the effects of packet loss in voice over IP (VoIP) communications. When the voice signal is sent as VoIP packets on an IP network, the packets may (and likely will) travel different routes. A packet therefore might arrive very late, might be corrupted, or simply might not arrive at all. One example case of the last situation could be, when a packet is rejected by a server which has a full buffer and cannot accept any more data. Other cases include network congestion resulting in significant delay. In a VoIP connection, error-control techniques such as automatic repeat request (ARQ) are not feasible and the receiver should be able to cope with packet loss. Packet loss concealment is the inclusion in a design of methodologies for accounting for and compensating for the loss of voice packets.

Contents

PLC techniques

Use

PLC is used with the codecs Internet Low Bitrate Codec (iLBC) [1] [2] and SILK [3] in Skype, in Jitsi with the SILK and Opus codecs, [4] [5] and in the pjsip stack used by CSipSimple. [6] Google Duo uses WaveNetEQ, a generative model based on DeepMind/Google AI’s WaveRNN. [7]

See also

Related Research Articles

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

Speex is an audio compression codec specifically tuned for the reproduction of human speech and also a free software speech codec that may be used on VoIP applications and podcasts. It is based on the CELP speech coding algorithm. Its creators claim Speex to be free of any patent restrictions and it is licensed under the revised (3-clause) BSD license. It may be used with the Ogg container format or directly transmitted over UDP/RTP. It may also be used with the FLV container format.

Voice over Internet Protocol (VoIP), also called IP telephony, is a method and group of technologies for voice calls, the delivery of voice communication sessions over Internet Protocol (IP) networks, such as the Internet.

<span class="mw-page-title-main">G.711</span> ITU-T recommendation

G.711 is a narrowband audio codec originally designed for use in telephony that provides toll-quality audio at 64 kbit/s. It is an ITU-T standard (Recommendation) for audio encoding, titled Pulse code modulation (PCM) of voice frequencies released for use in 1972.

Internet Low Bitrate Codec (iLBC) is a royalty-free narrowband speech audio coding format and an open-source reference implementation (codec), developed by Global IP Solutions (GIPS) formerly Global IP Sound. It was formerly freeware with limitations on commercial use, but since 2011 it is available under a free software/open source license as a part of the open source WebRTC project. It is suitable for VoIP applications, streaming audio, archival and messaging. The algorithm is a version of block-independent linear predictive coding, with the choice of data frame lengths of 20 and 30 milliseconds. The encoded blocks have to be encapsulated in a suitable protocol for transport, usually the Real-time Transport Protocol (RTP).

The following tables compare general and technical information for a variety of audio coding formats.

This is a comparison of voice over IP (VoIP) software used to conduct telephone-like voice conversations across Internet Protocol (IP) based networks. For residential markets, voice over IP phone service is often cheaper than traditional public switched telephone network (PSTN) service and can remove geographic restrictions to telephone numbers, e.g., have a PSTN phone number in a New York area code ring in Tokyo.

internet Speech Audio Codec (iSAC) is a wideband speech codec, developed by Global IP Solutions (GIPS). It is suitable for VoIP applications and streaming audio. The encoded blocks have to be encapsulated in a suitable protocol for transport, e.g. RTP.

The MPEG-4 Low Delay Audio Coder is audio compression standard designed to combine the advantages of perceptual audio coding with the low delay necessary for two-way communication. It is closely derived from the MPEG-2 Advanced Audio Coding (AAC) standard. It was published in MPEG-4 Audio Version 2 and in its later revisions.

SVOPC is a compression method for audio which is used by VOIP applications. It is a lossy speech compression codec designed specifically towards communication channels suffering from packet loss. It uses more bandwidth than best bandwidth-optimised codecs, but it is packet loss resistant instead.

Wideband audio, also known as wideband voice or HD voice, is high definition voice quality for telephony audio, contrasted with standard digital telephony "toll quality". It extends the frequency range of audio signals transmitted over telephone lines, resulting in higher quality speech. The range of the human voice extends from 100 Hz to 17 kHz but traditional, voiceband or narrowband telephone calls limit audio frequencies to the range of 300 Hz to 3.4 kHz. Wideband audio relaxes the bandwidth limitation and transmits in the audio frequency range of 50 Hz to 7 kHz. In addition, some wideband codecs may use a higher audio bit depth of 16 bits to encode samples, also resulting in much better voice quality.

Constrained Energy Lapped Transform (CELT) is an open, royalty-free lossy audio compression format and a free software codec with especially low algorithmic delay for use in low-latency audio communication. The algorithms are openly documented and may be used free of software patent restrictions. Development of the format was maintained by the Xiph.Org Foundation and later coordinated by the Opus working group of the Internet Engineering Task Force (IETF).

SILK is an audio compression format and audio codec developed by Skype Limited, now a Microsoft subsidiary. It was developed for use in Skype, as a replacement for the SVOPC codec. Since licensing out, it has also been used by others. It has been extended to the Internet standard Opus codec.

Global IP Solutions was a United States-based corporation that developed real-time voice and video processing software for IP networks, before it was acquired by Google in May 2010. The company delivered embedded software that enabled real-time communications capabilities for video and voice over IP (VoIP). GIPS was perhaps best known for developing the narrowband iLBC and wideband iSAC speech codecs.

<span class="mw-page-title-main">Opus (audio format)</span> Lossy audio coding format

Opus is a lossy audio coding format developed by the Xiph.Org Foundation and standardized by the Internet Engineering Task Force, designed to efficiently code speech and general audio in a single format, while remaining low-latency enough for real-time interactive communication and low-complexity enough for low-end embedded processors. Opus replaces both Vorbis and Speex for new applications, and several blind listening tests have ranked it higher-quality than any other standard audio format at any given bitrate until transparency is reached, including MP3, AAC, and HE-AAC.

Google Duo was a proprietary voice over IP (VoIP) and videotelephony service developed by Google, available for Android, iOS and web browsers. It let users make and receive one-to-one and group audio and video calls with other Duo users in high definition, using end-to-end encryption by default. Duo could be used either with a phone number or a Google account, allowing users to call someone from their contact list.

Enhanced Voice Services (EVS) is a superwideband speech audio coding standard that was developed for VoLTE. It offers up to 20 kHz audio bandwidth and has high robustness to delay jitter and packet losses due to its channel aware coding and improved packet loss concealment. It has been developed in 3GPP and is described in 3GPP TS 26.441. The application areas of EVS consist of improved telephony and teleconferencing, audiovisual conferencing services, and streaming audio. Source code of both decoder and encoder in ANSI C is available as 3GPP TS 26.442 and is being updated regularly. Samsung uses the term HD+ when doing a call using EVS.

Lyra is a lossy audio codec developed by Google that is designed for compressing speech at very low bitrates. Unlike most other audio formats, it compresses data using a machine learning-based algorithm.

Satin is a lossy speech codec developed by Microsoft. Satin was designed to supersede the earlier Silk codec in their applications, and implements a neural network and novel signal processing to improve performance over its predecessor.

References

  1. "blog.radvision.com". Archived from the original on 2012-06-12.
  2. "ANALYSIS AND EVALUATION OF THE SKYPE AND GOOGLE-TALK VOIP SYSTEMS". CiteSeerX   10.1.1.81.4153 .
  3. "SILK_RTP_PayloadFormat.pdf" (PDF).
  4. "Archived copy". java.net. Archived from the original on 30 December 2016. Retrieved 12 January 2022.{{cite web}}: CS1 maint: archived copy as title (link)
  5. "Opus Codec". opus-codec.org.
  6. "Google Code Archive - Long-term storage for Google Code Project Hosting". code.google.com.
  7. "Improving Audio Quality in Duo with WaveNetEQ". Google AI Blog. Retrieved 2020-04-01.