Internet Speech Audio Codec

Last updated
internet Speech Audio Codec
Internet media type
audio/isac [1]
Developed by Global IP Solutions, now Google Inc
Type of formatAudio compression format
Codec
Developer(s) Global IP Solutions, now Google Inc
Written in C
Operating system Cross-platform
Type Audio codec, reference implementation
License formerly proprietary, now 3-clause BSD
Website webrtc.org

internet Speech Audio Codec (iSAC) is a wideband speech codec, developed by Global IP Solutions (GIPS) (acquired by Google Inc in 2011). [2] [3] It is suitable for VoIP applications and streaming audio. The encoded blocks have to be encapsulated in a suitable protocol for transport, e.g. RTP.

Contents

It is one of the codecs used by AIM Triton, the Gizmo5, QQ, and Google Talk. It was formerly a proprietary codec licensed by Global IP Solutions. As of June 2011, it is part of open source WebRTC project, [4] which includes a royalty-free license for iSAC when using the WebRTC codebase. [5]

Parameters and features

See also

Related Research Articles

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

Speex is an audio compression codec specifically tuned for the reproduction of human speech and also a free software speech codec that may be used on voice over IP applications and podcasts. It is based on the code excited linear prediction speech coding algorithm. Its creators claim Speex to be free of any patent restrictions and it is licensed under the revised (3-clause) BSD license. It may be used with the Ogg container format or directly transmitted over UDP/RTP. It may also be used with the FLV container format.

The Adaptive Multi-Rateaudio codec is an audio compression format optimized for speech coding. AMR is a multi-rate narrowband speech codec that encodes narrowband (200–3400 Hz) signals at variable bit rates ranging from 4.75 to 12.2 kbit/s with toll quality speech starting at 7.4 kbit/s.

Full Rate was the first digital speech coding standard used in the GSM digital mobile phone system. It uses linear predictive coding (LPC). The bit rate of the codec is 13 kbit/s, or 1.625 bits/audio sample. The quality of the coded speech is quite poor by modern standards, but at the time of development it was a good compromise between computational complexity and quality, requiring only on the order of a million additions and multiplications per second. The codec is still widely used in networks around the world. Gradually FR will be replaced by Enhanced Full Rate (EFR) and Adaptive Multi-Rate (AMR) standards, which provide much higher speech quality with lower bit rate.

Adaptive Multi-Rate Wideband (AMR-WB) is a patented wideband speech audio coding standard developed based on Adaptive Multi-Rate encoding, using a similar methodology to algebraic code-excited linear prediction (ACELP). AMR-WB provides improved speech quality due to a wider speech bandwidth of 50–7000 Hz compared to narrowband speech coders which in general are optimized for POTS wireline quality of 300–3400 Hz. AMR-WB was developed by Nokia and VoiceAge and it was first specified by 3GPP.

<span class="mw-page-title-main">G.722</span> ITU-T recommendation

G.722 is an ITU-T standard 7 kHz wideband audio codec operating at 48, 56 and 64 kbit/s. It was approved by ITU-T in November 1988. Technology of the codec is based on sub-band ADPCM (SB-ADPCM). The corresponding narrow-band codec based on the same technology is G.726.

<span class="mw-page-title-main">G.722.1</span> ITU-T Recommendation

G.722.1 is a licensed royalty-free ITU-T standard audio codec providing high quality, moderate bit rate wideband (50 Hz – 7 kHz audio bandwidth, 16 ksps audio coding. It is a partial implementation of Siren 7 audio coding format developed by PictureTel Corp.. Its official name is Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss. It uses a modified discrete cosine transform audio data compression algorithm.

Extended Adaptive Multi-Rate – Wideband (AMR-WB+) is an audio codec that extends AMR-WB. It adds support for stereo signals and higher sampling rates. Another main improvement is the use of transform coding additionally to ACELP. This greatly improves the generic audio coding. Automatic switching between transform coding and ACELP provides both good speech and audio quality with moderate bit rates.

Variable-Rate Multimode Wideband (VMR-WB) is a source-controlled variable-rate multimode codec designed for robust encoding/decoding of wideband/narrowband speech. The operation of VMR-WB is controlled by speech signal characteristics and by traffic condition of the network. Depending on the traffic conditions and the desired quality of service (QoS), one of the 4 operational modes is used. All operating modes of the existing VMR-WB standard are fully compliant with cdma2000 rate-set II. VMR-WB modes 0, 1, and 2 are cdma2000 native modes with mode 0 providing the highest quality and mode 2 the lowest ADR. VMR-WB mode 3 is the AMR-WB interoperable mode operating at an ADR slightly higher than mode 0 and providing a quality equal or better than that of AMR-WB at 12.65 kbit/s when in an interoperable interconnection with AMR-WB at 12.65 kbit/s.

Internet Low Bitrate Codec (iLBC) is a royalty-free narrowband speech audio coding format and an open-source reference implementation (codec), developed by Global IP Solutions (GIPS) formerly Global IP Sound. It was formerly freeware with limitations on commercial use, but since 2011 it is available under a free software/open source license as a part of the open source WebRTC project. It is suitable for VoIP applications, streaming audio, archival and messaging. The algorithm is a version of block-independent linear predictive coding, with the choice of data frame lengths of 20 and 30 milliseconds. The encoded blocks have to be encapsulated in a suitable protocol for transport, usually the Real-time Transport Protocol (RTP).

<span class="mw-page-title-main">G.729.1</span> ITU-T Recommendation

G.729.1 is an 8-32 kbit/s embedded speech and audio codec providing bitstream interoperability with G.729, G.729 Annex A and G.729 Annex B. Its official name is G.729-based embedded variable bit rate codec: An 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729. It was introduced in 2006.

Siren is a family of patented, transform-based, wideband audio coding formats and their audio codec implementations developed and licensed by PictureTel Corporation. There are three Siren codecs: Siren 7, Siren 14 and Siren 22.

G.719 is an ITU-T standard audio coding format providing high quality, moderate bit rate wideband audio coding at low computational load. It was produced through a collaboration between Polycom and Ericsson.

Wideband audio, also known as wideband voice or HD voice, is high definition voice quality for telephony audio, contrasted with standard digital telephony "toll quality". It extends the frequency range of audio signals transmitted over telephone lines, resulting in higher quality speech. The range of the human voice extends from 100 Hz to 17 kHz but traditional, voiceband or narrowband telephone calls limit audio frequencies to the range of 300 Hz to 3.4 kHz. Wideband audio relaxes the bandwidth limitation and transmits in the audio frequency range of 50 Hz to 7 kHz. In addition, some wideband codecs may use a higher audio bit depth of 16 bits to encode samples, also resulting in much better voice quality.

<span class="mw-page-title-main">G.718</span> ITU-T Recommendation

G.718 is an ITU-T Recommendation embedded scalable speech and audio codec providing high quality narrowband speech over the lower bit rates and high quality wideband speech over the complete range of bit rates. In addition, G.718 is designed to be highly robust to frame erasures, thereby enhancing the speech quality when used in Internet Protocol (IP) transport applications on fixed, wireless and mobile networks. Despite its embedded nature, the codec also performs well with both narrowband and wideband generic audio signals. The codec has an embedded scalable structure, enabling maximum flexibility in the transport of voice packets through IP networks of today and in future media-aware networks. In addition, the embedded structure of G.718 will easily allow the codec to be extended to provide a superwideband and stereo capability through additional layers which are currently under development in ITU-T Study Group 16. The bitstream may be truncated at the decoder side or by any component of the communication system to instantaneously adjust the bit rate to the desired value without the need for out-of-band signalling. The encoder produces an embedded bitstream structured in five layers corresponding to the five available bit rates: 8, 12, 16, 24 & 32 kbit/s.

Adaptive differential pulse-code modulation (ADPCM) is a variant of differential pulse-code modulation (DPCM) that varies the size of the quantization step, to allow further reduction of the required data bandwidth for a given signal-to-noise ratio.

SILK is an audio compression format and audio codec developed by Skype Limited, now a Microsoft subsidiary. It was developed for use in Skype, as a replacement for the SVOPC codec. Since licensing out, it has also been used by others. It has been extended to the Internet standard Opus codec.

Global IP Solutions was a United States-based corporation that developed real-time voice and video processing software for IP networks, before it was acquired by Google in May 2010. The company delivered embedded software that enabled real-time communications capabilities for video and voice over IP (VoIP). GIPS was perhaps best known for developing the narrowband iLBC and wideband iSAC speech codecs.

<span class="mw-page-title-main">Opus (audio format)</span> Lossy audio coding format

Opus is a lossy audio coding format developed by the Xiph.Org Foundation and standardized by the Internet Engineering Task Force, designed to efficiently code speech and general audio in a single format, while remaining low-latency enough for real-time interactive communication and low-complexity enough for low-end embedded processors. Opus replaces both Vorbis and Speex for new applications, and several blind listening tests have ranked it higher-quality than any other standard audio format at any given bitrate until transparency is reached, including MP3, AAC, and HE-AAC.

References

  1. 1 2 3 Grand, Tina le; Jones, Paul; Huart, Pascal; Shabestary, Turaj Zakizadeh; Alvestrand, Harald T. (2013). "RTP Payload Format for the iSAC Codec" . Retrieved 2016-04-30.
  2. Dana Blankenhorn (2010-05-18). "Why Google bought Global IP Solutions". ZDNet . Archived from the original on May 21, 2010. Retrieved 2011-06-23.
  3. "iLBC Freeware". Archived from the original on 2011-07-05. Retrieved 2011-06-23.
  4. webrtc.org/faq/#what-is-the-isac-audio-codec Archived 2011-06-07 at the Wayback Machine .
  5. webrtc.org/license/additional-ip-grant/ Archived 2017-11-13 at the Wayback Machine .
  6. 1 2 "WebRTC FAQ - What are the parameters of iSAC?". Archived from the original on 2016-11-17. Retrieved 2011-06-23.
  7. 1 2 "WebRTC components". Archived from the original on 2011-06-28. Retrieved 2011-06-23.