Siren (codec)

Last updated

Siren is a family of patented, transform-based, wideband audio coding formats and their audio codec implementations developed and licensed by PictureTel Corporation (acquired by Polycom, Inc. in 2001). [1] There are three Siren codecs: Siren 7, Siren 14 and Siren 22.

Contents

Editions

Siren 7 (or Siren7 or simply Siren) provides 7 kHz audio, bit rates 16, 24, 32 kbit/s and sampling frequency 16 kHz. Siren is derived from PictureTel's PT716plus algorithm. [2] In 1999, ITU-T approved G.722.1 recommendation, which is based on Siren 7 algorithm. It was approved after a four-year selection process involving extensive testing. [2] G.722.1 provides only bit rates 24 and 32 kbit/s and does not support Siren 7's bit rate 16 kbit/s. [3] [4] The algorithm of Siren 7 is identical to its successor, G.722.1, although the data formats are slightly different.

Siren 14 (or Siren14) provides 14 kHz audio, bit rates 24, 32, 48 kbit/s for mono, 48, 64, 96 kbit/s for stereo and sampling frequency 32 kHz. Siren 14 supports stereo and mono audio. It offers 40 millisecond algorithmic delay, using 20 millisecond frame lengths. The mono version of Siren 14 became ITU-T G.722.1C (14 kHz, 24/32/48 kbit/s) in April 2005. [5] [6] [7] The algorithm is based on transform coding technology, using a modulated lapped transform (MLT), [8] a type of discrete cosine transform (DCT) [9] or modified discrete cosine transform (MDCT). [10]

Siren 22 (or Siren22) provides 22 kHz audio, sampling frequency 48 kHz, bit rates 64, 96, 128 kbit/s stereo and 32, 48, 64 kbit/s mono. Siren 22 offers 40 millisecond algorithmic delay using 20 millisecond frame lengths. In May 2008, ITU-T approved the new G.719 full-band codec which is based on Polycom Siren 22 audio technology and Ericsson's advanced audio techniques. [11] [12]

Software support

Siren 7 is commonly used in videoconferencing systems and is also part of Microsoft Office Communicator when using A/V conferencing. Microsoft Office Communications Server uses Siren 7 during audio conferencing. With the default Office Communicator client, point to point audio is by default performed using Microsoft's proprietary codec RTAudio. When a call is promoted into an audio conference (any time 3 or more participants have joined), the codec is switched on the fly to Siren. This is done for performance reasons. Note that even if the conference is reduced to below 3 participants, OCS does not demote the conference to be point-to-point; it remains an A/V conference until the conference is terminated.

In Windows XP and later versions of Windows, the Siren 7 codec is implemented in %systemroot%\system32\SIRENACM.DLL. It is used by MSN Messenger and Live Messenger for sending and receiving voice clips and also as one of the available codecs for the 'Computer Call' feature. [13] [14] [15]

FreeSWITCH communication open source software can do transcoding, conferencing and bridging of Siren 7/G.722.1 and Siren 14/G.722.1C audio formats. [16] [17] [18]

aMSN, an open source Windows Live Messenger clone uses for Siren audio compression and decompression the "libsiren" library, an open source implementation of the codec, written by aMSN developer Youness Alaoui (KaKaRoTo) . [19] The libsiren library has also been copied into libmsn and into the msn-pecan project, which provides plug-in for Pidgin and Adium instant messaging clients. [19] [20] [21] [22] [23]

Licensing

Usage of Siren 7 and Siren 14 audio coding formats require the licensing of patents from Polycom, in most countries. A royalty free licence for Siren 7 and Siren 14 is available from Polycom if certain fairly basic conditions are met. [4] [17] [24] [25] [26] [27] [28]

Usage of Siren 22 also requires the licensing of patents from Polycom. [26]

See also

Related Research Articles

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

Speex is an audio compression codec specifically tuned for the reproduction of human speech and also a free software speech codec that may be used on voice over IP applications and podcasts. It is based on the code excited linear prediction speech coding algorithm. Its creators claim Speex to be free of any patent restrictions and it is licensed under the revised (3-clause) BSD license. It may be used with the Ogg container format or directly transmitted over UDP/RTP. It may also be used with the FLV container format.

<span class="mw-page-title-main">G.723.1</span> ITU-T Recommendation

G.723.1 is an audio codec for voice that compresses voice audio in 30 ms frames. An algorithmic look-ahead of 7.5 ms duration means that total algorithmic delay is 37.5 ms. Its official name is Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s. It is sometimes associated with a Truespeech trademark in coprocessors produced by DSP Group.

<span class="mw-page-title-main">G.711</span> ITU-T recommendation

G.711 is a narrowband audio codec originally designed for use in telephony that provides toll-quality audio at 64 kbit/s. It is an ITU-T standard (Recommendation) for audio encoding, titled Pulse code modulation (PCM) of voice frequencies released for use in 1972.

The Adaptive Multi-Rateaudio codec is an audio compression format optimized for speech coding. AMR is a multi-rate narrowband speech codec that encodes narrowband (200–3400 Hz) signals at variable bit rates ranging from 4.75 to 12.2 kbit/s with toll quality speech starting at 7.4 kbit/s.

Adaptive Multi-Rate Wideband (AMR-WB) is a patented wideband speech audio coding standard developed based on Adaptive Multi-Rate encoding, using a similar methodology to algebraic code-excited linear prediction (ACELP). AMR-WB provides improved speech quality due to a wider speech bandwidth of 50–7000 Hz compared to narrowband speech coders which in general are optimized for POTS wireline quality of 300–3400 Hz. AMR-WB was developed by Nokia and VoiceAge and it was first specified by 3GPP.

<span class="mw-page-title-main">G.729</span> ITU-T Recommendation

G.729 is a royalty-free narrow-band vocoder-based audio data compression algorithm using a frame length of 10 milliseconds. It is officially described as Coding of speech at 8 kbit/s using code-excited linear prediction speech coding (CS-ACELP), and was introduced in 1996. The wide-band extension of G.729 is called G.729.1, which equals G.729 Annex J.

<span class="mw-page-title-main">G.722</span> ITU-T recommendation

G.722 is an ITU-T standard 7 kHz wideband audio codec operating at 48, 56 and 64 kbit/s. It was approved by ITU-T in November 1988. Technology of the codec is based on sub-band ADPCM (SB-ADPCM). The corresponding narrow-band codec based on the same technology is G.726.

<span class="mw-page-title-main">G.722.1</span> ITU-T Recommendation

G.722.1 is a licensed royalty-free ITU-T standard audio codec providing high quality, moderate bit rate wideband (50 Hz – 7 kHz audio bandwidth, 16 ksps audio coding. It is a partial implementation of Siren 7 audio coding format developed by PictureTel Corp.. Its official name is Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss. It uses a modified discrete cosine transform audio data compression algorithm.

Internet Low Bitrate Codec (iLBC) is a royalty-free narrowband speech audio coding format and an open-source reference implementation (codec), developed by Global IP Solutions (GIPS) formerly Global IP Sound. It was formerly freeware with limitations on commercial use, but since 2011 it is available under a free software/open source license as a part of the open source WebRTC project. It is suitable for VoIP applications, streaming audio, archival and messaging. The algorithm is a version of block-independent linear predictive coding, with the choice of data frame lengths of 20 and 30 milliseconds. The encoded blocks have to be encapsulated in a suitable protocol for transport, usually the Real-time Transport Protocol (RTP).

The following tables compare general and technical information for a variety of audio coding formats.

internet Speech Audio Codec (iSAC) is a wideband speech codec, developed by Global IP Solutions (GIPS). It is suitable for VoIP applications and streaming audio. The encoded blocks have to be encapsulated in a suitable protocol for transport, e.g. RTP.

G.719 is an ITU-T standard audio coding format providing high quality, moderate bit rate wideband audio coding at low computational load. It was produced through a collaboration between Polycom and Ericsson.

Wideband audio, also known as wideband voice or HD voice, is high definition voice quality for telephony audio, contrasted with standard digital telephony "toll quality". It extends the frequency range of audio signals transmitted over telephone lines, resulting in higher quality speech. The range of the human voice extends from 100 Hz to 17 kHz but traditional, voiceband or narrowband telephone calls limit audio frequencies to the range of 300 Hz to 3.4 kHz. Wideband audio relaxes the bandwidth limitation and transmits in the audio frequency range of 50 Hz to 7 kHz. In addition, some wideband codecs may use a higher audio bit depth of 16 bits to encode samples, also resulting in much better voice quality.

<span class="mw-page-title-main">G.718</span> ITU-T Recommendation

G.718 is an ITU-T Recommendation embedded scalable speech and audio codec providing high quality narrowband speech over the lower bit rates and high quality wideband speech over the complete range of bit rates. In addition, G.718 is designed to be highly robust to frame erasures, thereby enhancing the speech quality when used in Internet Protocol (IP) transport applications on fixed, wireless and mobile networks. Despite its embedded nature, the codec also performs well with both narrowband and wideband generic audio signals. The codec has an embedded scalable structure, enabling maximum flexibility in the transport of voice packets through IP networks of today and in future media-aware networks. In addition, the embedded structure of G.718 will easily allow the codec to be extended to provide a superwideband and stereo capability through additional layers which are currently under development in ITU-T Study Group 16. The bitstream may be truncated at the decoder side or by any component of the communication system to instantaneously adjust the bit rate to the desired value without the need for out-of-band signalling. The encoder produces an embedded bitstream structured in five layers corresponding to the five available bit rates: 8, 12, 16, 24 & 32 kbit/s.

Adaptive differential pulse-code modulation (ADPCM) is a variant of differential pulse-code modulation (DPCM) that varies the size of the quantization step, to allow further reduction of the required data bandwidth for a given signal-to-noise ratio.

Constrained Energy Lapped Transform (CELT) is an open, royalty-free lossy audio compression format and a free software codec with especially low algorithmic delay for use in low-latency audio communication. The algorithms are openly documented and may be used free of software patent restrictions. Development of the format was maintained by the Xiph.Org Foundation and later coordinated by the Opus working group of the Internet Engineering Task Force (IETF).

aptX Family of proprietary audio codecs owned by Qualcomm

aptX is a family of proprietary audio codec compression algorithms owned by Qualcomm, with a heavy emphasis on wireless audio applications.

<span class="mw-page-title-main">Opus (audio format)</span> Lossy audio coding format

Opus is a lossy audio coding format developed by the Xiph.Org Foundation and standardized by the Internet Engineering Task Force, designed to efficiently code speech and general audio in a single format, while remaining low-latency enough for real-time interactive communication and low-complexity enough for low-end embedded processors. Opus replaces both Vorbis and Speex for new applications, and several blind listening tests have ranked it higher-quality than any other standard audio format at any given bitrate until transparency is reached, including MP3, AAC, and HE-AAC.

References

  1. Business Wire (2001-03-26). "PictureTel Announces New Siren Wideband Audio Technology Licensing Program". thefreelibrary.com. Archived from the original on 2012-10-13. Retrieved 2009-09-10.{{cite web}}: |author= has generic name (help)
  2. 1 2 Business Wire (2000-07-19). "PictureTel Licenses Audio Technology Suite to Intel". thefreelibrary.com. Archived from the original on 2012-10-13. Retrieved 2009-09-10.{{cite web}}: |author= has generic name (help)
  3. (2008-08-05) Polycom Enables Acceleration of HD Voice Adoption by Offering Royalty-Free Codec Archived 2013-02-01 at archive.today , Retrieved 2009-09-07
  4. 1 2 "Polycom Siren/G 722.1 FAQs". Polycom, Inc. Retrieved 2009-09-07.
  5. Polycom, Inc. (2005-04-12) ITU Approves Polycom Siren14 as New International Standard, Retrieved 2009-09-07
  6. "Polycom Siren 14/G 722.1C". Polycom, Inc. Retrieved 2009-09-07.
  7. "ITU Approves Polycom Siren14 as New International Standard". BusinessWire.com. 2005-04-12. Retrieved 2009-09-10.
  8. Siren 14 information for Prospective Licensees (PDF), retrieved 2010-06-08
  9. Hersent, Olivier; Petit, Jean-Pierre; Gurle, David (2005). Beyond VoIP Protocols: Understanding Voice Technology and Networking Techniques for IP Telephony. John Wiley & Sons. p. 55. ISBN   9780470023631.
  10. Britanak, Vladimir; Rao, K. R. (2017). Cosine-/Sine-Modulated Filter Banks: General Properties, Fast Algorithms and Integer Approximations. Springer. p. 478. ISBN   9783319610801.
  11. "Polycom Siren 22". Polycom, Inc. Retrieved 2009-09-07.
  12. "G.719: The First ITU-T Standard for Full-Band Audio" (PDF). Polycom, Inc. April 2009. Retrieved 2009-09-07.
  13. "Siren". MultimediaWiki. Retrieved 2009-09-07.
  14. "MPlayer - Status of codecs support". MultimediaWiki. Retrieved 2009-09-07.
  15. Microsoft (November 2001). "Media Support in the Microsoft Windows Real-Time Communications Platform". Microsoft. Retrieved 2009-09-07.
  16. "FreeSWITCH First to Support Polycom's 32khz HD-Audio". FreeSWITCH. 2008-12-15. Archived from the original on 2009-05-08. Retrieved 2009-09-07.
  17. 1 2 "libg722_1 - COPYING". FreeSWITCH. Retrieved 2014-07-19.
  18. "libg722_1 - README". FreeSWITCH. Retrieved 2014-07-19.
  19. 1 2 KaKaRoTo (2008-02-12) MSN Protocol documentation Archived 2013-05-24 at the Wayback Machine , Pidgin.im mailinglist, Retrieved 2009-09-08
  20. "msn-pecan 0.0.18 released, now with voice clips support". msn-pecan. 2009-02-16. Retrieved 2014-07-19.
  21. "msn-pecan". msn-pecan. Retrieved 2009-09-07.
  22. "Libmsn - is a reusable, open-source, fully documented library for connecting to Microsoft's MSN Messenger service". Libmsn project at Sourceforge.net. 2009. Retrieved 2009-09-07.
  23. "SCM Repositories - libmsn - libsiren". Libmsn project at Sourceforge.net. 2009. Retrieved 2009-09-07.
  24. Xiph.Org Foundation (2009). "CELT - Codec Feature Comparison". Xiph.Org Foundation. Archived from the original on 2009-09-12. Retrieved 2009-09-07.
  25. Xiph.Org Foundation (2006). "Speex - Codec Quality Comparison". Xiph.Org Foundation. Retrieved 2009-09-07.
  26. 1 2 Polycom, Inc. "Siren7/Siren14/G.719 License info". Polycom, Inc. Retrieved 2009-09-07.
  27. Polycom, Inc. "Polycom Siren 14/G 722.1C FAQs - What are the terms on the free license?". Polycom, Inc. Retrieved 2009-09-07.
  28. Greg Galitzine (2008-08-06). "Polycom CTO Discusses Siren 7 HD Voice Codec". TMCnet.com. Retrieved 2014-07-19.