G.729

Last updated
G.729
Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP)
Pcm.svg
StatusIn force
Latest version(10/17)
October 2017
Organization ITU-T
Committee ITU-T Study Group 16
Related standards G.191, G.711, G.729.1
Domain audio compression
LicenseFreely available
Website https://www.itu.int/rec/T-REC-G.729

G.729 is a royalty-free [1] narrow-band vocoder-based audio data compression algorithm using a frame length of 10 milliseconds. It is officially described as Coding of speech at 8 kbit/s using code-excited linear prediction speech coding (CS-ACELP), and was introduced in 1996. [2] The wide-band extension of G.729 is called G.729.1, which equals G.729 Annex J.

Contents

Because of its low bandwidth requirements, G.729 is mostly used in voice over Internet Protocol (VoIP) applications when bandwidth must be conserved. Standard G.729 operates at a bit rate of 8 kbit/s, but extensions provide rates of 6.4 kbit/s (Annex D, F, H, I, C+) and 11.8 kbit/s (Annex E, G, H, I, C+) for worse and better speech quality, respectively.

G.729 has been extended with various features, commonly designated as G.729a and G.729b:

Dual-tone multi-frequency signaling (DTMF), fax transmissions, and high-quality audio cannot be transported reliably with this codec. DTMF requires the use of the named telephony events in the RTP payload for DTMF digits, telephony tones, and telephony signals as specified in RFC 4733.

G.729 annexes

FunctionalityG.729 Annexes [4]
-ABCDEFGHIC+J
Low complexityXX
Fixed-pointXXXXXXXXXX
Floating-pointXX
8 kbit/sXXXXXXXXXXXX
6.4 kbit/sXXXXX
11.8 kbit/sXXXXX
DTX XXXXX
Embedded
variable bit rate,
wideband
X

G.729 Annex A

G.729a is a compatible extension of G.729, but requires less computational power. This lower complexity, however, bears the cost of marginally reduced speech quality.

G.729a was developed by a consortium of organizations: France Télécom, Mitsubishi Electric Corporation, Nippon Telegraph and Telephone Corporation (NTT).

The features of G.729a are:

Some VoIP phones incorrectly use the description "G729a/8000" in SDP (e.g. this affects some Cisco and Linksys phones). This is incorrect as G729a is an alternative method of encoding the audio, but still generates data decodable by either G729 or G729a - i.e. there is no difference in terms of codec negotiation. Since the SDP RFC allows static payload types to be overridden by the textual rtpmap description this can cause problems calling from these phones to endpoints adhering to the RFC unless the codec is renamed in their settings since they will not recognise 'G729a' as 'G729' without a specific workaround in place for the bug.

G.729 Annex B

G.729 has been extended in Annex B (G.729b) which provides a silence compression method that enables a voice activity detection (VAD) module. It is used to detect voice activity in the signal. It also includes a discontinuous transmission (DTX) module which decides on updating the background noise parameters for non speech (noisy frames). It uses 2-byte Silence Insertion Descriptor (SID) frames transmitted to initiate comfort noise generation (CNG). If transmission is stopped, and the link goes quiet because of no speech, the receiving side might assume that the link has been cut. By inserting comfort noise, analog hiss is simulated digitally during silence to assure the receiver that the link is active and operational.

G.729 Annex J (G.729.1)

G.729 Annex J, maintained by G.729.1, provides support for wideband speech and audio. Introduced in 2006, [3] it defines variable bit-rate wideband enhancement using up to 12 hierarchical layers. The core layer is an 8 kbit/s G.729 bitstream, the second layer is a 4 kbit/s narrowband enhancement layer, and the third 2 kibt/s layer is a bandwidth enhancement layer. Further layers provide wideband enhancement in 2 kbit/s steps. The G.729.1 uses three-stage coding: embedded code-excited linear prediction (CELP) coding of the lower band, parametric coding of the higher band by Time-Domain Bandwidth Extension (TDBWE), and enhancement of the full band by a predictive transform coding algorithm called time-domain aliasing cancellation (TDAC), also known as modified discrete cosine transform (MDCT) coding. [3] Bit rate and the obtained quality are adjustable by simple bitstream truncation.

Licensing

As of January 1, 2017, the patent terms of most licensed patents under the G.729 Consortium have expired, the remaining unexpired patents are usable on a royalty-free basis. [5] G.729 includes patents from several companies which were until the expiry licensed by Sipro Lab Telecom, the authorized Intellectual Property Licensing Administrator for G.729 technology and patent pool. [6] [7] [8] [9]

Past patent litigation

AIM IP LLC, a California Limited Liability Company based in Mission Viejo, CA [10] filed 17 patent infringement lawsuits [11] in the Central District Courts of California accusing 22 different companies, including, Cisco Systems, Polycom and others of infringing U.S. Patent No. 5,920,853. [12] [13] The '853 patent was filed at the United States Patent and Trademark Office in 1996 by Rockwell International. The inventors listed on the '853 patent are Benyassine Adil, Su Huan-Yu and Shlomot Eyal. [14]

In 2000, the '853 patent was assigned by Rockwell International to Conexant Systems, [15] an American-based software developer and fabless semiconductor company, which began as a division of Rockwell before being spun-off as its own public company. [16] In 2010, the '853 patent was sold by Conexant Systems to AIM IP LLC, a California Limited Liability Company based in Mission Viejo. [15]

The '853 patent contains patent claims which cover lookup tables used in G.729. The patent has since expired and is no longer in force due to its patent term expiring. [17]

RTP payload type

G.729 is assigned the static payload type 18 for RTP by IANA. [18] The rtpmap parameter description for this payload type is "G729/8000".

Both G.729a and G.729b use the same rtpmap description as G.729. G.729a and G.729b are indicated using annexb=no or annexb=yes, respectively. G.729 Annex B (G.729b) is the default in absence of parameter annexb in the Session Description Protocol. [19]

See also

Related Research Articles

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

Speex is an audio compression codec specifically tuned for the reproduction of human speech and also a free software speech codec that may be used on voice over IP applications and podcasts. It is based on the code excited linear prediction speech coding algorithm. Its creators claim Speex to be free of any patent restrictions and it is licensed under the revised (3-clause) BSD license. It may be used with the Ogg container format or directly transmitted over UDP/RTP. It may also be used with the FLV container format.

<span class="mw-page-title-main">G.723.1</span> ITU-T Recommendation

G.723.1 is an audio codec for voice that compresses voice audio in 30 ms frames. An algorithmic look-ahead of 7.5 ms duration means that total algorithmic delay is 37.5 ms. Its official name is Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s. It is sometimes associated with a Truespeech trademark in coprocessors produced by DSP Group.

<span class="mw-page-title-main">G.711</span> ITU-T recommendation

G.711 is a narrowband audio codec originally designed for use in telephony that provides toll-quality audio at 64 kbit/s. It is an ITU-T standard (Recommendation) for audio encoding, titled Pulse code modulation (PCM) of voice frequencies released for use in 1972.

The Adaptive Multi-Rateaudio codec is an audio compression format optimized for speech coding. AMR is a multi-rate narrowband speech codec that encodes narrowband (200–3400 Hz) signals at variable bit rates ranging from 4.75 to 12.2 kbit/s with toll quality speech starting at 7.4 kbit/s.

Full Rate was the first digital speech coding standard used in the GSM digital mobile phone system. It uses linear predictive coding (LPC). The bit rate of the codec is 13 kbit/s, or 1.625 bits/audio sample. The quality of the coded speech is quite poor by modern standards, but at the time of development it was a good compromise between computational complexity and quality, requiring only on the order of a million additions and multiplications per second. The codec is still widely used in networks around the world. Gradually FR will be replaced by Enhanced Full Rate (EFR) and Adaptive Multi-Rate (AMR) standards, which provide much higher speech quality with lower bit rate.

Adaptive Multi-Rate Wideband (AMR-WB) is a patented wideband speech audio coding standard developed based on Adaptive Multi-Rate encoding, using a similar methodology to algebraic code-excited linear prediction (ACELP). AMR-WB provides improved speech quality due to a wider speech bandwidth of 50–7000 Hz compared to narrowband speech coders which in general are optimized for POTS wireline quality of 300–3400 Hz. AMR-WB was developed by Nokia and VoiceAge and it was first specified by 3GPP.

<span class="mw-page-title-main">G.722</span> ITU-T recommendation

G.722 is an ITU-T standard 7 kHz wideband audio codec operating at 48, 56 and 64 kbit/s. It was approved by ITU-T in November 1988. Technology of the codec is based on sub-band ADPCM (SB-ADPCM). The corresponding narrow-band codec based on the same technology is G.726.

<span class="mw-page-title-main">G.722.1</span> ITU-T Recommendation

G.722.1 is a licensed royalty-free ITU-T standard audio codec providing high quality, moderate bit rate wideband (50 Hz – 7 kHz audio bandwidth, 16 ksps audio coding. It is a partial implementation of Siren 7 audio coding format developed by PictureTel Corp.. Its official name is Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss. It uses a modified discrete cosine transform audio data compression algorithm.

<span class="mw-page-title-main">G.726</span> ITU-T Recommendation

G.726 is an ITU-T ADPCM speech codec standard covering the transmission of voice at rates of 16, 24, 32, and 40 kbit/s. It was introduced to supersede both G.721, which covered ADPCM at 32 kbit/s, and G.723, which described ADPCM for 24 and 40 kbit/s. G.726 also introduced a new 16 kbit/s rate. The four bit rates associated with G.726 are often referred to by the bit size of a sample, which are 2, 3, 4, and 5-bits respectively. The corresponding wide-band codec based on the same technology is G.722.

Extended Adaptive Multi-Rate – Wideband (AMR-WB+) is an audio codec that extends AMR-WB. It adds support for stereo signals and higher sampling rates. Another main improvement is the use of transform coding additionally to ACELP. This greatly improves the generic audio coding. Automatic switching between transform coding and ACELP provides both good speech and audio quality with moderate bit rates.

Variable-Rate Multimode Wideband (VMR-WB) is a source-controlled variable-rate multimode codec designed for robust encoding/decoding of wideband/narrowband speech. The operation of VMR-WB is controlled by speech signal characteristics and by traffic condition of the network. Depending on the traffic conditions and the desired quality of service (QoS), one of the 4 operational modes is used. All operating modes of the existing VMR-WB standard are fully compliant with cdma2000 rate-set II. VMR-WB modes 0, 1, and 2 are cdma2000 native modes with mode 0 providing the highest quality and mode 2 the lowest ADR. VMR-WB mode 3 is the AMR-WB interoperable mode operating at an ADR slightly higher than mode 0 and providing a quality equal or better than that of AMR-WB at 12.65 kbit/s when in an interoperable interconnection with AMR-WB at 12.65 kbit/s.

Internet Low Bitrate Codec (iLBC) is a royalty-free narrowband speech audio coding format and an open-source reference implementation (codec), developed by Global IP Solutions (GIPS) formerly Global IP Sound. It was formerly freeware with limitations on commercial use, but since 2011 it is available under a free software/open source license as a part of the open source WebRTC project. It is suitable for VoIP applications, streaming audio, archival and messaging. The algorithm is a version of block-independent linear predictive coding, with the choice of data frame lengths of 20 and 30 milliseconds. The encoded blocks have to be encapsulated in a suitable protocol for transport, usually the Real-time Transport Protocol (RTP).

internet Speech Audio Codec (iSAC) is a wideband speech codec, developed by Global IP Solutions (GIPS). It is suitable for VoIP applications and streaming audio. The encoded blocks have to be encapsulated in a suitable protocol for transport, e.g. RTP.

Enhanced Variable Rate Codec B (EVRC-B) is a speech codec used by CDMA networks. EVRC-B is an enhancement to EVRC and compresses each 20 milliseconds of 8000 Hz, 16-bit sampled speech input into output frames of one of the four different sizes: Rate 1 - 171 bits, Rate 1/2 - 80 bits, Rate 1/4 - 40 bits, Rate 1/8 - 16 bits.

<span class="mw-page-title-main">G.729.1</span> ITU-T Recommendation

G.729.1 is an 8-32 kbit/s embedded speech and audio codec providing bitstream interoperability with G.729, G.729 Annex A and G.729 Annex B. Its official name is G.729-based embedded variable bit rate codec: An 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729. It was introduced in 2006.

G.719 is an ITU-T standard audio coding format providing high quality, moderate bit rate wideband audio coding at low computational load. It was produced through a collaboration between Polycom and Ericsson.

<span class="mw-page-title-main">G.718</span> ITU-T Recommendation

G.718 is an ITU-T Recommendation embedded scalable speech and audio codec providing high quality narrowband speech over the lower bit rates and high quality wideband speech over the complete range of bit rates. In addition, G.718 is designed to be highly robust to frame erasures, thereby enhancing the speech quality when used in Internet Protocol (IP) transport applications on fixed, wireless and mobile networks. Despite its embedded nature, the codec also performs well with both narrowband and wideband generic audio signals. The codec has an embedded scalable structure, enabling maximum flexibility in the transport of voice packets through IP networks of today and in future media-aware networks. In addition, the embedded structure of G.718 will easily allow the codec to be extended to provide a superwideband and stereo capability through additional layers which are currently under development in ITU-T Study Group 16. The bitstream may be truncated at the decoder side or by any component of the communication system to instantaneously adjust the bit rate to the desired value without the need for out-of-band signalling. The encoder produces an embedded bitstream structured in five layers corresponding to the five available bit rates: 8, 12, 16, 24 & 32 kbit/s.

Adaptive differential pulse-code modulation (ADPCM) is a variant of differential pulse-code modulation (DPCM) that varies the size of the quantization step, to allow further reduction of the required data bandwidth for a given signal-to-noise ratio.

References

  1. Michael Graves (March 6, 2017). "It's Official! The patents on G.729 have expired".
  2. "G.729: Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP)". www.itu.int. Archived from the original on 2021-04-06. Retrieved 2021-04-06.
  3. 1 2 3 Nagireddi, Sivannarayana (2008). VoIP Voice and Fax Signal Processing. John Wiley & Sons. p. 69. ISBN   9780470377864.
  4. ITU-T (January 2007). "G.729 : Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP)" (PDF): i. Retrieved 2009-07-21.{{cite journal}}: Cite journal requires |journal= (help)
  5. Sipro Lab Telecom (2017-01-28). "About G.729". Archived from the original on 2017-02-02.
  6. "Sipro Lab Telecom Website". Archived from the original on 2012-12-25. Retrieved 2007-03-31.
  7. VoiceAge Corporation (2007-10-14). "G.729 Licensing". Archived from the original on 2007-10-14. Retrieved 2009-09-17.
  8. Sipro Lab Telecom (2007-10-25). "FAQ G.729 and G.723.1". Archived from the original on 2007-10-25. Retrieved 2009-09-17.
  9. Sipro Lab Telecom (2006-10-29). "G.729 IPR Pool". Archived from the original on 2006-10-29. Retrieved 2009-09-17.
  10. "Business Search - Results". Business Search - Business Entities - Business Programs | California Secretary of State.
  11. "US 5,920,853 A - Signal compression using index mapping technique for the sharing of quantization tables | RPX Insight".
  12. "Patent Litigations Search | RPX Insight". insight.rpxcorp.com.
  13. "Aim Ip LLC v. Cisco Systems Inc et. al. patent lawsuit". Archived from the original on February 1, 2014.
  14. "Patent Public Search | USPTO". ppubs.uspto.gov.
  15. 1 2 "United States Patent and Trademark Office". assignment.uspto.gov.
  16. Mark Lapedus (November 10, 1998). "Rockwell Semi spin-off Conexant will target communications IC market". EE Times .
  17. "US5920853A - Signal compression using index mapping technique for the sharing of quantization tables". Google Patents .
  18. "Real-Time Transport Protocol (RTP) Parameters". Iana.org. Retrieved 2013-09-18.
  19. S. Casner, P. Hoschka (July 2003). "MIME Type Registration of RTP Payload Formats" . Retrieved 2013-02-27.