Wideband audio

Last updated

Audio bands in telephony [1]
NameRange (Hz)
Narrowband 300–3,400
Wideband50–7,000
Superwideband50–14,000
Fullband 20–20,000

Wideband audio, also known as wideband voice or HD voice, is high definition voice quality for telephony audio, contrasted with standard digital telephony "toll quality". It extends the frequency range of audio signals transmitted over telephone lines, resulting in higher quality speech. The range of the human voice extends from 100 Hz to 17 kHz [2] but traditional, voiceband or narrowband telephone calls limit audio frequencies to the range of 300 Hz to 3.4 kHz. Wideband audio relaxes the bandwidth limitation and transmits in the audio frequency range of 50 Hz to 7 kHz. [3] [1] In addition, some wideband codecs may use a higher audio bit depth of 16 bits to encode samples, also resulting in much better voice quality.[ citation needed ]

Contents

Wideband codecs have a typical sample rate of 16 kHz. For superwideband codecs the typical value is 32 kHz. [1]

History

In 1987, the International Telecommunication Union (ITU) standardized a version of wideband audio known as G.722. Radio broadcasters began using G.722 over Integrated Services Digital Network (ISDN) to provide high-quality audio for remote broadcasts, such as commentary from sports venues. AMR-WB (G.722.2) was developed by Nokia and VoiceAge and it was first specified by 3GPP.

The traditional telephone network (PSTN) is generally limited to narrowband audio by the intrinsic nature of its transmission technology, TDM (time-division multiplexing), and by the analogue-to-digital converters used at the edge of the network, as well as the speakers, microphones and other elements in the endpoints themselves.

Wideband audio has been broadly deployed in conjunction with videoconferencing. Providers of this technology quickly discovered that despite the explicit emphasis on video transmission, the quality of the participant experience was significantly influenced by the fidelity of the associated audio signal.

Communications via Voice over Internet Protocol (VoIP) can readily employ wideband audio. When PC-to-PC calls are placed via VoIP services, such as Skype, and the participants use a high-quality headset, the resulting call quality can be noticeably superior to conventional PSTN calls. A number of audio codecs have emerged to support these services, supplementing G.722.

Manufacturers of audio conferencing equipment have introduced wideband-capable models that include support for G.722 over VoIP.

Conference calls are a direct beneficiary of the enhancements offered by wideband audio. Participants often struggle to figure out who is talking or to understand accented speakers. Misunderstandings are commonplace due primarily to generally poor audio quality and accumulation of background noise.

Some listener benefits cited of wideband audio compared to traditional (narrowband):

Despite its reputation for poor audio quality, the mobile telephone industry has started to make some progress on wideband audio. The 3GPP standards group has designated G.722.2 as its wideband codec and calls it Adaptive Multirate – Wideband (AMR-WB). More than a hundred handsets have been introduced supporting this codec (for example, Apple, Google, HTC, Nokia, Samsung and Sony), and network demonstrations have been conducted.[ citation needed ]

Deployment

VoIP

As business telephone systems have adopted VoIP technology, support for wideband audio has grown rapidly. Telephone sets from Avaya, Cisco, NEC Unified Solutions, Grandstream, Gigaset, Panasonic (which brands wideband audio "HD Sonic"), Polycom (which brands wideband audio "HD Voice"), Snom, AudioCodes (which brands wideband audio "HDVoIP") and others now incorporate G.722, as well as varying degrees of higher-quality audio components.

Suppliers of integrated circuits for telephony equipment, including DSP Group, Broadcom, Infineon, and Texas Instruments, include wideband audio in their feature portfolios. There are audio conferencing service providers that support wideband connections from these and other VoIP endpoints, while also permitting PSTN participants to join the conference in narrowband. sipXtapi is an open-source solution for VoIP media processing engine supporting wideband and HD voice that provides RTP and codecs through a plugin framework for use with SIP and other VoIP protocols. Skype uses an audio codec called Silk which allows for extremely high-quality audio.

A number of carriers around the world have rolled out HD voice services based on the G.722 wideband standard. In North America, hosted service providers have recently[ when? ] deployed the Aastra Hi-Q upgrade to its installed user base and as of January 2010 claimed around 70,000 HD voice endpoints. The consumer service provider ooma has an estimated 25,000 HD voice endpoints deployed stemming from its rollout of its second-generation Telo hardware.

VoLTE

In cellular communication, "HD Voice" specifically refers to AMR-WB (G.722.2) in VoLTE, but AMR-WB in turn does not specify quality or bitrate. Likewise for HD Voice+ and AMR-WB+. GSMA has an HD trademark and runs two certification program around the HD and HD+ logos. [4]

AMR-WB is natively supported in Android since Android Gingerbread, [5] and in iOS since the iPhone 5. [6]

As of December 2015, a report announces 117 commercial mobile HD Voice networks launched in 76 countries. [7]

Many mobile networks including AT&T [8] and Verizon are discontinuing support for phones that don't support 4G and wideband audio.

Wideband audio coding standards

The following are wideband audio coding standards and audio codecs used in telecommunication. [9]

ITU-T

YearWideband audio coding standard Wideband speech coding algorithmRef
1988 G.722 SB-ADPCM [10]
1999 G.722.1 (Siren7) MDCT [11]
2003G.722.2 (Adaptive Multi-Rate Wideband) ACELP [12]
2006 G.729.1 MDCT [13]
2008 G.711.1 MDCT [14]
G.718 MDCT [15]

GSMA

3GPP

Others

Related Research Articles

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

Speex is an audio compression codec specifically tuned for the reproduction of human speech and also a free software speech codec that may be used on voice over IP applications and podcasts. It is based on the code excited linear prediction speech coding algorithm. Its creators claim Speex to be free of any patent restrictions and it is licensed under the revised (3-clause) BSD license. It may be used with the Ogg container format or directly transmitted over UDP/RTP. It may also be used with the FLV container format.

Voice over Internet Protocol (VoIP), also called IP telephony, is a method and group of technologies for voice calls for the delivery of voice communication sessions over Internet Protocol (IP) networks, such as the Internet.

The Adaptive Multi-Rateaudio codec is an audio compression format optimized for speech coding. AMR is a multi-rate narrowband speech codec that encodes narrowband (200–3400 Hz) signals at variable bit rates ranging from 4.75 to 12.2 kbit/s with toll quality speech starting at 7.4 kbit/s.

Full Rate was the first digital speech coding standard used in the GSM digital mobile phone system. It uses linear predictive coding (LPC). The bit rate of the codec is 13 kbit/s, or 1.625 bits/audio sample. The quality of the coded speech is quite poor by modern standards, but at the time of development it was a good compromise between computational complexity and quality, requiring only on the order of a million additions and multiplications per second. The codec is still widely used in networks around the world. Gradually FR will be replaced by Enhanced Full Rate (EFR) and Adaptive Multi-Rate (AMR) standards, which provide much higher speech quality with lower bit rate.

Adaptive Multi-Rate Wideband (AMR-WB) is a patented wideband speech audio coding standard developed based on Adaptive Multi-Rate encoding, using a similar methodology to algebraic code-excited linear prediction (ACELP). AMR-WB provides improved speech quality due to a wider speech bandwidth of 50–7000 Hz compared to narrowband speech coders which in general are optimized for POTS wireline quality of 300–3400 Hz. AMR-WB was developed by Nokia and VoiceAge and it was first specified by 3GPP.

<span class="mw-page-title-main">G.729</span> ITU-T Recommendation

G.729 is a royalty-free narrow-band vocoder-based audio data compression algorithm using a frame length of 10 milliseconds. It is officially described as Coding of speech at 8 kbit/s using code-excited linear prediction speech coding (CS-ACELP), and was introduced in 1996. The wide-band extension of G.729 is called G.729.1, which equals G.729 Annex J.

<span class="mw-page-title-main">G.722</span> ITU-T recommendation

G.722 is an ITU-T standard 7 kHz wideband audio codec operating at 48, 56 and 64 kbit/s. It was approved by ITU-T in November 1988. Technology of the codec is based on sub-band ADPCM (SB-ADPCM). The corresponding narrow-band codec based on the same technology is G.726.

<span class="mw-page-title-main">G.722.1</span> ITU-T Recommendation

G.722.1 is a licensed royalty-free ITU-T standard audio codec providing high quality, moderate bit rate wideband (50 Hz – 7 kHz audio bandwidth, 16 ksps audio coding. It is a partial implementation of Siren 7 audio coding format developed by PictureTel Corp.. Its official name is Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss. It uses a modified discrete cosine transform audio data compression algorithm.

Extended Adaptive Multi-Rate – Wideband (AMR-WB+) is an audio codec that extends AMR-WB. It adds support for stereo signals and higher sampling rates. Another main improvement is the use of transform coding additionally to ACELP. This greatly improves the generic audio coding. Automatic switching between transform coding and ACELP provides both good speech and audio quality with moderate bit rates.

<span class="mw-page-title-main">VoIP phone</span> Phone using one or more VoIP technologies

A VoIP phone or IP phone uses voice over IP technologies for placing and transmitting telephone calls over an IP network, such as the Internet. This is in contrast to a standard phone which uses the traditional public switched telephone network (PSTN).

H.324 is an ITU-T recommendation for voice, video and data transmission over regular analog phone lines. It uses a regular 33,600 bit/s modem for transmission, the H.263 codec for video encoding and G.723.1 for audio.

The following tables compare general and technical information for a variety of audio coding formats.

<span class="mw-page-title-main">H.323</span> Audio-visual communication signaling protocol

H.323 is a recommendation from the ITU Telecommunication Standardization Sector (ITU-T) that defines the protocols to provide audio-visual communication sessions on any packet network. The H.323 standard addresses call signaling and control, multimedia transport and control, and bandwidth control for point-to-point and multi-point conferences.

Siren is a family of patented, transform-based, wideband audio coding formats and their audio codec implementations developed and licensed by PictureTel Corporation. There are three Siren codecs: Siren 7, Siren 14 and Siren 22.

G.719 is an ITU-T standard audio coding format providing high quality, moderate bit rate wideband audio coding at low computational load. It was produced through a collaboration between Polycom and Ericsson.

<span class="mw-page-title-main">G.718</span> ITU-T Recommendation

G.718 is an ITU-T Recommendation embedded scalable speech and audio codec providing high quality narrowband speech over the lower bit rates and high quality wideband speech over the complete range of bit rates. In addition, G.718 is designed to be highly robust to frame erasures, thereby enhancing the speech quality when used in Internet Protocol (IP) transport applications on fixed, wireless and mobile networks. Despite its embedded nature, the codec also performs well with both narrowband and wideband generic audio signals. The codec has an embedded scalable structure, enabling maximum flexibility in the transport of voice packets through IP networks of today and in future media-aware networks. In addition, the embedded structure of G.718 will easily allow the codec to be extended to provide a superwideband and stereo capability through additional layers which are currently under development in ITU-T Study Group 16. The bitstream may be truncated at the decoder side or by any component of the communication system to instantaneously adjust the bit rate to the desired value without the need for out-of-band signalling. The encoder produces an embedded bitstream structured in five layers corresponding to the five available bit rates: 8, 12, 16, 24 & 32 kbit/s.

<span class="mw-page-title-main">Voice over LTE</span> High-speed wireless communication functionality

Voice over LTE (VoLTE) is an LTE high-speed wireless communication standard for voice calls using mobile phones and data terminals. VoLTE has up to three times more voice and data capacity than older 3G UMTS and up to six times more than 2G GSM. It uses less bandwidth because VoLTE's packet headers are smaller than those of unoptimized VoIP/LTE. VoLTE calls are usually charged at the same rate as other calls.

Enhanced Voice Services (EVS) is a superwideband speech audio coding standard that was developed for VoLTE and VoNR. It offers up to 20 kHz audio bandwidth and has high robustness to delay jitter and packet losses due to its channel aware coding and improved packet loss concealment. It has been developed in 3GPP and is described in 3GPP TS 26.441. The application areas of EVS consist of improved telephony and teleconferencing, audiovisual conferencing services, and streaming audio. Source code of both decoder and encoder in ANSI C is available as 3GPP TS 26.442 and is being updated regularly. Samsung uses the term HD+ when doing a call using EVS.

References

  1. 1 2 3 Cox, R. V.; Neto, S. F. De Campos; Lamblin, C.; Sherif, M. H. (October 2009). "ITU-T coders for wideband, superwideband, and fullband speech communication [Series Editorial]" . IEEE Communications Magazine. 47 (10): 106–109. doi:10.1109/MCOM.2009.5273816.
  2. "Human Voice Frequency Range". SEAINDIA. 2020-06-14. Retrieved 2022-01-24.
  3. "Answering the call of HD Voice". Global IP Sound. Retrieved 2009-09-06.
  4. "HD Voice". GSMA.
  5. "MediaRecorder.AudioEncoder". Android Developers.
  6. Buster Hein (13 September 2012). "The iPhone 5 Supports HD Voice, But You'll Never Get To Use It In The U.S."
  7. "Mobile HD Voice: Global Update report". GSA. 2014-09-19. Retrieved 2014-09-24.
  8. "Get Ready, 3G is Going Away in 2022". AT&T.
  9. "Which wideband codec to choose?". TMCnet. Retrieved 2012-11-13.
  10. ITU-T G.722 page ITU-T Recommendation G.722 (11/88), "7 kHz audio-coding within 64 kbit/s"
  11. Lutzky, Manfred; Schuller, Gerald; Gayer, Marc; Krämer, Ulrich; Wabnik, Stefan (May 2004). A guideline to audio codec delay (PDF). 116th AES Convention. Fraunhofer IIS . Audio Engineering Society . Retrieved 24 October 2019.
  12. ACELP map, VoiceAge Corporation
  13. Nagireddi, Sivannarayana (2008). VoIP Voice and Fax Signal Processing. John Wiley & Sons. p. 69. ISBN   9780470377864.
  14. Sasaki, Shigeaki; Mori, Takeshi; Hiwasaki, Yusuke; Ohmuro, Hitoshi (August 2008). "Global Standard for Wideband Speech Coding: ITU-T G.711.1 (G.711 wideband extension)". NTT Technical Review. Nippon Telegraph and Telephone.
  15. "ITU-T Work Programme". ITU.
  16. "HD Voice". Future Networks. Retrieved 2020-05-10.
  17. "Enhanced Voice Services Codec for LTE". www.3gpp.org. Retrieved 2020-05-10.