RTAudio

Last updated May 04, 2019

RTAudio is a Microsoft produced adaptive wide-band speech codec. It is used by Microsoft Office Communications Server (OCS) and the related OCS clients (Microsoft Office Communicator, and Microsoft Live Meeting Console).

Microsoft U.S.-headquartered technology company

Microsoft Corporation (MS) is an American multinational technology company with headquarters in Redmond, Washington. It develops, manufactures, licenses, supports and sells computer software, consumer electronics, personal computers, and related services. Its best known software products are the Microsoft Windows line of operating systems, the Microsoft Office suite, and the Internet Explorer and Edge web browsers. Its flagship hardware products are the Xbox video game consoles and the Microsoft Surface lineup of touchscreen personal computers. As of 2016, it is the world's largest software maker by revenue, and one of the world's most valuable companies. The word "Microsoft" is a portmanteau of "microcomputer" and "software". Microsoft is ranked No. 30 in the 2018 Fortune 500 rankings of the largest United States corporations by total revenue.

RTAudio was designed for real-time two-way Voice over IP (VoIP) applications. Some of the target applications include games, audio conferencing, and wireless applications over IP. RTAudio is the preferred Microsoft Real-Time audio codec, and is the default voice codec for Microsoft’s Unified Communications platforms.

Voice over Internet Protocol (VoIP), also called IP telephony, is a methodology and group of technologies for the delivery of voice communications and multimedia sessions over Internet Protocol (IP) networks, such as the Internet. The terms Internet telephony, broadband telephony, and broadband phone service specifically refer to the provisioning of communications services over the public Internet, rather than via the public switched telephone network (PSTN).

The RTAudio encoder is capable of encoding single-channel (mono), 16 bit per sample audio signals. The encoder can be configured to operate either in the Narrow Band mode (8 kHz sampling rate) or the Wide Band mode (16 kHz sampling rate). The RTAudio decoder has a built-in jitter control module as well as an error concealment module.

In electronics and telecommunications, jitter is the deviation from true periodicity of a presumably periodic signal, often in relation to a reference clock signal. In clock recovery applications it is called timing jitter. Jitter is a significant, and usually undesired, factor in the design of almost all communications links.

Licensing

RTAudio is a proprietary codec. Like RTVideo, this protocol can also be licensed from Microsoft.

RTVideo is Microsoft's default video codec for Office Communications Server 2007 and the Microsoft Office Communicator 2007 client. It is a Microsoft proprietary implementation of the VC-1 codec for real-time transmission purposes. Microsoft extensions to VC-1 are based on cached frame and SP-frame. Also it includes system-level enhancements for recovery of packet loss on IP networks - forward error correction and error concealment.

External links

Overview of the Microsoft RTAudio Speech codec

This article about software created or produced by Microsoft is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

Related Research Articles

Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

Windows Media Audio (WMA) is a series of audio codecs and their corresponding audio coding formats developed by Microsoft. It is a proprietary technology that forms part of the Windows Media framework. WMA consists of four distinct codecs. The original WMA codec, known simply as WMA, was conceived as a competitor to the popular MP3 and RealAudio codecs. WMA Pro, a newer and more advanced codec, supports multichannel and high resolution audio. A lossless codec, WMA Lossless, compresses audio data without loss of audio fidelity. WMA Voice, targeted at voice content, applies compression using a range of low bit rates. Microsoft has also developed a digital container format called Advanced Systems Format to store audio encoded by WMA.

Speex is an audio compression format specifically tuned for the reproduction of human speech and also a free software speech codec that may be used on VoIP applications and podcasts. It is based on the CELP speech coding algorithm. Speex claims to be free of any patent restrictions and is licensed under the revised (3-clause) BSD license. It may be used with the Ogg container format or directly transmitted over UDP/RTP. It may also be used with the FLV container format.

G.723.1 is an audio codec for voice that compresses voice audio in 30 ms frames. An algorithmic look-ahead of 7.5 ms duration means that total algorithmic delay is 37.5 ms. Its official name is Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s. It is sometimes associated with a Truespeech trademark in coprocessors produced by DSP Group.

The Adaptive Multi-Rate audio codec is an audio compression format optimized for speech coding. AMR speech codec consists of a multi-rate narrowband speech codec that encodes narrowband (200–3400 Hz) signals at variable bit rates ranging from 4.75 to 12.2 kbit/s with toll quality speech starting at 7.4 kbit/s.

Adaptive Multi-Rate Wideband (AMR-WB) is a patented wideband speech audio coding standard developed based on Adaptive Multi-Rate encoding, using similar methodology as algebraic code excited linear prediction (ACELP). AMR-WB provides improved speech quality due to a wider speech bandwidth of 50–7000 Hz compared to narrowband speech coders which in general are optimized for POTS wireline quality of 300–3400 Hz. AMR-WB was developed by Nokia and VoiceAge and it was first specified by 3GPP.

G.722 is an ITU-T standard 7 kHz Wideband audio codec operating at 48, 56 and 64 kbit/s. It was approved by ITU-T in November 1988. Technology of the codec is based on sub-band ADPCM (SB-ADPCM). The corresponding narrow-band codec based on the same technology is G.726.

Extended Adaptive Multi-Rate – Wideband (AMR-WB+) is an audio codec that extends AMR-WB. It adds support for stereo signals and higher sampling rates. Another main improvement is the use of transform coding additionally to ACELP. This greatly improves the generic audio coding. Automatic switching between transform coding and ACELP provides both good speech and audio quality with moderate bit rates.

Variable-Rate Multimode Wideband (VMR-WB) is a source-controlled variable-rate multimode codec designed for robust encoding/decoding of wideband/narrowband speech. The operation of VMR-WB is controlled by speech signal characteristics and by traffic condition of the network. Depending on the traffic conditions and the desired quality of service (QoS), one of the 4 operational modes is used. All operating modes of the existing VMR-WB standard are fully compliant with cdma2000 rate-set II. VMR-WB modes 0, 1, and 2 are cdma2000 native modes with mode 0 providing the highest quality and mode 2 the lowest ADR. VMR-WB mode 3 is the AMR-WB interoperable mode operating at an ADR slightly higher than mode 0 and providing a quality equal or better than that of AMR-WB at 12.65 kbit/s when in an interoperable interconnection with AMR-WB at 12.65 kbit/s.

Internet Low Bitrate Codec (iLBC) is an open-source royalty-free narrowband speech audio coding format codec and reference implementation, developed by Global IP Solutions (GIPS) formerly Global IP Sound. It was formerly freeware with limitations on commercial use, but since 2011 it is available under a free software/open source license as a part of the open source WebRTC project. It is suitable for VoIP applications, streaming audio, archival and messaging. The algorithm is a version of block-independent linear predictive coding, with the choice of data frame lengths of 20 and 30 milliseconds. The encoded blocks have to be encapsulated in a suitable protocol for transport, usually the Real-time Transport Protocol (RTP).

internet Speech Audio Codec (iSAC) is a wideband speech codec, developed by Global IP Solutions (GIPS). It is suitable for VoIP applications and streaming audio. The encoded blocks have to be encapsulated in a suitable protocol for transport, e.g. RTP.

This article describes audio APIs and components in Microsoft Windows which are now obsolete or deprecated.

An acoustic model is used in automatic speech recognition to represent the relationship between an audio signal and the phonemes or other linguistic units that make up speech. The model is learned from a set of audio recordings and their corresponding transcripts. It is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word.

Siren is a family of patented, transform-based, wideband audio coding formats and their audio codec implementations developed and licensed by PictureTel Corporation. There are three Siren codecs: Siren 7, Siren 14 and Siren 22.

Wideband audio, also known as wideband voice or HD voice, is high definition voice quality for telephony audio, contrasted with standard digital telephony "toll quality". It extends the frequency range of audio signals transmitted over telephone lines, resulting in higher quality speech. The range of the human voice extends from 80 Hz to 14 kHz but traditional, voiceband or narrowband telephone calls limit audio frequencies to the range of 300 Hz to 3.4 kHz. Wideband audio relaxes the bandwidth limitation and transmits in the audio frequency range of 50 Hz to 7 kHz or even up to 22 kHz. In addition, some wideband codecs may use a higher audio bit depth of 16-bits to encode samples, also resulting in much better voice quality.

G.718 is an ITU-T recommendation embedded scalable speech and audio codec providing high quality narrowband speech over the lower bit rates and high quality wideband speech over the complete range of bit rates. In addition, G.718 is designed to be highly robust to frame erasures, thereby enhancing the speech quality when used in internet protocol (IP) transport applications on fixed, wireless and mobile networks. Despite its embedded nature, the codec also performs well with both narrowband and wideband generic audio signals. The codec has an embedded scalable structure, enabling maximum flexibility in the transport of voice packets through IP networks of today and in future media-aware networks. In addition, the embedded structure of G.718 will easily allow the codec to be extended to provide a superwideband and stereo capability through additional layers which are currently under development in ITU-T Study Group 16. The bitstream may be truncated at the decoder side or by any component of the communication system to instantaneously adjust the bit rate to the desired value without the need for out-of-band signalling. The encoder produces an embedded bitstream structured in five layers corresponding to the five available bit rates: 8, 12, 16, 24 & 32 kbit/s.

In digital audio data reduction technology, aptX is a family of proprietary audio codec compression algorithms owned by Qualcomm.

Tandem Free Operation (TFO) is a part of ETSI's 3GPP standard specification, which has been included from R99 of the standards specifications onwards.

Opus is a lossy audio coding format developed by the Xiph.Org Foundation and standardized by the Internet Engineering Task Force, designed to efficiently code speech and general audio in a single format, while remaining low-latency enough for real-time interactive communication and low-complexity enough for low-end embedded processors. Opus replaces both Vorbis and Speex for new applications, and several blind listening tests have ranked it higher-quality than any other standard audio format at any given bitrate until transparency is reached, including MP3, AAC, and HE-AAC.

Codec 2 is a low-bitrate speech audio codec that is patent free and open source. Codec 2 compresses speech using sinusoidal coding, a method specialized for human speech. Bit rates of 3200 to 450 bit/s have been successfully created. Codec 2 was designed to be used for amateur radio and other high compression voice applications.