Parametric stereo

Last updated

Parametric stereo (abbreviated as PS) [1] is an audio compression algorithm used as an audio coding format for digital audio. It is considered an Audio Object Type of MPEG-4 Part 3 (MPEG-4 Audio) that serves to enhance the coding efficiency of low bandwidth stereo audio media. Parametric Stereo digitally codes a stereo audio signal by storing the audio as monaural alongside a small amount of extra information. This extra information (defined as "parametric overhead") describes how the monaural signal will behave across both stereo channels, which allows for the signal to exist in true stereo upon playback.

Contents

History

Background

Advanced Audio Coding Low Complexity (AAC LC) combined with Spectral Band Replication (SBR) and Parametric Stereo (PS) was defined as HE-AAC v2. An HE-AAC v1 decoder will only give mono sound when decoding an HE-AAC v2 bitstream. Parametric Stereo performs sparse coding in the spatial domain, somewhat similar to what SBR does in the frequency domain. An AAC HE v2 bitstream is obtained by downmixing the stereo audio to mono at the encoder along with 2–3 kbit/s of side info (the Parametric Stereo information) in order to describe the spatial intensity stereo generation and ambience regeneration at the decoder. By having the Parametric Stereo side info along with the mono audio stream, the decoder (player) can regenerate a faithful spatial approximation of the original stereo panorama at very low bitrates. Because only one audio channel is transmitted, along with the parametric side info, a 24 kbit/s coded audio signal with Parametric Stereo will be substantially improved in quality relative to discrete stereo audio signals encoded with conventional means. The additional bitrate spent on the single mono channel (combined with some PS side info) will substantially improve the perceived quality of the audio compared to a standard stereo stream at similar bitrate. However, this technique is only useful at the lowest bitrates (approx. 16–48 kbit/s and down to 14.4 kbps in xHE-AAC used in DRM) to give a good stereo impression, so while it can improve perceived quality at very low bitrates, it generally does not achieve transparency, since simulating the stereo dynamics of the audio with the technique is limited and generally deteriorates perceived quality regardless of the bitrate.

Development

The development of Parametric Stereo was as a result of necessity to further enhance the coding efficiency of audio in low bandwidth stereo media. It has gone through various iterations and improvements, however, it was first standardized as an algorithm when included in the feature set of MPEG-4 Audio. [1] Parametric Stereo was originally developed in Stockholm, Sweden by companies Philips and Coding Technologies, and was first unveiled in Naples, Italy, in 2004 during the 7th International Conference on Digital Audio Effects (DAFx'04). [2]

Approaches

The implementation in MPEG-4 is based on specifying the relative amount, delay, and correlation (coherence) of left and right channels by each frequency band in the mixed mono audio. Special handling is given to transient signals, as the approach would otherwise cause unacceptable delays. Compared to intensity stereo coding, which does not record delay or correlation, PS can provide more ambience. [2]

Modifications to PS continue to be proposed.

MPEG Surround uses a technique related to PS.

See also

Related Research Articles

<span class="mw-page-title-main">MP3</span> Digital audio format

MP3 is a coding format for digital audio developed largely by the Fraunhofer Society in Germany under the lead of Karlheinz Brandenburg, with support from other digital scientists in other countries. Originally defined as the third audio format of the MPEG-1 standard, it was retained and further extended — defining additional bit-rates and support for more audio channels — as the third audio format of the subsequent MPEG-2 standard. A third version, known as MPEG-2.5 — extended to better support lower bit rates — is commonly implemented, but is not a recognized standard.

MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s without excessive quality loss, making video CDs, digital cable/satellite TV and digital audio broadcasting (DAB) practical.

Windows Media Audio (WMA) is a series of audio codecs and their corresponding audio coding formats developed by Microsoft. It is a proprietary technology that forms part of the Windows Media framework. WMA consists of four distinct codecs. The original WMA codec, known simply as WMA, was conceived as a competitor to the popular MP3 and RealAudio codecs. WMA Pro, a newer and more advanced codec, supports multichannel and high resolution audio. A lossless codec, WMA Lossless, compresses audio data without loss of audio fidelity. WMA Voice, targeted at voice content, applies compression using a range of low bit rates. Microsoft has also developed a digital container format called Advanced Systems Format to store audio encoded by WMA.

Adaptive Transform Acoustic Coding (ATRAC) is a family of proprietary audio compression algorithms developed by Sony. MiniDisc was the first commercial product to incorporate ATRAC, in 1992. ATRAC allowed a relatively small disc like MiniDisc to have the same running time as CD while storing audio information with minimal perceptible loss in quality. Improvements to the codec in the form of ATRAC3, ATRAC3plus, and ATRAC Advanced Lossless followed in 1999, 2002, and 2006 respectively.

Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. Designed to be the successor of the MP3 format, AAC generally achieves higher sound quality than MP3 encoders at the same bit rate.

MPEG-4 Part 3 or MPEG-4 Audio is the third part of the ISO/IEC MPEG-4 international standard developed by Moving Picture Experts Group. It specifies audio coding methods. The first version of ISO/IEC 14496-3 was published in 1999.

<span class="mw-page-title-main">Digital Radio Mondiale</span> Digital radio broadcasting standard

Digital Radio Mondiale is a set of digital audio broadcasting technologies designed to work over the bands currently used for analogue radio broadcasting including AM broadcasting—particularly shortwave—and FM broadcasting. DRM is more spectrally efficient than AM and FM, allowing more stations, at higher quality, into a given amount of bandwidth, using xHE-AAC audio coding format. Various other MPEG-4 codecs and Opus are also compatible, but the standard now specifies xHE-AAC.

Musepack or MPC is an open source lossy audio codec, specifically optimized for transparent compression of stereo audio at bitrates of 160–180 kbit/s. It was formerly known as MPEGplus, MPEG+ or MP+.

<span class="mw-page-title-main">High-Efficiency Advanced Audio Coding</span> Audio codec

High-Efficiency Advanced Audio Coding (HE-AAC) is an audio coding format for lossy data compression of digital audio defined as an MPEG-4 Audio profile in ISO/IEC 14496–3. It is an extension of Low Complexity AAC (AAC-LC) optimized for low-bitrate applications such as streaming audio. The usage profile HE-AAC v1 uses spectral band replication (SBR) to enhance the modified discrete cosine transform (MDCT) compression efficiency in the frequency domain. The usage profile HE-AAC v2 couples SBR with Parametric Stereo (PS) to further enhance the compression efficiency of stereo signals.

FAAC or Freeware Advanced Audio Coder is a software project which includes the AAC encoder FAAC and decoder FAAD2. It supports MPEG-2 AAC as well as MPEG-4 AAC. It supports several MPEG-4 Audio object types, file formats, multichannel and gapless encoding/decoding and MP4 metadata tags. The encoder and decoder is compatible with standard-compliant audio applications using one or more of these object types and facilities. It also supports Digital Radio Mondiale.

Dolby Digital Plus, also known as Enhanced AC-3, is a digital audio compression scheme developed by Dolby Labs for the transport and storage of multi-channel digital audio. It is a successor to Dolby Digital (AC-3), and has a number of improvements over that codec, including support for a wider range of data rates, an increased channel count, and multi-program support, as well as additional tools (algorithms) for representing compressed data and counteracting artifacts. Whereas Dolby Digital (AC-3) supports up to five full-bandwidth audio channels at a maximum bitrate of 640 kbit/s, E-AC-3 supports up to 15 full-bandwidth audio channels at a maximum bitrate of 6.144 Mbit/s.

On December 28, 2005, XM Satellite Radio announced that it had teamed up with Neural Audio Corporation to introduce XM HD Surround, XM HD Surround broadcasts XM channels in 5.1 digital surround sound. For a time XM broadcast a variety of special shows and live music performances in this format. According to the press release, Denon, Onkyo, Pioneer Electronics (USA) Inc., and Yamaha will introduced home audio systems capable of playing XM HD Surround in 2006.

Perceptual Audio Coder (PAC) is a lossy audio compression algorithm. It is used by Sirius Satellite Radio for their digital audio radio service.

MPEG Surround, also known as Spatial Audio Coding (SAC) is a lossy compression format for surround sound that provides a method for extending mono or stereo audio services to multi-channel audio in a backwards compatible fashion. The total bit rates used for the core and the MPEG Surround data are typically only slightly higher than the bit rates used for coding of the core. MPEG Surround adds a side-information stream to the core bit stream, containing spatial image data. Legacy stereo playback systems will ignore this side-information while players supporting MPEG Surround decoding will output the reconstructed multi-channel audio.

Constrained Energy Lapped Transform (CELT) is an open, royalty-free lossy audio compression format and a free software codec with especially low algorithmic delay for use in low-latency audio communication. The algorithms are openly documented and may be used free of software patent restrictions. Development of the format was maintained by the Xiph.Org Foundation and later coordinated by the Opus working group of the Internet Engineering Task Force (IETF).

<span class="mw-page-title-main">ABNT NBR 15602</span>

The audio and video compression aspects of the Brazilian Digital Terrestrial Television Standards are described in the three documents published by ABNT, the Brazilian Association of Technical Standards, the ABNT NBR 15602-1:2007 - Digital terrestrial television - Video coding, audio coding and multiplexing - Part 1: Video coding; ABNT NBR 15602-2:2007 - Digital terrestrial television - Video coding, audio coding and multiplexing - Part 2: Audio coding; and ABNT NBR 15602-3:2007 - Digital terrestrial television - Video coding, audio coding and multiplexing - Part 3: Multiplexing signals.

In audio engineering, joint encoding refers to a joining of several channels of similar information during encoding in order to obtain higher quality, a smaller file size, or both.

Unified Speech and Audio Coding (USAC) is an audio compression format and codec for both music and speech or any mix of speech and audio using very low bit rates between 12 and 64 kbit/s. It was developed by Moving Picture Experts Group (MPEG) and was published as an international standard ISO/IEC 23003-3 and also as an MPEG-4 Audio Object Type in ISO/IEC 14496-3:2009/Amd 3 in 2012.

Fraunhofer FDK AAC is an open-source library for encoding and decoding digital audio in the Advanced Audio Coding (AAC) format. Fraunhofer IIS, developed this library for Android 4.1. It supports several Audio Object Types including MPEG-2 and MPEG-4 AAC LC, HE-AAC, HE-AACv2 as well AAC-LD and AAC-ELD for real-time communication. The encoding library supports sample rates up to 96 kHz and up to eight channels.

<span class="mw-page-title-main">ECMA-407</span>

ECMA-407 is the world's first approved international 3D audio standard for the unrestricted delivery of channel-based, object-based and scene-based signals up to NHK 22.2 developed by Ecma TC32-TG22 in close cooperation with France Télévisions, Radio France, École Polytechnique Fédérale de Lausanne and McGill University in Montreal.

References

  1. 1 2 Breebaart, Jeroen; Par, Steven; Kohlrausch, Armin; Schuijers, Erik (2005-06-01). "Parametric Coding of Stereo Audio". EURASIP Journal on Advances in Signal Processing. 2005 (9): 561917. Bibcode:2005EJASP2005..284B. doi: 10.1155/ASP.2005.1305 .
  2. 1 2 Purnhagen, Heiko (October 5–8, 2004). "LOW COMPLEXITY PARAMETRIC STEREO CODING IN MPEG-4" (PDF). 7th International Conference on Digital Audio Effects: 163–168.
  3. Jimmy Lapierre; R. Lefebvre (2006). On Improving Parametric Stereo Audio Coding. AES 120th Convention, Paris, France.
  4. Pang, Hee-Suk (5 October 2009). "Pilot-Based Coding Scheme for Parametric Stereo in Enhanced aacPlus". ETRI Journal. 31 (5): 613–615. doi:10.4218/etrij.09.0209.0193. S2CID   61177149.
  5. Elfitri, Ikhwana; Kurnia, Rahmadi; Harneldi, Defry (October 2014). Experimental study on improved parametric stereo for bit rate scalable audio coding (PDF). 2014 6th International Conference on Information Technology and Electrical Engineering (ICITEE). pp. 1–5. doi:10.1109/ICITEED.2014.7007922.