This article needs additional citations for verification .(April 2009) |
MPEG Surround (ISO/IEC 23003-1 [1] or MPEG-D Part 1 [2] [3] ), also known as Spatial Audio Coding (SAC) [4] [5] [6] [7] is a lossy compression format for surround sound that provides a method for extending mono or stereo audio services to multi-channel audio in a backwards compatible fashion. The total bit rates used for the (mono or stereo) core and the MPEG Surround data are typically only slightly higher than the bit rates used for coding of the (mono or stereo) core. MPEG Surround adds a side-information stream to the (mono or stereo) core bit stream, containing spatial image data. Legacy stereo playback systems will ignore this side-information while players supporting MPEG Surround decoding will output the reconstructed multi-channel audio.
Moving Picture Experts Group (MPEG) issued a call for proposals on MPEG Spatial Audio Coding in March 2004. The group decided that the technology that would be the starting point in standardization process, would be a combination of the submissions from two proponents - Fraunhofer IIS / Agere Systems and Coding Technologies / Philips. [5] The MPEG Surround standard was developed by the Moving Picture Experts Group (ISO/IEC JTC 1/SC29/WG11) and published as ISO/IEC 23003 in 2007. [1] It was the first standard of MPEG-D standards group, formally known as ISO/IEC 23003 - MPEG audio technologies.
MPEG Surround was also defined as one of the MPEG-4 Audio Object Types in 2007. [8] There is also the MPEG-4 No Delay MPEG Surround object type (LD MPEG Surround), which was published in 2010. [9] [10] The Spatial Audio Object Coding (SAOC) was published as MPEG-D Part 2 - ISO/IEC 23003–2 in 2010 and it extends MPEG Surround standard by re-using its spatial rendering capabilities while retaining full compatibility with existing receivers. MPEG SAOC system allows users on the decoding side to interactively control the rendering of each individual audio object (e.g. individual instruments, vocals, human voices). [2] [3] [11] [12] [13] [14] [15] There is also the Unified Speech and Audio Coding (USAC) which will be defined in MPEG-D Part 3 - ISO/IEC 23003-3 and ISO/IEC 14496-3:2009/Amd 3. [16] [17] MPEG-D MPEG Surround parametric coding tools are integrated into the USAC codec. [18]
The (mono or stereo) core could be coded with any (lossy or lossless) audio codec. Particularly low bitrates (64-96 kbit/s for 5.1 channels) are possible when using HE-AAC v2 as the core codec.
MPEG Surround coding uses our capacity to perceive sound in the 3D and captures that perception in a compact set of parameters. Spatial perception is primarily attributed to three parameters, or cues, describing how humans localize sound in the horizontal plane: Interaural level difference (ILD), Interaural time difference (ITD) and Interaural coherence (IC). This three concepts are illustrated in next image. Direct, or first-arrival, waveforms from the source hit the left ear at time, while direct sound received by the right ear is diffracted around the head, with time delay and level attenuation, associated. These two effects result in ITD and ILD are associated with the main source. At last, in a reverberant environment, reflected sound from the source, or sound from diffuse source, or uncorrelated sound can hit both ears, all of them are related with IC.
MPEG Surround uses interchannel differences in level, phase and coherence equivalent to the ILD, ITD and IC parameters. The spatial image is captured by a multichannel audio signal relative to a transmitted downmix signal. These parameters are encoded in a very compact form so as to decode the parameters and the transmitted signal and to synthesize a high quality multichannel representation.
MPEG Surround encoder receives a multichannel audio signal x1 to xN where the number of input channels is N. The most important aspect of the encoding process is that a downmix signal, xt1 and xt2, which is typically stereo, is derived from the multichannel input signal, and it is this downmix signal that is compressed for transmission over the channel rather than the multichannel signal. The encoder may be able to exploit the downmix process so as to be more advantageous. It not only creates a faithful equivalent of the multichannel signal in the mono or stereo downmix, but also creates the best possible multichannel decoding based on the downmix and encoded spatial cues as well. Alternatively, the downmix could be supplied externally (Artistic Downmix in before Diagram Block). The MPEG Surround encoding process could be ignored by the compression algorithm used for the transmitted channels (Audio Encoder and Audio Decoder in before Diagram Block). It could be any type of high-performance compression algorithms such as MPEG-1 Layer III, MPEG-4 AAC or MPEG-4 High Efficiency AAC, or it could even be PCM.
The spatial signals are generated and recovered in two types of filter modules. The reverse-OTT (one-to-two) generates one downmixed stream, one level difference, one coherence value, and an optional residue signal from one pair of signals. The reverse-TTT (two-to-three) element generates two downmixed streams, two level differences, one coherence value, and an optional residue signal. In both the forward (decoding) and reverse (encoding) directions, arranging these filters into a tree setup allows for arbitrary downmixing and recovery. [19]
The MPEG Surround technique allows for compatibility with existing and future stereo MPEG decoders by having the transmitted downmix (e.g. stereo) appear to stereo MPEG decoders to be an ordinary stereo version of the multichannel signal. Compatibility with stereo decoders is desirable since stereo presentation will remain pervasive due to the number of applications in which listening is primarily via headphones, such as portable music players.
MPEG Surround also supports a mode in which the downmix is compatible with popular matrix surround decoders, such as Dolby Pro-Logic. [19]
Due to the relatively small channel bandwidth, the relatively large cost of transmission equipment and transmission licenses and the desire to maximize user choices by providing many programs, the majority of existing or planned digital broadcasting systems cannot provide multichannel sound to the users.
DRM+ was designed [20] to be fully capable of transmitting MPEG Surround and such broadcasting was also successfully demonstrated. [21]
MPEG Surround's backward compatibility and relatively low overhead provides one way to add multichannel sound to DAB without severely reducing audio quality or impacting other services.
Currently, the majority of digital TV broadcasts use stereo audio coding. MPEG Surround could be used to extend these established services to surround sound, as with DAB.
Currently, a number of commercial music download services are available and working with considerable commercial success. Such services could be seamlessly extended to provide multichannel presentations while remaining compatible with stereo players: on computers with 5.1 channel playback systems the compressed sound files are presented in surround sound while on portable players the same files are reproduced in stereo.
Many Internet radios operate with severely constrained transmission bandwidth, such that they can offer only mono or stereo content. MPEG Surround Coding technology could extend this to a multichannel service while still remaining within the permissible operating range of bitrates. Since efficiency is of paramount importance in this application, compression of the transmitted audio signal is vital. Using recent MPEG compression technology (MPEG-4 High Efficiency Profile coding), full MPEG Surround systems have been demonstrated with bitrates as low as 48 kbit/s.
MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s without excessive quality loss, making video CDs, digital cable/satellite TV and digital audio broadcasting (DAB) practical.
MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth. While MPEG-2 is not as efficient as newer standards such as H.264/AVC and H.265/HEVC, backwards compatibility with existing hardware and software means it is still widely used, for example in over-the-air digital television broadcasting and in the DVD-Video standard.
MPEG-4 is a group of international standards for the compression of digital audio and visual data, multimedia systems, and file storage formats. It was originally introduced in late 1998 as a group of audio and video coding formats and related technology agreed upon by the ISO/IEC Moving Picture Experts Group (MPEG) under the formal standard ISO/IEC 14496 – Coding of audio-visual objects. Uses of MPEG-4 include compression of audiovisual data for Internet video and CD distribution, voice and broadcast television applications. The MPEG-4 standard was developed by a group led by Touradj Ebrahimi and Fernando Pereira.
Super Audio CD (SACD) is an optical disc format for audio storage introduced in 1999. It was developed jointly by Sony and Philips Electronics and intended to be the successor to the compact disc (CD) format.
MPEG-1 Audio Layer II or MPEG-2 Audio Layer II is a lossy audio compression format defined by ISO/IEC 11172-3 alongside MPEG-1 Audio Layer I and MPEG-1 Audio Layer III (MP3). While MP3 is much more popular for PC and Internet applications, MP2 remains a dominant standard for audio broadcasting.
Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. It was designed to be the successor of the MP3 format and generally achieves higher sound quality than MP3 at the same bit rate.
MPEG-4 Part 3 or MPEG-4 Audio is the third part of the ISO/IEC MPEG-4 international standard developed by Moving Picture Experts Group. It specifies audio coding methods. The first version of ISO/IEC 14496-3 was published in 1999.
Harmonic Vector Excitation Coding, abbreviated as HVXC is a speech coding algorithm specified in MPEG-4 Part 3 standard for very low bit rate speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and variable bit rate mode and sampling frequency of 8 kHz. It also operates at lower bitrates, such as 1.2 - 1.7 kbit/s, using a variable bit rate technique. The total algorithmic delay for the encoder and decoder is 36 ms.
High-Efficiency Advanced Audio Coding (HE-AAC) is an audio coding format for lossy data compression of digital audio defined as an MPEG-4 Audio profile in ISO/IEC 14496–3. It is an extension of Low Complexity AAC (AAC-LC) optimized for low-bitrate applications such as streaming audio. The usage profile HE-AAC v1 uses spectral band replication (SBR) to enhance the modified discrete cosine transform (MDCT) compression efficiency in the frequency domain. The usage profile HE-AAC v2 couples SBR with Parametric Stereo (PS) to further enhance the compression efficiency of stereo signals.
TwinVQ is an audio compression technique developed by Nippon Telegraph and Telephone Corporation (NTT) Human Interface Laboratories in 1994. The compression technique has been used in both standardized and proprietary designs.
FAAC is a software project which includes the AAC encoder FAAC and decoder FAAD2. It supports MPEG-2 AAC as well as MPEG-4 AAC. It supports several MPEG-4 Audio object types, file formats, multichannel and gapless encoding/decoding and MP4 metadata tags. The encoder and decoder is compatible with standard-compliant audio applications using one or more of these object types and facilities. It also supports Digital Radio Mondiale.
Parametric stereo is an audio compression algorithm used as an audio coding format for digital audio. It is considered an Audio Object Type of MPEG-4 Part 3 that serves to enhance the coding efficiency of low bandwidth stereo audio media. Parametric Stereo digitally codes a stereo audio signal by storing the audio as monaural alongside a small amount of extra information. This extra information describes how the monaural signal will behave across both stereo channels, which allows for the signal to exist in true stereo upon playback.
MPEG-4 Audio Lossless Coding, also known as MPEG-4 ALS, is an extension to the MPEG-4 Part 3 audio standard to allow lossless audio compression. The extension was finalized in December 2005 and published as ISO/IEC 14496-3:2005/Amd 2:2006 in 2006. The latest description of MPEG-4 ALS was published as subpart 11 of the MPEG-4 Audio standard in December 2019.
MPEG Multichannel, also known as MPEG-2 Backwards Compatible, or MPEG-2 BC, is an extension to the MPEG-1 Layer II audio compression specification, as defined in the MPEG-2 Audio standard which allows it provide up to 5.1-channels of audio. To maintain backwards compatibility with the older 2-channel (stereo) audio specification, it uses a channel matrixing scheme, where the additional channels are mixed into the two backwards compatible channels. Extra information in the data stream contains signals to process extra channels from the matrix.
MPEG-4 SLS, or MPEG-4 Scalable to Lossless as per ISO/IEC 14496-3:2005/Amd 3:2006 (Scalable Lossless Coding), is an extension to the MPEG-4 Part 3 (MPEG-4 Audio) standard to allow lossless audio compression scalable to lossy MPEG-4 General Audio coding methods (e.g., variations of AAC). It was developed jointly by the Institute for Infocomm Research (I2R) and Fraunhofer, which commercializes its implementation of a limited subset of the standard under the name of HD-AAC. Standardization of the HD-AAC profile for MPEG-4 Audio is under development (as of September 2009).
The MPEG-4 Low Delay Audio Coder is audio compression standard designed to combine the advantages of perceptual audio coding with the low delay necessary for two-way communication. It is closely derived from the MPEG-2 Advanced Audio Coding (AAC) standard. It was published in MPEG-4 Audio Version 2 and in its later revisions.
MPEG-D is a group of standards for audio coding formally known as ISO/IEC 23003 - MPEG audio technologies, published since 2007.
Part 3 of the MPEG-2 standard defines audio coding:
Unified Speech and Audio Coding (USAC) is an audio compression format and codec for both music and speech or any mix of speech and audio using very low bit rates between 12 and 64 kbit/s. It was developed by Moving Picture Experts Group (MPEG) and was published as an international standard ISO/IEC 23003-3 and also as an MPEG-4 Audio Object Type in ISO/IEC 14496-3:2009/Amd 3 in 2012.
ECMA-407 is the world's first approved international 3D audio standard for the unrestricted delivery of channel-based, object-based and scene-based signals up to NHK 22.2 developed by Ecma TC32-TG22 in close cooperation with France Télévisions, Radio France, École Polytechnique Fédérale de Lausanne and McGill University in Montreal.
{{cite web}}
: CS1 maint: numeric names: authors list (link){{citation}}
: CS1 maint: numeric names: authors list (link){{cite web}}
: CS1 maint: numeric names: authors list (link)