MPEG-4 Structured Audio

Last updated September 12, 2024

MPEG-4 Structured Audio is an ISO/IEC standard for describing sound. It was published as subpart 5 of MPEG-4 Part 3 (ISO/IEC 14496-3:1999) in 1999.^[1]^[2]^[3]^[4]

It allows the transmission of synthetic music and sound effects at very low bit rates (from 0.01 to 10 kbit/s), and the description of parametric sound post-production for mixing multiple streams and adding effects to audio scenes. It does not standardize a particular set of synthesis methods, but a method for describing synthesis methods.

The sound descriptions generate audio when compiled (or interpreted) by a compliant decoder. MPEG-4 Structured Audio consists of the following major elements:

Structured Audio Orchestra Language (SAOL), an audio programming language. SAOL is historically related to Csound and other so-called Music-N languages. It was created by MIT Media Lab grad student Eric Scheirer while he was studying under Barry Vercoe during the 1990s.
Structured Audio Score Language (SASL) - is used to describe the manner in which algorithms described in SAOL are used to produce sound.
Structured Audio Sample Bank Format (SASBF) - allows for the transmission of banks of audio samples to be used in 'wavetable' sample-based synthesis (based on SoundFont and DownLoadable Sounds)^[5]
A normative Structured Audio scheduler description - it is the supervisory run-time element of the Structured Audio decoding process.
MIDI support - provides important backward-compatibility with existing content and authoring tools.

MPEG-4 Structured Audio was cited by CNN as one of the top-25 innovations to arise at the Media Laboratory.^[6]

Related Research Articles

MPEG-4 is a group of international standards for the compression of digital audio and visual data, multimedia systems, and file storage formats. It was originally introduced in late 1998 as a group of audio and video coding formats and related technology agreed upon by the ISO/IEC Moving Picture Experts Group (MPEG) under the formal standard ISO/IEC 14496 – Coding of audio-visual objects. Uses of MPEG-4 include compression of audiovisual data for Internet video and CD distribution, voice and broadcast television applications. The MPEG-4 standard was developed by a group led by Touradj Ebrahimi and Fernando Pereira.

Wavetable synthesis is a sound synthesis technique used to create quasi-periodic waveforms often used in the production of musical tones or notes.

Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. It was designed to be the successor of the MP3 format and generally achieves higher sound quality than MP3 at the same bit rate.

MPEG-4 Part 3 or MPEG-4 Audio is the third part of the ISO/IEC MPEG-4 international standard developed by Moving Picture Experts Group. It specifies audio coding methods. The first version of ISO/IEC 14496-3 was published in 1999.

Harmonic Vector Excitation Coding, abbreviated as HVXC is a speech coding algorithm specified in MPEG-4 Part 3 standard for very low bit rate speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and variable bit rate mode and sampling frequency of 8 kHz. It also operates at lower bitrates, such as 1.2 - 1.7 kbit/s, using a variable bit rate technique. The total algorithmic delay for the encoder and decoder is 36 ms.

High-Efficiency Advanced Audio Coding (HE-AAC) is an audio coding format for lossy data compression of digital audio defined as an MPEG-4 Audio profile in ISO/IEC 14496–3. It is an extension of Low Complexity AAC (AAC-LC) optimized for low-bitrate applications such as streaming audio. The usage profile HE-AAC v1 uses spectral band replication (SBR) to enhance the modified discrete cosine transform (MDCT) compression efficiency in the frequency domain. The usage profile HE-AAC v2 couples SBR with Parametric Stereo (PS) to further enhance the compression efficiency of stereo signals.

TwinVQ is an audio compression technique developed by Nippon Telegraph and Telephone Corporation (NTT) Human Interface Laboratories in 1994. The compression technique has been used in both standardized and proprietary designs.

MPEG-4 Part 2, MPEG-4 Visual is a video compression format developed by the Moving Picture Experts Group (MPEG). It belongs to the MPEG-4 ISO/IEC standards. It uses block-wise motion compensation and a discrete cosine transform (DCT), similar to previous standards such as MPEG-1 Part 2 and H.262/MPEG-2 Part 2.

MPEG-4 Part 17, or MPEG-4 Timed Text (MP4TT), or MPEG-4 Streaming text format is the text-based subtitle format for MPEG-4, published as ISO/IEC 14496-17 in 2006. It was developed in response to the need for a generic method for coding of text as one of the multimedia components within audiovisual presentations.

MPEG-4 Part 11Scene description and application engine was published as ISO/IEC 14496-11 in 2005. MPEG-4 Part 11 is also known as BIFS, XMT, MPEG-J. It defines:

MPEG-4 Audio Lossless Coding, also known as MPEG-4 ALS, is an extension to the MPEG-4 Part 3 audio standard to allow lossless audio compression. The extension was finalized in December 2005 and published as ISO/IEC 14496-3:2005/Amd 2:2006 in 2006. The latest description of MPEG-4 ALS was published as subpart 11 of the MPEG-4 Audio standard in December 2019.

MPEG Multichannel, also known as MPEG-2 Backwards Compatible, or MPEG-2 BC, is an extension to the MPEG-1 Layer II audio compression specification, as defined in the MPEG-2 Audio standard which allows it provide up to 5.1-channels of audio. To maintain backwards compatibility with the older 2-channel (stereo) audio specification, it uses a channel matrixing scheme, where the additional channels are mixed into the two backwards compatible channels. Extra information in the data stream contains signals to process extra channels from the matrix.

<span class="mw-page-title-main">MPEG-4 SLS</span> Extension to the MPEG-4 Audio standard

MPEG-4 SLS, or MPEG-4 Scalable to Lossless as per ISO/IEC 14496-3:2005/Amd 3:2006 (Scalable Lossless Coding), is an extension to the MPEG-4 Part 3 (MPEG-4 Audio) standard to allow lossless audio compression scalable to lossy MPEG-4 General Audio coding methods (e.g., variations of AAC). It was developed jointly by the Institute for Infocomm Research (I²R) and Fraunhofer, which commercializes its implementation of a limited subset of the standard under the name of HD-AAC. Standardization of the HD-AAC profile for MPEG-4 Audio is under development (as of September 2009).

MPEG-4 Part 14, or MP4, is a digital multimedia container format most commonly used to store video and audio, but it can also be used to store other data such as subtitles and still images. Like most modern container formats, it allows streaming over the Internet. The only filename extension for MPEG-4 Part 14 files as defined by the specification is .mp4. MPEG-4 Part 14 is a standard specified as a part of MPEG-4.

The MPEG-4 Low Delay Audio Coder is audio compression standard designed to combine the advantages of perceptual audio coding with the low delay necessary for two-way communication. It is closely derived from the MPEG-2 Advanced Audio Coding (AAC) standard. It was published in MPEG-4 Audio Version 2 and in its later revisions.

MPEG Surround, also known as Spatial Audio Coding (SAC) is a lossy compression format for surround sound that provides a method for extending mono or stereo audio services to multi-channel audio in a backwards compatible fashion. The total bit rates used for the core and the MPEG Surround data are typically only slightly higher than the bit rates used for coding of the core. MPEG Surround adds a side-information stream to the core bit stream, containing spatial image data. Legacy stereo playback systems will ignore this side-information while players supporting MPEG Surround decoding will output the reconstructed multi-channel audio.

Structured Audio Orchestra Language (SAOL) is an imperative, MUSIC-N programming language designed for describing virtual instruments, processing digital audio, and applying sound effects. It was published as subpart 5 of MPEG-4 Part 3 in 1999.

The ISO base media file format (ISOBMFF) is a container file format that defines a general structure for files that contain time-based multimedia data such as video and audio. It is standardized in ISO/IEC 14496-12, a.k.a. MPEG-4 Part 12, and was formerly also published as ISO/IEC 15444-12, a.k.a. JPEG 2000 Part 12.

Part 3 of the MPEG-2 standard defines audio coding:

Unified Speech and Audio Coding (USAC) is an audio compression format and codec for both music and speech or any mix of speech and audio using very low bit rates between 12 and 64 kbit/s. It was developed by Moving Picture Experts Group (MPEG) and was published as an international standard ISO/IEC 23003-3 and also as an MPEG-4 Audio Object Type in ISO/IEC 14496-3:2009/Amd 3 in 2012.

References

↑ ISO (1999). "ISO/IEC 14496-3:1999 - Information technology -- Coding of audio-visual objects -- Part 3: Audio". ISO. Retrieved 2009-10-06.
↑ ISO/IEC JTC 1/SC 29/WG 11 (1998-05-15), ISO/IEC FCD 14496-3 Subpart 5 - Information Technology - Coding of Audiovisual Objects – Low Bitrate Coding of Multimedia Objects, Part 3: Audio, Subpart 5: Structured Audio, Final Committee Draft, N2203SA (PDF), retrieved 2009-10-10{{citation}}: CS1 maint: numeric names: authors list (link)
↑ D. Thom, H. Purnhagen, and the MPEG Audio Subgroup (October 1998). "MPEG Audio FAQ Version 9 - MPEG-4". chiariglione.org. Archived from the original on 2009-09-26. Retrieved 2009-10-06.{{cite web}}: CS1 maint: multiple names: authors list (link)
↑ Heiko Purnhagen (2001-06-01). "The MPEG-4 Audio Standard: Overview and Applications". Heiko Purnhagen. Retrieved 2009-10-07.^{[ dead link ]}
↑ Scheirer, Eric D.; Ray, Lee (1998). "Algorithmic and Wavetable Synthesis in the MPEG-4 Multimedia Standard". Audio Engineering Society Convention 105, 1998. CiteSeerX 10.1.1.35.2773 . 2.2 Wavetable synthesis with SASBF: The SASBF wavetable-bank format had a somewhat complex history of development. The original specification was contributed by E-Mu Systems and was based on their "SoundFont" format [15]. After integration of this component in the MPEG-4 reference software was complete, the MIDI Manufacturers Association (MMA) approached MPEG requesting that MPEG-4 SASBF be compatible with their "Downloaded Sounds" format [13]. E-Mu agreed that this compatibility was desirable, and so a new format was negotiated and designed collaboratively by all parties.
↑ "THE BIG I: MIT Media Lab Turns 25". CNN. Archived from the original on January 2, 2013.

External links

This multimedia software-related article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[mpeg4audio-iso-1999-1] ISO (1999). "ISO/IEC 14496-3:1999 - Information technology -- Coding of audio-visual objects -- Part 3: Audio". ISO. Retrieved 2009-10-06.

[mpeg4audio1-draft-2] ISO/IEC JTC 1/SC 29/WG 11 (1998-05-15), ISO/IEC FCD 14496-3 Subpart 5 - Information Technology - Coding of Audiovisual Objects – Low Bitrate Coding of Multimedia Objects, Part 3: Audio, Subpart 5: Structured Audio, Final Committee Draft, N2203SA (PDF), retrieved 2009-10-10{{citation}}: CS1 maint: numeric names: authors list (link)

[mpeg-audio-faq-3] D. Thom, H. Purnhagen, and the MPEG Audio Subgroup (October 1998). "MPEG Audio FAQ Version 9 - MPEG-4". chiariglione.org. Archived from the original on 2009-09-26. Retrieved 2009-10-06.{{cite web}}: CS1 maint: multiple names: authors list (link)

[mpeg4audio1-4] Heiko Purnhagen (2001-06-01). "The MPEG-4 Audio Standard: Overview and Applications". Heiko Purnhagen. Retrieved 2009-10-07.^{[ dead link ]}

[scheirerray-5] Scheirer, Eric D.; Ray, Lee (1998). "Algorithmic and Wavetable Synthesis in the MPEG-4 Multimedia Standard". Audio Engineering Society Convention 105, 1998. CiteSeerX 10.1.1.35.2773 . 2.2 Wavetable synthesis with SASBF: The SASBF wavetable-bank format had a somewhat complex history of development. The original specification was contributed by E-Mu Systems and was based on their "SoundFont" format [15]. After integration of this component in the MPEG-4 reference software was complete, the MIDI Manufacturers Association (MMA) approached MPEG requesting that MPEG-4 SASBF be compatible with their "Downloaded Sounds" format [13]. E-Mu agreed that this compatibility was desirable, and so a new format was negotiated and designed collaboratively by all parties.

[CNN-6] "THE BIG I: MIT Media Lab Turns 25". CNN. Archived from the original on January 2, 2013.

[1]

[2]

[3]

[4]

[5]

[6]

MPEG-4 Structured Audio

Contents

See also

Related Research Articles

References

External links