Audio coding format

Last updated
Comparison of coding efficiency between popular audio formats Opus quality comparison colorblind compatible.svg
Comparison of coding efficiency between popular audio formats

An audio coding format [1] (or sometimes audio compression format) is a content representation format for storage or transmission of digital audio (such as in digital television, digital radio and in audio and video files). Examples of audio coding formats include MP3, AAC, Vorbis, FLAC, and Opus. A specific software or hardware implementation capable of audio compression and decompression to/from a specific audio coding format is called an audio codec ; an example of an audio codec is LAME, which is one of several different codecs which implements encoding and decoding audio in the MP3 audio coding format in software.

Contents

Some audio coding formats are documented by a detailed technical specification document known as an audio coding specification. Some such specifications are written and approved by standardization organizations as technical standards, and are thus known as an audio coding standard. The term "standard" is also sometimes used for de facto standards as well as formal standards.

Audio content encoded in a particular audio coding format is normally encapsulated within a container format. As such, the user normally doesn't have a raw AAC file, but instead has a .m4a audio file, which is a MPEG-4 Part 14 container containing AAC-encoded audio. The container also contains metadata such as title and other tags, and perhaps an index for fast seeking. [2] A notable exception is MP3 files, which are raw audio coding without a container format. De facto standards for adding metadata tags such as title and artist to MP3s, such as ID3, are hacks which work by appending the tags to the MP3, and then relying on the MP3 player to recognize the chunk as malformed audio coding and therefore skip it. In video files with audio, the encoded audio content is bundled with video (in a video coding format) inside a multimedia container format.

An audio coding format does not dictate all algorithms used by a codec implementing the format. An important part of how lossy audio compression works is by removing data in ways humans can't hear, according to a psychoacoustic model; the implementer of an encoder has some freedom of choice in which data to remove (according to their psychoacoustic model).

Lossless, lossy, and uncompressed audio coding formats

A lossless audio coding format reduces the total data needed to represent a sound but can be de-coded to its original, uncompressed form. A lossy audio coding format additionally reduces the bit resolution of the sound on top of compression, which results in far less data at the cost of irretrievably lost information.

Transmitted (streamed) audio is most often compressed using lossy audio codecs as the smaller size is far more convenient for distribution. The most widely used audio coding formats are MP3 and Advanced Audio Coding (AAC), both of which are lossy formats based on modified discrete cosine transform (MDCT) and perceptual coding algorithms.

Lossless audio coding formats such as FLAC and Apple Lossless are sometimes available, though at the cost of larger files.

Uncompressed audio formats, such as pulse-code modulation (PCM, or .wav), are also sometimes used. PCM was the standard format for Compact Disc Digital Audio (CDDA).

History

Solidyne 922: The world's first commercial audio bit compression sound card for PC, 1990 Placa-audioPC-925.jpg
Solidyne 922: The world's first commercial audio bit compression sound card for PC, 1990

In 1950, Bell Labs filed the patent on differential pulse-code modulation (DPCM). [3] Adaptive DPCM (ADPCM) was introduced by P. Cummiskey, Nikil S. Jayant and James L. Flanagan at Bell Labs in 1973. [4] [5]

Perceptual coding was first used for speech coding compression, with linear predictive coding (LPC). [6] Initial concepts for LPC date back to the work of Fumitada Itakura (Nagoya University) and Shuzo Saito (Nippon Telegraph and Telephone) in 1966. [7] During the 1970s, Bishnu S. Atal and Manfred R. Schroeder at Bell Labs developed a form of LPC called adaptive predictive coding (APC), a perceptual coding algorithm that exploited the masking properties of the human ear, followed in the early 1980s with the code-excited linear prediction (CELP) algorithm which achieved a significant compression ratio for its time. [6] Perceptual coding is used by modern audio compression formats such as MP3 [6] and AAC.

Discrete cosine transform (DCT), developed by Nasir Ahmed, T. Natarajan and K. R. Rao in 1974, [8] provided the basis for the modified discrete cosine transform (MDCT) used by modern audio compression formats such as MP3 [9] and AAC. MDCT was proposed by J. P. Princen, A. W. Johnson and A. B. Bradley in 1987, [10] following earlier work by Princen and Bradley in 1986. [11] The MDCT is used by modern audio compression formats such as Dolby Digital, [12] [13] MP3, [9] and Advanced Audio Coding (AAC). [14]

List of lossy formats

General

Basic compression algorithmAudio coding standardAbbreviationIntroductionMarket share (2019) [15] Ref
Modified discrete cosine transform (MDCT) Dolby Digital (AC-3)AC3199158% [12] [16]
Adaptive Transform Acoustic Coding ATRAC1992Un­known [12]
MPEG Layer III MP3199349% [9] [17]
Advanced Audio Coding (MPEG-2 / MPEG-4)AAC199788% [14] [12]
Windows Media Audio WMA1999Un­known [12]
Ogg Vorbis Ogg20007% [18] [12]
Constrained Energy Lapped Transform CELT2011 [19]
Opus Opus20128% [20]
LDAC LDAC2015Un­known [21] [22]
Adaptive differential pulse-code modulation (ADPCM) aptX / aptX-HD aptX1989Un­known [23]
Digital Theater Systems DTS199014% [24] [25]
Master Quality Authenticated MQA2014Un­known
Sub-band coding (SBC) MPEG-1 Audio Layer II MP21993Un­known
Musepack MPC1997

Speech

List of lossless formats

See also

Related Research Articles

An audio file format is a file format for storing digital audio data on a computer system. The bit layout of the audio data is called the audio coding format and can be uncompressed, or compressed to reduce the file size, often using lossy compression. The data can be a raw bitstream in an audio coding format, but it is usually embedded in a container format or an audio data format with defined storage layer.

Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions. The energy contained in audio signals or sound power level is typically measured in decibels. As audio signals may be represented in either digital or analog format, processing may occur in either domain. Analog processors operate directly on the electrical signal, while digital processors operate mathematically on its digital representation.

A codec is a device or computer program that encodes or decodes a data stream or signal. Codec is a portmanteau of coder/decoder.

In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder.

<span class="mw-page-title-main">Lossy compression</span> Data compression approach that reduces data size while discarding or changing some of it

In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size for storing, handling, and transmitting content. The different versions of the photo of the cat on this page show how higher degrees of approximation create coarser images as more details are removed. This is opposed to lossless data compression which does not degrade the data. The amount of data reduction possible using lossy compression is much higher than using lossless techniques.

<span class="mw-page-title-main">MP3</span> Digital audio format

MP3 is a coding format for digital audio developed largely by the Fraunhofer Society in Germany under the lead of Karlheinz Brandenburg, with support from other digital scientists in other countries. Originally defined as the third audio format of the MPEG-1 standard, it was retained and further extended — defining additional bit-rates and support for more audio channels — as the third audio format of the subsequent MPEG-2 standard. A third version, known as MPEG-2.5 — extended to better support lower bit rates — is commonly implemented, but is not a recognized standard.

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

Windows Media Audio (WMA) is a series of audio codecs and their corresponding audio coding formats developed by Microsoft. It is a proprietary technology that forms part of the Windows Media framework. WMA consists of four distinct codecs. The original WMA codec, known simply as WMA, was conceived as a competitor to the popular MP3 and RealAudio codecs. WMA Pro, a newer and more advanced codec, supports multichannel and high resolution audio. A lossless codec, WMA Lossless, compresses audio data without loss of audio fidelity. WMA Voice, targeted at voice content, applies compression using a range of low bit rates. Microsoft has also developed a digital container format called Advanced Systems Format to store audio encoded by WMA.

Adaptive Transform Acoustic Coding (ATRAC) is a family of proprietary audio compression algorithms developed by Sony. MiniDisc was the first commercial product to incorporate ATRAC, in 1992. ATRAC allowed a relatively small disc like MiniDisc to have the same running time as CD while storing audio information with minimal perceptible loss in quality. Improvements to the codec in the form of ATRAC3, ATRAC3plus, and ATRAC Advanced Lossless followed in 1999, 2002, and 2006 respectively.

Transform coding is a type of data compression for "natural" data like audio signals or photographic images. The transformation is typically lossless on its own but is used to enable better quantization, which then results in a lower quality copy of the original input.

<span class="mw-page-title-main">Digital audio</span> Technology that records, stores, and reproduces sound

Digital audio is a representation of sound recorded in, or converted into, digital form. In digital audio, the sound wave of the audio signal is typically encoded as numerical samples in a continuous sequence. For example, in CD audio, samples are taken 44,100 times per second, each with 16-bit sample depth. Digital audio is also the name for the entire technology of sound recording and reproduction using audio signals that have been encoded in digital form. Following significant advances in digital audio technology during the 1970s and 1980s, it gradually replaced analog audio technology in many areas of audio engineering, record production and telecommunications in the 1990s and 2000s.

A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. The DCT, first proposed by Nasir Ahmed in 1972, is a widely used transformation technique in signal processing and data compression. It is used in most digital media, including digital images, digital video, digital audio, digital television, digital radio, and speech coding. DCTs are also important to numerous other applications in science and engineering, such as digital signal processing, telecommunication devices, reducing network bandwidth usage, and spectral methods for the numerical solution of partial differential equations.

Dolby Digital, originally synonymous with Dolby AC-3, is the name for a family of audio compression technologies developed by Dolby Laboratories. Called Dolby Stereo Digital until 1995, it is lossy compression. The first use of Dolby Digital was to provide digital sound in cinemas from 35 mm film prints. It has since also been used for TV broadcast, radio broadcast via satellite, digital video streaming, DVDs, Blu-ray discs and game consoles.

Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. It was designed to be the successor of the MP3 format and generally achieves higher sound quality than MP3 at the same bit rate.

The modified discrete cosine transform (MDCT) is a transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset, where subsequent blocks are overlapped so that the last half of one block coincides with the first half of the next block. This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid artifacts stemming from the block boundaries. As a result of these advantages, the MDCT is the most widely used lossy compression technique in audio data compression. It is employed in most modern audio coding standards, including MP3, Dolby Digital (AC-3), Vorbis (Ogg), Windows Media Audio (WMA), ATRAC, Cook, Advanced Audio Coding (AAC), High-Definition Coding (HDC), LDAC, Dolby AC-4, and MPEG-H 3D Audio, as well as speech coding standards such as AAC-LD (LD-MDCT), G.722.1, G.729.1, CELT, and Opus.

Constrained Energy Lapped Transform (CELT) is an open, royalty-free lossy audio compression format and a free software codec with especially low algorithmic delay for use in low-latency audio communication. The algorithms are openly documented and may be used free of software patent restrictions. Development of the format was maintained by the Xiph.Org Foundation and later coordinated by the Opus working group of the Internet Engineering Task Force (IETF).

A video coding format is a content representation format of digital video content, such as in a data file or bitstream. It typically uses a standardized video compression algorithm, most commonly based on discrete cosine transform (DCT) coding and motion compensation. A specific software, firmware, or hardware implementation capable of compression or decompression in a specific video coding format is called a video codec.

<span class="mw-page-title-main">Nasir Ahmed (engineer)</span> Indian-American electrical engineer and computer scientist

Nasir Ahmed is an Indian-American electrical engineer and computer scientist. He is Professor Emeritus of Electrical and Computer Engineering at University of New Mexico (UNM). He is best known for inventing the discrete cosine transform (DCT) in the early 1970s. The DCT is the most widely used data compression transformation, the basis for most digital media standards and commonly used in digital signal processing. He also described the discrete sine transform (DST), which is related to the DCT.

References

  1. The term "audio coding" can be seen in e.g. the name Advanced Audio Coding, and is analogous to the term video coding
  2. "Video – Where is synchronization information stored in container formats?".
  3. USpatent 2605361,C. Chapin Cutler,"Differential Quantization of Communication Signals",issued 1952-07-29
  4. Cummiskey, P.; Jayant, N. S.; Flanagan, J. L. (1973). "Adaptive Quantization in Differential PCM Coding of Speech". Bell System Technical Journal. 52 (7): 1105–1118. doi:10.1002/j.1538-7305.1973.tb02007.x.
  5. Cummiskey, P.; Jayant, Nikil S.; Flanagan, J. L. (1973). "Adaptive quantization in differential PCM coding of speech". The Bell System Technical Journal. 52 (7): 1105–1118. doi:10.1002/j.1538-7305.1973.tb02007.x. ISSN   0005-8580.
  6. 1 2 3 Schroeder, Manfred R. (2014). "Bell Laboratories". Acoustics, Information, and Communication: Memorial Volume in Honor of Manfred R. Schroeder. Springer. p. 388. ISBN   9783319056609.
  7. Gray, Robert M. (2010). "A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol" (PDF). Found. Trends Signal Process. 3 (4): 203–303. doi: 10.1561/2000000036 . ISSN   1932-8346.
  8. Nasir Ahmed; T. Natarajan; Kamisetty Ramamohan Rao (January 1974). "Discrete Cosine Transform" (PDF). IEEE Transactions on Computers. C-23 (1): 90–93. doi:10.1109/T-C.1974.223784. S2CID   149806273. Archived from the original (PDF) on 2016-12-08. Retrieved 2019-10-20.
  9. 1 2 3 Guckert, John (Spring 2012). "The Use of FFT and MDCT in MP3 Audio Compression" (PDF). University of Utah . Retrieved 14 July 2019.
  10. Princen, J.; Johnson, A.; Bradley, A. (1987). "Subband/Transform coding using filter bank designs based on time domain aliasing cancellation". ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol. 12. pp. 2161–2164. doi:10.1109/ICASSP.1987.1169405. S2CID   58446992.
  11. Princen, J.; Bradley, A. (1986). "Analysis/Synthesis filter bank design based on time domain aliasing cancellation". IEEE Transactions on Acoustics, Speech, and Signal Processing. 34 (5): 1153–1161. doi:10.1109/TASSP.1986.1164954.
  12. 1 2 3 4 5 6 Luo, Fa-Long (2008). Mobile Multimedia Broadcasting Standards: Technology and Practice. Springer Science & Business Media. p. 590. ISBN   9780387782638.
  13. Britanak, V. (2011). "On Properties, Relations, and Simplified Implementation of Filter Banks in the Dolby Digital (Plus) AC-3 Audio Coding Standards". IEEE Transactions on Audio, Speech, and Language Processing. 19 (5): 1231–1241. doi:10.1109/TASL.2010.2087755. S2CID   897622.
  14. 1 2 Brandenburg, Karlheinz (1999). "MP3 and AAC Explained" (PDF). Archived (PDF) from the original on 2017-02-13.
  15. "Video Developer Report 2019" (PDF). Bitmovin . 2019. Retrieved 5 November 2019.
  16. Britanak, V. (2011). "On Properties, Relations, and Simplified Implementation of Filter Banks in the Dolby Digital (Plus) AC-3 Audio Coding Standards". IEEE Transactions on Audio, Speech, and Language Processing. 19 (5): 1231–1241. doi:10.1109/TASL.2010.2087755. S2CID   897622.
  17. Stanković, Radomir S.; Astola, Jaakko T. (2012). "Reminiscences of the Early Work in DCT: Interview with K.R. Rao" (PDF). Reprints from the Early Days of Information Sciences. 60. Retrieved 13 October 2019.
  18. Xiph.Org Foundation (2009-06-02). "Vorbis I specification - 1.1.2 Classification". Xiph.Org Foundation. Retrieved 2009-09-22.
  19. Terriberry, Timothy B. Presentation of the CELT codec. Presentation (PDF).
  20. Valin, Jean-Marc; Maxwell, Gregory; Terriberry, Timothy B.; Vos, Koen (October 2013). High-Quality, Low-Delay Music Coding in the Opus Codec. 135th AES Convention. Audio Engineering Society. arXiv: 1602.04845 .
  21. Darko, John H. (2017-03-29). "The inconvenient truth about Bluetooth audio". DAR__KO. Archived from the original on 2018-01-14. Retrieved 2018-01-13.
  22. Ford, Jez (2015-08-24). "What is Sony LDAC, and how does it do it?". AVHub. Retrieved 2018-01-13.
  23. Ford, Jez (2016-11-22). "aptX HD - lossless or lossy?". AVHub. Retrieved 2018-01-13.
  24. "Digital Theater Systems Audio Formats". Library of Congress . 27 December 2011. Retrieved 10 November 2019.
  25. Spanias, Andreas; Painter, Ted; Atti, Venkatraman (2006). Audio Signal Processing and Coding. John Wiley & Sons. p. 338. ISBN   9780470041963.