Developed by | Xiph.Org Foundation |
---|---|
Type of format | Audio |
Contained by | Ogg |
Extended to | Opus |
Standard | Documentation |
Developer(s) | Xiph.org Foundation, Jean-Marc Valin |
---|---|
Preview release | 0.11.1 / February 15, 2011 |
Type | Audio codec, reference implementation |
License | 2-clause BSD |
Website | opus-codec |
Constrained Energy Lapped Transform (CELT) is an open, royalty-free lossy audio compression format and a free software codec with especially low algorithmic delay for use in low-latency audio communication. The algorithms are openly documented and may be used free of software patent restrictions. Development of the format was maintained by the Xiph.Org Foundation (as part of the Ogg codec family) and later coordinated by the Opus working group of the Internet Engineering Task Force (IETF).
CELT was meant to bridge the gap between Vorbis and Speex for applications where both high quality audio and low delay are desired. [1] It is suitable for both speech and music. It borrows ideas from the CELP algorithm, but avoids some of its limitations by operating in the frequency domain exclusively. [1]
The original stand-alone CELT has been merged into Opus. Therefore, CELT as stand-alone format is now abandoned and obsolete. Development is going on only for its hybridised form as a layer of Opus, integrated with SILK. This article covers the historic, stand-alone format; for the integrated form and its evolution since the integration into Opus see the article on Opus.
CELT's central feature is low algorithmic delay. It allows for latencies of typically 3 to 9 ms but is configurable to below 2 ms at the price of more bitrate to reach a similar audio quality. [2] CELT supports mono and stereo audio and is applicable to both speech and music. It can use a sampling rate from 32 kHz to 48 kHz and above and an adaptive bitrate from 24 kbit/s to 128 kbit/s per channel and above. [2]
There are no known intellectual property issues pertaining to the CELT algorithm, and its reference implementation is published under a permissive open-source license (the 2-clause BSD). [1] [3]
Like Vorbis, CELT is a fullband (entire human hearing range) general-purpose codec, i.e. not specialized for special types of audio signals and therefore different from its sibling project Speex. The format enables for transparent results at high bitrates, as well as very decent quality at lower bitrates. All in all, the compression capabilities are said to be significantly superior to those of MP3, and as another useful feature for realtime applications like telephony, CELT's audio quality at lower bitrates are even on par with HE-AAC v1, thanks to the band folding. [4] [5] In comparative double-blind listening tests it proved to be noticeably superior to HE-AACv1 at ~64 kBit/s. [6]
It has a comparably low computational complexity that resembles that of the low-delay variant of AAC (AAC-LD) and stays significantly below the complexity of Vorbis. [7]
It enables for constant and variable bitrate. If the signal disappears into the noise floor in speech pauses and similar cases, the transmission can be limited to signal the output of comfort noise to the decoder. Most settings of the naturally streaming-enabled format can be changed on the fly without interrupting transmission.
The format is robust to transmission errors. Loss of whole packets as well as bit errors can be masked with a steady degradation of audio quality (packet loss concealment, PLC).
CELT is a transform codec based on the modified discrete cosine transform (MDCT) and concepts from CELP (with a code book for excitation, but in the frequency domain).
The initial PCM-coded signal is handled in relatively small, overlapping blocks for the MDCT (window function) and transformed to frequency coefficients. Choosing an especially short block size on the one hand enables for a low latency, but also leads to poor frequency resolution that has to be compensated. For a further reduction of the algorithmic delay to the expense of a minor sacrifice in audio quality, the by nature 50% of overlap between the blocks is practically cut down to half by silencing the signal during one eight at both ends of a block, respectively. [2]
The coefficients are grouped to resemble the critical bands of the human auditory system. The entire amount of energy of each group is analysed and the values quantised for data reduction and compressed through prediction by only transmitting the difference to the predicted values (delta encoding).
The (unquantised) band energy values are removed from the raw DCT coefficients (normalisation). The coefficients of the resulting residual signal (so-called “band shape”) are coded by Pyramid Vector Quantisation (PVQ, a spherical vector quantisation). [8] This encoding leads to code words of fixed (predictable) length, which in turn enables for robustness against bit errors and leaves no need for entropy encoding. [5] Finally, all output of the encoder are coded to one bitstream by a range encoder. [9] In connection with the PVQ, CELT uses a technique known as band folding, which delivers a similar effect to spectral band replication (SBR) by reusing coefficients of lower bands for higher ones, but has much less impact on the algorithmic delay and computational complexity than the SBR. This works against “birdie” artifacts by preserving more richness in the appropriate frequency bands.
The decoder unpacks the individual components from the range coded bitstream, multiplies the band energy to the band shape coefficients and transforms them back (via iMDCT) to PCM data. The individual blocks are rejoined using weighted overlap-add (WOLA). Many parameters are not explicitly coded, but instead reconstructed by using the same functions as the encoder.
For the channel coupling CELT may use M/S stereo or intensity stereo. Blocks can be described independent from adjacent frames (Intra-frame); for example to enable a decoder to jump into a running stream. With transform codecs so-called pre-echo artifacts can get audible, because the quantisation error of sharp, energy-heavy sounds (transients) can spread over the entire DCT block and the transient doesn't mask them backward in time as well as forward. With CELT each block can be further divided to thwart such artifacts.
First work on plans and drafts for a Vorbis successor was done in 2005 at Xiph.org as part of the Ghost project (initially talked about as “Vorbis II”). This discussion together with Vorbis creator Christopher Montgomery led to Jean-Marc Valin′s interest in a particularly low-latency codec. Valin has worked on CELT since 2007. [5] In December 2007, the first draft version of libcelt was published as version 0.0.1, initially named “Code-Excited Lapped Transform”. [10] [11] CELT was established as an IETF technology in July 2009 [3] [12] [13] [14] under the "ietfcodec" working group. In May 2009, a draft of RTP payload format for the CELT Codec was published. [15]
In version 0.9, the pitch prediction operating in the frequency domain used until then was replaced by a less complex solution with a pre- and postfilter pair in time domain, [16] which was contributed by Raymond Chen of Broadcom. [5]
With CELT 0.11 from February 4, 2011 the format was tentatively frozen (“soft freeze”) – reserving the possibility of unexpectedly necessary last changes.
Shortly after the advent of the CELT/SILK hybrid codec Opus (formerly known as Harmony), the development of CELT as a separate project was halted, instead living on the basis of Opus, [17] which aims to treat the lower part of the spectral range in the time domain with linear prediction (SILK) and the higher part in the frequency domain with the MDCT. The draft for Opus has been registered at the IETF since September 2010.
The software library libcelt serves as the reference implementation for CELT, written in C and published as free software under Xiph's own 3-clause BSD-ish license.
Despite the format not being finally frozen, it was being used in many VoIP applications such as Ekiga [18] and FreeSWITCH, [19] which switched to CELT upon entering soft-freeze in January 2009, as well as Mumble, TeamSpeak and other [20] software. In April 2011, support for CELT was included in FFmpeg. [21] [22]
CELT is also supported or used by: [20]
MP3 is a coding format for digital audio developed largely by the Fraunhofer Society in Germany under the lead of Karlheinz Brandenburg, with support from other digital scientists in other countries. Originally defined as the third audio format of the MPEG-1 standard, it was retained and further extended—defining additional bit rates and support for more audio channels—as the third audio format of the subsequent MPEG-2 standard. A third version, known as MPEG-2.5—extended to better support lower bit rates—is commonly implemented but is not a recognized standard.
Ogg is a free, open container format maintained by the Xiph.Org Foundation. The authors of the Ogg format state that it is unrestricted by software patents and is designed to provide for efficient streaming and manipulation of high-quality digital multimedia. Its name is derived from "ogging", jargon from the computer game Netrek.
Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.
Vorbis is a free and open-source software project headed by the Xiph.Org Foundation. The project produces an audio coding format and software reference encoder/decoder (codec) for lossy audio compression, libvorbis. Vorbis is most commonly used in conjunction with the Ogg container format and it is therefore often referred to as Ogg Vorbis.
Windows Media Audio (WMA) is a series of audio codecs and their corresponding audio coding formats developed by Microsoft. It is a proprietary technology that forms part of the Windows Media framework. WMA consists of four distinct codecs. The original WMA codec, known simply as WMA, was conceived as a competitor to the popular MP3 and RealAudio codecs. WMA Pro, a newer and more advanced codec, supports multichannel and high-resolution audio. A lossless codec, WMA Lossless, compresses audio data without loss of audio fidelity. WMA Voice, targeted at voice content, applies compression using a range of low bit rates. Microsoft has also developed a digital container format called Advanced Systems Format to store audio encoded by WMA.
Adaptive Transform Acoustic Coding (ATRAC) is a family of proprietary audio compression algorithms developed by Sony. MiniDisc was the first commercial product to incorporate ATRAC, in 1992. ATRAC allowed a relatively small disc like MiniDisc to have the same running time as CD while storing audio information with minimal perceptible loss in quality. Improvements to the codec in the form of ATRAC3, ATRAC3plus, and ATRAC Advanced Lossless followed in 1999, 2002, and 2006 respectively.
Speex is an audio compression codec specifically tuned for the reproduction of human speech and also a free software speech codec that may be used on voice over IP applications and podcasts. It is based on the code excited linear prediction speech coding algorithm. Its creators claim Speex to be free of any patent restrictions and it is licensed under the revised (3-clause) BSD license. It may be used with the Ogg container format or directly transmitted over UDP/RTP. It may also be used with the FLV container format.
FLAC is an audio coding format for lossless compression of digital audio, developed by the Xiph.Org Foundation, and is also the name of the free software project producing the FLAC tools, the reference software package that includes a codec implementation. Digital audio compressed by FLAC's algorithm can typically be reduced to between 50 and 70 percent of its original size and decompresses to an identical copy of the original audio data.
Theora is a free lossy video compression format. It was developed by the Xiph.Org Foundation and distributed without licensing fees alongside their other free and open media projects, including the Vorbis audio format and the Ogg container.
Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. It was designed to be the successor of the MP3 format and generally achieves higher sound quality than MP3 at the same bit rate.
Xiph.Org Foundation is a nonprofit organization that produces free multimedia formats and software tools. It focuses on the Ogg family of formats, the most successful of which has been Vorbis, an open and freely licensed audio format and codec designed to compete with the patented WMA, MP3 and AAC. As of 2013, development work was focused on Daala, an open and patent-free video format and codec designed to compete with VP9 and the patented High Efficiency Video Coding.
Spectral band replication (SBR) is a technology to enhance audio or speech codecs, especially at low bit rates and is based on harmonic redundancy in the frequency domain.
G.722.1 is a licensed royalty-free ITU-T standard audio codec providing high quality, moderate bit rate wideband (50 Hz – 7 kHz audio bandwidth, 16 ksps audio coding. It is a partial implementation of Siren 7 audio coding format developed by PictureTel Corp.. Its official name is Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss. It uses a modified discrete cosine transform audio data compression algorithm.
In audio engineering, joint encoding refers to a joining of several channels of similar information during encoding in order to obtain higher quality, a smaller file size, or both.
Opus is a lossy audio coding format developed by the Xiph.Org Foundation and standardized by the Internet Engineering Task Force, designed to efficiently code speech and general audio in a single format, while remaining low-latency enough for real-time interactive communication and low-complexity enough for low-end embedded processors. Opus replaces both Vorbis and Speex for new applications, and several blind listening tests have ranked it higher-quality than any other standard audio format at any given bitrate until transparency is reached, including MP3, AAC, and HE-AAC.
Codec 2 is a low-bitrate speech audio codec that is patent free and open source. Codec 2 compresses speech using sinusoidal coding, a method specialized for human speech. Bit rates of 3200 to 450 bit/s have been successfully created. Codec 2 was designed to be used for amateur radio and other high compression voice applications.
Daala is a video coding format under development by the Xiph.Org Foundation under the lead of Timothy B. Terriberry mainly sponsored by the Mozilla Corporation. Like Theora and Opus, Daala is available free of any royalties and its reference implementation is being developed as free and open-source software. The name is taken from the fictional character of Admiral Natasi Daala from the Star Wars universe.
An audio coding format is a content representation format for storage or transmission of digital audio. Examples of audio coding formats include MP3, AAC, Vorbis, FLAC, and Opus. A specific software or hardware implementation capable of audio compression and decompression to/from a specific audio coding format is called an audio codec; an example of an audio codec is LAME, which is one of several different codecs which implements encoding and decoding audio in the MP3 audio coding format in software.