Lapped transform

Last updated

In signal processing, a lapped transform is a type of linear discrete block transformation where the basis functions of the transformation overlap the block boundaries, yet the number of coefficients overall resulting from a series of overlapping block transforms remains the same as if a non-overlapping block transform had been used. [1] [2] [3] [4]

Lapped transforms substantially reduce the blocking artifacts that otherwise occur with block transform coding techniques, in particular those using the discrete cosine transform. The best known example is the modified discrete cosine transform used in the MP3, Vorbis, AAC, and Opus audio codecs. [5]

Although the best-known application of lapped transforms has been for audio coding, they have also been used for video and image coding and various other applications. They are used in video coding for coding I-frames in VC-1 and for image coding in the JPEG XR format. More recently, a form of lapped transform has also been used in the development of the Daala video coding format. [5]

Related Research Articles

Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions. The energy contained in audio signals or sound level is typically measured in decibels. As audio signals may be represented in either digital or analog format, processing may occur in either domain. Analog processors operate directly on the electrical signal, while digital processors operate mathematically on its digital representation.

In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder.

Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a sequence of numbers that represent samples of a continuous variable in a domain such as time, space, or frequency. In digital electronics, a digital signal is represented as a pulse train, which is typically generated by the switching of a transistor.

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

<span class="mw-page-title-main">Image compression</span> Reduction of image size to save storage and transmission costs

Image compression is a type of data compression applied to digital images, to reduce their cost for storage or transmission. Algorithms may take advantage of visual perception and the statistical properties of image data to provide superior results compared with generic data compression methods which are used for other digital data.

Transform coding is a type of data compression for "natural" data like audio signals or photographic images. The transformation is typically lossless on its own but is used to enable better quantization, which then results in a lower quality copy of the original input.

<span class="mw-page-title-main">Motion compensation</span> Video compression technique, used to efficiently predict and generate video frames

Motion compensation in computing, is an algorithmic technique used to predict a frame in a video, given the previous and/or future frames by accounting for motion of the camera and/or objects in the video. It is employed in the encoding of video data for video compression, for example in the generation of MPEG-2 files. Motion compensation describes a picture in terms of the transformation of a reference picture to the current picture. The reference picture may be previous in time or even from the future. When images can be accurately synthesized from previously transmitted/stored images, the compression efficiency can be improved.

A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. The DCT, first proposed by Nasir Ahmed in 1972, is a widely used transformation technique in signal processing and data compression. It is used in most digital media, including digital images, digital video, digital audio, digital television, digital radio, and speech coding. DCTs are also important to numerous other applications in science and engineering, such as digital signal processing, telecommunication devices, reducing network bandwidth usage, and spectral methods for the numerical solution of partial differential equations.

The modified discrete cosine transform (MDCT) is a transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset, where subsequent blocks are overlapped so that the last half of one block coincides with the first half of the next block. This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid artifacts stemming from the block boundaries. As a result of these advantages, the MDCT is the most widely used lossy compression technique in audio data compression. It is employed in most modern audio coding standards, including MP3, Dolby Digital (AC-3), Vorbis (Ogg), Windows Media Audio (WMA), ATRAC, Cook, Advanced Audio Coding (AAC), High-Definition Coding (HDC), LDAC, Dolby AC-4, and MPEG-H 3D Audio, as well as speech coding standards such as AAC-LD (LD-MDCT), G.722.1, G.729.1, CELT, and Opus.

Welch's method, named after Peter D. Welch, is an approach for spectral density estimation. It is used in physics, engineering, and applied mathematics for estimating the power of a signal at different frequencies. The method is based on the concept of using periodogram spectrum estimates, which are the result of converting a signal from the time domain to the frequency domain. Welch's method is an improvement on the standard periodogram spectrum estimating method and on Bartlett's method, in that it reduces noise in the estimated power spectra in exchange for reducing the frequency resolution. Due to the noise caused by imperfect and finite data, the noise reduction from Welch's method is often desired.

<span class="mw-page-title-main">Wavelet transform</span> Mathematical technique used in data compression and analysis

In mathematics, a wavelet series is a representation of a square-integrable function by a certain orthonormal series generated by a wavelet. This article provides a formal, mathematical definition of an orthonormal wavelet and of the integral wavelet transform.

JPEG XR is an image compression standard for continuous tone photographic images, based on the HD Photo specifications that Microsoft originally developed and patented. It supports both lossy and lossless compression, and is the preferred image format for Ecma-388 Open XML Paper Specification documents.

<span class="mw-page-title-main">K. R. Rao</span> Indian-American electrical engineer

Kamisetty Ramamohan Rao was an Indian-American electrical engineer. He was a professor of Electrical Engineering at the University of Texas at Arlington. Academically known as K. R. Rao, he is credited with the co-invention of discrete cosine transform (DCT), along with Nasir Ahmed and T. Natarajan due to their landmark publication, Discrete Cosine Transform.

The modulated complex lapped transform (MCLT) is a lapped transform, similar to the modified discrete cosine transform, that explicitly represents the phase (complex values) of the signal.

Constrained Energy Lapped Transform (CELT) is an open, royalty-free lossy audio compression format and a free software codec with especially low algorithmic delay for use in low-latency audio communication. The algorithms are openly documented and may be used free of software patent restrictions. Development of the format was maintained by the Xiph.Org Foundation and later coordinated by the Opus working group of the Internet Engineering Task Force (IETF).

A video coding format is a content representation format for storage or transmission of digital video content. It typically uses a standardized video compression algorithm, most commonly based on discrete cosine transform (DCT) coding and motion compensation. A specific software, firmware, or hardware implementation capable of compression or decompression to/from a specific video coding format is called a video codec.

<span class="mw-page-title-main">Nasir Ahmed (engineer)</span> Indian-American electrical engineer and computer scientist

Nasir Ahmed is an Indian-American electrical engineer and computer scientist. He is Professor Emeritus of Electrical and Computer Engineering at University of New Mexico (UNM). He is best known for inventing the discrete cosine transform (DCT) in the early 1970s. The DCT is the most widely used data compression transformation, the basis for most digital media standards and commonly used in digital signal processing. He also described the discrete sine transform (DST), which is related to the DCT.

Daala is a video coding format under development by the Xiph.Org Foundation under the lead of Timothy B. Terriberry mainly sponsored by the Mozilla Corporation. Like Theora and Opus, Daala is available free of any royalties and its reference implementation is being developed as free and open-source software. The name is taken from the fictional character of Admiral Natasi Daala from the Star Wars universe.

<span class="mw-page-title-main">Audio coding format</span> Digitally coded format for audio signals

An audio coding format is a content representation format for storage or transmission of digital audio. Examples of audio coding formats include MP3, AAC, Vorbis, FLAC, and Opus. A specific software or hardware implementation capable of audio compression and decompression to/from a specific audio coding format is called an audio codec; an example of an audio codec is LAME, which is one of several different codecs which implements encoding and decoding audio in the MP3 audio coding format in software.

<span class="mw-page-title-main">Sergio Barbarossa</span> Italian professor, engineer and inventor

Sergio Barbarossa is an Italian professor, engineer and inventor. He is a professor at Sapienza University of Rome, Italy.

References

  1. Malvar, H. S. (1992). "Signal Processing with Lapped Transforms" (Document). Artech House.
  2. de Queiroz, Ricardo L. "On Lapped Transforms". CiteSeerX   10.1.1.91.7148 . Retrieved August 20, 2023.
  3. Malvar, H. S. (November 1992). "Extended Lapped Transforms: Properties, Applications, and Fast Algorithms" (PDF). IEEE Transactions on Signal Processing. 40 (11): 2703–2714. Bibcode:1992ITSP...40.2703M. doi:10.1109/78.165657.
  4. Tran, Trac D.; Liang, Jie; Tu, Chengjie (June 2003). "Lapped Transform via Time-Domain Pre- and Post-Filtering" (PDF). IEEE Transactions on Signal Processing. 51 (6): 1557. Bibcode:2003ITSP...51.1557T. doi:10.1109/TSP.2003.811222. Archived from the original (PDF) on 2016-03-04. Retrieved 2013-06-22.
  5. 1 2 "Next generation video: Introducing Daala". xiph.org. June 20, 2013.