Quantization (image processing)

Last updated December 06, 2024

Quantization, involved in image processing, is a lossy compression technique achieved by compressing a range of values to a single quantum (discrete) value. When the number of discrete symbols in a given stream is reduced, the stream becomes more compressible. For example, reducing the number of colors required to represent a digital image makes it possible to reduce its file size. Specific applications include DCT data quantization in JPEG and DWT data quantization in JPEG 2000.

Color quantization

Color quantization reduces the number of colors used in an image; this is important for displaying images on devices that support a limited number of colors and for efficiently compressing certain kinds of images. Most bitmap editors and many operating systems have built-in support for color quantization. Popular modern color quantization algorithms include the nearest color algorithm (for fixed palettes), the median cut algorithm, and an algorithm based on octrees.

It is common to combine color quantization with dithering to create an impression of a larger number of colors and eliminate banding artifacts.

Grayscale quantization

Grayscale quantization, also known as gray level quantization, is a process in digital image processing that involves reducing the number of unique intensity levels (shades of gray) in an image while preserving its essential visual information. This technique is commonly used for simplifying images, reducing storage requirements, and facilitating processing operations. In grayscale quantization, an image with N intensity levels is converted into an image with a reduced number of levels, typically L levels, where L<N. The process involves mapping each pixel's original intensity value to one of the new intensity levels. One of the simplest methods of grayscale quantization is uniform quantization, where the intensity range is divided into equal intervals, and each interval is represented by a single intensity value. Let's say we have an image with intensity levels ranging from 0 to 255 (8-bit grayscale). If we want to quantize it to 4 levels, the intervals would be [0-63], [64-127], [128-191], and [192-255]. Each interval would be represented by the midpoint intensity value, resulting in intensity levels of 31, 95, 159, and 223 respectively.

The formula for uniform quantization is:

$Q(x)=\left\lfloor {\frac {x}{\Delta }}\right\rfloor \times \Delta +{\frac {\Delta }{2}}$ Where:

Q(x) is the quantized intensity value.
x is the original intensity value.
Δ is the size of each quantization interval.

Let's quantize an original intensity value of 147 to 3 intensity levels.

Original intensity value: x=147

Desired intensity levels: L=3

We first need to calculate the size of each quantization interval:

$\Delta ={\frac {255}{L-1}}={\frac {255}{3-1}}=127.5$

Using the uniform quantization formula:

$Q(x)=\left\lfloor {\frac {147}{127.5}}\right\rfloor \times 127.5+{\frac {127.5}{2}}$

$Q(x)=\left\lfloor 1.15294118\right\rfloor \times 127.5+{\frac {127.5}{2}}$

$Q(x)=1\times 127.5+63.75=191.25$

Rounding 191.25 to the nearest integer, we get $Q(x)=191$

So, the quantized intensity value of 147 to 3 levels is 191.

Frequency quantization for image compression

The human eye is fairly good at seeing small differences in brightness over a relatively large area, but not so good at distinguishing the exact strength of a high frequency (rapidly varying) brightness variation. This fact allows one to reduce the amount of information required by ignoring the high frequency components. This is done by simply dividing each component in the frequency domain by a constant for that component, and then rounding to the nearest integer. This is the main lossy operation in the whole process. As a result of this, it is typically the case that many of the higher frequency components are rounded to zero, and many of the rest become small positive or negative numbers.

As human vision is also more sensitive to luminance than chrominance, further compression can be obtained by working in a non-RGB color space which separates the two (e.g., YCbCr), and quantizing the channels separately.^[1]

Quantization matrices

A typical video codec works by breaking the picture into discrete blocks (8×8 pixels in the case of MPEG^[1]). These blocks can then be subjected to discrete cosine transform (DCT) to calculate the frequency components, both horizontally and vertically.^[1] The resulting block (the same size as the original block) is then pre-multiplied by the quantization scale code and divided element-wise by the quantization matrix, and rounding each resultant element. The quantization matrix is designed to provide more resolution to more perceivable frequency components over less perceivable components (usually lower frequencies over high frequencies) in addition to transforming as many components to 0, which can be encoded with greatest efficiency. Many video encoders (such as DivX, Xvid, and 3ivx) and compression standards (such as MPEG-2 and H.264/AVC) allow custom matrices to be used. The extent of the reduction may be varied by changing the quantizer scale code, taking up much less bandwidth than a full quantizer matrix.^[1]

This is an example of DCT coefficient matrix:

{\begin{bmatrix}-415&-33&-58&35&58&-51&-15&-12\\5&-34&49&18&27&1&-5&3\\-46&14&80&-35&-50&19&7&-18\\-53&21&34&-20&2&34&36&12\\9&-2&9&-5&-32&-15&45&37\\-8&15&-16&7&-8&11&4&7\\19&-28&-2&-26&-2&7&-44&-21\\18&25&-12&-44&35&48&-37&-3\end{bmatrix}}

A common quantization matrix is:

{\begin{bmatrix}16&11&10&16&24&40&51&61\\12&12&14&19&26&58&60&55\\14&13&16&24&40&57&69&56\\14&17&22&29&51&87&80&62\\18&22&37&56&68&109&103&77\\24&35&55&64&81&104&113&92\\49&64&78&87&103&121&120&101\\72&92&95&98&112&100&103&99\end{bmatrix}}

Dividing the DCT coefficient matrix element-wise with this quantization matrix, and rounding to integers results in:

{\begin{bmatrix}-26&-3&-6&2&2&-1&0&0\\0&-3&4&1&1&0&0&0\\-3&1&5&-1&-1&0&0&0\\-4&1&2&-1&0&0&0&0\\1&0&0&0&0&0&0&0\\0&0&0&0&0&0&0&0\\0&0&0&0&0&0&0&0\\0&0&0&0&0&0&0&0\end{bmatrix}}

For example, using −415 (the DC coefficient) and rounding to the nearest integer

\mathrm {round} \left({\frac {-415}{16}}\right)=\mathrm {round} \left(-25.9375\right)=-26

Typically this process will result in matrices with values primarily in the upper left (low frequency) corner. By using a zig-zag ordering to group the non-zero entries and run length encoding, the quantized matrix can be much more efficiently stored than the non-quantized version.^[1]

Related Research Articles

JPEG is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality. JPEG typically achieves 10:1 compression with little perceptible loss in image quality. Since its introduction in 1992, JPEG has been the most widely used image compression standard in the world, and the most widely used digital image format, with several billion JPEG images produced every day as of 2015.

MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s without excessive quality loss, making video CDs, digital cable/satellite TV and digital audio broadcasting (DAB) practical.

In mathematics, the Haar wavelet is a sequence of rescaled "square-shaped" functions which together form a wavelet family or basis. Wavelet analysis is similar to Fourier analysis in that it allows a target function over an interval to be represented in terms of an orthonormal basis. The Haar sequence is now recognised as the first known wavelet basis and is extensively used as a teaching example.

A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. The DCT, first proposed by Nasir Ahmed in 1972, is a widely used transformation technique in signal processing and data compression. It is used in most digital media, including digital images, digital video, digital audio, digital television, digital radio, and speech coding. DCTs are also important to numerous other applications in science and engineering, such as digital signal processing, telecommunication devices, reducing network bandwidth usage, and spectral methods for the numerical solution of partial differential equations.

Y′UV, also written YUV, is the color model found in the PAL analogue color TV standard. A color is described as a Y′ component (luma) and two chroma components U and V. The prime symbol (') denotes that the luma is calculated from gamma-corrected RGB input and that it is different from true luminance. Today, the term YUV is commonly used in the computer industry to describe colorspaces that are encoded using YCbCr.

Digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allows a much wider range of algorithms to be applied to the input data and can avoid problems such as the build-up of noise and distortion during processing. Since images are defined over two dimensions digital image processing may be modeled in the form of multidimensional systems. The generation and development of digital image processing are mainly affected by three factors: first, the development of computers; second, the development of mathematics ; third, the demand for a wide range of applications in environment, agriculture, military, industry and medical science has increased.

In physics and mathematics, the Lorentz group is the group of all Lorentz transformations of Minkowski spacetime, the classical and quantum setting for all (non-gravitational) physical phenomena. The Lorentz group is named for the Dutch physicist Hendrik Lorentz.

Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a (countable) smaller set, often with a finite number of elements. Rounding and truncation are typical examples of quantization processes. Quantization is involved to some degree in nearly all digital signal processing, as the process of representing a signal in digital form ordinarily involves rounding. Quantization also forms the core of essentially all lossy compression algorithms.

<span class="mw-page-title-main">Grayscale</span> Image where each pixels intensity is shown only achromatic values of black, gray, and white

In digital photography, computer-generated imagery, and colorimetry, a greyscale or grayscale image is one in which the value of each pixel is a single sample representing only an amount of light; that is, it carries only intensity information. Grayscale images, are black-and-white or gray monochrome, and composed exclusively of shades of gray. The contrast ranges from black at the weakest intensity to white at the strongest.

<span class="mw-page-title-main">YCbCr</span> Family of digital colour spaces

YCbCr, Y′CbCr, or Y Pb/Cb Pr/Cr, also written as YC_BC_R or Y′C_BC_R, is a family of color spaces used as a part of the color image pipeline in video and digital photography systems. Y′ is the luma component and C_B and C_R are the blue-difference and red-difference chroma components. Y′ is distinguished from Y, which is luminance, meaning that light intensity is nonlinearly encoded based on gamma corrected RGB primaries.

sRGB is a standard RGB color space that HP and Microsoft created cooperatively in 1996 to use on monitors, printers, and the World Wide Web. It was subsequently standardized by the International Electrotechnical Commission (IEC) as IEC 61966-2-1:1999. sRGB is the current defined standard colorspace for the web, and it is usually the assumed colorspace for images that are neither tagged for a colorspace nor have an embedded color profile.

In numerical analysis and functional analysis, a discrete wavelet transform (DWT) is any wavelet transform for which the wavelets are discretely sampled. As with other wavelet transforms, a key advantage it has over Fourier transforms is temporal resolution: it captures both frequency and location information.

Dilution of precision (DOP), or geometric dilution of precision (GDOP), is a term used in satellite navigation and geomatics engineering to specify the error propagation as a mathematical effect of navigation satellite geometry on positional measurement precision.

The Rabi problem concerns the response of an atom to an applied harmonic electric field, with an applied frequency very close to the atom's natural frequency. It provides a simple and generally solvable example of light–atom interactions and is named after Isidor Isaac Rabi.

<span class="mw-page-title-main">Progressive Graphics File</span> File format

PGF is a wavelet-based bitmapped image format that employs lossless and lossy data compression. PGF was created to improve upon and replace the JPEG format. It was developed at the same time as JPEG 2000 but with a focus on speed over compression ratio.

<span class="mw-page-title-main">Ordered dithering</span> Image dithering algorithm

Ordered dithering is any image dithering algorithm which uses a pre-set threshold map tiled across an image. It is commonly used to display a continuous image on a display of smaller color depth. For example, Microsoft Windows uses it in 16-color graphics modes. The algorithm is characterized by noticeable crosshatch patterns in the result.

<span class="mw-page-title-main">Interval finite element</span>

In numerical analysis, the interval finite element method is a finite element method that uses interval parameters. Interval FEM can be applied in situations where it is not possible to get reliable probabilistic characteristics of the structure. This is important in concrete structures, wood structures, geomechanics, composite structures, biomechanics and in many other areas. The goal of the Interval Finite Element is to find upper and lower bounds of different characteristics of the model and use these results in the design process. This is so called worst case design, which is closely related to the limit state design.

The direct-quadrature-zerotransformation or zero-direct-quadraturetransformation is a tensor that rotates the reference frame of a three-element vector or a three-by-three element matrix in an effort to simplify analysis. The DQZ transform is the product of the Clarke transform and the Park transform, first proposed in 1929 by Robert H. Park.

In digital image and video processing, a color layout descriptor (CLD) is designed to capture the spatial distribution of color in an image. The feature extraction process consists of two parts: grid based representative color selection and discrete cosine transform with quantization.

IC_TC_P, ICtCp, or ITP is a color representation format specified in the Rec. ITU-R BT.2100 standard that is used as a part of the color image pipeline in video and digital photography systems for high dynamic range (HDR) and wide color gamut (WCG) imagery. It was developed by Dolby Laboratories from the IPT color space by Ebner and Fairchild. The format is derived from an associated RGB color space by a coordinate transformation that includes two matrix transformations and an intermediate nonlinear transfer function that is informally known as gamma pre-correction. The transformation produces three signals called I, C_T, and C_P. The IC_TC_P transformation can be used with RGB signals derived from either the perceptual quantizer (PQ) or hybrid log–gamma (HLG) nonlinearity functions, but is most commonly associated with the PQ function.

References

1 2 3 4 5 John Wiseman, An Introduction to MPEG Video Compression, https://web.archive.org/web/20111115004238/http://www.john-wiseman.com/technical/MPEG_tutorial.htm

^[1]

↑ Smith, Steven W. (2003). Digital signal processing: a practical guide for engineers and scientists. Demystifying technology series. Amsterdam Boston: Newnes. ISBN 978-0-7506-7444-7.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[wiseman-1] 1 2 3 4 5 John Wiseman, An Introduction to MPEG Video Compression, https://web.archive.org/web/20111115004238/http://www.john-wiseman.com/technical/MPEG_tutorial.htm

[2] Smith, Steven W. (2003). Digital signal processing: a practical guide for engineers and scientists. Demystifying technology series. Amsterdam Boston: Newnes. ISBN 978-0-7506-7444-7.

[1]

[1]