Color layout descriptor

Last updated

In digital image and video processing, a color layout descriptor (CLD) is designed to capture the spatial distribution of color in an image. The feature extraction process consists of two parts: grid based representative color selection and discrete cosine transform with quantization.

Contents

Color is the most basic quality of the visual contents, therefore it is possible to use colors to describe and represent an image. The MPEG-7 standard has tested the most efficient procedure to describe the color and has selected those that have provided more satisfactory results. This standard proposes different methods to obtain these descriptors, and one tool defined to describe the color is the CLD, that permits describing the color relation between sequences or group of images.

The CLD captures the spatial layout of the representative colors on a grid superimposed on a region or image. Representation is based on coefficients of the discrete cosine transform (DCT). This is a very compact descriptor being highly efficient in fast browsing and search applications. It can be applied to still images as well as to video segments.

Definition

The CLD is a very compact and resolution-invariant representation of color for high-speed image retrieval and it has been designed to efficiently represent the spatial distribution of colors. This feature can be used for a wide variety of similarity-based retrieval, content filtering and visualization. It is especially useful for spatial structure-based retrieval applications. This descriptor is obtained by applying the DCT transformation on a 2-D array of local representative colors in Y or Cb or Cr color space. The functionalities of the CLD are basically the matching:

– Image-to-image matching
– Video clip-to-video clip matching

Remark that the CLD is one of the most precise and fast color descriptor.

Extraction process of the CLD Extraction process of the CLD.jpg
Extraction process of the CLD

Extraction

The extraction process of this color descriptor consists of four stages:

The standard MPEG-7 recommends using the YCbCr color space for the CLD.

Image partitioning Image partitioning in CLD 2.jpg
Image partitioning

Image partitioning

In the image partitioning stage, the input picture (on RGB color space) is divided into 64 blocks to guarantee the invariance to resolution or scale. The inputs and outputs of this step are summarized in the following table:

Input stage 1Output stage 1
Input picture [M x N]Input picture divided into
64 blocks [M/8xN/8]
Representative color selection Representative color selection 2.jpg
Representative color selection

Representative color selection

After the image partitioning stage, a single representative color is selected from each block. Any method to select the representative color can be applied, but the standard recommends the use of the average of the pixel colors in a block as the corresponding representative color, since it is simpler and the description accuracy is sufficient in general. The selection results in a tiny image icon of size 8x8. The next figure shows this process. Note that in the image of the figure, the resolution of the original image has been maintained only in order to facilitate its representation. The inputs and outputs of this stage are summarized in the next table:

Input stage 2Output stage 2
Input picture divided into 64 blocks [M/8xN/8]Tiny image icon [8x8]

Once the tiny image icon is obtained, the color space conversion between RGB and YCbCr is applied.

Input stage 3Output stage 3
Tiny image icon [8x8] in RGB color spaceTiny image icon [8x8] in YCbCr color space

DCT transformation

In the fourth stage, the luminance (Y) and the blue and red chrominance (Cb and Cr) are transformed by 8x8 DCT, so three sets of 64 DCT coefficients are obtained. To calculate the DCT in a 2D array, the formulas below are used.

The inputs and outputs of this stage are summarized in the next table:

Input stage 4Output stage 4
Tiny image icon [8x8]
in YCbCr color space
3 [8x8] matrix of 64 coefficients
(DCTY, DCTCb, DCTCr)
Zigzag scanning Zigzag scanning.jpg
Zigzag scanning

Zigzag scanning

A zigzag scanning is performed with these three sets of 64 DCT coefficients, following the schema presented in the figure. The purpose of the zigzag scan is to group the low frequency coefficients of the 8x8 matrix. The inputs and outputs of this stage are summarized in the next table:

Input stage 5Output stage 5
3 [8x8] matrix of 64 coefficients
(DCTY, DCTCb, DCTCr)
3 zigzag scanned matrix
(DY, DCb, DCr)

Finally, these three set of matrices correspond to the CLD of the input image.

Matching

The matching process helps to evaluate if two elements are equal comparing both elements and calculating the distance between them. In the case of color descriptors the matching process helps to evaluate if two images are similar. Its procedure is the following:

– Given an image as an input, the application attempts to find an image with a similar descriptor in a data base of images.

If we consider two CLDs:

{DY, DCb, DCr}
{ DY‟, DCb‟, DCr‟ },

The distance between the two descriptors can be computed as:

The subscript i represents the zigzag-scanning order of the coefficients. Furthermore, notice that is possible to weight the coefficients (w) in order to adjust the performance of the matching process. These weights let us give to some components of the descriptor more importance than others. Observing the formula, it can be extracted that:

– 2 images are the same if the distance is 0
– 2 images are similar if the distance is near to 0

Therefore, this matching process will let to identify images with similar color descriptors. Since the complexity of the similarity matching process shown above is low, high-speed image matching can be achieved.

See also

Related Research Articles

<span class="mw-page-title-main">Binomial coefficient</span> Number of subsets of a given size

In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers nk ≥ 0 and is written It is the coefficient of the xk term in the polynomial expansion of the binomial power (1 + x)n; this coefficient can be computed by the multiplicative formula

<span class="mw-page-title-main">Ellipse</span> Plane curve: conic section

In mathematics, an ellipse is a plane curve surrounding two focal points, such that for all points on the curve, the sum of the two distances to the focal points is a constant. It generalizes a circle, which is the special type of ellipse in which the two focal points are the same. The elongation of an ellipse is measured by its eccentricity , a number ranging from to .

<span class="mw-page-title-main">JPEG</span> Lossy compression method for reducing the size of digital images

JPEG is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality. JPEG typically achieves 10:1 compression with little perceptible loss in image quality. Since its introduction in 1992, JPEG has been the most widely used image compression standard in the world, and the most widely used digital image format, with several billion JPEG images produced every day as of 2015.

The propagation constant of a sinusoidal electromagnetic wave is a measure of the change undergone by the amplitude and phase of the wave as it propagates in a given direction. The quantity being measured can be the voltage, the current in a circuit, or a field vector such as electric field strength or flux density. The propagation constant itself measures the change per unit length, but it is otherwise dimensionless. In the context of two-port networks and their cascades, propagation constant measures the change undergone by the source quantity as it propagates from one port to the next.

A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. The DCT, first proposed by Nasir Ahmed in 1972, is a widely used transformation technique in signal processing and data compression. It is used in most digital media, including digital images, digital video, digital audio, digital television, digital radio, and speech coding. DCTs are also important to numerous other applications in science and engineering, such as digital signal processing, telecommunication devices, reducing network bandwidth usage, and spectral methods for the numerical solution of partial differential equations.

<span class="mw-page-title-main">Chebyshev polynomials</span> Polynomial sequence

The Chebyshev polynomials are two sequences of polynomials related to the cosine and sine functions, notated as and . They can be defined in several equivalent ways, one of which starts with trigonometric functions:

The modified discrete cosine transform (MDCT) is a transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset, where subsequent blocks are overlapped so that the last half of one block coincides with the first half of the next block. This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid artifacts stemming from the block boundaries. As a result of these advantages, the MDCT is the most widely used lossy compression technique in audio data compression. It is employed in most modern audio coding standards, including MP3, Dolby Digital (AC-3), Vorbis (Ogg), Windows Media Audio (WMA), ATRAC, Cook, Advanced Audio Coding (AAC), High-Definition Coding (HDC), LDAC, Dolby AC-4, and MPEG-H 3D Audio, as well as speech coding standards such as AAC-LD (LD-MDCT), G.722.1, G.729.1, CELT, and Opus.

<span class="mw-page-title-main">Window function</span> Function used in signal processing

In signal processing and statistics, a window function is a mathematical function that is zero-valued outside of some chosen interval, normally symmetric around the middle of the interval, usually near a maximum in the middle, and usually tapering away from the middle. Mathematically, when another function or waveform/data-sequence is "multiplied" by a window function, the product is also zero-valued outside the interval: all that is left is the part where they overlap, the "view through the window". Equivalently, and in actual practice, the segment of data within the window is first isolated, and then only that data is multiplied by the window function values. Thus, tapering, not segmentation, is the main purpose of window functions.

In signal processing, a finite impulse response (FIR) filter is a filter whose impulse response is of finite duration, because it settles to zero in finite time. This is in contrast to infinite impulse response (IIR) filters, which may have internal feedback and may continue to respond indefinitely.

<span class="mw-page-title-main">Theta function</span> Special functions of several complex variables

In mathematics, theta functions are special functions of several complex variables. They show up in many topics, including Abelian varieties, moduli spaces, quadratic forms, and solitons. As Grassmann algebras, they appear in quantum field theory.

In mathematics, the Jacobi elliptic functions are a set of basic elliptic functions. They are found in the description of the motion of a pendulum, as well as in the design of electronic elliptic filters. While trigonometric functions are defined with reference to a circle, the Jacobi elliptic functions are a generalization which refer to other conic sections, the ellipse in particular. The relation to trigonometric functions is contained in the notation, for example, by the matching notation for . The Jacobi elliptic functions are used more often in practical problems than the Weierstrass elliptic functions as they do not require notions of complex analysis to be defined and/or understood. They were introduced by Carl Gustav Jakob Jacobi (1829). Carl Friedrich Gauss had already studied special Jacobi elliptic functions in 1797, the lemniscate elliptic functions in particular, but his work was published much later.

In statistics, the Bhattacharyya distance measures the similarity of two probability distributions. It is closely related to the Bhattacharyya coefficient, which is a measure of the amount of overlap between two statistical samples or populations.

In mathematics, an algebraic function is a function that can be defined as the root of a polynomial equation. Quite often algebraic functions are algebraic expressions using a finite number of terms, involving only the algebraic operations addition, subtraction, multiplication, division, and raising to a fractional power. Examples of such functions are:

<span class="mw-page-title-main">Multiple integral</span> Generalization of definite integrals to functions of multiple variables

In mathematics (specifically multivariable calculus), a multiple integral is a definite integral of a function of several real variables, for instance, f(x, y) or f(x, y, z). Integrals of a function of two variables over a region in (the real-number plane) are called double integrals, and integrals of a function of three variables over a region in (real-number 3D space) are called triple integrals. For multiple integrals of a single-variable function, see the Cauchy formula for repeated integration.

Ripple in electronics is the residual periodic variation of the DC voltage within a power supply which has been derived from an alternating current (AC) source. This ripple is due to incomplete suppression of the alternating waveform after rectification. Ripple voltage originates as the output of a rectifier or from generation and commutation of DC power.

In image processing, ridge detection is the attempt, via software, to locate ridges in an image, defined as curves whose points are local maxima of the function, akin to geographical ridges.

In the 1760s, Johann Heinrich Lambert was the first to prove that the number π is irrational, meaning it cannot be expressed as a fraction , where and are both integers. In the 19th century, Charles Hermite found a proof that requires no prerequisite knowledge beyond basic calculus. Three simplifications of Hermite's proof are due to Mary Cartwright, Ivan Niven, and Nicolas Bourbaki. Another proof, which is a simplification of Lambert's proof, is due to Miklós Laczkovich. Many of these are proofs by contradiction.

<span class="mw-page-title-main">Bicentric quadrilateral</span> Convex, 4-sided shape with an incircle and a circumcircle

In Euclidean geometry, a bicentric quadrilateral is a convex quadrilateral that has both an incircle and a circumcircle. The radii and center of these circles are called inradius and circumradius, and incenter and circumcenter respectively. From the definition it follows that bicentric quadrilaterals have all the properties of both tangential quadrilaterals and cyclic quadrilaterals. Other names for these quadrilaterals are chord-tangent quadrilateral and inscribed and circumscribed quadrilateral. It has also rarely been called a double circle quadrilateral and double scribed quadrilateral.

In applied mathematics, the discrete Chebyshev transform (DCT), named after Pafnuty Chebyshev, is either of two main varieties of DCTs: the discrete Chebyshev transform on the 'roots' grid of the Chebyshev polynomials of the first kind and the discrete Chebyshev transform on the 'extrema' grid of the Chebyshev polynomials of the first kind.

In mathematics, the exponential response formula (ERF), also known as exponential response and complex replacement, is a method used to find a particular solution of a non-homogeneous linear ordinary differential equation of any order. The exponential response formula is applicable to non-homogeneous linear ordinary differential equations with constant coefficients if the function is polynomial, sinusoidal, exponential or the combination of the three. The general solution of a non-homogeneous linear ordinary differential equation is a superposition of the general solution of the associated homogeneous ODE and a particular solution to the non-homogeneous ODE. Alternative methods for solving ordinary differential equations of higher order are method of undetermined coefficients and method of variation of parameters.