Phase correlation

Last updated January 26, 2024

Phase correlation is an approach to estimate the relative translative offset between two similar images (digital image correlation) or other data sets. It is commonly used in image registration and relies on a frequency-domain representation of the data, usually calculated by fast Fourier transforms. The term is applied particularly to a subset of cross-correlation techniques that isolate the phase information from the Fourier-space representation of the cross-correlogram.

Example

The following image demonstrates the usage of phase correlation to determine relative translative movement between two images corrupted by independent Gaussian noise. The image was translated by (30,33) pixels. Accordingly, one can clearly see a peak in the phase-correlation representation at approximately (30,33).

Method

Given two input images $\ g_{a}$ and $\ g_{b}$ :

Apply a window function (e.g., a Hamming window) on both images to reduce edge effects (this may be optional depending on the image characteristics). Then, calculate the discrete 2D Fourier transform of both images.

\ \mathbf {G} _{a}={\mathcal {F}}\{g_{a}\},\;\mathbf {G} _{b}={\mathcal {F}}\{g_{b}\}

Calculate the cross-power spectrum by taking the complex conjugate of the second result, multiplying the Fourier transforms together elementwise, and normalizing this product elementwise.

\ R={\frac {\mathbf {G} _{a}\circ \mathbf {G} _{b}^{*}}{|\mathbf {G} _{a}\circ \mathbf {G} _{b}^{*}|}}

Where $\circ$ is the Hadamard product (entry-wise product) and the absolute values are taken entry-wise as well. Written out entry-wise for element index $(j,k)$ :

\ R_{jk}={\frac {G_{a,jk}\cdot G_{b,jk}^{*}}{|G_{a,jk}\cdot G_{b,jk}^{*}|}}

Obtain the normalized cross-correlation by applying the inverse Fourier transform.

\ r={\mathcal {F}}^{-1}\{R\}

Determine the location of the peak in $\ r$ .

\ (\Delta x,\Delta y)=\arg \max _{(x,y)}\{r\}

Commonly, interpolation methods are used to estimate the peak location in the cross-correlogram to non-integer values, despite the fact that the data are discrete, and this procedure is often termed 'subpixel registration'. A large variety of subpixel interpolation methods are given in the technical literature. Common peak interpolation methods such as parabolic interpolation have been used, and the OpenCV computer vision package uses a centroid-based method, though these generally have inferior accuracy compared to more sophisticated methods.

Because the Fourier representation of the data has already been computed, it is especially convenient to use the Fourier shift theorem with real-valued (sub-integer) shifts for this purpose, which essentially interpolates using the sinusoidal basis functions of the Fourier transform. An especially popular FT-based estimator is given by Foroosh et al.^[1] In this method, the subpixel peak location is approximated by a simple formula involving peak pixel value and the values of its nearest neighbors, where $r_{(0,0)}$ is the peak value and $r_{(1,0)}$ is the nearest neighbor in the x direction (assuming, as in most approaches, that the integer shift has already been found and the comparand images differ only by a subpixel shift).

\ \Delta x={\frac {r_{(1,0)}}{r_{(1,0)}\pm r_{(0,0)}}}

^{[ clarification needed ]}

The Foroosh et al. method is quite fast compared to most methods, though it is not always the most accurate. Some methods shift the peak in Fourier space and apply non-linear optimization to maximize the correlogram peak, but these tend to be very slow since they must apply an inverse Fourier transform or its equivalent in the objective function.^[2]

It is also possible to infer the peak location from phase characteristics in Fourier space without the inverse transformation, as noted by Stone.^[3] These methods usually use a linear least squares (LLS) fit of the phase angles to a planar model. The long latency of the phase angle computation in these methods is a disadvantage, but the speed can sometimes be comparable to the Foroosh et al. method depending on the image size. They often compare favorably in speed to the multiple iterations of extremely slow objective functions in iterative non-linear methods.

Since all subpixel shift computation methods are fundamentally interpolative, the performance of a particular method depends on how well the underlying data conform to the assumptions in the interpolator. This fact also may limit the usefulness of high numerical accuracy in an algorithm, since the uncertainty due to interpolation method choice may be larger than any numerical or approximation error in the particular method.

Subpixel methods are also particularly sensitive to noise in the images, and the utility of a particular algorithm is distinguished not only by its speed and accuracy but its resilience to the particular types of noise in the application.

Rationale

The method is based on the Fourier shift theorem.

Let the two images $\ g_{a}$ and $\ g_{b}$ be circularly-shifted versions of each other:

\ g_{b}(x,y)\ {\stackrel {\mathrm {def} }{=}}\ g_{a}((x-\Delta x){\bmod {M}},(y-\Delta y){\bmod {N}})

(where the images are $\ M\times N$ in size).

Then, the discrete Fourier transforms of the images will be shifted relatively in phase:

\mathbf {G} _{b}(u,v)=\mathbf {G} _{a}(u,v)e^{-2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}

One can then calculate the normalized cross-power spectrum to factor out the phase difference:

{\begin{aligned}R(u,v)&={\frac {\mathbf {G} _{a}\mathbf {G} _{b}^{*}}{|\mathbf {G} _{a}\mathbf {G} _{b}^{*}|}}\\&={\frac {\mathbf {G} _{a}\mathbf {G} _{a}^{*}e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}}{|\mathbf {G} _{a}\mathbf {G} _{a}^{*}e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}|}}\\&={\frac {\mathbf {G} _{a}\mathbf {G} _{a}^{*}e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}}{|\mathbf {G} _{a}\mathbf {G} _{a}^{*}|}}\\&=e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}\end{aligned}}

since the magnitude of an imaginary exponential always is one, and the phase of $\ \mathbf {G} _{a}\mathbf {G} _{a}^{*}$ always is zero.

The inverse Fourier transform of a complex exponential is a Dirac delta function, i.e. a single peak:

\ r(x,y)=\delta (x+\Delta x,y+\Delta y)

This result could have been obtained by calculating the cross correlation directly. The advantage of this method is that the discrete Fourier transform and its inverse can be performed using the fast Fourier transform, which is much faster than correlation for large images.

Benefits

Unlike many spatial-domain algorithms, the phase correlation method is resilient to noise, occlusions, and other defects typical of medical or satellite images.^[4]

The method can be extended to determine rotation and scaling differences between two images by first converting the images to log-polar coordinates. Due to properties of the Fourier transform, the rotation and scaling parameters can be determined in a manner invariant to translation.^[5]^[6]

Limitations

In practice, it is more likely that $\ g_{b}$ will be a simple linear shift of $\ g_{a}$ , rather than a circular shift as required by the explanation above. In such cases, $\ r$ will not be a simple delta function, which will reduce the performance of the method. In such cases, a window function (such as a Gaussian or Tukey window) should be employed during the Fourier transform to reduce edge effects, or the images should be zero padded so that the edge effects can be ignored. If the images consist of a flat background, with all detail situated away from the edges, then a linear shift will be equivalent to a circular shift, and the above derivation will hold exactly. The peak can be sharpened by using edge or vector correlation.^[7]

For periodic images (such as a chessboard or picket fence), phase correlation may yield ambiguous results with several peaks in the resulting output.

Applications

Phase correlation is the preferred method for television standards conversion, as it leaves the fewest artifacts.

Related Research Articles

In mathematics, the discrete Fourier transform (DFT) converts a finite sequence of equally-spaced samples of a function into a same-length sequence of equally-spaced samples of the discrete-time Fourier transform (DTFT), which is a complex-valued function of frequency. The interval at which the DTFT is sampled is the reciprocal of the duration of the input sequence. An inverse DFT (IDFT) is a Fourier series, using the DTFT samples as coefficients of complex sinusoids at the corresponding DTFT frequencies. It has the same sample-values as the original input sequence. The DFT is therefore said to be a frequency domain representation of the original input sequence. If the original sequence spans all the non-zero values of a function, its DTFT is continuous, and the DFT provides discrete samples of one cycle. If the original sequence is one cycle of a periodic function, the DFT provides all the non-zero values of one DTFT cycle.

In mathematical analysis, the Dirac delta function, also known as the unit impulse, is a generalized function on the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire real line is equal to one. Since there is no function having this property, to model the delta "function" rigorously involves the use of limits or, as is common in mathematics, measure theory and the theory of distributions.

<span class="mw-page-title-main">Fourier series</span> Decomposition of periodic functions into sums of simpler sinusoidal forms

A Fourier series is an expansion of a periodic function into a sum of trigonometric functions. The Fourier series is an example of a trigonometric series, but not all trigonometric series are Fourier series. By expressing a function as a sum of sines and cosines, many problems involving the function become easier to analyze because trigonometric functions are well understood. For example, Fourier series were first used by Joseph Fourier to find solutions to the heat equation. This application is possible because the derivatives of trigonometric functions fall into simple patterns. Fourier series cannot be used to approximate arbitrary functions, because most functions have infinitely many terms in their Fourier series, and the series do not always converge. Well-behaved functions, for example smooth functions, have Fourier series that converge to the original function. The coefficients of the Fourier series are determined by integrals of the function multiplied by trigonometric functions, described in Common forms of the Fourier series below.

<span class="mw-page-title-main">Heat equation</span> Partial differential equation describing the evolution of temperature in a region

In mathematics and physics, the heat equation is a certain partial differential equation. Solutions of the heat equation are sometimes known as caloric functions. The theory of the heat equation was first developed by Joseph Fourier in 1822 for the purpose of modeling how a quantity such as heat diffuses through a given region.

In physics, the screened Poisson equation is a Poisson equation, which arises in the Klein–Gordon equation, electric field screening in plasmas, and nonlocal granular fluidity in granular flow.

In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form

Fourier optics is the study of classical optics using Fourier transforms (FTs), in which the waveform being considered is regarded as made up of a combination, or superposition, of plane waves. It has some parallels to the Huygens–Fresnel principle, in which the wavefront is regarded as being made up of a combination of spherical wavefronts whose sum is the wavefront being studied. A key difference is that Fourier optics considers the plane waves to be natural modes of the propagation medium, as opposed to Huygens–Fresnel, where the spherical waves originate in the physical medium.

In physics, a wave packet is a short burst of localized wave action that travels as a unit, outlined by an envelope. A wave packet can be analyzed into, or can be synthesized from, a potentially-infinite set of component sinusoidal waves of different wavenumbers, with phases and amplitudes such that they interfere constructively only over a small region of space, and destructively elsewhere. Any signal of a limited width in time or space requires many frequency components around a center frequency within a bandwidth inversely proportional to that width; even a gaussian function is considered a wave packet because its Fourier transform is a "packet" of waves of frequencies clustered around a central frequency. Each component wave function, and hence the wave packet, are solutions of a wave equation. Depending on the wave equation, the wave packet's profile may remain constant or it may change (dispersion) while propagating.

In mathematics, the Radon transform is the integral transform which takes a function f defined on the plane to a function Rf defined on the (two-dimensional) space of lines in the plane, whose value at a particular line is equal to the line integral of the function over that line. The transform was introduced in 1917 by Johann Radon, who also provided a formula for the inverse transform. Radon further included formulas for the transform in three dimensions, in which the integral is taken over planes. It was later generalized to higher-dimensional Euclidean spaces and more broadly in the context of integral geometry. The complex analogue of the Radon transform is known as the Penrose transform. The Radon transform is widely applicable to tomography, the creation of an image from the projection data associated with cross-sectional scans of an object.

In physics, the reciprocal lattice emerges from the Fourier transform of another lattice. The direct lattice or real lattice is a periodic function in physical space, such as a crystal system. The reciprocal lattice exists in the mathematical space of spatial frequencies, known as reciprocal space or k space, where $refers to the wavevector.$

In solid-state physics, the tight-binding model is an approach to the calculation of electronic band structure using an approximate set of wave functions based upon superposition of wave functions for isolated atoms located at each atomic site. The method is closely related to the LCAO method used in chemistry. Tight-binding models are applied to a wide variety of solids. The model gives good qualitative results in many cases and can be combined with other models that give better results where the tight-binding model fails. Though the tight-binding model is a one-electron model, the model also provides a basis for more advanced calculations like the calculation of surface states and application to various kinds of many-body problem and quasiparticle calculations.

In mathematics, a wavelet series is a representation of a square-integrable function by a certain orthonormal series generated by a wavelet. This article provides a formal, mathematical definition of an orthonormal wavelet and of the integral wavelet transform.

Ewald summation, named after Paul Peter Ewald, is a method for computing long-range interactions in periodic systems. It was first developed as the method for calculating the electrostatic energies of ionic crystals, and is now commonly used for calculating long-range interactions in computational chemistry. Ewald summation is a special case of the Poisson summation formula, replacing the summation of interaction energies in real space with an equivalent summation in Fourier space. In this method, the long-range interaction is divided into two parts: a short-range contribution, and a long-range contribution which does not have a singularity. The short-range contribution is calculated in real space, whereas the long-range contribution is calculated using a Fourier transform. The advantage of this method is the rapid convergence of the energy compared with that of a direct summation. This means that the method has high accuracy and reasonable speed when computing long-range interactions, and it is thus the de facto standard method for calculating long-range interactions in periodic systems. The method requires charge neutrality of the molecular system to accurately calculate the total Coulombic interaction. A study of the truncation errors introduced in the energy and force calculations of disordered point-charge systems is provided by Kolafa and Perram.

In condensed matter physics and crystallography, the static structure factor is a mathematical description of how a material scatters incident radiation. The structure factor is a critical tool in the interpretation of scattering patterns obtained in X-ray, electron and neutron diffraction experiments.

Pulse compression is a signal processing technique commonly used by radar, sonar and echography to either increase the range resolution when pulse length is constrained or increase the signal to noise ratio when the peak power and the bandwidth of the transmitted signal are constrained. This is achieved by modulating the transmitted pulse and then correlating the received signal with the transmitted pulse.

Phase retrieval is the process of algorithmically finding solutions to the phase problem. Given a complex signal $, of amplitude, and phase :$

Digital image correlation and tracking is an optical method that employs tracking and image registration techniques for accurate 2D and 3D measurements of changes in images. This method is often used to measure full-field displacement and strains, and it is widely applied in many areas of science and engineering. Compared to strain gauges and extensometers, digital image correlation methods provide finer details about deformation, due to the ability to provide both local and average data.

In physics and engineering, the envelope of an oscillating signal is a smooth curve outlining its extremes. The envelope thus generalizes the concept of a constant amplitude into an instantaneous amplitude. The figure illustrates a modulated sine wave varying between an upper envelope and a lower envelope. The envelope function may be a function of time, space, angle, or indeed of any variable.

In optics, the Fraunhofer diffraction equation is used to model the diffraction of waves when the diffraction pattern is viewed at a long distance from the diffracting object, and also when it is viewed at the focal plane of an imaging lens.

<span class="mw-page-title-main">Fluctuation X-ray scattering</span>

Fluctuation X-ray scattering (FXS) is an X-ray scattering technique similar to small-angle X-ray scattering (SAXS), but is performed using X-ray exposures below sample rotational diffusion times. This technique, ideally performed with an ultra-bright X-ray light source, such as a free electron laser, results in data containing significantly more information as compared to traditional scattering methods.

References

↑ H. Foroosh (Shekarforoush), J.B. Zerubia, and M. Berthod, "Extension of Phase Correlation to Subpixel Registration," IEEE Transactions on Image Processing, V. 11, No. 3, Mar. 2002, pp. 188-200.
↑ E.g. M. Sjödahl and L.R. Benckert, "Electronic speckle photography: analysis of an algorithm giving the displacement with subpixel accuracy," Appl Opt. 1993 May 1;32(13):2278-84. doi : 10.1364/AO.32.002278
↑ Harold S. Stone, "A Fast Direct Fourier-Based Algorithm for Subpixel Registration of Images", IEEE Transactions on Geoscience and Remote Sensing, V. 39, No. 10, Oct. 2001, pp.2235-2242
↑ S. Nithyanadam, S. Amaresan and N. Mohamed Haris "An Innovative Normalization Process by Phase Correlation Method of Iris Images for the block size of 32*32"
↑ E. De Castro and C. Morandi "Registration of Translated and Rotated Images Using Finite Fourier Transforms", IEEE Transactions on Pattern Analysis and Machine Intelligence, Sept. 1987
↑ B. S Reddy and B. N. Chatterji, “An FFT-based technique for translation, rotation, and scale-invariant image registration”, IEEE Transactions on Image Processing 5, no. 8 (1996): 1266–1271.
↑ Sarvaiya, Jignesh Natvarlal; Patnaik, Suprava; Kothari, Kajal (2012). "Image Registration Using Log Polar Transform and Phase Correlation to Recover Higher Scale". JPRR. 7 (1): 90–105. CiteSeerX 10.1.1.730.9105 . doi:10.13176/11.355.

External links

Using Matlab to perform normalized cross-correlation on images

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] H. Foroosh (Shekarforoush), J.B. Zerubia, and M. Berthod, "Extension of Phase Correlation to Subpixel Registration," IEEE Transactions on Image Processing, V. 11, No. 3, Mar. 2002, pp. 188-200.

[2] E.g. M. Sjödahl and L.R. Benckert, "Electronic speckle photography: analysis of an algorithm giving the displacement with subpixel accuracy," Appl Opt. 1993 May 1;32(13):2278-84. doi : 10.1364/AO.32.002278

[3] Harold S. Stone, "A Fast Direct Fourier-Based Algorithm for Subpixel Registration of Images", IEEE Transactions on Geoscience and Remote Sensing, V. 39, No. 10, Oct. 2001, pp.2235-2242

[4] S. Nithyanadam, S. Amaresan and N. Mohamed Haris "An Innovative Normalization Process by Phase Correlation Method of Iris Images for the block size of 32*32"

[5] E. De Castro and C. Morandi "Registration of Translated and Rotated Images Using Finite Fourier Transforms", IEEE Transactions on Pattern Analysis and Machine Intelligence, Sept. 1987

[6] B. S Reddy and B. N. Chatterji, “An FFT-based technique for translation, rotation, and scale-invariant image registration”, IEEE Transactions on Image Processing 5, no. 8 (1996): 1266–1271.

[7] Sarvaiya, Jignesh Natvarlal; Patnaik, Suprava; Kothari, Kajal (2012). "Image Registration Using Log Polar Transform and Phase Correlation to Recover Higher Scale". JPRR. 7 (1): 90–105. CiteSeerX 10.1.1.730.9105 . doi:10.13176/11.355.

[avs-8] 1 2 Also used in China's DVB-S/S2 network.

[mobaho-9] 1 2 Defuncted.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[note 1]

[note 2]