Eight-point algorithm

Last updated March 23, 2024

The eight-point algorithm is an algorithm used in computer vision to estimate the essential matrix or the fundamental matrix related to a stereo camera pair from a set of corresponding image points. It was introduced by Christopher Longuet-Higgins in 1981 for the case of the essential matrix. In theory, this algorithm can be used also for the fundamental matrix, but in practice the normalized eight-point algorithm, described by Richard Hartley in 1997, is better suited for this case.

Coplanarity constraint

One may express the epipolar geometry of two cameras and a point in space with an algebraic equation. Observe that, no matter where the point $P$ is in space, the vectors ${\overline {O_{L}P}}$ , ${\overline {O_{R}P}}$ and ${\overline {O_{R}O_{L}}}$ belong to the same plane. Call $X_{L}$ the coordinates of point $P$ in the left eye's reference frame and call $X_{R}$ the coordinates of $P$ in the right eye's reference frame and call $R,T$ the rotation and translation between the two reference frames s.t. $X_{R}=R(X_{L}-T)$ is the relationship between the coordinates of $P$ in the two reference frames. The following equation always holds because the vector generated from $T\wedge X_{L}$ is orthogonal to both $T$ and $X_{L}$ :

X_{L}^{T}T\wedge X_{L}-T^{T}T\wedge X_{L}=(X_{L}-T)^{T}T\wedge X_{L}=0

Because $I=R^{T}R$ , we get

(X_{L}-T)^{T}R^{T}RT\wedge X_{L}=0

.

Replacing $(X_{L}-T)^{T}R^{T}$ with $X_{R}^{T}$ , we get

X_{R}^{T}RT\wedge X_{L}=X_{R}^{T}RSX_{L}=X_{R}^{T}EX_{L}=0

Observe that $T\wedge$ may be thought of as a matrix; Longuet-Higgins used the symbol $S$ to denote it. The product $RT\wedge =RS$ is often called essential matrix and denoted with $E$ .

The vectors ${\overline {O_{L}p_{L}}},{\overline {O_{R}p_{R}}}$ are parallel to the vectors ${\overline {O_{L}P}},{\overline {O_{R}P}}$ and therefore the coplanarity constraint holds if we substitute these vectors. If we call $y,y'$ the coordinates of the projections of $P$ onto the left and right image planes, then the coplanarity constraint may be written as

y'^{T}\mathbf {E} y=0

Basic algorithm

The basic eight-point algorithm is here described for the case of estimating the essential matrix $\mathbf {E}$ . It consists of three steps. First, it formulates a homogeneous linear equation, where the solution is directly related to $\mathbf {E}$ , and then solves the equation, taking into account that it may not have an exact solution. Finally, the internal constraints of the resulting matrix are managed. The first step is described in Longuet-Higgins' paper, the second and third steps are standard approaches in estimation theory.

The constraint defined by the essential matrix $\mathbf {E}$ is

(\mathbf {y} ')^{T}\,\mathbf {E} \,\mathbf {y} =0

for corresponding image points represented in normalized image coordinates $\mathbf {y} ,\mathbf {y} '$ . The problem which the algorithm solves is to determine $\mathbf {E}$ for a set of matching image points. In practice, the image coordinates of the image points are affected by noise and the solution may also be over-determined which means that it may not be possible to find $\mathbf {E}$ which satisfies the above constraint exactly for all points. This issue is addressed in the second step of the algorithm.

Step 1: Formulating a homogeneous linear equation

With

\mathbf {y} ={\begin{pmatrix}y_{1}\\y_{2}\\1\end{pmatrix}}

and

\mathbf {y} '={\begin{pmatrix}y'_{1}\\y'_{2}\\1\end{pmatrix}}

and

\mathbf {E} ={\begin{pmatrix}e_{11}&e_{12}&e_{13}\\e_{21}&e_{22}&e_{23}\\e_{31}&e_{32}&e_{33}\end{pmatrix}}

the constraint can also be rewritten as

y'_{1}y_{1}e_{11}+y'_{1}y_{2}e_{12}+y'_{1}e_{13}+y'_{2}y_{1}e_{21}+y'_{2}y_{2}e_{22}+y'_{2}e_{23}+y_{1}e_{31}+y_{2}e_{32}+e_{33}=0\,

or

\mathbf {e} \cdot {\tilde {\mathbf {y} }}=0

where

{\tilde {\mathbf {y} }}={\begin{pmatrix}y'_{1}y_{1}\\y'_{1}y_{2}\\y'_{1}\\y'_{2}y_{1}\\y'_{2}y_{2}\\y'_{2}\\y_{1}\\y_{2}\\1\end{pmatrix}}

and

\mathbf {e} ={\begin{pmatrix}e_{11}\\e_{12}\\e_{13}\\e_{21}\\e_{22}\\e_{23}\\e_{31}\\e_{32}\\e_{33}\end{pmatrix}}

that is, $\mathbf {e}$ represents the essential matrix in the form of a 9-dimensional vector and this vector must be orthogonal to the vector ${\tilde {\mathbf {y} }}$ which can be seen as a vector representation of the $3\times 3$ matrix $\mathbf {y} '\,\mathbf {y} ^{T}$ .

Each pair of corresponding image points produces a vector ${\tilde {\mathbf {y} }}$ . Given a set of 3D points $\mathbf {P} _{k}$ this corresponds to a set of vectors ${\tilde {\mathbf {y} }}_{k}$ and all of them must satisfy

\mathbf {e} \cdot {\tilde {\mathbf {y} }}_{k}=0

for the vector $\mathbf {e}$ . Given sufficiently many (at least eight) linearly independent vectors ${\tilde {\mathbf {y} }}_{k}$ it is possible to determine $\mathbf {e}$ in a straightforward way. Collect all vectors ${\tilde {\mathbf {y} }}_{k}$ as the columns of a matrix $\mathbf {Y}$ and it must then be the case that

\mathbf {e} ^{T}\,\mathbf {Y} =\mathbf {0}

This means that $\mathbf {e}$ is the solution to a homogeneous linear equation.

Step 2: Solving the equation

A standard approach to solving this equation implies that $\mathbf {e}$ is a right singular vector of $\mathbf {Y}$ corresponding to a singular value that equals zero. Provided that at least eight linearly independent vectors ${\tilde {\mathbf {y} }}_{k}$ are used to construct $\mathbf {Y}$ it follows that this singular vector is unique (disregarding scalar multiplication) and, consequently, $\mathbf {e}$ and then $\mathbf {E}$ can be determined.

In the case that more than eight corresponding points are used to construct $\mathbf {Y}$ it is possible that it does not have any singular value equal to zero. This case occurs in practice when the image coordinates are affected by various types of noise. A common approach to deal with this situation is to describe it as a total least squares problem; find $\mathbf {e}$ which minimizes

\|\mathbf {e} ^{T}\,\mathbf {Y} \|

when $\|\mathbf {e} \|=1$ . The solution is to choose $\mathbf {e}$ as the left singular vector corresponding to the smallest singular value of $\mathbf {Y}$ . A reordering of this $\mathbf {e}$ back into a $3\times 3$ matrix gives the result of this step, here referred to as $\mathbf {E} _{\rm {est}}$ .

Step 3: Enforcing the internal constraint

Another consequence of dealing with noisy image coordinates is that the resulting matrix may not satisfy the internal constraint of the essential matrix, that is, two of its singular values are equal and nonzero and the other is zero. Depending on the application, smaller or larger deviations from the internal constraint may or may not be a problem. If it is critical that the estimated matrix satisfies the internal constraints, this can be accomplished by finding the matrix $\mathbf {E} '$ of rank 2 which minimizes

\|\mathbf {E} '-\mathbf {E} _{\rm {est}}\|

where $\mathbf {E} _{\rm {est}}$ is the resulting matrix from Step 2 and the Frobenius matrix norm is used. The solution to the problem is given by first computing a singular value decomposition of $\mathbf {E} _{\rm {est}}$ :

\mathbf {E} _{\rm {est}}=\mathbf {U} \,\mathbf {S} \,\mathbf {V} ^{T}

where $\mathbf {U} ,\mathbf {V}$ are orthogonal matrices and $\mathbf {S}$ is a diagonal matrix which contains the singular values of $\mathbf {E} _{\rm {est}}$ . In the ideal case, one of the diagonal elements of $\mathbf {S}$ should be zero, or at least small compared to the other two which should be equal. In any case, set

\mathbf {S} '={\begin{pmatrix}s_{1}&0&0\\0&s_{2}&0\\0&0&0\end{pmatrix}},

where $s_{1},s_{2}$ are the largest and second largest singular values in $\mathbf {S}$ respectively. Finally, $\mathbf {E} '$ is given by

\mathbf {E} '=\mathbf {U} \,\mathbf {S} '\,\mathbf {V} ^{T}

The matrix $\mathbf {E} '$ is the resulting estimate of the essential matrix provided by the algorithm.

Normalized algorithm

The basic eight-point algorithm can in principle be used also for estimating the fundamental matrix $\mathbf {F}$ . The defining constraint for $\mathbf {F}$ is

(\mathbf {y} ')^{T}\,\mathbf {F} \,\mathbf {y} =0

where $\mathbf {y} ,\mathbf {y} '$ are the homogeneous representations of corresponding image coordinates (not necessary normalized). This means that it is possible to form a matrix $\mathbf {Y}$ in a similar way as for the essential matrix and solve the equation

\mathbf {f} ^{T}\,\mathbf {Y} =\mathbf {0}

for $\mathbf {f}$ which is a reshaped version of $\mathbf {F}$ . By following the procedure outlined above, it is then possible to determine $\mathbf {F}$ from a set of eight matching points. In practice, however, the resulting fundamental matrix may not be useful for determining epipolar constraints.

Difficulty

The problem is that the resulting $\mathbf {Y}$ often is ill-conditioned. In theory, $\mathbf {Y}$ should have one singular value equal to zero and the rest are non-zero. In practice, however, some of the non-zero singular values can become small relative to the larger ones. If more than eight corresponding points are used to construct $\mathbf {Y}$ , where the coordinates are only approximately correct, there may not be a well-defined singular value which can be identified as approximately zero. Consequently, the solution of the homogeneous linear system of equations may not be sufficiently accurate to be useful.

Cause

Hartley addressed this estimation problem in his 1997 article. His analysis of the problem shows that the problem is caused by the poor distribution of the homogeneous image coordinates in their space, $\mathbb {R} ^{3}$ . A typical homogeneous representation of the 2D image coordinate $(y_{1},y_{2})\,$ is

\mathbf {y} ={\begin{pmatrix}y_{1}\\y_{2}\\1\end{pmatrix}}

where both $y_{1},y_{2}\,$ lie in the range 0 to 1000–2000 for a modern digital camera. This means that the first two coordinates in $\mathbf {y}$ vary over a much larger range than the third coordinate. Furthermore, if the image points which are used to construct $\mathbf {Y}$ lie in a relatively small region of the image, for example at $(700,700)\pm (100,100)\,$ , again the vector $\mathbf {y}$ points in more or less the same direction for all points. As a consequence, $\mathbf {Y}$ will have one large singular value and the remaining are small.

Solution

As a solution to this problem, Hartley proposed that the coordinate system of each of the two images should be transformed, independently, into a new coordinate system according to the following principle.

The origin of the new coordinate system should be centered (have its origin) at the centroid (center of gravity) of the image points. This is accomplished by a translation of the original origin to the new one.
After the translation the coordinates are uniformly scaled so that the mean of distances from the origin to the points equals ${\sqrt {2}}$ .

This principle results, normally, in a distinct coordinate transformation for each of the two images. As a result, new homogeneous image coordinates $\mathbf {\bar {y}} ,\mathbf {\bar {y}} '$ are given by

\mathbf {\bar {y}} =\mathbf {T} \,\mathbf {y}

\mathbf {\bar {y}} '=\mathbf {T} '\,\mathbf {y} '

where $\mathbf {T} ,\mathbf {T} '$ are the transformations (translation and scaling) from the old to the new normalized image coordinates. This normalization is only dependent on the image points which are used in a single image and is, in general, distinct from normalized image coordinates produced by a normalized camera.

The epipolar constraint based on the fundamental matrix can now be rewritten as

0=(\mathbf {\bar {y}} ')^{T}\,((\mathbf {T} ')^{T})^{-1}\,\mathbf {F} \,\mathbf {T} ^{-1}\,\mathbf {\bar {y}} =(\mathbf {\bar {y}} ')^{T}\,\mathbf {\bar {F}} \,\mathbf {\bar {y}}

where $\mathbf {\bar {F}} =((\mathbf {T} ')^{T})^{-1}\,\mathbf {F} \,\mathbf {T} ^{-1}$ . This means that it is possible to use the normalized homogeneous image coordinates $\mathbf {\bar {y}} ,\mathbf {\bar {y}} '$ to estimate the transformed fundamental matrix $\mathbf {\bar {F}}$ using the basic eight-point algorithm described above.

The purpose of the normalization transformations is that the matrix $\mathbf {\bar {Y}}$ , constructed from the normalized image coordinates, in general, has a better condition number than $\mathbf {Y}$ has. This means that the solution $\mathbf {\bar {f}}$ is more well-defined as a solution of the homogeneous equation $\mathbf {\bar {Y}} \,\mathbf {\bar {f}}$ than $\mathbf {f}$ is relative to $\mathbf {Y}$ . Once $\mathbf {\bar {f}}$ has been determined and reshaped into $\mathbf {\bar {F}}$ the latter can be de-normalized to give $\mathbf {F}$ according to

\mathbf {F} =(\mathbf {T} ')^{T}\,\mathbf {\bar {F}} \,\mathbf {T}

In general, this estimate of the fundamental matrix is a better one than would have been obtained by estimating from the un-normalized coordinates.

Using fewer than eight points

Each point pair contributes with one constraining equation on the element in $\mathbf {E}$ . Since $\mathbf {E}$ has five degrees of freedom it should therefore be sufficient with only five point pairs to determine $\mathbf {E}$ . David Nister proposed an efficient solution to estimate the essential matrix from set of five paired points, known as the five-point algorithm.^[1] Hartley et. al. later proposed a modified and more stable five-point algorithm based on Nister's algorithm.^[2]

Related Research Articles

In mathematics, and more specifically in linear algebra, a linear map is a mapping $between two vector spaces that preserves the operations of vector addition and scalar multiplication. The same names and the same definition are also used for the more general case of modules over a ring; see Module homomorphism.$

In Euclidean geometry, an affine transformation or affinity is a geometric transformation that preserves lines and parallelism, but not necessarily Euclidean distances and angles.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

In linear algebra, the Cholesky decomposition or Cholesky factorization is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for efficient numerical solutions, e.g., Monte Carlo simulations. It was discovered by André-Louis Cholesky for real matrices, and posthumously published in 1924. When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations.

<span class="mw-page-title-main">Singular value decomposition</span> Matrix decomposition

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix into a rotation, followed by a rescaling followed by another rotation. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any matrix. It is related to the polar decomposition.

In mathematics, the matrix representation of conic sections permits the tools of linear algebra to be used in the study of conic sections. It provides easy ways to calculate a conic section's axis, vertices, tangents and the pole and polar relationship between points and lines of the plane determined by the conic. The technique does not require putting the equation of a conic section into a standard form, thus making it easier to investigate those conic sections whose axes are not parallel to the coordinate system.

In Euclidean geometry, a translation is a geometric transformation that moves every point of a figure, shape or space by the same distance in a given direction. A translation can also be interpreted as the addition of a constant vector to every point, or as shifting the origin of the coordinate system. In a Euclidean space, any translation is an isometry.

<span class="mw-page-title-main">Homogeneous coordinates</span> Coordinate system used in projective geometry

In mathematics, homogeneous coordinates or projective coordinates, introduced by August Ferdinand Möbius in his 1827 work Der barycentrische Calcul, are a system of coordinates used in projective geometry, just as Cartesian coordinates are used in Euclidean geometry. They have the advantage that the coordinates of points, including points at infinity, can be represented using finite coordinates. Formulas involving homogeneous coordinates are often simpler and more symmetric than their Cartesian counterparts. Homogeneous coordinates have a range of applications, including computer graphics and 3D computer vision, where they allow affine transformations and, in general, projective transformations to be easily represented by a matrix. They are also used in fundamental elliptic curve cryptography algorithms.

<span class="mw-page-title-main">Barycentric coordinate system</span> Coordinate system that is defined by points instead of vectors

In geometry, a barycentric coordinate system is a coordinate system in which the location of a point is specified by reference to a simplex. The barycentric coordinates of a point can be interpreted as masses placed at the vertices of the simplex, such that the point is the center of mass of these masses. These masses can be zero or negative; they are all positive if and only if the point is inside the simplex.

<span class="mw-page-title-main">Blowing up</span>

In mathematics, blowing up or blowup is a type of geometric transformation which replaces a subspace of a given space with the space of all directions pointing out of that subspace. For example, the blowup of a point in a plane replaces the point with the projectivized tangent space at that point. The metaphor is that of zooming in on a photograph to enlarge part of the picture, rather than referring to an explosion.

In geometry and linear algebra, a Cartesian tensor uses an orthonormal basis to represent a tensor in a Euclidean space in the form of components. Converting a tensor's components from one such basis to another is done through an orthogonal transformation.

<span class="mw-page-title-main">Hyperboloid model</span> Model of n-dimensional hyperbolic geometry

In geometry, the hyperboloid model, also known as the Minkowski model after Hermann Minkowski, is a model of n-dimensional hyperbolic geometry in which points are represented by points on the forward sheet S⁺ of a two-sheeted hyperboloid in (n+1)-dimensional Minkowski space or by the displacement vectors from the origin to those points, and m-planes are represented by the intersections of (m+1)-planes passing through the origin in Minkowski space with S⁺ or by wedge products of m vectors. Hyperbolic space is embedded isometrically in Minkowski space; that is, the hyperbolic distance function is inherited from Minkowski space, analogous to the way spherical distance is inherited from Euclidean distance when the n-sphere is embedded in (n+1)-dimensional Euclidean space.

In computer vision, the essential matrix is a $matrix, that relates corresponding points in stereo images assuming that the cameras satisfy the pinhole camera model.$

The Plücker matrix is a special skew-symmetric 4 × 4 matrix, which characterizes a straight line in projective space. The matrix is defined by 6 Plücker coordinates with 4 degrees of freedom. It is named after the German mathematician Julius Plücker.

In computer vision a camera matrix or (camera) projection matrix is a $matrix which describes the mapping of a pinhole camera from 3D points in the world to 2D points in an image.$

<span class="mw-page-title-main">Pinhole camera model</span> Model of 3D points projected onto planar image via a lens-less aperture

The pinhole camera model describes the mathematical relationship between the coordinates of a point in three-dimensional space and its projection onto the image plane of an ideal pinhole camera, where the camera aperture is described as a point and no lenses are used to focus light. The model does not include, for example, geometric distortions or blurring of unfocused objects caused by lenses and finite sized apertures. It also does not take into account that most practical cameras have only discrete image coordinates. This means that the pinhole camera model can only be used as a first order approximation of the mapping from a 3D scene to a 2D image. Its validity depends on the quality of the camera and, in general, decreases from the center of the image to the edges as lens distortion effects increase.

Direct linear transformation (DLT) is an algorithm which solves a set of variables from a set of similarity relations:

In computer vision, triangulation refers to the process of determining a point in 3D space given its projections onto two, or more, images. In order to solve this problem it is necessary to know the parameters of the camera projection function from 3D to 2D for the cameras involved, in the simplest case represented by the camera matrices. Triangulation is sometimes also referred to as reconstruction or intersection.

In classical mechanics, the Udwadia–Kalaba formulation is a method for deriving the equations of motion of a constrained mechanical system. The method was first described by Anatolii Fedorovich Vereshchagin for the particular case of robotic arms, and later generalized to all mechanical systems by Firdaus E. Udwadia and Robert E. Kalaba in 1992. The approach is based on Gauss's principle of least constraint. The Udwadia–Kalaba method applies to both holonomic constraints and nonholonomic constraints, as long as they are linear with respect to the accelerations. The method generalizes to constraint forces that do not obey D'Alembert's principle.

In mathematics, an ordinary differential equation (ODE) is a differential equation (DE) dependent on only a single independent variable. As with other DE, its unknown(s) consists of one function(s) and involves the derivatives of those functions. The term "ordinary" is used in contrast with partial differential equations which may be with respect to more than one independent variable.

References

↑ Nister, David (2004). "An efficient solution to the five-point relative pose problem". IEEE Transactions on Pattern Analysis and Machine Intelligence. 26 (6): 756–770. doi:10.1109/TPAMI.2004.17. PMID 18579936. S2CID 886598.
↑ Li, Hongdong (2006). "Five-Point Motion Estimation Made Easy". 18th International Conference on Pattern Recognition (ICPR'06). pp. 630–633. doi:10.1109/ICPR.2006.579. ISBN 0-7695-2521-0. S2CID 7745676.