Triangulation (computer vision)

Last updated

In computer vision, triangulation refers to the process of determining a point in 3D space given its projections onto two, or more, images. In order to solve this problem it is necessary to know the parameters of the camera projection function from 3D to 2D for the cameras involved, in the simplest case represented by the camera matrices. Triangulation is sometimes also referred to as reconstruction or intersection.

Contents

The triangulation problem is in principle trivial. Since each point in an image corresponds to a line in 3D space, all points on the line in 3D are projected to the point in the image. If a pair of corresponding points in two, or more images, can be found it must be the case that they are the projection of a common 3D point x. The set of lines generated by the image points must intersect at x (3D point) and the algebraic formulation of the coordinates of x (3D point) can be computed in a variety of ways, as is presented below.

In practice, however, the coordinates of image points cannot be measured with arbitrary accuracy. Instead, various types of noise, such as geometric noise from lens distortion or interest point detection error, lead to inaccuracies in the measured image coordinates. As a consequence, the lines generated by the corresponding image points do not always intersect in 3D space. The problem, then, is to find a 3D point which optimally fits the measured image points. In the literature there are multiple proposals for how to define optimality and how to find the optimal 3D point. Since they are based on different optimality criteria, the various methods produce different estimates of the 3D point x when noise is involved.

Introduction

In the following, it is assumed that triangulation is made on corresponding image points from two views generated by pinhole cameras. Generalization from these assumptions are discussed here.

The ideal case of epipolar geometry. A 3D point x is projected onto two camera images through lines (green) which intersect with each camera's focal point, O1 and O2. The resulting image points are y1 and y2. The green lines intersect at x. TriangulationIdeal.svg
The ideal case of epipolar geometry. A 3D point x is projected onto two camera images through lines (green) which intersect with each camera's focal point, O1 and O2. The resulting image points are y1 and y2. The green lines intersect at x.
In practice, the image points y1 and y2 cannot be measured with arbitrary accuracy. Instead points y'1 and y'2 are detected and used for the triangulation. The corresponding projection lines (blue) do not, in general, intersect in 3D space and may also not intersect with point x. TriangulationReal.svg
In practice, the image points y1 and y2 cannot be measured with arbitrary accuracy. Instead points y'1 and y'2 are detected and used for the triangulation. The corresponding projection lines (blue) do not, in general, intersect in 3D space and may also not intersect with point x.

The image to the left illustrates the epipolar geometry of a pair of stereo cameras of pinhole model. A point x (3D point) in 3D space is projected onto the respective image plane along a line (green) which goes through the camera's focal point, and , resulting in the two corresponding image points and . If and are given and the geometry of the two cameras are known, the two projection lines (green lines) can be determined and it must be the case that they intersect at point x (3D point). Using basic linear algebra that intersection point can be determined in a straightforward way.

The image to the right shows the real case. The position of the image points and cannot be measured exactly. The reason is a combination of factors such as

As a consequence, the measured image points are and instead of and . However, their projection lines (blue) do not have to intersect in 3D space or come close to x. In fact, these lines intersect if and only if and satisfy the epipolar constraint defined by the fundamental matrix. Given the measurement noise in and it is rather likely that the epipolar constraint is not satisfied and the projection lines do not intersect.

This observation leads to the problem which is solved in triangulation. Which 3D point xest is the best estimate of x given and and the geometry of the cameras? The answer is often found by defining an error measure which depends on xest and then minimizing this error. In the following sections, some of the various methods for computing xest presented in the literature are briefly described.

All triangulation methods produce xest = x in the case that and , that is, when the epipolar constraint is satisfied (except for singular points, see below). It is what happens when the constraint is not satisfied which differs between the methods.

Properties

A triangulation method can be described in terms of a function such that

where are the homogeneous coordinates of the detected image points and are the camera matrices. x (3D point) is the homogeneous representation of the resulting 3D point. The sign implies that is only required to produce a vector which is equal to x up to a multiplication by a non-zero scalar since homogeneous vectors are involved.

Before looking at the specific methods, that is, specific functions , there are some general concepts related to the methods that need to be explained. Which triangulation method is chosen for a particular problem depends to some extent on these characteristics.

Singularities

Some of the methods fail to correctly compute an estimate of x (3D point) if it lies in a certain subset of the 3D space, corresponding to some combination of . A point in this subset is then a singularity of the triangulation method. The reason for the failure can be that some equation system to be solved is under-determined or that the projective representation of xest becomes the zero vector for the singular points.

Invariance

In some applications, it is desirable that the triangulation is independent of the coordinate system used to represent 3D points; if the triangulation problem is formulated in one coordinate system and then transformed into another the resulting estimate xest should transform in the same way. This property is commonly referred to as invariance. Not every triangulation method assures invariance, at least not for general types of coordinate transformations.

For a homogeneous representation of 3D coordinates, the most general transformation is a projective transformation, represented by a matrix . If the homogeneous coordinates are transformed according to

then the camera matrices must transform as (Ck)

to produce the same homogeneous image coordinates (yk)

If the triangulation function is invariant to then the following relation must be valid

from which follows that

  for all

For each triangulation method, it can be determined if this last relation is valid. If it is, it may be satisfied only for a subset of the projective transformations, for example, rigid or affine transformations.

Computational complexity

The function is only an abstract representation of a computation which, in practice, may be relatively complex. Some methods result in a which is a closed-form continuous function while others need to be decomposed into a series of computational steps involving, for example, SVD or finding the roots of a polynomial. Yet another class of methods results in which must rely on iterative estimation of some parameters. This means that both the computation time and the complexity of the operations involved may vary between the different methods.

Methods

Mid-point method

Each of the two image points and has a corresponding projection line (blue in the right image above), here denoted as and , which can be determined given the camera matrices . Let be a distance function between a (3D line) L and a x (3D point) such that is the Euclidean distance between and . The midpoint method finds the point xest which minimizes

It turns out that xest lies exactly at the middle of the shortest line segment which joins the two projection lines.

Direct linear transformation

Via the essential matrix

The problem to be solved there is how to compute given corresponding normalized image coordinates and . If the essential matrix is known and the corresponding rotation and translation transformations have been determined, this algorithm (described in Longuet-Higgins' paper) provides a solution.

Let denote row k of the rotation matrix :

Combining the above relations between 3D coordinates in the two coordinate systems and the mapping between 3D and 2D points described earlier gives

or

Once is determined, the other two coordinates can be computed as

The above derivation is not unique. It is also possible to start with an expression for and derive an expression for according to

In the ideal case, when the camera maps the 3D points according to a perfect pinhole camera and the resulting 2D points can be detected without any noise, the two expressions for are equal. In practice, however, they are not and it may be advantageous to combine the two estimates of , for example, in terms of some sort of average.

There are also other types of extensions of the above computations which are possible. They started with an expression of the primed image coordinates and derived 3D coordinates in the unprimed system. It is also possible to start with unprimed image coordinates and obtain primed 3D coordinates, which finally can be transformed into unprimed 3D coordinates. Again, in the ideal case the result should be equal to the above expressions, but in practice they may deviate.

A final remark relates to the fact that if the essential matrix is determined from corresponding image coordinate, which often is the case when 3D points are determined in this way, the translation vector is known only up to an unknown positive scaling. As a consequence, the reconstructed 3D points, too, are undetermined with respect to a positive scaling.

See also

Related Research Articles

<span class="mw-page-title-main">3-sphere</span> Mathematical object

In mathematics, a 3-sphere, glome or hypersphere is a higher-dimensional analogue of a sphere. It may be embedded in 4-dimensional Euclidean space as the set of points equidistant from a fixed central point. Analogous to how the boundary of a ball in three dimensions is an ordinary sphere, the boundary of a ball in four dimensions is a 3-sphere. A 3-sphere is an example of a 3-manifold and an n-sphere.

<span class="mw-page-title-main">Four-vector</span> 4-dimensional vector in relativity

In special relativity, a four-vector is an object with four components, which transform in a specific way under Lorentz transformations. Specifically, a four-vector is an element of a four-dimensional vector space considered as a representation space of the standard representation of the Lorentz group, the representation. It differs from a Euclidean vector in how its magnitude is determined. The transformations that preserve this magnitude are the Lorentz transformations, which include spatial rotations and boosts.

<span class="mw-page-title-main">Minkowski space</span> Spacetime used in theory of relativity

In mathematical physics, Minkowski space combines inertial space and time manifolds (x,y) with a non-inertial reference frame of space and time (x',t') into a four-dimensional model relating a position to the field (physics). A four-vector (x,y,z,t) consisting of coordinate axes such as a Euclidean space plus time may be used with the non-inertial frame to illustrate specifics of motion, but should not be confused with the spacetime model generally. The model helps show how a spacetime interval between any two events is independent of the inertial frame of reference in which they are recorded. Although initially developed by mathematician Hermann Minkowski for Maxwell's equations of electromagnetism, the mathematical structure of Minkowski spacetime was shown to be implied by the postulates of special relativity.

<span class="mw-page-title-main">3D projection</span> Design technique

A 3D projection is a design technique used to display a three-dimensional (3D) object on a two-dimensional (2D) surface. These projections rely on visual perspective and aspect analysis to project a complex object for viewing capability on a simpler plane.

<span class="mw-page-title-main">Discretization</span> Process of transferring continuous functions into discrete counterparts

In applied mathematics, discretization is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numerical evaluation and implementation on digital computers. Dichotomization is the special case of discretization in which the number of discrete classes is 2, which can approximate a continuous variable as a binary variable.

In mathematics, the covariant derivative is a way of specifying a derivative along tangent vectors of a manifold. Alternatively, the covariant derivative is a way of introducing and working with a connection on a manifold by means of a differential operator, to be contrasted with the approach given by a principal connection on the frame bundle – see affine connection. In the special case of a manifold isometrically embedded into a higher-dimensional Euclidean space, the covariant derivative can be viewed as the orthogonal projection of the Euclidean directional derivative onto the manifold's tangent space. In this case the Euclidean derivative is broken into two parts, the extrinsic normal component and the intrinsic covariant derivative component.

<span class="mw-page-title-main">Cross-correlation</span> Covariance and correlation

In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long signal for a shorter, known feature. It has applications in pattern recognition, single particle analysis, electron tomography, averaging, cryptanalysis, and neurophysiology. The cross-correlation is similar in nature to the convolution of two functions. In an autocorrelation, which is the cross-correlation of a signal with itself, there will always be a peak at a lag of zero, and its size will be the signal energy.

<span class="mw-page-title-main">Osculating circle</span> Circle of immediate corresponding curvature of a curve at a point

In differential geometry of curves, the osculating circle of a sufficiently smooth plane curve at a given point p on the curve has been traditionally defined as the circle passing through p and a pair of additional points on the curve infinitesimally close to p. Its center lies on the inner normal line, and its curvature defines the curvature of the given curve at that point. This circle, which is the one among all tangent circles at the given point that approaches the curve most tightly, was named circulus osculans by Leibniz.

In a relativistic theory of physics, a Lorentz scalar is an expression, formed from items of the theory, which evaluates to a scalar, invariant under any Lorentz transformation. A Lorentz scalar may be generated from e.g., the scalar product of vectors, or from contracting tensors of the theory. While the components of vectors and tensors are in general altered under Lorentz transformations, Lorentz scalars remain unchanged.

Linear dynamical systems are dynamical systems whose evaluation functions are linear. While dynamical systems, in general, do not have closed-form solutions, linear dynamical systems can be solved exactly, and they have a rich set of mathematical properties. Linear systems can also be used to understand the qualitative behavior of general dynamical systems, by calculating the equilibrium points of the system and approximating it as a linear system around each such point.

In computer vision the motion field is an ideal representation of 3D motion as it is projected onto a camera image. Given a simplified camera model, each point in the image is the projection of some point in the 3D scene but the position of the projection of a fixed point in space can vary with time. The motion field can formally be defined as the time derivative of the image position of all image points given that they correspond to fixed 3D points. This means that the motion field can be represented as a function which maps image coordinates to a 2-dimensional vector. The motion field is an ideal description of the projected 3D motion in the sense that it can be formally defined but in practice it is normally only possible to determine an approximation of the motion field from the image data.

In computer vision, the essential matrix is a matrix, that relates corresponding points in stereo images assuming that the cameras satisfy the pinhole camera model.

The intent of this article is to highlight the important points of the derivation of the Navier–Stokes equations as well as its application and formulation for different families of fluids.

In computer vision a camera matrix or (camera) projection matrix is a matrix which describes the mapping of a pinhole camera from 3D points in the world to 2D points in an image.

<span class="mw-page-title-main">Pinhole camera model</span>

The pinhole camera model describes the mathematical relationship between the coordinates of a point in three-dimensional space and its projection onto the image plane of an ideal pinhole camera, where the camera aperture is described as a point and no lenses are used to focus light. The model does not include, for example, geometric distortions or blurring of unfocused objects caused by lenses and finite sized apertures. It also does not take into account that most practical cameras have only discrete image coordinates. This means that the pinhole camera model can only be used as a first order approximation of the mapping from a 3D scene to a 2D image. Its validity depends on the quality of the camera and, in general, decreases from the center of the image to the edges as lens distortion effects increase.

The eight-point algorithm is an algorithm used in computer vision to estimate the essential matrix or the fundamental matrix related to a stereo camera pair from a set of corresponding image points. It was introduced by Christopher Longuet-Higgins in 1981 for the case of the essential matrix. In theory, this algorithm can be used also for the fundamental matrix, but in practice the normalized eight-point algorithm, described by Richard Hartley in 1997, is better suited for this case.

The Cauchy momentum equation is a vector partial differential equation put forth by Cauchy that describes the non-relativistic momentum transport in any continuum.

In mathematics, the moduli stack of elliptic curves, denoted as or , is an algebraic stack over classifying elliptic curves. Note that it is a special case of the moduli stack of algebraic curves . In particular its points with values in some field correspond to elliptic curves over the field, and more generally morphisms from a scheme to it correspond to elliptic curves over . The construction of this space spans over a century because of the various generalizations of elliptic curves as the field has developed. All of these generalizations are contained in .

Steered-Response Power Phase Transform (SRP-PHAT) is a popular algorithm for acoustic source localization, well known for its robust performance in adverse acoustic environments. The algorithm can be interpreted as a beamforming-based approach that searches for the candidate position that maximizes the output of a steered delay-and-sum beamformer.

Tau functions are an important ingredient in the modern theory of integrable systems, and have numerous applications in a variety of other domains. They were originally introduced by Ryogo Hirota in his direct method approach to soliton equations, based on expressing them in an equivalent bilinear form. The term Tau function, or -function, was first used systematically by Mikio Sato and his students in the specific context of the Kadomtsev–Petviashvili equation, and related integrable hierarchies. It is a central ingredient in the theory of solitons. Tau functions also appear as matrix model partition functions in the spectral theory of Random Matrices, and may also serve as generating functions, in the sense of combinatorics and enumerative geometry, especially in relation to moduli spaces of Riemann surfaces, and enumeration of branched coverings, or so-called Hurwitz numbers.

References