Epipolar geometry

Last updated
Typical use case for epipolar geometry
Two cameras take a picture of the same scene from different points of view. The epipolar geometry then describes the relation between the two resulting views. Aufnahme mit zwei Kameras.svg
Typical use case for epipolar geometry
Two cameras take a picture of the same scene from different points of view. The epipolar geometry then describes the relation between the two resulting views.

Epipolar geometry is the geometry of stereo vision. When two cameras view a 3D scene from two distinct positions, there are a number of geometric relations between the 3D points and their projections onto the 2D images that lead to constraints between the image points. These relations are derived based on the assumption that the cameras can be approximated by the pinhole camera model.

Contents

Definitions

The figure below depicts two pinhole cameras looking at point X. In real cameras, the image plane is actually behind the focal center, and produces an image that is symmetric about the focal center of the lens. Here, however, the problem is simplified by placing a virtual image plane in front of the focal center i.e. optical center of each camera lens to produce an image not transformed by the symmetry. OL and OR represent the centers of symmetry of the two cameras lenses. X represents the point of interest in both cameras. Points xL and xR are the projections of point X onto the image planes.

Epipolar geometry Epipolar geometry.svg
Epipolar geometry

Each camera captures a 2D image of the 3D world. This conversion from 3D to 2D is referred to as a perspective projection and is described by the pinhole camera model. It is common to model this projection operation by rays that emanate from the camera, passing through its focal center. Each emanating ray corresponds to a single point in the image.

Epipole or epipolar point

Since the optical centers of the cameras lenses are distinct, each center projects onto a distinct point into the other camera's image plane. These two image points, denoted by eL and eR, are called epipoles or epipolar points. Both epipoles eL and eR in their respective image planes and both optical centers OL and OR lie on a single 3D line.

Epipolar line

The line OLX is seen by the left camera as a point because it is directly in line with that camera's lens optical center. However, the right camera sees this line as a line in its image plane. That line (eRxR) in the right camera is called an epipolar line. Symmetrically, the line ORX is seen by the right camera as a point and is seen as epipolar line eLxLby the left camera.

An epipolar line is a function of the position of point X in the 3D space, i.e. as X varies, a set of epipolar lines is generated in both images. Since the 3D line OLX passes through the optical center of the lens OL, the corresponding epipolar line in the right image must pass through the epipole eR (and correspondingly for epipolar lines in the left image). All epipolar lines in one image contain the epipolar point of that image. In fact, any line which contains the epipolar point is an epipolar line since it can be derived from some 3D point X.

Epipolar plane

As an alternative visualization, consider the points X, OL & OR that form a plane called the epipolar plane. The epipolar plane intersects each camera's image plane where it forms lines—the epipolar lines. The epipolar plane and all epipolar lines intersect the epipoles regardless of where X is located.

Epipolar constraint and triangulation

If the relative position of the two cameras is known, this leads to two important observations:

Simplified cases

The epipolar geometry is simplified if the two camera image planes coincide. In this case, the epipolar lines also coincide (eLXL = eRXR). Furthermore, the epipolar lines are parallel to the line OLOR between the centers of projection, and can in practice be aligned with the horizontal axes of the two images. This means that for each point in one image, its corresponding point in the other image can be found by looking only along a horizontal line. If the cameras cannot be positioned in this way, the image coordinates from the cameras may be transformed to emulate having a common image plane. This process is called image rectification.

Epipolar geometry of pushbroom sensor

In contrast to the conventional frame camera which uses a two-dimensional CCD, pushbroom camera adopts an array of one-dimensional CCDs to produce long continuous image strip which is called "image carpet". Epipolar geometry of this sensor is quite different from that of pinhole projection cameras. First, the epipolar line of pushbroom sensor is not straight, but hyperbola-like curve. Second, epipolar 'curve' pair does not exist. [1] However, in some special conditions, the epipolar geometry of the satellite images could be considered as a linear model. [2]

See also

Related Research Articles

<span class="mw-page-title-main">Optical aberration</span> Deviation from perfect paraxial optical behavior

In optics, aberration is a property of optical systems, such as lenses, that causes light to be spread out over some region of space rather than focused to a point. Aberrations cause the image formed by a lens to be blurred or distorted, with the nature of the distortion depending on the type of aberration. Aberration can be defined as a departure of the performance of an optical system from the predictions of paraxial optics. In an imaging system, it occurs when light from one point of an object does not converge into a single point after transmission through the system. Aberrations occur because the simple paraxial theory is not a completely accurate model of the effect of an optical system on light, rather than due to flaws in the optical elements.

The focal length of an optical system is a measure of how strongly the system converges or diverges light; it is the inverse of the system's optical power. A positive focal length indicates that a system converges light, while a negative focal length indicates that the system diverges light. A system with a shorter focal length bends the rays more sharply, bringing them to a focus in a shorter distance or diverging them more quickly. For the special case of a thin lens in air, a positive focal length is the distance over which initially collimated (parallel) rays are brought to a focus, or alternatively a negative focal length indicates how far in front of the lens a point source must be located to form a collimated beam. For more general optical systems, the focal length has no intuitive meaning; it is simply the inverse of the system's optical power.

<span class="mw-page-title-main">Camera lens</span> Optical lens or assembly of lenses used with a camera to create images

A camera lens is an optical lens or assembly of lenses used in conjunction with a camera body and mechanism to make images of objects either on photographic film or on other media capable of storing an image chemically or electronically.

<span class="mw-page-title-main">3D projection</span> Design technique

A 3D projection is a design technique used to display a three-dimensional (3D) object on a two-dimensional (2D) surface. These projections rely on visual perspective and aspect analysis to project a complex object for viewing capability on a simpler plane.

<span class="mw-page-title-main">Ray casting</span> Methodological basis for 3D CAD/CAM solid modeling and image rendering

Ray casting is the methodological basis for 3D CAD/CAM solid modeling and image rendering. It is essentially the same as ray tracing for computer graphics where virtual light rays are "cast" or "traced" on their path from the focal point of a camera through each pixel in the camera sensor to determine what is visible along the ray in the 3D scene. The term "Ray Casting" was introduced by Scott Roth while at the General Motors Research Labs from 1978–1980. His paper, "Ray Casting for Modeling Solids", describes modeled solid objects by combining primitive solids, such as blocks and cylinders, using the set operators union (+), intersection (&), and difference (-). The general idea of using these binary operators for solid modeling is largely due to Voelcker and Requicha's geometric modelling group at the University of Rochester. See solid modeling for a broad overview of solid modeling methods. This figure on the right shows a U-Joint modeled from cylinders and blocks in a binary tree using Roth's ray casting system in 1979.

<span class="mw-page-title-main">Vanishing point</span> Artistic concept relating to perspective

A vanishing point is a point on the image plane of a perspective rendering where the two-dimensional perspective projections of mutually parallel lines in three-dimensional space appear to converge. When the set of parallel lines is perpendicular to a picture plane, the construction is known as one-point perspective, and their vanishing point corresponds to the oculus, or "eye point", from which the image should be viewed for correct perspective geometry. Traditional linear drawings use objects with one to three sets of parallels, defining one to three vanishing points.

A light field is a vector function that describes the amount of light flowing in every direction through every point in a space. The space of all possible light rays is given by the five-dimensional plenoptic function, and the magnitude of each ray is given by its radiance. Michael Faraday was the first to propose that light should be interpreted as a field, much like the magnetic fields on which he had been working. The term light field was coined by Andrey Gershun in a classic 1936 paper on the radiometric properties of light in three-dimensional space.

<span class="mw-page-title-main">Confocal microscopy</span> Optical imaging technique

Confocal microscopy, most frequently confocal laser scanning microscopy (CLSM) or laser scanning confocal microscopy (LSCM), is an optical imaging technique for increasing optical resolution and contrast of a micrograph by means of using a spatial pinhole to block out-of-focus light in image formation. Capturing multiple two-dimensional images at different depths in a sample enables the reconstruction of three-dimensional structures within an object. This technique is used extensively in the scientific and industrial communities and typical applications are in life sciences, semiconductor inspection and materials science.

In geometric optics, distortion is a deviation from rectilinear projection; a projection in which straight lines in a scene remain straight in an image. It is a form of optical aberration.

In Gaussian optics, the cardinal points consist of three pairs of points located on the optical axis of a rotationally symmetric, focal, optical system. These are the focal points, the principal points, and the nodal points; there are two of each. For ideal systems, the basic imaging properties such as image size, location, and orientation are completely determined by the locations of the cardinal points; in fact, only four points are necessary: the two focal points and either the principal points or the nodal points. The only ideal system that has been achieved in practice is a plane mirror, however the cardinal points are widely used to approximate the behavior of real optical systems. Cardinal points provide a way to analytically simplify an optical system with many components, allowing the imaging characteristics of the system to be approximately determined with simple calculations.

In computer vision, the fundamental matrix is a 3×3 matrix which relates corresponding points in stereo images. In epipolar geometry, with homogeneous image coordinates, x and x′, of corresponding points in a stereo image pair, Fx describes a line on which the corresponding point x′ on the other image must lie. That means, for all pairs of corresponding points holds

Camera resectioning is the process of estimating the parameters of a pinhole camera model approximating the camera that produced a given photograph or video; it determines which incoming light ray is associated with each pixel on the resulting image. Basically, the process determines the pose of the pinhole camera.

<span class="mw-page-title-main">Image rectification</span>

Image rectification is a transformation process used to project images onto a common image plane. This process has several degrees of freedom and there are many strategies for transforming images to the common plane. Image rectification is used in computer stereo vision to simplify the problem of finding matching points between images, and in geographic information systems to merge images taken from multiple perspectives into a common map coordinate system.

In computer vision a camera matrix or (camera) projection matrix is a matrix which describes the mapping of a pinhole camera from 3D points in the world to 2D points in an image.

<span class="mw-page-title-main">Pinhole camera model</span> Model of 3D points projected onto planar image via a lens-less aperture

The pinhole camera model describes the mathematical relationship between the coordinates of a point in three-dimensional space and its projection onto the image plane of an ideal pinhole camera, where the camera aperture is described as a point and no lenses are used to focus light. The model does not include, for example, geometric distortions or blurring of unfocused objects caused by lenses and finite sized apertures. It also does not take into account that most practical cameras have only discrete image coordinates. This means that the pinhole camera model can only be used as a first order approximation of the mapping from a 3D scene to a 2D image. Its validity depends on the quality of the camera and, in general, decreases from the center of the image to the edges as lens distortion effects increase.

In computer vision, triangulation refers to the process of determining a point in 3D space given its projections onto two, or more, images. In order to solve this problem it is necessary to know the parameters of the camera projection function from 3D to 2D for the cameras involved, in the simplest case represented by the camera matrices. Triangulation is sometimes also referred to as reconstruction or intersection.

Camera auto-calibration is the process of determining internal camera parameters directly from multiple uncalibrated images of unstructured scenes. In contrast to classic camera calibration, auto-calibration does not require any special calibration objects in the scene. In the visual effects industry, camera auto-calibration is often part of the "Match Moving" process where a synthetic camera trajectory and intrinsic projection model are solved to reproject synthetic content into video.

<span class="mw-page-title-main">Collinearity equation</span> Two equations relating 2D sensor plane coordinates to 3D object coordinates

The collinearity equations are a set of two equations, used in photogrammetry and computer stereo vision, to relate coordinates in a sensor plane to object coordinates. The equations originate from the central projection of a point of the object through the optical centre of the camera to the image on the sensor plane.

<span class="mw-page-title-main">3D reconstruction from multiple images</span> Creation of a 3D model from a set of images

3D reconstruction from multiple images is the creation of three-dimensional models from a set of images. It is the reverse process of obtaining 2D images from 3D scenes.

<span class="mw-page-title-main">Homography (computer vision)</span> Relation of two images with software

In the field of computer vision, any two images of the same planar surface in space are related by a homography. This has many practical applications, such as image rectification, image registration, or camera motion—rotation and translation—between two images. Once camera resectioning has been done from an estimated homography matrix, this information may be used for navigation, or to insert models of 3D objects into an image or video, so that they are rendered with the correct perspective and appear to have been part of the original scene.

References

Further reading