Pinhole camera model

Last updated
A diagram of a pinhole camera. Pinhole-camera.svg
A diagram of a pinhole camera.

The pinhole camera model describes the mathematical relationship between the coordinates of a point in three-dimensional space and its projection onto the image plane of an ideal pinhole camera, where the camera aperture is described as a point and no lenses are used to focus light. The model does not include, for example, geometric distortions or blurring of unfocused objects caused by lenses and finite sized apertures [1] . It also does not take into account that most practical cameras have only discrete image coordinates. This means that the pinhole camera model can only be used as a first order approximation of the mapping from a 3D scene to a 2D image. Its validity depends on the quality of the camera and, in general, decreases from the center of the image to the edges as lens distortion effects increase.

Contents

Some of the effects that the pinhole camera model does not take into account can be compensated, for example by applying suitable coordinate transformations on the image coordinates; other effects are sufficiently small to be neglected if a high quality camera is used. This means that the pinhole camera model often can be used as a reasonable description of how a camera depicts a 3D scene, for example in computer vision and computer graphics.

Geometry

The geometry of a pinhole camera. Note: the x1x2x3 coordinate system in the figure is left-handed, that is the direction of the OZ axis is in reverse to the system the reader may be used to. Pinhole.svg
The geometry of a pinhole camera. Note: the x1x2x3 coordinate system in the figure is left-handed, that is the direction of the OZ axis is in reverse to the system the reader may be used to.

The geometry related to the mapping of a pinhole camera is illustrated in the figure. The figure contains the following basic objects:

The pinhole aperture of the camera, through which all projection lines must pass, is assumed to be infinitely small, a point. In the literature this point in 3D space is referred to as the optical (or lens or camera) center. [3]

Formulation

Next we want to understand how the coordinates of point Q depend on the coordinates of point P. This can be done with the help of the following figure which shows the same scene as the previous figure but now from above, looking down in the negative direction of the X2 axis.

The geometry of a pinhole camera as seen from the X2 axis Pinhole2.svg
The geometry of a pinhole camera as seen from the X2 axis

In this figure we see two similar triangles, both having parts of the projection line (green) as their hypotenuses. The catheti of the left triangle are and f and the catheti of the right triangle are and . Since the two triangles are similar it follows that

or

A similar investigation, looking in the negative direction of the X1 axis gives

or

This can be summarized as

which is an expression that describes the relation between the 3D coordinates of point P and its image coordinates given by point Q in the image plane.

Rotated image and the virtual image plane

The mapping from 3D to 2D coordinates described by a pinhole camera is a perspective projection followed by a 180° rotation in the image plane. This corresponds to how a real pinhole camera operates; the resulting image is rotated 180° and the relative size of projected objects depends on their distance to the focal point and the overall size of the image depends on the distance f between the image plane and the focal point. In order to produce an unrotated image, which is what we expect from a camera, there are two possibilities:

In both cases, the resulting mapping from 3D coordinates to 2D image coordinates is given by the expression above, but without the negation, thus

In homogeneous coordinates

The mapping from 3D coordinates of points in space to 2D image coordinates can also be represented in homogeneous coordinates. Let be a representation of a 3D point in homogeneous coordinates (a 4-dimensional vector), and let be a representation of the image of this point in the pinhole camera (a 3-dimensional vector). Then the following relation holds

where is the camera matrix and the means equality between elements of projective spaces. This implies that the left and right hand sides are equal up to a non-zero scalar multiplication. A consequence of this relation is that also can be seen as an element of a projective space; two camera matrices are equivalent if they are equal up to a scalar multiplication. This description of the pinhole camera mapping, as a linear transformation instead of as a fraction of two linear expressions, makes it possible to simplify many derivations of relations between 3D and 2D coordinates.[ citation needed ]

See also

Related Research Articles

<span class="mw-page-title-main">Cartesian coordinate system</span> Most common coordinate system (geometry)

In geometry, a Cartesian coordinate system in a plane is a coordinate system that specifies each point uniquely by a pair of real numbers called coordinates, which are the signed distances to the point from two fixed perpendicular oriented lines, called coordinate lines, coordinate axes or just axes of the system. The point where they meet is called the origin and has (0, 0) as coordinates.

<span class="mw-page-title-main">Ellipse</span> Plane curve: conic section

In mathematics, an ellipse is a plane curve surrounding two focal points, such that for all points on the curve, the sum of the two distances to the focal points is a constant. It generalizes a circle, which is the special type of ellipse in which the two focal points are the same. The elongation of an ellipse is measured by its eccentricity , a number ranging from to .

<span class="mw-page-title-main">Azimuth</span> Horizontal angle from north or other reference cardinal direction

An azimuth is the angular measurement in a spherical coordinate system which represents the horizontal angle from a cardinal direction, most commonly north.

<span class="mw-page-title-main">3D projection</span> Design technique

A 3D projection is a design technique used to display a three-dimensional (3D) object on a two-dimensional (2D) surface. These projections rely on visual perspective and aspect analysis to project a complex object for viewing capability on a simpler plane.

<span class="mw-page-title-main">Bilinear interpolation</span> Method of interpolating functions on a 2D grid

In mathematics, bilinear interpolation is a method for interpolating functions of two variables using repeated linear interpolation. It is usually applied to functions sampled on a 2D rectilinear grid, though it can be generalized to functions defined on the vertices of arbitrary convex quadrilaterals.

<span class="mw-page-title-main">Barycentric coordinate system</span> Coordinate system that is defined by points instead of vectors

In geometry, a barycentric coordinate system is a coordinate system in which the location of a point is specified by reference to a simplex. The barycentric coordinates of a point can be interpreted as masses placed at the vertices of the simplex, such that the point is the center of mass of these masses. These masses can be zero or negative; they are all positive if and only if the point is inside the simplex.

<span class="mw-page-title-main">Radius</span> Segment in a circle or sphere from its center to its perimeter or surface and its length

In classical geometry, a radius of a circle or sphere is any of the line segments from its center to its perimeter, and in more modern usage, it is also their length. The name comes from the Latin radius, meaning ray but also the spoke of a chariot wheel. The typical abbreviation and mathematical variable name for radius is R or r. By extension, the diameter D is defined as twice the radius:

In geometry, the Hessian curve is a plane curve similar to folium of Descartes. It is named after the German mathematician Otto Hesse. This curve was suggested for application in elliptic curve cryptography, because arithmetic in this curve representation is faster and needs less memory than arithmetic in standard Weierstrass form.

<span class="mw-page-title-main">Hyperboloid model</span> Model of n-dimensional hyperbolic geometry

In geometry, the hyperboloid model, also known as the Minkowski model after Hermann Minkowski, is a model of n-dimensional hyperbolic geometry in which points are represented by points on the forward sheet S+ of a two-sheeted hyperboloid in (n+1)-dimensional Minkowski space or by the displacement vectors from the origin to those points, and m-planes are represented by the intersections of (m+1)-planes passing through the origin in Minkowski space with S+ or by wedge products of m vectors. Hyperbolic space is embedded isometrically in Minkowski space; that is, the hyperbolic distance function is inherited from Minkowski space, analogous to the way spherical distance is inherited from Euclidean distance when the n-sphere is embedded in (n+1)-dimensional Euclidean space.

In computer vision, the motion field is an ideal representation of motion in three-dimensional space (3D) as it is projected onto a camera image. Given a simplified camera model, each point in the image is the projection of some point in the 3D scene but the position of the projection of a fixed point in space can vary with time. The motion field can formally be defined as the time derivative of the image position of all image points given that they correspond to fixed 3D points. This means that the motion field can be represented as a function which maps image coordinates to a 2-dimensional vector. The motion field is an ideal description of the projected 3D motion in the sense that it can be formally defined but in practice it is normally only possible to determine an approximation of the motion field from the image data.

In computer vision, the essential matrix is a matrix, that relates corresponding points in stereo images assuming that the cameras satisfy the pinhole camera model.

<span class="mw-page-title-main">Line–line intersection</span> Common point(s) shared by two lines in Euclidean geometry

In Euclidean geometry, the intersection of a line and a line can be the empty set, a point, or another line. Distinguishing these cases and finding the intersection have uses, for example, in computer graphics, motion planning, and collision detection.

In computer vision a camera matrix or (camera) projection matrix is a matrix which describes the mapping of a pinhole camera from 3D points in the world to 2D points in an image.

The eight-point algorithm is an algorithm used in computer vision to estimate the essential matrix or the fundamental matrix related to a stereo camera pair from a set of corresponding image points. It was introduced by Christopher Longuet-Higgins in 1981 for the case of the essential matrix. In theory, this algorithm can be used also for the fundamental matrix, but in practice the normalized eight-point algorithm, described by Richard Hartley in 1997, is better suited for this case.

In computer vision, triangulation refers to the process of determining a point in 3D space given its projections onto two, or more, images. In order to solve this problem it is necessary to know the parameters of the camera projection function from 3D to 2D for the cameras involved, in the simplest case represented by the camera matrices. Triangulation is sometimes also referred to as reconstruction or intersection.

Computer stereo vision is the extraction of 3D information from digital images, such as those obtained by a CCD camera. By comparing information about a scene from two vantage points, 3D information can be extracted by examining the relative positions of objects in the two panels. This is similar to the biological process of stereopsis.

In mathematics, the Jacobi curve is a representation of an elliptic curve different from the usual one defined by the Weierstrass equation. Sometimes it is used in cryptography instead of the Weierstrass form because it can provide a defence against simple and differential power analysis style (SPA) attacks; it is possible, indeed, to use the general addition formula also for doubling a point on an elliptic curve of this form: in this way the two operations become indistinguishable from some side-channel information. The Jacobi curve also offers faster arithmetic compared to the Weierstrass curve.

In mathematics, the Montgomery curve is a form of elliptic curve introduced by Peter L. Montgomery in 1987, different from the usual Weierstrass form. It is used for certain computations, and in particular in different cryptography applications.

<span class="mw-page-title-main">Twisted Edwards curve</span>

In algebraic geometry, the twisted Edwards curves are plane models of elliptic curves, a generalisation of Edwards curves introduced by Bernstein, Birkner, Joye, Lange and Peters in 2008. The curve set is named after mathematician Harold M. Edwards. Elliptic curves are important in public key cryptography and twisted Edwards curves are at the heart of an electronic signature scheme called EdDSA that offers high performance while avoiding security problems that have surfaced in other digital signature schemes.

<span class="mw-page-title-main">Intersection (geometry)</span> Shape formed from points common to other shapes

In geometry, an intersection is a point, line, or curve common to two or more objects. The simplest case in Euclidean geometry is the line–line intersection between two distinct lines, which either is one point or does not exist. Other types of geometric intersection include:

References

  1. Szeliski, Richard (2022). Computer Vision: Algorithms and Applications (2 ed.). Springer Nature. p. 74. ISBN   3030343723 . Retrieved 30 December 2023.
  2. Carlo Tomasi (2016-08-09). "A Simple Camera Model" (PDF). cs.duke.edu. Retrieved 2021-02-18.
  3. Andrea Fusiello (2005-12-27). "Elements of Geometric Computer Vision". Homepages.inf.ed.ac.uk. Retrieved 2013-12-18.

Bibliography