Camera resectioning

Last updated November 24, 2024

Camera resectioning is the process of estimating the parameters of a pinhole camera model approximating the camera that produced a given photograph or video; it determines which incoming light ray is associated with each pixel on the resulting image. Basically, the process determines the pose of the pinhole camera.

This process is often called geometric camera calibration or simply camera calibration, although that term may also refer to photometric camera calibration or be restricted for the estimation of the intrinsic parameters only. Exterior orientation and interior orientation refer to the determination of only the extrinsic and intrinsic parameters, respectively.

The classic camera calibration requires special objects in the scene, which is not required in camera auto-calibration . Camera resectioning is often used in the application of stereo vision where the camera projection matrices of two cameras are used to calculate the 3D world coordinates of a point viewed by both cameras.

Formulation

The camera projection matrix is derived from the intrinsic and extrinsic parameters of the camera, and is often represented by the series of transformations; e.g., a matrix of camera intrinsic parameters, a 3 × 3 rotation matrix, and a translation vector. The camera projection matrix can be used to associate points in a camera's image space with locations in 3D world space.

Homogeneous coordinates

In this context, we use $[u\ v\ 1]^{T}$ to represent a 2D point position in pixel coordinates and $[x_{w}\ y_{w}\ z_{w}\ 1]^{T}$ is used to represent a 3D point position in world coordinates. In both cases, they are represented in homogeneous coordinates (i.e. they have an additional last component, which is initially, by convention, a 1), which is the most common notation in robotics and rigid body transforms.

Projection

Referring to the pinhole camera model, a camera matrix $M$ is used to denote a projective mapping from world coordinates to pixel coordinates.

{\begin{bmatrix}wu\\wv\\w\end{bmatrix}}=K\,{\begin{bmatrix}R&T\end{bmatrix}}{\begin{bmatrix}x_{w}\\y_{w}\\z_{w}\\1\end{bmatrix}}=M{\begin{bmatrix}x_{w}\\y_{w}\\z_{w}\\1\end{bmatrix}}

where $M=K\,{\begin{bmatrix}R&T\end{bmatrix}}$ . $u,v$ by convention are the x and y coordinates of the pixel in the camera, $K$ is the intrinsic matrix as described below, and $R\,T$ form the extrinsic matrix as described below. $x_{w},y_{w},z_{w}$ are the coordinates of the source of the light ray which hits the camera sensor in world coordinates, relative to the origin of the world. By dividing the matrix product by $w$ , the theoretical value for the pixel coordinates can be found.

Intrinsic parameters

K={\begin{bmatrix}\alpha _{x}&\gamma &u_{0}\\0&\alpha _{y}&v_{0}\\0&0&1\end{bmatrix}}

The $K$ contains 5 intrinsic parameters of the specific camera model. These parameters encompass focal length, image sensor format, and camera principal point. The parameters $\alpha _{x}=f\cdot m_{x}$ and $\alpha _{y}=f\cdot m_{y}$ represent focal length in terms of pixels, where $m_{x}$ and $m_{y}$ are the inverses of the width and height of a pixel on the projection plane and $f$ is the focal length in terms of distance. ^[1] $\gamma$ represents the skew coefficient between the x and the y axis, and is often 0. $u_{0}$ and $v_{0}$ represent the principal point, which would be ideally in the center of the image.

Nonlinear intrinsic parameters such as lens distortion are also important although they cannot be included in the linear camera model described by the intrinsic parameter matrix. Many modern camera calibration algorithms estimate these intrinsic parameters as well in the form of non-linear optimisation techniques. This is done in the form of optimising the camera and distortion parameters in the form of what is generally known as bundle adjustment.

Extrinsic parameters

${}{\begin{bmatrix}R_{3\times 3}&T_{3\times 1}\\0_{1\times 3}&1\end{bmatrix}}_{4\times 4}$

$R,T$ are the extrinsic parameters which denote the coordinate system transformations from 3D world coordinates to 3D camera coordinates. Equivalently, the extrinsic parameters define the position of the camera center and the camera's heading in world coordinates. $T$ is the position of the origin of the world coordinate system expressed in coordinates of the camera-centered coordinate system. $T$ is often mistakenly considered the position of the camera. The position, $C$ , of the camera expressed in world coordinates is $C=-R^{-1}T=-R^{T}T$ (since $R$ is a rotation matrix). This can be verified by checking that the point $[-R^{-1}T,1]$ is transformed to $[0,0,0,1]^{T}$ , which is what is expected (since the camera's location is, in the camera's coordinates, the origin).

Camera calibration is often used as an early stage in computer vision.

When a camera is used, light from the environment is focused on an image plane and captured. This process reduces the dimensions of the data taken in by the camera from three to two (light from a 3D scene is stored on a 2D image). Each pixel on the image plane therefore corresponds to a shaft of light from the original scene.

Algorithms

There are many different approaches to calculate the intrinsic and extrinsic parameters for a specific camera setup. The most common ones are:

Direct linear transformation (DLT) method
Zhang's method
Tsai's method
Selby's method (for X-ray cameras)

Zhang's method

Zhang's method^[2]^[3] is a camera calibration method that uses traditional calibration techniques (known calibration points) and self-calibration techniques (correspondence between the calibration points when they are in different positions). To perform a full calibration by the Zhang method, at least three different images of the calibration target/gauge are required, either by moving the gauge or the camera itself. If some of the intrinsic parameters are given as data (orthogonality of the image or optical center coordinates), the number of images required can be reduced to two.

In a first step, an approximation of the estimated projection matrix $H$ between the calibration target and the image plane is determined using DLT method.^[4] Subsequently, self-calibration techniques are applied to obtain the image of the absolute conic matrix.^[5] The main contribution of Zhang's method is how to, given $n$ poses of the calibration target, extract a constrained intrinsic matrix $K$ , along with $n$ instances of $R$ and $T$ calibration parameters.

Derivation

Assume we have a homography ${\textbf {H}}$ that maps points $x_{\pi }$ on a "probe plane" $\pi$ to points $x$ on the image.

The circular points $I,J={\begin{bmatrix}1&\pm j&0\end{bmatrix}}^{\mathrm {T} }$ lie on both our probe plane $\pi$ and on the absolute conic $\Omega _{\infty }$ . Lying on $\Omega _{\infty }$ of course means they are also projected onto the image of the absolute conic (IAC) $\omega$ , thus $x_{1}^{T}\omega x_{1}=0$ and $x_{2}^{T}\omega x_{2}=0$ . The circular points project as

{\begin{aligned}x_{1}&={\textbf {H}}I={\begin{bmatrix}h_{1}&h_{2}&h_{3}\end{bmatrix}}{\begin{bmatrix}1\\j\\0\end{bmatrix}}=h_{1}+jh_{2}\\x_{2}&={\textbf {H}}J={\begin{bmatrix}h_{1}&h_{2}&h_{3}\end{bmatrix}}{\begin{bmatrix}1\\-j\\0\end{bmatrix}}=h_{1}-jh_{2}\end{aligned}}

.

We can actually ignore $x_{2}$ while substituting our new expression for $x_{1}$ as follows:

{\begin{aligned}x_{1}^{T}\omega x_{1}&=\left(h_{1}+jh_{2}\right)^{T}\omega \left(h_{1}+jh_{2}\right)\\&=\left(h_{1}^{T}+jh_{2}^{T}\right)\omega \left(h_{1}+jh_{2}\right)\\&=h_{1}^{T}\omega h_{1}+j\left(h_{2}^{T}\omega h_{2}\right)\\&=0\end{aligned}}

Tsai's algorithm

Tsai's algorithm, a significant method in camera calibration, involves several detailed steps for accurately determining a camera's orientation and position in 3D space. The procedure, while technical, can be generally broken down into three main stages:

Initial Calibration

The process begins with the initial calibration stage, where a series of images are captured by the camera. These images, often featuring a known calibration pattern like a checkerboard, are used to estimate intrinsic camera parameters such as focal length and optical center.^[6] In some applications, variants of the chessboard target are used which are robust to partial occlusions. Such targets like the ChArUco^[7] and PuzzleBoard targets^[8] simplify the measurement of distortions in the corners of the camera sensor.

Pose Estimation

Following initial calibration, the algorithm undertakes pose estimation. This involves calculating the camera's position and orientation relative to a known object in the scene. The process typically requires identifying specific points in the calibration pattern and solving for the camera's rotation and translation vectors.

Refinement of Parameters

The final phase is the refinement of parameters. In this stage, the algorithm refines the lens distortion coefficients, addressing radial and tangential distortions. Further optimization of internal and external camera parameters is performed to enhance the calibration accuracy.

This structured approach has positioned Tsai's Algorithm as a pivotal technique in both academic research and practical applications within robotics and industrial metrology.

Selby's method (for X-ray cameras)

Selby's camera calibration method^[9] addresses the auto-calibration of X-ray camera systems. X-ray camera systems, consisting of the X-ray generating tube and a solid state detector can be modelled as pinhole camera systems, comprising 9 intrinsic and extrinsic camera parameters. Intensity based registration based on an arbitrary X-ray image and a reference model (as a tomographic dataset) can then be used to determine the relative camera parameters without the need of a special calibration body or any ground-truth data.

Related Research Articles

Kinematics is a subfield of physics and mathematics, developed in classical mechanics, that describes the motion of points, bodies (objects), and systems of bodies without considering the forces that cause them to move. Kinematics, as a field of study, is often referred to as the "geometry of motion" and is occasionally seen as a branch of both applied and pure mathematics since it can be studied without considering the mass of a body or the forces acting upon it. A kinematics problem begins by describing the geometry of the system and declaring the initial conditions of any known values of position, velocity and/or acceleration of points within the system. Then, using arguments from geometry, the position, velocity and acceleration of any unknown parts of the system can be determined. The study of how forces act on bodies falls within kinetics, not kinematics. For further details, see analytical dynamics.

Orbital elements are the parameters required to uniquely identify a specific orbit. In celestial mechanics these elements are considered in two-body systems using a Kepler orbit. There are many different ways to mathematically describe the same orbit, but certain schemes, each consisting of a set of six parameters, are commonly used in astronomy and orbital mechanics.

In mathematics and classical mechanics, the Poisson bracket is an important binary operation in Hamiltonian mechanics, playing a central role in Hamilton's equations of motion, which govern the time evolution of a Hamiltonian dynamical system. The Poisson bracket also distinguishes a certain class of coordinate transformations, called canonical transformations, which map canonical coordinate systems into canonical coordinate systems. A "canonical coordinate system" consists of canonical position and momentum variables that satisfy canonical Poisson bracket relations. The set of possible canonical transformations is always very rich. For instance, it is often possible to choose the Hamiltonian itself $as one of the new canonical momentum coordinates.$

Screw theory is the algebraic calculation of pairs of vectors, also known as dual vectors – such as angular and linear velocity, or forces and moments – that arise in the kinematics and dynamics of rigid bodies.

In mathematics and physics, a Hamiltonian vector field on a symplectic manifold is a vector field defined for any energy function or Hamiltonian. Named after the physicist and mathematician Sir William Rowan Hamilton, a Hamiltonian vector field is a geometric manifestation of Hamilton's equations in classical mechanics. The integral curves of a Hamiltonian vector field represent solutions to the equations of motion in the Hamiltonian form. The diffeomorphisms of a symplectic manifold arising from the flow of a Hamiltonian vector field are known as canonical transformations in physics and (Hamiltonian) symplectomorphisms in mathematics.

The Rayleigh–Ritz method is a direct numerical method of approximating eigenvalues, originated in the context of solving physical boundary value problems and named after Lord Rayleigh and Walther Ritz.

In mathematics, the Weierstrass–Enneper parameterization of minimal surfaces is a classical piece of differential geometry.

In mathematics and physics, in particular quantum information, the term generalized Pauli matrices refers to families of matrices which generalize the properties of the Pauli matrices. Here, a few classes of such matrices are summarized.

In geometry, various formalisms exist to express a rotation in three dimensions as a mathematical transformation. In physics, this concept is applied to classical mechanics where rotational kinematics is the science of quantitative description of a purely rotational motion. The orientation of an object at a given instant is described with the same tools, as it is defined as an imaginary rotation from a reference placement in space, rather than an actually observed rotation from a previous placement in space.

In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there is another confounding variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. This is precisely the motivation for including other right-side variables in a multiple regression; but while multiple regression gives unbiased results for the effect size, it does not give a numerical value of a measure of the strength of the relationship between the two variables of interest.

Image rectification is a transformation process used to project images onto a common image plane. This process has several degrees of freedom and there are many strategies for transforming images to the common plane. Image rectification is used in computer stereo vision to simplify the problem of finding matching points between images, and in geographic information systems (GIS) to merge images taken from multiple perspectives into a common map coordinate system.

In mechatronics engineering, the Denavit–Hartenberg parameters are the four parameters associated with the DH convention for attaching reference frames to the links of a spatial kinematic chain, or robot manipulator.

Estimation of signal parameters via rotational invariant techniques (ESPRIT), is a technique to determine the parameters of a mixture of sinusoids in background noise. This technique was first proposed for frequency estimation. However, with the introduction of phased-array systems in everyday technology, it is also used for angle of arrival estimations.

3D reconstruction from multiple images is the creation of three-dimensional models from a set of images. It is the reverse process of obtaining 2D images from 3D scenes.

<span class="mw-page-title-main">Matrix completion</span>

Matrix completion is the task of filling in the missing entries of a partially observed matrix, which is equivalent to performing data imputation in statistics. A wide range of datasets are naturally organized in matrix form. One example is the movie-ratings matrix, as appears in the Netflix problem: Given a ratings matrix in which each entry $represents the rating of movie by customer, if customer has watched movie and is otherwise missing, we would like to predict the remaining entries in order to make good recommendations to customers on what to watch next. Another example is the document-term matrix: The frequencies of words used in a collection of documents can be represented as a matrix, where each entry corresponds to the number of times the associated term appears in the indicated document.$

In image processing, a kernel, convolution matrix, or mask is a small matrix used for blurring, sharpening, embossing, edge detection, and more. This is accomplished by doing a convolution between the kernel and an image. Or more simply, when each pixel in the output image is a function of the nearby pixels in the input image, the kernel is that function.

In physics and engineering, Davenport chained rotations are three chained intrinsic rotations about body-fixed specific axes. Euler rotations and Tait–Bryan rotations are particular cases of the Davenport general rotation decomposition. The angles of rotation are called Davenport angles because the general problem of decomposing a rotation in a sequence of three was studied first by Paul B. Davenport.

Chessboards arise frequently in computer vision theory and practice because their highly structured geometry is well-suited for algorithmic detection and processing. The appearance of chessboards in computer vision can be divided into two main areas: camera calibration and feature extraction. This article provides a unified discussion of the role that chessboards play in the canonical methods from these two areas, including references to the seminal literature, examples, and pointers to software implementations.

Perspective-n-Point is the problem of estimating the pose of a calibrated camera given a set of $n$ 3D points in the world and their corresponding 2D projections in the image. The camera pose consists of 6 degrees-of-freedom (DOF) which are made up of the rotation and 3D translation of the camera with respect to the world. This problem originates from camera calibration and has many applications in computer vision and other areas, including 3D pose estimation, robotics and augmented reality. A commonly used solution to the problem exists for $n = 3$ called P3P, and many solutions are available for the general case of $n \geq 3$ . A solution for $n = 2$ exists if feature orientations are available at the two points. Implementations of these solutions are also available in open source software.

Generalized pencil-of-function method (GPOF), also known as matrix pencil method, is a signal processing technique for estimating a signal or extracting information with complex exponentials. Being similar to Prony and original pencil-of-function methods, it is generally preferred to those for its robustness and computational efficiency.

References

↑ Richard Hartley and Andrew Zisserman (2003). Multiple View Geometry in Computer Vision. Cambridge University Press. pp. 155–157. ISBN 0-521-54051-8.
↑ Z. Zhang, "A flexible new technique for camera calibration'" Archived 2015-12-03 at the Wayback Machine , IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.22, No.11, pages 1330–1334, 2000
↑ P. Sturm and S. Maybank, "On plane-based camera calibration: a general algorithm, singularities, applications'" Archived 2016-03-04 at the Wayback Machine , In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 432–437, Fort Collins, CO, USA, June 1999
↑ Abdel-Aziz, Y.I., Karara, H.M. "Direct linear transformation from comparator coordinates into object space coordinates in close-range photogrammetry Archived 2019-08-02 at the Wayback Machine ", Proceedings of the Symposium on Close-Range Photogrammetry (pp. 1-18), Falls Church, VA: American Society of Photogrammetry, (1971)
↑ Luong, Q.-T.; Faugeras, O.D. (1997-03-01). "Self-Calibration of a Moving Camera from Point Correspondences and Fundamental Matrices". International Journal of Computer Vision. 22 (3): 261–289. doi:10.1023/A:1007982716991. ISSN 1573-1405.
↑ Roger Y. Tsai, "A Versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision Metrology Using Off-the-Shelf TV Cameras and Lenses," IEEE Journal of Robotics and Automation, Vol. RA-3, No.4, August 1987
↑ OpenCV. https://docs.opencv.org/3.4/df/d4a/tutorial_charuco_detection.html.
↑ P. Stelldinger, et al. "PuzzleBoard: A New Camera Calibration Pattern with Position Encoding." German Conference on Pattern Recognition. (2024). https://users.informatik.haw-hamburg.de/~stelldinger/pub/PuzzleBoard/. (2024).
↑ Boris Peter Selby et al., "Patient positioning with X-ray detector self-calibration for image guided therapy" Archived 2023-11-10 at the Wayback Machine , Australasian Physical & Engineering Science in Medicine, Vol.34, No.3, pages 391–400, 2011

External links

Zhang's Camera Calibration Method with Software
Camera Calibration - Augmented reality lecture at TU Muenchen, Germany

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Richard Hartley and Andrew Zisserman (2003). Multiple View Geometry in Computer Vision. Cambridge University Press. pp. 155–157. ISBN 0-521-54051-8.

[2] Z. Zhang, "A flexible new technique for camera calibration'" Archived 2015-12-03 at the Wayback Machine , IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.22, No.11, pages 1330–1334, 2000

[3] P. Sturm and S. Maybank, "On plane-based camera calibration: a general algorithm, singularities, applications'" Archived 2016-03-04 at the Wayback Machine , In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 432–437, Fort Collins, CO, USA, June 1999

[4] Abdel-Aziz, Y.I., Karara, H.M. "Direct linear transformation from comparator coordinates into object space coordinates in close-range photogrammetry Archived 2019-08-02 at the Wayback Machine ", Proceedings of the Symposium on Close-Range Photogrammetry (pp. 1-18), Falls Church, VA: American Society of Photogrammetry, (1971)

[5] Luong, Q.-T.; Faugeras, O.D. (1997-03-01). "Self-Calibration of a Moving Camera from Point Correspondences and Fundamental Matrices". International Journal of Computer Vision. 22 (3): 261–289. doi:10.1023/A:1007982716991. ISSN 1573-1405.

[Tsai1987-6] Roger Y. Tsai, "A Versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision Metrology Using Off-the-Shelf TV Cameras and Lenses," IEEE Journal of Robotics and Automation, Vol. RA-3, No.4, August 1987

[opencv-7] OpenCV. https://docs.opencv.org/3.4/df/d4a/tutorial_charuco_detection.html.

[stelldinger2024-8] P. Stelldinger, et al. "PuzzleBoard: A New Camera Calibration Pattern with Position Encoding." German Conference on Pattern Recognition. (2024). https://users.informatik.haw-hamburg.de/~stelldinger/pub/PuzzleBoard/. (2024).

[9] Boris Peter Selby et al., "Patient positioning with X-ray detector self-calibration for image guided therapy" Archived 2023-11-10 at the Wayback Machine , Australasian Physical & Engineering Science in Medicine, Vol.34, No.3, pages 391–400, 2011

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]