Perspective-n-Point

Last updated November 28, 2023

Perspective-n-Point^[1] is the problem of estimating the pose of a calibrated camera given a set of $n$ 3D points in the world and their corresponding 2D projections in the image. The camera pose consists of 6 degrees-of-freedom (DOF) which are made up of the rotation (roll, pitch, and yaw) and 3D translation of the camera with respect to the world. This problem originates from camera calibration and has many applications in computer vision and other areas, including 3D pose estimation, robotics and augmented reality.^[2] A commonly used solution to the problem exists for $n = 3$ called P3P, and many solutions are available for the general case of $n \geq 3$ . A solution for $n = 2$ exists if feature orientations are available at the two points.^[3] Implementations of these solutions are also available in open source software.

Problem Specification

Definition

Given a set of $n$ 3D points in a world reference frame and their corresponding 2D image projections as well as the calibrated intrinsic camera parameters, determine the 6 DOF pose of the camera in the form of its rotation and translation with respect to the world. This follows the perspective projection model for cameras:

s\,p_{c}=K\,[\,R\,|\,T\,]\,p_{w}

.

where $\textstyle p_{w}={\begin{bmatrix}x&y&z&1\end{bmatrix}}^{T}$ is the homogeneous world point, $\textstyle p_{c}={\begin{bmatrix}u&v&1\end{bmatrix}}^{T}$ is the corresponding homogeneous image point, $\textstyle K$ is the matrix of intrinsic camera parameters, (where $\textstyle f_{x}$ and $f_{y}$ are the scaled focal lengths, $\textstyle \gamma$ is the skew parameter which is sometimes assumed to be 0, and $\textstyle (u_{0},\,v_{0})$ is the principal point), $\textstyle s$ is a scale factor for the image point, and $\textstyle R$ and $\textstyle T$ are the desired 3D rotation and 3D translation of the camera (extrinsic parameters) that are being calculated. This leads to the following equation for the model:

s{\begin{bmatrix}u\\v\\1\end{bmatrix}}={\begin{bmatrix}f_{x}&\gamma &u_{0}\\0&f_{y}&v_{0}\\0&0&1\end{bmatrix}}{\begin{bmatrix}r_{11}&r_{12}&r_{13}&t_{1}\\r_{21}&r_{22}&r_{23}&t_{2}\\r_{31}&r_{32}&r_{33}&t_{3}\\\end{bmatrix}}{\begin{bmatrix}x\\y\\z\\1\end{bmatrix}}

.

Assumptions and Data Characteristics

There are a few preliminary aspects of the problem that are common to all solutions of PnP. The assumption made in most solutions is that the camera is already calibrated. Thus, its intrinsic properties are already known, such as the focal length, principal image point, skew parameter, and other parameters. Some methods, such as UPnP.^[4] or the Direct Linear Transform (DLT) applied to the projection model, are exceptions to this assumption as they estimate these intrinsic parameters as well as the extrinsic parameters which make up the pose of the camera that the original PnP problem is trying to find.

For each solution to PnP, the chosen point correspondences cannot be colinear. In addition, PnP can have multiple solutions, and choosing a particular solution would require post-processing of the solution set. RANSAC is also commonly used with a PnP method to make the solution robust to outliers in the set of point correspondences. P3P methods assume that the data is noise free, most PnP methods assume Gaussian noise on the inlier set.

Methods

This following section describes two common methods that can be used to solve the PnP problem that are also readily available in open source software and how RANSAC can be used to deal with outliers in the data set.

P3P

When $n = 3$ , the PnP problem is in its minimal form of P3P and can be solved with three point correspondences. However, with just three point correspondences, P3P yields up to four real, geometrically feasible solutions. For low noise levels a fourth correspondence can be used to remove ambiguity. The setup for the problem is as follows.

Let P be the center of projection for the camera, A, B, and C be 3D world points with corresponding images points u, v, and w. Let X = |PA|, Y = |PB|, Z = |PC|, $\alpha =\angle BPC$ , $\beta =\angle APC$ , $\gamma =\angle APB$ , $p=2\cos \alpha$ , $q=2\cos \beta$ , $r=2\cos \gamma$ , $a'=|AB|$ , $b'=|BC|$ , $c'=|AC|$ . This forms triangles PBC, PAC, and PAB from which we obtain a sufficient equation system for P3P:

{\begin{cases}Y^{2}+Z^{2}-YZp-b'^{2}&=0\\Z^{2}+X^{2}-XZq-c'^{2}&=0\\X^{2}+Y^{2}-XYr-a'^{2}&=0\\\end{cases}}

.

Solving the P3P system results in up to four geometrically feasible real solutions for $R$ and $T$ . The oldest published solution dates to 1841.^[5] A recent algorithm for solving the problem as well as a solution classification for it is given in the 2003 IEEE Transactions on Pattern Analysis and Machine Intelligence paper by Gao, et al.^[6] An open source implementation of Gao's P3P solver can be found in OpenCV's calib3d module in the solvePnP function.^[7] Several faster and more accurate versions have been published since, including Lambda Twist P3P^[8] which achieved state of the art performance in 2018 with a 50 fold increase in speed and a 400 fold decrease in numerical failures. Lambdatwist is available as open source in OpenMVG and at https://github.com/midjji/pnp.

EPnP

Efficient PnP (EPnP) is a method developed by Lepetit, et al. in their 2008 International Journal of Computer Vision paper^[9] that solves the general problem of PnP for $n \geq 4$ . This method is based on the notion that each of the n points (which are called reference points) can be expressed as a weighted sum of four virtual control points. Thus, the coordinates of these control points become the unknowns of the problem. It is from these control points that the final pose of the camera is solved for.

As an overview of the process, first note that each of the $n$ reference points in the world frame, $p_{i}^{w}$ , and their corresponding image points, $p_{i}^{c}$ , are weighted sums of the four controls points, $c_{j}^{w}$ and $c_{j}^{c}$ respectively, and the weights are normalized per reference point as shown below. All points are expressed in homogeneous form.

p_{i}^{w}=\sum _{j=1}^{4}{\alpha _{ij}c_{j}^{w}}

p_{i}^{c}=\sum _{j=1}^{4}{\alpha _{ij}c_{j}^{c}}

\sum _{j=1}^{4}{\alpha _{ij}}=1

From this, the derivation of the image reference points becomes

s_{i}\,p_{i}^{img}=K\sum _{j=1}^{4}{\alpha _{ij}c_{j}^{c}}

.

Where $p_{i}^{img}$ is the image reference points with pixel coordinate ${\begin{bmatrix}u_{i}&v_{i}&1\end{bmatrix}}^{T}$ . The homogeneous image control point has the form $\textstyle c_{j}^{c}={\begin{bmatrix}x_{j}^{c}&y_{j}^{c}&z_{j}^{c}\end{bmatrix}}^{T}$ . Rearranging the image reference point equation yields the following two linear equations for each reference point:

\sum _{j=1}^{4}{\alpha _{ij}f_{x}x_{j}^{c}+\alpha _{ij}(u_{0}-u_{i})z_{j}^{c}}=0

\sum _{j=1}^{4}{\alpha _{ij}f_{y}y_{j}^{c}+\alpha _{ij}(v_{0}-v_{i})z_{j}^{c}}=0

.

Using these two equations for each of the $n$ reference points, the system $\textstyle Mx=0$ can be formed where $\textstyle x={\begin{bmatrix}c_{1}^{c^{T}}&c_{2}^{c^{T}}&c_{3}^{c^{T}}&c_{4}^{c^{T}}\end{bmatrix}}^{T}$ . The solution for the control points exists in the null space of $M$ and is expressed as

x=\sum _{i=1}^{N}{\beta _{i}v_{i}}

where $N$ is the number of null singular values in $M$ and each $v_{i}$ is the corresponding right singular vector of $M$ . $N$ can range from 0 to 4. After calculating the initial coefficients $\beta _{i}$ , the Gauss-Newton algorithm is used to refine them. The $R$ and $T$ matrices that minimize the reprojection error of the world reference points, $p_{i}^{w}$ , and their corresponding actual image points $p_{i}^{c}$ , are then calculated.

This solution has $O(n)$ complexity and works in the general case of PnP for both planar and non-planar control points. Open source software implementations of this method can be found in OpenCV's Camera Calibration and 3D Reconstruction module in the solvePnP function^[7] as well as from the code published by Lepetit, et al. at their website, CVLAB at EPFL.^[10]

This method is not robust against outliers and generally compares poorly to RANSAC P3P followed by nonlinear refinement ^{[ citation needed ]}.

SQPnP

SQPnP was described by Terzakis and Lourakis in an ECCV 2020 paper.^[11] It is a non-minimal, non-polynomial solver which casts PnP as a non-linear quadratic program. SQPnP identifies regions in the parameter space of 3D rotations (i.e., the 8-sphere) that contain unique minima with guarantees that at least one of them is the global one. Each regional minimum is computed with sequential quadratic programming that is initiated at nearest orthogonal approximation matrices.

SQPnP has similar or even higher accuracy compared to state of the art polynomial solvers, is globally optimal and computationally very efficient, being practically linear in the number of supplied points $n$ . A C++ implementation is available on GitHub, which has also been ported to OpenCV and included in the Camera Calibration and 3D Reconstruction module (SolvePnP function).^[12]

Using RANSAC

PnP is prone to errors if there are outliers in the set of point correspondences. Thus, RANSAC can be used in conjunction with existing solutions to make the final solution for the camera pose more robust to outliers. An open source implementation of PnP methods with RANSAC can be found in OpenCV's Camera Calibration and 3D Reconstruction module in the solvePnPRansac function.^[12]

Related Research Articles

<span class="mw-page-title-main">Cauchy–Riemann equations</span> Chacteristic property of holomorphic functions

In the field of complex analysis in mathematics, the Cauchy–Riemann equations, named after Augustin Cauchy and Bernhard Riemann, consist of a system of two partial differential equations which form a necessary and sufficient condition for a complex function of a complex variable to be complex differentiable.

In mathematical physics and mathematics, the Pauli matrices are a set of three $2 \times 2$ complex matrices which are Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

In mechanics and geometry, the 3D rotation group, often denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space $under the operation of composition.$

In continuum mechanics, the infinitesimal strain theory is a mathematical approach to the description of the deformation of a solid body in which the displacements of the material particles are assumed to be much smaller than any relevant dimension of the body; so that its geometry and the constitutive properties of the material at each point of space can be assumed to be unchanged by the deformation.

<span class="mw-page-title-main">Exterior algebra</span> Algebra of exterior/ wedge products

In mathematics, the exterior algebra of a vector space $V$ is a graded associative algebra,

Linear elasticity is a mathematical model of how solid objects deform and become internally stressed due to prescribed loading conditions. It is a simplification of the more general nonlinear theory of elasticity and a branch of continuum mechanics.

In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. It is used to solve systems of linear differential equations. In the theory of Lie groups, the matrix exponential gives the exponential map between a matrix Lie algebra and the corresponding Lie group.

In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix

In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a graph or a discrete grid. For the case of a finite-dimensional graph, the discrete Laplace operator is more commonly called the Laplacian matrix.

In mathematics, bicubic interpolation is an extension of cubic spline interpolation for interpolating data points on a two-dimensional regular grid. The interpolated surface is smoother than corresponding surfaces obtained by bilinear interpolation or nearest-neighbor interpolation. Bicubic interpolation can be accomplished using either Lagrange polynomials, cubic splines, or cubic convolution algorithm.

An osculating circle is a circle that best approximates the curvature of a curve at a specific point. It is tangent to the curve at that point and has the same curvature as the curve at that point. The osculating circle provides a way to understand the local behavior of a curve and is commonly used in differential geometry and calculus.

<span class="mw-page-title-main">Geometry processing</span>

Geometry processing, or mesh processing, is an area of research that uses concepts from applied mathematics, computer science and engineering to design efficient algorithms for the acquisition, reconstruction, analysis, manipulation, simulation and transmission of complex 3D models. As the name implies, many of the concepts, data structures, and algorithms are directly analogous to signal processing and image processing. For example, where image smoothing might convolve an intensity signal with a blur kernel formed using the Laplace operator, geometric smoothing might be achieved by convolving a surface geometry with a blur kernel formed using the Laplace-Beltrami operator.

In numerical analysis, finite-difference methods (FDM) are a class of numerical techniques for solving differential equations by approximating derivatives with finite differences. Both the spatial domain and time interval are discretized, or broken into a finite number of steps, and the value of the solution at these discrete points is approximated by solving algebraic equations containing finite differences and values from nearby points.

In applied mathematics, polyharmonic splines are used for function approximation and data interpolation. They are very useful for interpolating and fitting scattered data in many dimensions. Special cases include thin plate splines and natural cubic splines in one dimension.

In numerical analysis and linear algebra, lower–upper (LU) decomposition or factorization factors a matrix as the product of a lower triangular matrix and an upper triangular matrix. The product sometimes includes a permutation matrix as well. LU decomposition can be viewed as the matrix form of Gaussian elimination. Computers usually solve square systems of linear equations using LU decomposition, and it is also a key step when inverting a matrix or computing the determinant of a matrix. The LU decomposition was introduced by the Polish astronomer Tadeusz Banachiewicz in 1938. To quote: "It appears that Gauss and Doolittle applied the method [of elimination] only to symmetric equations. More recent authors, for example, Aitken, Banachiewicz, Dwyer, and Crout … have emphasized the use of the method, or variations of it, in connection with non-symmetric problems … Banachiewicz … saw the point … that the basic problem is really one of matrix factorization, or “decomposition” as he called it." It's also referred to as LR decomposition.

Camera resectioning is the process of estimating the parameters of a pinhole camera model approximating the camera that produced a given photograph or video; it determines which incoming light ray is associated with each pixel on the resulting image. Basically, the process determines the pose of the pinhole camera.

In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient, is a statistic used to measure the ordinal association between two measured quantities. A τ test is a non-parametric hypothesis test for statistical dependence based on the τ coefficient. It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities. It is named after Maurice Kendall, who developed it in 1938, though Gustav Fechner had proposed a similar measure in the context of time series in 1897.

In classical mechanics, holonomic constraints are relations between the position variables that can be expressed in the following form:

In mathematics, some boundary value problems can be solved using the methods of stochastic analysis. Perhaps the most celebrated example is Shizuo Kakutani's 1944 solution of the Dirichlet problem for the Laplace operator using Brownian motion. However, it turns out that for a large class of semi-elliptic second-order partial differential equations the associated Dirichlet boundary value problem can be solved using an Itō process that solves an associated stochastic differential equation.

In numerical linear algebra, the alternating-direction implicit (ADI) method is an iterative method used to solve Sylvester matrix equations. It is a popular method for solving the large matrix equations that arise in systems theory and control, and can be formulated to construct solutions in a memory-efficient, factored form. It is also used to numerically solve parabolic and elliptic partial differential equations, and is a classic method used for modeling heat conduction and solving the diffusion equation in two or more dimensions. It is an example of an operator splitting method.

References

↑ Fischler, M. A.; Bolles, R. C. (1981). "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography". Communications of the ACM. 24 (6): 381–395. doi: 10.1145/358669.358692 . S2CID 972888.
↑ Apple, ARKIT team (2018). "Understanding ARKit Tracking and Detection". WWDC.
↑ Fabbri, Ricardo; Giblin, Peter; Kimia, Benjamin (2012). "Camera Pose Estimation Using First-Order Curve Differential Geometry". Computer Vision – ECCV 2012 (PDF). Lecture Notes in Computer Science. Vol. 7575. pp. 231–244. doi:10.1007/978-3-642-33765-9_17. ISBN 978-3-642-33764-2. S2CID 15402824.
↑ Penate-Sanchez, A.; Andrade-Cetto, J.; Moreno-Noguer, F. (2013). "Exhaustive Linearization for Robust Camera Pose and Focal Length Estimation". IEEE Transactions on Pattern Analysis and Machine Intelligence. 35 (10): 2387–2400. doi:10.1109/TPAMI.2013.36. hdl: 2117/22931 . PMID 23969384. S2CID 9614348.
↑ Quan, Long; Lan, Zhong-Dan (1999). "Linear N-Point Camera Pose Determination" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence.
↑ Gao, Xiao-Shan; Hou, Xiao-Rong; Tang, Jianliang; Cheng, Hang-Fei (2003). "Complete Solution Classification for the Perspective-Three-Point Problem". IEEE Transactions on Pattern Analysis and Machine Intelligence. 25 (8): 930–943. doi:10.1109/tpami.2003.1217599. S2CID 15869446.
1 2 "Camera Calibration and 3D Reconstruction". OpenCV.
↑ Persson, Mikael; Nordberg, Klas (2018). "Lambda Twist: An Accurate Fast Robust Perspective Three Point (P3P) Solver" (PDF). The European Conference on Computer Vision (ECCV).
↑ Lepetit, V.; Moreno-Noguer, M.; Fua, P. (2009). "EPnP: An Accurate O(n) Solution to the PnP Problem". International Journal of Computer Vision. 81 (2): 155–166. doi:10.1007/s11263-008-0152-6. hdl: 2117/10327 . S2CID 207252029.
↑ "EPnP: Efficient Perspective-n-Point Camera Pose Estimation". EPFL-CVLAB.
↑ Terzakis, George; Lourakis, Manolis (2020). "A Consistently Fast and Globally Optimal Solution to the Perspective-n-Point Problem". Computer Vision – ECCV 2020. Lecture Notes in Computer Science. Vol. 12346. pp. 478–494. doi:10.1007/978-3-030-58452-8_28. ISBN 978-3-030-58451-1. S2CID 226239551.
1 2 "Camera Calibration and 3D Reconstruction". OpenCV.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Fischler, M. A.; Bolles, R. C. (1981). "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography". Communications of the ACM. 24 (6): 381–395. doi: 10.1145/358669.358692 . S2CID 972888.

[2] Apple, ARKIT team (2018). "Understanding ARKit Tracking and Detection". WWDC.

[SIFTOrientationPose-3] Fabbri, Ricardo; Giblin, Peter; Kimia, Benjamin (2012). "Camera Pose Estimation Using First-Order Curve Differential Geometry". Computer Vision – ECCV 2012 (PDF). Lecture Notes in Computer Science. Vol. 7575. pp. 231–244. doi:10.1007/978-3-642-33765-9_17. ISBN 978-3-642-33764-2. S2CID 15402824.

[4] Penate-Sanchez, A.; Andrade-Cetto, J.; Moreno-Noguer, F. (2013). "Exhaustive Linearization for Robust Camera Pose and Focal Length Estimation". IEEE Transactions on Pattern Analysis and Machine Intelligence. 35 (10): 2387–2400. doi:10.1109/TPAMI.2013.36. hdl: 2117/22931 . PMID 23969384. S2CID 9614348.

[5] Quan, Long; Lan, Zhong-Dan (1999). "Linear N-Point Camera Pose Determination" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Gao, Xiao-Shan; Hou, Xiao-Rong; Tang, Jianliang; Cheng, Hang-Fei (2003). "Complete Solution Classification for the Perspective-Three-Point Problem". IEEE Transactions on Pattern Analysis and Machine Intelligence. 25 (8): 930–943. doi:10.1109/tpami.2003.1217599. S2CID 15869446.

[cvsolpnp-7] 1 2 "Camera Calibration and 3D Reconstruction". OpenCV.

[8] Persson, Mikael; Nordberg, Klas (2018). "Lambda Twist: An Accurate Fast Robust Perspective Three Point (P3P) Solver" (PDF). The European Conference on Computer Vision (ECCV).

[9] Lepetit, V.; Moreno-Noguer, M.; Fua, P. (2009). "EPnP: An Accurate O(n) Solution to the PnP Problem". International Journal of Computer Vision. 81 (2): 155–166. doi:10.1007/s11263-008-0152-6. hdl: 2117/10327 . S2CID 207252029.

[10] "EPnP: Efficient Perspective-n-Point Camera Pose Estimation". EPFL-CVLAB.

[11] Terzakis, George; Lourakis, Manolis (2020). "A Consistently Fast and Globally Optimal Solution to the Perspective-n-Point Problem". Computer Vision – ECCV 2020. Lecture Notes in Computer Science. Vol. 12346. pp. 478–494. doi:10.1007/978-3-030-58452-8_28. ISBN 978-3-030-58451-1. S2CID 226239551.

[openCV-12] 1 2 "Camera Calibration and 3D Reconstruction". OpenCV.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]