Quaternion estimator algorithm

Last updated August 12, 2023

The quaternion estimator algorithm (QUEST) is an algorithm designed to solve Wahba's problem, that consists of finding a rotation matrix between two coordinate systems from two sets of observations sampled in each system respectively. The key idea behind the algorithm is to find an expression of the loss function for the Wahba's problem as a quadratic form, using the Cayley–Hamilton theorem and the Newton–Raphson method to efficiently solve the eigenvalue problem and construct a numerically stable representation of the solution.

The algorithm was introduced by Malcolm D. Shuster in 1981, while working at Computer Sciences Corporation.^[1] While being in principle less robust than other methods such as Davenport's q method or singular value decomposition, the algorithm is significantly faster and reliable in practical applications,^[2]^[3] and it is used for attitude determination problem in fields such as robotics and avionics.^[4]^[5]^[6]

Formulation of the problem

Wahba's problem consists of finding a rotation matrix $\mathbf {A} ^{*}$ that minimises the loss function

l\left(\mathbf {A} \right)={\frac {1}{2}}\sum _{i=1}^{n}a_{i}\left\|\mathbf {w} _{i}-\mathbf {A} \mathbf {v} _{i}\right\|^{2}

where $\mathbf {w} _{i}$ are the vector observations in the reference frame, $\mathbf {v} _{i}$ are the vector observations in the body frame, $\mathbf {A}$ is a rotation matrix between the two frames, and $a_{i}$ are a set of weights such that $\textstyle \sum _{i}a_{i}=1$ . It is possible to rewrite this as a maximisation problem of a gain function $g$

g\left(\mathbf {A} \right)=1-l\left(\mathbf {A} \right)=\sum _{i}a_{i}\mathbf {w} _{i}^{\top }\mathbf {A} \mathbf {v} _{i}

defined in such a way that the loss $l$ attains a minimum when $g$ is maximised. The gain $g$ can in turn be rewritten as

g\left(\mathbf {A} \right)=\operatorname {tr} \left(\mathbf {A} \mathbf {B} ^{\top }\right)

where $\mathbf {B} =\textstyle \sum _{i}a_{i}\mathbf {w} _{i}\mathbf {v} _{i}^{\top }$ is known as the attitude profile matrix.

In order to reduce the number of variables, the problem can be reformulated by parametrising the rotation as a unit quaternion $\mathbf {q} =\left(v_{1},v_{2},v_{3},q\right)$ with vector part $\mathbf {v} =\left(v_{1},v_{2},v_{3}\right)$ and scalar part $q$ , representing the rotation of angle $\theta =2\cos ^{-1}q$ around an axis whose direction is described by the vector $\textstyle {\frac {1}{\sin {\frac {\theta }{2}}}}\mathbf {v}$ , subject to the unity constraint $\mathbf {q} ^{\top }\mathbf {q} =1$ . It is now possible to express $\mathbf {A}$ in terms of the quaternion parametrisation as

\mathbf {A} =\left(q^{2}-\mathbf {v} \cdot \mathbf {v} \right)\mathbf {I} +2\mathbf {v} \mathbf {v} ^{\top }+2q\mathbf {V} _{\times }

where $\mathbf {V} _{\times }$ is the skew-symmetric matrix

\mathbf {V} _{\times }={\begin{pmatrix}0&v_{3}&-v_{2}\\-v_{3}&0&v_{1}\\v_{2}&-v_{1}&0\\\end{pmatrix}}

.

Substituting $\mathbf {A}$ with the quaternion representation and simplifying the resulting expression, the gain function can be written as a quadratic form in $\mathbf {q}$

g(\mathbf {q} )=\mathbf {q} ^{\top }\mathbf {K} \mathbf {q}

where the $4\times 4$ matrix

\mathbf {K} ={\begin{pmatrix}\mathbf {S} -\sigma \mathbf {I} &\mathbf {z} \\\mathbf {z} ^{\top }&\sigma \end{pmatrix}}

is defined from the quantities

{\begin{aligned}\mathbf {S} &=\mathbf {B} +\mathbf {B} ^{\top }\\\mathbf {z} &=\sum _{i}a_{i}\left(\mathbf {w} _{i}\times \mathbf {v} _{i}\right)\\\sigma &=\operatorname {tr} \mathbf {B} .\end{aligned}}

This quadratic form can be optimised under the unity constraint by adding a Lagrange multiplier $-\lambda \mathbf {q} ^{\top }\mathbf {q}$ , obtaining an unconstrained gain function

{\hat {g}}\left(\mathbf {q} \right)=\mathbf {q} ^{\top }\mathbf {K} \mathbf {q} -\lambda \mathbf {q} ^{\top }\mathbf {q}

that attains a maximum when

\mathbf {K} \mathbf {q} =\lambda \mathbf {q}

.

This implies that the optimal rotation is parametrised by the quaternion $\mathbf {q} ^{*}$ that is the eigenvector associated to the largest eigenvalue $\lambda _{\text{max}}$ of $\mathbf {K}$ .^[1]^[2]

Solution of the characteristic equation

The optimal quaternion can be determined by solving the characteristic equation of $\mathbf {K}$ and constructing the eigenvector for the largest eigenvalue. From the definition of $\mathbf {K}$ , it is possible to rewrite

\mathbf {K} \mathbf {q} =\lambda \mathbf {q}

as a system of two equations

{\begin{aligned}\mathbf {y} &=\left((\lambda +\sigma )\mathbf {I} -\mathbf {S} \right)^{-1}\mathbf {z} \\\lambda &=\sigma +\mathbf {z} \mathbf {y} \end{aligned}}

where $\mathbf {y} =\textstyle {\frac {1}{q}}\mathbf {v}$ is the Rodrigues vector. Substituting $\mathbf {y}$ in the second equation with the first, it is possible to derive an expression of the characteristic equation

\lambda =\sigma +\mathbf {z} ^{\top }\left((\lambda +\sigma )\mathbf {I} -\mathbf {S} \right)^{-1}\mathbf {z}

.

Since $\lambda _{\text{max}}=\max g\left(\mathbf {A} \right)$ , it follows that $\lambda _{\text{max}}=1-\min l\left(\mathbf {A} \right)$ and therefore $\lambda _{\text{max}}\approx 1$ for an optimal solution (when the loss $l$ is small). This permits to construct the optimal quaternion $\mathbf {q} ^{*}$ by replacing $\lambda _{\text{max}}$ in the Rodrigues vector $\mathbf {y}$

\mathbf {q} ^{*}={\frac {1}{\sqrt {1+\left|\mathbf {y} _{\lambda _{\text{max}}}\right|^{2}}}}(\mathbf {y} ,1)^{\top }

.

The $\mathbf {y}$ vector is however singular for $\theta =\pi$ . An alternative expression of the solution that does not involve the Rodrigues vector can be constructed using the Cayley–Hamilton theorem. The characteristic equation of a $3\times 3$ matrix $\mathbf {S}$ is

\det \left[\mathbf {S} -\xi \mathbf {I} \right]=-\xi ^{3}+2\sigma \xi ^{2}-k\xi +\Delta =0

where

{\begin{aligned}\sigma &={\frac {1}{2}}\operatorname {tr} {\mathbf {S} }\\k&=\operatorname {tr} \left(\operatorname {adj} \mathbf {S} \right)\\\Delta &=\det \mathbf {S} \end{aligned}}

The Cayley–Hamilton theorem states that any square matrix over a commutative ring satisfies its own characteristic equation, therefore

-\mathbf {S} ^{3}+2\sigma \mathbf {S} ^{2}-k\mathbf {S} +\Delta =0

allowing to write

\left((\omega +\sigma )\mathbf {I} -\mathbf {S} \right)^{-1}={\frac {\alpha \mathbf {I} +\beta \mathbf {S} +\mathbf {S} ^{2}}{\gamma }}

where

{\begin{aligned}\alpha &=\omega ^{2}-\sigma ^{2}+k\\\beta &=\omega -\sigma \\\gamma &=(\omega +\sigma )\alpha -\Delta \end{aligned}}

and for $\omega =\lambda _{\text{max}}$ this provides a new construction of the optimal vector

{\begin{aligned}\mathbf {y} ^{*}&=\left((\lambda +\sigma )\mathbf {I} -\mathbf {S} \right)^{-1}\mathbf {z} \\&={\frac {\alpha \mathbf {I} +\beta \mathbf {S} +\mathbf {S} ^{2}}{\gamma }}\mathbf {z} \end{aligned}}

that gives the conjugate quaternion representation of the optimal rotation as

\mathbf {q} ^{*}={\frac {1}{\sqrt {\gamma ^{2}+\left|\mathbf {x} \right|^{2}}}}(\mathbf {x} ,\gamma )^{\top }

where

\mathbf {x} =\left(\alpha \mathbf {I} +\beta \mathbf {S} +\mathbf {S} ^{2}\right)\mathbf {z}

.

The value of $\lambda _{\text{max}}$ can be determined as a numerical solution of the characteristic equation. Replacing $\left((\omega +\sigma )\mathbf {I} -\mathbf {S} \right)^{-1}$ inside the previously obtained characteristic equation

\lambda =\sigma +\mathbf {z} ^{\top }\left((\lambda +\sigma )\mathbf {I} -\mathbf {S} \right)^{-1}\mathbf {z}

.

gives

\lambda ^{4}-(a+b)\lambda ^{2}-c\lambda +(ab+c\sigma -d)=0

where

{\begin{aligned}a&=\sigma ^{2}-k\\b&=\sigma ^{2}+\mathbf {z} ^{\top }\mathbf {z} \\c&=\Delta +\mathbf {z} ^{\top }\mathbf {S} \mathbf {z} \\d&=\mathbf {z} ^{\top }\mathbf {S} ^{2}\mathbf {z} \end{aligned}}

whose root can be efficiently approximated with the Newton–Raphson method, taking 1 as initial guess of the solution in order to converge to the highest eigenvalue (using the fact, shown above, that $\lambda _{\text{max}}\approx 1$ when the quaternion is close to the optimal solution).^[1]^[2]

Related Research Articles

In physics, the Lorentz transformations are a six-parameter family of linear transformations from a coordinate frame in spacetime to another frame that moves at a constant velocity relative to the former. The respective inverse transformation is then parameterized by the negative of this velocity. The transformations are named after the Dutch physicist Hendrik Lorentz.

In mathematical physics and mathematics, the Pauli matrices are a set of three $2 \times 2$ complex matrices which are Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

In mathematics, the quaternion number system extends the complex numbers. Quaternions were first described by the Irish mathematician William Rowan Hamilton in 1843 and applied to mechanics in three-dimensional space. Hamilton defined a quaternion as the quotient of two directed lines in a three-dimensional space, or, equivalently, as the quotient of two vectors. Multiplication of quaternions is noncommutative.

<span class="mw-page-title-main">Singular value decomposition</span> Matrix decomposition

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any $matrix. It is related to the polar decomposition.$

<span class="mw-page-title-main">Hooke's law</span> Physical law: force needed to deform a spring scales linearly with distance

In physics, Hooke's law is an empirical law which states that the force needed to extend or compress a spring by some distance scales linearly with respect to that distance—that is, $F s = kx$ , where $k$ is a constant factor characteristic of the spring, and $x$ is small compared to the total possible deformation of the spring. The law is named after 17th-century British physicist Robert Hooke. He first stated the law in 1676 as a Latin anagram. He published the solution of his anagram in 1678 as: ut tensio, sic vis. Hooke states in the 1678 work that he was aware of the law since 1660.

In statistics, the Wishart distribution is a generalization to multiple dimensions of the gamma distribution. It is named in honor of John Wishart, who first formulated the distribution in 1928. Other names include Wishart ensemble, or Wishart–Laguerre ensemble, or LOE, LUE, LSE.

In physics, the reciprocal lattice represents the Fourier transform of another lattice. The direct lattice or real lattice is a periodic function in physical space, such as a crystal system. The reciprocal lattice exists in the mathematical space of spatial frequencies, known as reciprocal space or k space, where $refers to the wavevector.$

In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. It is used to solve systems of linear differential equations. In the theory of Lie groups, the matrix exponential gives the exponential map between a matrix Lie algebra and the corresponding Lie group.

In mathematics, in particular functional analysis, the singular values, or s-numbers of a compact operator $acting between Hilbert spaces and, are the square roots of the eigenvalues of the self-adjoint operator .$

In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix

The Maxwell stress tensor is a symmetric second-order tensor used in classical electromagnetism to represent the interaction between electromagnetic forces and mechanical momentum. In simple situations, such as a point charge moving freely in a homogeneous magnetic field, it is easy to calculate the forces on the charge from the Lorentz force law. When the situation becomes more complicated, this ordinary procedure can become impractically difficult, with equations spanning multiple lines. It is therefore convenient to collect many of these terms in the Maxwell stress tensor, and to use tensor arithmetic to find the answer to the problem at hand.

In continuum mechanics, the Cauchy stress tensor $, true stress tensor, or simply called the stress tensor is a second order tensor named after Augustin-Louis Cauchy. The tensor consists of nine components that completely define the state of stress at a point inside a material in the deformed state, placement, or configuration. The tensor relates a unit-length direction vector e to the traction vector T (e) across an imaginary surface perpendicular to e :$

Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of other variables, with the goal of obtaining the posterior probability of the regression coefficients and ultimately allowing the out-of-sample prediction of the regressandconditional on observed values of the regressors. The simplest and most widely used version of this model is the normal linear model, in which $given is distributed Gaussian. In this model, and under a particular choice of prior probabilities for the parameters—so-called conjugate priors—the posterior can be found analytically. With more arbitrarily chosen priors, the posteriors generally have to be approximated.$

In statistics, Bayesian multivariate linear regression is a Bayesian approach to multivariate linear regression, i.e. linear regression where the predicted outcome is a vector of correlated random variables rather than a single scalar random variable. A more general treatment of this approach can be found in the article MMSE estimator.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In the fields of computer vision and image analysis, the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or interest points so to make correspondences between images, recognize textures, categorize objects or build panoramas.

<span class="mw-page-title-main">Normal-inverse-gamma distribution</span>

In probability theory and statistics, the normal-inverse-gamma distribution is a four-parameter family of multivariate continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and variance.

This article derives the main properties of rotations in 3-dimensional space.

For certain applications in linear algebra, it is useful to know properties of the probability distribution of the largest eigenvalue of a finite sum of random matrices. Suppose $is a finite sequence of random matrices. Analogous to the well-known Chernoff bound for sums of scalars, a bound on the following is sought for a given parameter t :$

References

1 2 3 Shuster and Oh (1981)
1 2 3 Markley and Mortari (2000)
↑ Crassidis (2007)
↑ Psiaki (2000)
↑ Wu et al. (2017)
↑ Xiaoping et al. (2008)

Sources

Crassidis, John L; Markley, F Landis; Cheng, Yang (2007). "Survey of nonlinear attitude estimation methods". Journal of Guidance, Control, and Dynamics. 30 (1): 12–28. Bibcode:2007JGCD...30...12C. doi:10.2514/1.22452.
Markley, F Landis; Mortari, Daniele (2000). "Quaternion attitude estimation using vector observations". The Journal of the Astronautical Sciences. Springer. 48 (2): 359–380. Bibcode:2000JAnSc..48..359M. doi:10.1007/BF03546284.
Psiaki, Mark L (2000). "Attitude-determination filtering via extended quaternion estimation". Journal of Guidance, Control, and Dynamics. 23 (2): 206–214. Bibcode:2000JGCD...23..206P. doi:10.2514/2.4540.
Shuster, M.D.; Oh, S.D. (1981). "Three-axis attitude determination from vector observations". Journal of Guidance and Control. 4 (1): 70–77. Bibcode:1981JGCD....4...70S. doi:10.2514/3.19717.
Wu, Jin; Zhou, Zebo; Gao, Bin; Li, Rui; Cheng, Yuhua; Fourati, Hassen (2017). "Fast linear quaternion attitude estimator using vector observations" (PDF). IEEE Transactions on Automation Science and Engineering. IEEE. 15 (1): 307–319. doi:10.1109/TASE.2017.2699221. S2CID 3455346.
Yun, Xiaoping; Bachmann, Eric R; McGhee, Robert B (2008). "A simplified quaternion-based algorithm for orientation estimation from earth gravity and magnetic field measurements". IEEE Transactions on Instrumentation and Measurement. IEEE. 57 (3): 638–650. doi:10.1109/TIM.2007.911646. S2CID 15571138.

External links

"QUEST — AHRS documentation".
University of Colorado Boulder. "QUEST". Kinematics: Describing the Motions of Spacecraft. Coursera.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[shuster-1] 1 2 3 Shuster and Oh (1981)

[markley_and_mortari-2] 1 2 3 Markley and Mortari (2000)

[3] Crassidis (2007)

[4] Psiaki (2000)

[5] Wu et al. (2017)

[6] Xiaoping et al. (2008)

[1]

[2]

[3]

[4]

[5]

[6]

Quaternion estimator algorithm

Contents

Formulation of the problem

Solution of the characteristic equation

See also

Related Research Articles

References

Sources

External links