Unscented transform

Last updated

The unscented transform (UT) is a mathematical function used to estimate the result of applying a given nonlinear transformation to a probability distribution that is characterized only in terms of a finite set of statistics. The most common use of the unscented transform is in the nonlinear projection of mean and covariance estimates in the context of nonlinear extensions of the Kalman filter. Its creator Jeffrey Uhlmann explained that "unscented" was an arbitrary name that he adopted to avoid it being referred to as the “Uhlmann filter.” [1]

Contents

Background

Many filtering and control methods represent estimates of the state of a system in the form of a mean vector and an associated error covariance matrix. As an example, the estimated 2-dimensional position of an object of interest might be represented by a mean position vector, , with an uncertainty given in the form of a 2x2 covariance matrix giving the variance in , the variance in , and the cross covariance between the two. A covariance that is zero implies that there is no uncertainty or error and that the position of the object is exactly what is specified by the mean vector.

The mean and covariance representation only gives the first two moments of an underlying, but otherwise unknown, probability distribution. In the case of a moving object, the unknown probability distribution might represent the uncertainty of the object's position at a given time. The mean and covariance representation of uncertainty is mathematically convenient because any linear transformation can be applied to a mean vector and covariance matrix as and . This linearity property does not hold for moments beyond the first raw moment (the mean) and the second central moment (the covariance), so it is not generally possible to determine the mean and covariance resulting from a nonlinear transformation because the result depends on all the moments, and only the first two are given.

Although the covariance matrix is often treated as being the expected squared error associated with the mean, in practice the matrix is maintained as an upper bound on the actual squared error. Specifically, a mean and covariance estimate is conservatively maintained so that the covariance matrix is greater than or equal to the actual squared error associated with . Mathematically this means that the result of subtracting the expected squared error (which is not usually known) from is a semi-definite or positive-definite matrix. The reason for maintaining a conservative covariance estimate is that most filtering and control algorithms will tend to diverge (fail) if the covariance is underestimated. This is because a spuriously small covariance implies less uncertainty and leads the filter to place more weight (confidence) than is justified in the accuracy of the mean.

Returning to the example above, when the covariance is zero it is trivial to determine the location of the object after it moves according to an arbitrary nonlinear function : just apply the function to the mean vector. When the covariance is not zero the transformed mean will not generally be equal to and it is not even possible to determine the mean of the transformed probability distribution from only its prior mean and covariance. Given this indeterminacy, the nonlinearly transformed mean and covariance can only be approximated. The earliest approximation was to linearize the nonlinear function and apply the resulting Jacobian matrix to the given mean and covariance. This is the basis of the extended Kalman Filter (EKF), and although it was known to yield poor results in many circumstances, there was no practical alternative for many decades.

Motivation for the unscented transform

In 1994 Jeffrey Uhlmann noted that the EKF takes a nonlinear function and partial distribution information (in the form of a mean and covariance estimate) of the state of a system but applies an approximation to the known function rather than to the imprecisely-known probability distribution. He suggested that a better approach would be to use the exact nonlinear function applied to an approximating probability distribution. The motivation for this approach is given in his doctoral dissertation, where the term unscented transform was first defined: [2]

Consider the following intuition: With a fixed number of parameters it should be easier to approximate a given distribution than it is to approximate an arbitrary nonlinear function/transformation. Following this intuition, the goal is to find a parameterization that captures the mean and covariance information while at the same time permitting the direct propagation of the information through an arbitrary set of nonlinear equations. This can be accomplished by generating a discrete distribution having the same first and second (and possibly higher) moments, where each point in the discrete approximation can be directly transformed. The mean and covariance of the transformed ensemble can then be computed as the estimate of the nonlinear transformation of the original distribution. More generally, the application of a given nonlinear transformation to a discrete distribution of points, computed so as to capture a set of known statistics of an unknown distribution, is referred to as an unscented transformation.

In other words, the given mean and covariance information can be exactly encoded in a set of points, referred to as sigma points, which if treated as elements of a discrete probability distribution has mean and covariance equal to the given mean and covariance. This distribution can be propagated exactly by applying the nonlinear function to each point. The mean and covariance of the transformed set of points then represents the desired transformed estimate. The principal advantage of the approach is that the nonlinear function is fully exploited, as opposed to the EKF which replaces it with a linear one. Eliminating the need for linearization also provides advantages independent of any improvement in estimation quality. One immediate advantage is that the UT can be applied with any given function whereas linearization may not be possible for functions that are not differentiable. A practical advantage is that the UT can be easier to implement because it avoids the need to derive and implement a linearizing Jacobian matrix.

Sigma points

To compute the unscented transform, one first has to choose a set of sigma points. Since the seminal work of Uhlmann, many different sets of sigma points have been proposed in the literature. A thoroughgoing review of these variants can be found in the work of Menegaz et al. [3] In general, sigma points are necessary and sufficient to define a discrete distribution having a given mean and covariance in dimensions. [2]

A canonical set of sigma points is the symmetric set originally proposed by Uhlmann. Consider the vertices of an equilateral triangle centered on origin in two dimensions:

It can be verified that the above set of points has mean and covariance (the identity matrix). Given any 2-dimensional mean and covariance, , the desired sigma points can be obtained by multiplying each point by the matrix square root of and adding . A similar canonical set of sigma points can be generated in any number of dimensions by taking the zero vector and the points comprising the rows of the identity matrix, computing the mean of the set of points, subtracting the mean from each point so that the resulting set has a mean of zero, then computing the covariance of the zero-mean set of points and applying its inverse to each point so that the covariance of the set will be equal to the identity.

Uhlmann showed that it is possible to conveniently generate a symmetric set of sigma points from the columns of and the zero vector, where is the given covariance matrix, without having to compute a matrix inverse. It is computationally efficient and, because the points form a symmetric distribution, captures the third central moment (the skew) whenever the underlying distribution of the state estimate is known or can be assumed to be symmetric. [2] He also showed that weights, including negative weights, can be used to affect the statistics of the set. Julier also developed and examined techniques for generating sigma points to capture the third moment (the skew) of an arbitrary distribution and the fourth moment (the kurtosis) of a symmetric distribution. [4] [5]

Example

The unscented transform is defined for the application of a given function to any partial characterization of an otherwise unknown distribution, but its most common use is for the case in which only the mean and covariance is given. A common example is the conversion from one coordinate system to another, such as from a Cartesian coordinate frame to polar coordinates. [4]

Suppose a 2-dimensional mean and covariance estimate, , is given in Cartesian coordinates with:

and the transformation function to polar coordinates, , is:

Multiplying each of the canonical simplex sigma points (given above) by and adding the mean, , gives:

Applying the transformation function to each of the above points gives:

The mean of these three transformed points, , is the UT estimate of the mean in polar coordinates:

The UT estimate of the covariance is:

where each squared term in the sum is a vector outer product. This gives:

This can be compared to the linearized mean and covariance:

The absolute difference between the UT and linearized estimates in this case is relatively small, but in filtering applications the cumulative effect of small errors can lead to unrecoverable divergence of the estimate. The effect of the errors are exacerbated when the covariance is underestimated because this causes the filter to be overconfident in the accuracy of the mean. In the above example it can be seen that the linearized covariance estimate is smaller than that of the UT estimate, suggesting that linearization has likely produced an underestimate of the actual error in its mean.

In this example there is no way to determine the absolute accuracy of the UT and linearized estimates without ground truth in the form of the actual probability distribution associated with the original estimate and the mean and covariance of that distribution after application of the nonlinear transformation (e.g., as determined analytically or through numerical integration). Such analyses have been performed for coordinate transformations under the assumption of Gaussianity for the underlying distributions, and the UT estimates tend to be significantly more accurate than those obtained from linearization. [6] [7]

Empirical analysis has shown that the use of the minimal simplex set of sigma points is significantly less accurate than the use of the symmetric set of points when the underlying distribution is Gaussian. [7] This suggests that the use of the simplex set in the above example would not be the best choice if the underlying distribution associated with is symmetric. Even if the underlying distribution is not symmetric, the simplex set is still likely to be less accurate than the symmetric set because the asymmetry of the simplex set is not matched to the asymmetry of the actual distribution.

Returning to the example, the minimal symmetric set of sigma points can be obtained from the covariance matrix simply as the mean vector, plus and minus the columns of :

This construction guarantees that the mean and covariance of the above four sigma points is , which is directly verifiable. Applying the nonlinear function to each of the sigma points gives:

The mean of these four transformed sigma points, , is the UT estimate of the mean in polar coordinates:

The UT estimate of the covariance is:

where the each squared term in the sum is a vector outer product. This gives:

The difference between the UT and linearized mean estimates gives a measure of the effect of the nonlinearity of the transformation. When the transformation is linear, for instance, the UT and linearized estimates will be identical. This motivates the use of the square of this difference to be added to the UT covariance to guard against underestimating of the actual error in the mean. This approach does not improve the accuracy of the mean but can significantly improve the accuracy of a filter over time by reducing the likelihood that the covariance is underestimated. [2]

Optimality of the unscented transform

Uhlmann noted that given only the mean and covariance of an otherwise unknown probability distribution, the transformation problem is ill-defined because there is an infinite number of possible underlying distributions with the same first two moments. Without any a priori information or assumptions about the characteristics of the underlying distribution, any choice of distribution used to compute the transformed mean and covariance is as reasonable as any other. In other words, there is no choice of distribution with a given mean and covariance that is superior to that provided by the set of sigma points, therefore the unscented transform is trivially optimal.

This general statement of optimality is of course useless for making any quantitative statements about the performance of the UT, e.g., compared to linearization; consequently he, Julier and others have performed analyses under various assumptions about the characteristics of the distribution and/or the form of the nonlinear transformation function. For example, if the function is differentiable, which is essential for linearization, these analyses validate the expected and empirically-corroborated superiority of the unscented transform. [6] [7]

Applications

The unscented transform can be used to develop a non-linear generalization of the Kalman filter, known as the Unscented Kalman Filter (UKF). This filter has largely replaced the EKF in many nonlinear filtering and control applications, including for underwater, [8] ground and air navigation, [9] and spacecraft. [10] The unscented transform has also been used as a computational framework for Riemann-Stieltjes optimal control. [11] This computational approach is known as unscented optimal control. [12] [13]

Unscented Kalman Filter

Uhlmann and Simon Julier published several papers showing that the use of the unscented transformation in a Kalman filter, which is referred to as the unscented Kalman filter (UKF), provides significant performance improvements over the EKF in a variety of applications. [14] [4] [6] Julier and Uhlmann published papers using a particular parameterized form of the unscented transform in the context of the UKF which used negative weights to capture assumed distribution information. [14] [6] That form of the UT is susceptible to a variety of numerical errors that the original formulations (the symmetric set originally proposed by Uhlmann) do not suffer. Julier has subsequently described parameterized forms which do not use negative weights and also are not subject to those issues. [15]

See also

Related Research Articles

<span class="mw-page-title-main">Affine transformation</span> Geometric transformation that preserves lines but not angles nor the origin

In Euclidean geometry, an affine transformation or affinity is a geometric transformation that preserves lines and parallelism, but not necessarily Euclidean distances and angles.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

Covariance in probability theory and statistics is a measure of the joint variability of two random variables.

<span class="mw-page-title-main">Kalman filter</span> Algorithm that estimates unknowns from a series of measurements over time

For statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone, by estimating a joint probability distribution over the variables for each timeframe. The filter is named after Rudolf E. Kálmán, who was one of the primary developers of its theory.

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

<span class="mw-page-title-main">Pearson correlation coefficient</span> Measure of linear correlation

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

In signal processing, time–frequency analysis comprises those techniques that study a signal in both the time and frequency domains simultaneously, using various time–frequency representations. Rather than viewing a 1-dimensional signal and some transform, time–frequency analysis studies a two-dimensional signal – a function whose domain is the two-dimensional real plane, obtained from the signal via a time–frequency transform.

Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. In control theory, the observability and controllability of a linear system are mathematical duals.

<span class="mw-page-title-main">Total least squares</span> Statistical technique

In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generalization of Deming regression and also of orthogonal regression, and can be applied to both linear and non-linear models.

In control theory, a state observer or state estimator is a system that provides an estimate of the internal state of a given real system, from measurements of the input and output of the real system. It is typically computer-implemented, and provides the basis of many practical applications.

In Hamiltonian mechanics, the linear canonical transformation (LCT) is a family of integral transforms that generalizes many classical transforms. It has 4 parameters and 1 constraint, so it is a 3-dimensional family, and can be visualized as the action of the special linear group SL2(R) on the time–frequency plane (domain). As this defines the original function up to a sign, this translates into an action of its double cover on the original function space.

A radar tracker is a component of a radar system, or an associated command and control (C2) system, that associates consecutive radar observations of the same target into tracks. It is particularly useful when the radar system is reporting data from several different targets or when it is necessary to combine the data from several different radars or other sensors.

A whitening transformation or sphering transformation is a linear transformation that transforms a vector of random variables with a known covariance matrix into a set of new variables whose covariance is the identity matrix, meaning that they are uncorrelated and each have variance 1. The transformation is called "whitening" because it changes the input vector into a white noise vector.

In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there is another confounding variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. This is precisely the motivation for including other right-side variables in a multiple regression; but while multiple regression gives unbiased results for the effect size, it does not give a numerical value of a measure of the strength of the relationship between the two variables of interest.

The ensemble Kalman filter (EnKF) is a recursive filter suitable for problems with a large number of variables, such as discretizations of partial differential equations in geophysical models. The EnKF originated as a version of the Kalman filter for large problems, and it is now an important data assimilation component of ensemble forecasting. EnKF is related to the particle filter but the EnKF makes the assumption that all probability distributions involved are Gaussian; when it is applicable, it is much more efficient than the particle filter.

The sample mean or empirical mean, and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables.

GPS/INS is the use of GPS satellite signals to correct or calibrate a solution from an inertial navigation system (INS). The method is applicable for any GNSS/INS system.

In estimation theory, the extended Kalman filter (EKF) is the nonlinear version of the Kalman filter which linearizes about an estimate of the current mean and covariance. In the case of well defined transition models, the EKF has been considered the de facto standard in the theory of nonlinear state estimation, navigation systems and GPS.

Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.

<span class="mw-page-title-main">Homoscedasticity and heteroscedasticity</span> Statistical property

In statistics, a sequence of random variables is homoscedastic if all its random variables have the same finite variance; this is also known as homogeneity of variance. The complementary notion is called heteroscedasticity, also known as heterogeneity of variance. The spellings homoskedasticity and heteroskedasticity are also frequently used. Skedasticity comes from the Ancient Greek word skedánnymi, meaning “to scatter”. Assuming a variable is homoscedastic when in reality it is heteroscedastic results in unbiased but inefficient point estimates and in biased estimates of standard errors, and may result in overestimating the goodness of fit as measured by the Pearson coefficient.

References

  1. "First-Hand:The Unscented Transform - Engineering and Technology History Wiki".
  2. 1 2 3 4 Uhlmann, Jeffrey (1995). Dynamic Map Building and Localization: New Theoretical Foundations (Ph.D. thesis). University of Oxford.
  3. Menegaz, Henrique M. T.; João, Y. Ishihara; Borges, Geovany A.; Vargas, Alessandro N. (16 February 2015). "A Systematization of the Unscented Kalman Filter Theory". IEEE Transactions on Automatic Control. 60 (10): 2583–2598. doi:10.1109/TAC.2015.2404511. hdl: 20.500.11824/251 . S2CID   12606055.
  4. 1 2 3 Julier, S.; J. Uhlmann (1997). "Consistent Debiased Method for Converting Between Polar and Cartesian Coordinate Systems". Proceedings of the 1997 SPIE Conference on Acquisition, Tracking, and Pointing. Vol. 3086. SPIE.
  5. Julier, Simon (1998). "A Skewed Approach to Filtering". The Proceedings of the 12th Intl. Symp. On Aerospace/Defense Sensing, Simulation and Controls. Vol. 3373. SPIE.
  6. 1 2 3 4 Julier, Simon; Uhlmann, Jeffrey (2000). "A New Method for the Nonlinear Transformation of Means and Covariances in Nonlinear Filters". IEEE Transactions on Automatic Control. 45 (3): 477–482. doi:10.1109/9.847726.
  7. 1 2 3 Zhang, W.; M. Liu; Z. Zhao (2009). "Accuracy Analysis of Unscented Transformation of Several Sampling Strategies". Proc. of the 10th Intl. Conf. on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. ACIS.
  8. Wu, L.; J. Ma; J. Tian (2010). "Self-Adaptive Unscented Kalman Filtering for Underwater Gravity Aided Navigation". Proc. of IEEE/ION Plans.
  9. El-Sheimy, N; Shin, EH; Niu, X (2006). "Kalman Filter Face-Off: Extended vs. Unscented Kalman Filters for Integrated GPS and MEMS Inertial". Inside GNSS: Engineering Solutions for the Global Navigation Satellite System Community. 1 (2).
  10. Crassidis, J.; Markley, F. (2003). "Unscented Filtering for Spacecraft Attitude Estimation". Journal of Guidance, Control, and Dynamics. 26 (4): 536–542. Bibcode:2003JGCD...26..536C. doi:10.2514/2.5102.
  11. Ross, I. Michael; Proulx, Ronald J.; Karpenko, Mark; Gong, Qi (July 2015). "Riemann–Stieltjes Optimal Control Problems for Uncertain Dynamic Systems". Journal of Guidance, Control, and Dynamics. 38 (7): 1251–1263. Bibcode:2015JGCD...38.1251R. doi:10.2514/1.G000505. S2CID   121424228.
  12. I. M. Ross, R. J. Proulx, and M. Karpenko, "Unscented Optimal Control for Space Flight," Proceedings of the 24th International Symposium on Space Flight Dynamics (ISSFD), May 5–9, 2014, Laurel, MD. http://issfd.org/ISSFD_2014/ISSFD24_Paper_S12-5_Karpenko.pdf
  13. Ross, I. Michael; Proulx, Ronald J.; Karpenko, Mark (July 2015). "Unscented guidance". 2015 American Control Conference (ACC). pp. 5605–5610. doi:10.1109/ACC.2015.7172217. ISBN   978-1-4799-8684-2. S2CID   28136418.
  14. 1 2 Julier, S.; J. Uhlmann (1997). "New Extension of the Kalman Filter to Nonlinear Systems". Proceedings of the 1997 SPIE Conference on Signal Processing, Sensor Fusion, and Target Recognition. Vol. 3068.
  15. Julier, Simon (2002). "The Scaled Unscented Transformation". Proceedings of the American Control Conference. Vol. 6. IEEE.