Ensemble Kalman filter

Last updated

The ensemble Kalman filter (EnKF) is a recursive filter suitable for problems with a large number of variables, such as discretizations of partial differential equations in geophysical models. The EnKF originated as a version of the Kalman filter for large problems (essentially, the covariance matrix is replaced by the sample covariance), and it is now an important data assimilation component of ensemble forecasting. EnKF is related to the particle filter (in this context, a particle is the same thing as an ensemble member) but the EnKF makes the assumption that all probability distributions involved are Gaussian; when it is applicable, it is much more efficient than the particle filter.

Contents

Introduction

The ensemble Kalman filter (EnKF) is a Monte Carlo implementation of the Bayesian update problem: given a probability density function (PDF) of the state of the modeled system (the prior , called often the forecast in geosciences) and the data likelihood, Bayes' theorem is used to obtain the PDF after the data likelihood has been taken into account (the posterior , often called the analysis). This is called a Bayesian update. The Bayesian update is combined with advancing the model in time, incorporating new data from time to time. The original Kalman filter, introduced in 1960, [1] assumes that all PDFs are Gaussian (the Gaussian assumption) and provides algebraic formulas for the change of the mean and the covariance matrix by the Bayesian update, as well as a formula for advancing the mean and covariance in time provided the system is linear. However, maintaining the covariance matrix is not feasible computationally for high-dimensional systems. For this reason, EnKFs were developed. [2] [3] EnKFs represent the distribution of the system state using a collection of state vectors, called an ensemble, and replace the covariance matrix by the sample covariance computed from the ensemble. The ensemble is operated with as if it were a random sample, but the ensemble members are really not independent, as they all share the EnKF. One advantage of EnKFs is that advancing the PDF in time is achieved by simply advancing each member of the ensemble. [4]

Derivation

Kalman filter

Let denote the -dimensional state vector of a model, and assume that it has Gaussian probability distribution with mean and covariance , i.e., its PDF is

Here and below, means proportional; a PDF is always scaled so that its integral over the whole space is one. This , called the prior , was evolved in time by running the model and now is to be updated to account for new data. It is natural to assume that the error distribution of the data is known; data have to come with an error estimate, otherwise they are meaningless. Here, the data is assumed to have Gaussian PDF with covariance and mean , where is the so-called observation matrix. The covariance matrix describes the estimate of the error of the data; if the random errors in the entries of the data vector are independent, is diagonal and its diagonal entries are the squares of the standard deviation (“error size”) of the error of the corresponding entries of the data vector . The value is what the value of the data would be for the state in the absence of data errors. Then the probability density of the data conditional of the system state , called the data likelihood, is

The PDF of the state and the data likelihood are combined to give the new probability density of the system state conditional on the value of the data (the posterior ) by the Bayes theorem,

The data is fixed once it is received, so denote the posterior state by instead of and the posterior PDF by . It can be shown by algebraic manipulations [5] that the posterior PDF is also Gaussian,

with the posterior mean and covariance given by the Kalman update formulas

where

is the so-called Kalman gain matrix.

Ensemble Kalman Filter

The EnKF is a Monte Carlo approximation of the Kalman filter, which avoids evolving the covariance matrix of the PDF of the state vector . Instead, the PDF is represented by an ensemble

is an matrix whose columns are the ensemble members, and it is called the prior ensemble. Ideally, ensemble members would form a sample from the prior distribution. However, the ensemble members are not in general independent except in the initial ensemble, since every EnKF step ties them together. They are deemed to be approximately independent, and all calculations proceed as if they actually were independent.

Replicate the data into an matrix

so that each column consists of the data vector plus a random vector from the -dimensional normal distribution . If, in addition, the columns of are a sample from the prior probability distribution, then the columns of

form a sample from the posterior probability distribution. To see this in the scalar case with : Let , and Then

.

The first sum is the posterior mean, and the second sum, in view of the independence, has a variance

,

which is the posterior variance.

The EnKF is now obtained simply by replacing the state covariance in Kalman gain matrix by the sample covariance computed from the ensemble members (called the ensemble covariance), [6] that is:

Implementation

Basic formulation

Here we follow. [7] [8] Suppose the ensemble matrix and the data matrix are as above. The ensemble mean and the covariance are

where

and denotes the matrix of all ones of the indicated size.

The posterior ensemble is then given by

where the perturbed data matrix is as above.

Note that since is a covariance matrix, it is always positive semidefinite and usually positive definite, so the inverse above exists and the formula can be implemented by the Cholesky decomposition. [9] In, [7] [8] is replaced by the sample covariance where and the inverse is replaced by a pseudoinverse, computed using the singular-value decomposition (SVD) .

Since these formulas are matrix operations with dominant Level 3 operations, [10] they are suitable for efficient implementation using software packages such as LAPACK (on serial and shared memory computers) and ScaLAPACK (on distributed memory computers). [9] Instead of computing the inverse of a matrix and multiplying by it, it is much better (several times cheaper and also more accurate) to compute the Cholesky decomposition of the matrix and treat the multiplication by the inverse as solution of a linear system with many simultaneous right-hand sides. [10]

Observation matrix-free implementation

Since we have replaced the covariance matrix with ensemble covariance, this leads to a simpler formula where ensemble observations are directly used without explicitly specifying the matrix . More specifically, define a function of the form

The function is called the observation function or, in the inverse problems context, the forward operator . The value of is what the value of the data would be for the state assuming the measurement is exact. Then the posterior ensemble can be rewritten as

where

and

with

Consequently, the ensemble update can be computed by evaluating the observation function on each ensemble member once and the matrix does not need to be known explicitly. This formula holds also [9] for an observation function with a fixed offset , which also does not need to be known explicitly. The above formula has been commonly used for a nonlinear observation function , such as the position of a hurricane vortex. [11] In that case, the observation function is essentially approximated by a linear function from its values at ensemble members.

Implementation for a large number of data points

For a large number of data points, the multiplication by becomes a bottleneck. The following alternative formula is advantageous when the number of data points is large (such as when assimilating gridded or pixel data) and the data error covariance matrix is diagonal (which is the case when the data errors are uncorrelated), or cheap to decompose (such as banded due to limited covariance distance). Using the Sherman–Morrison–Woodbury formula [12]

with

gives

which requires only the solution of systems with the matrix (assumed to be cheap) and of a system of size with right-hand sides. See [9] for operation counts.

Further extensions

The EnKF version described here involves randomization of data. For filters without randomization of data, see. [13] [14] [15]

Since the ensemble covariance is rank deficient (there are many more state variables, typically millions, than the ensemble members, typically less than a hundred), it has large terms for pairs of points that are spatially distant. Since in reality the values of physical fields at distant locations are not that much correlated, the covariance matrix is tapered off artificially based on the distance, which gives rise to localized EnKF algorithms. [16] [17] These methods modify the covariance matrix used in the computations and, consequently, the posterior ensemble is no longer made only of linear combinations of the prior ensemble.

For nonlinear problems, EnKF can create posterior ensemble with non-physical states. This can be alleviated by regularization, such as penalization of states with large spatial gradients. [6]

For problems with coherent features, such as hurricanes, thunderstorms, firelines, squall lines, and rain fronts, there is a need to adjust the numerical model state by deforming the state in space (its grid) as well as by correcting the state amplitudes additively. In 2007, Ravela et al. introduce the joint position-amplitude adjustment model using ensembles, and systematically derive a sequential approximation which can be applied to both EnKF and other formulations. [18] Their method does not make the assumption that amplitudes and position errors are independent or jointly Gaussian, as others do. The morphing EnKF employs intermediate states, obtained by techniques borrowed from image registration and morphing, instead of linear combinations of states. [19] [20]

Formally, EnKFs rely on the Gaussian assumption. In practice they can also be used for nonlinear problems, where the Gaussian assumption may not be satisfied. Related filters attempting to relax the Gaussian assumption in EnKF while preserving its advantages include filters that fit the state PDF with multiple Gaussian kernels, [21] filters that approximate the state PDF by Gaussian mixtures, [22] a variant of the particle filter with computation of particle weights by density estimation, [20] and a variant of the particle filter with thick tailed data PDF to alleviate particle filter degeneracy. [23]

See also

Related Research Articles

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

Covariance in probability theory and statistics is a measure of the joint variability of two random variables.

<span class="mw-page-title-main">Kalman filter</span> Algorithm that estimates unknowns from a series of measurements over time

For statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone, by estimating a joint probability distribution over the variables for each timeframe. The filter is named after Rudolf E. Kálmán, who was one of the primary developers of its theory.

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution, i.e. every finite linear combination of them is normally distributed. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

In statistics, the Wishart distribution is a generalization to multiple dimensions of the gamma distribution. It is named in honor of John Wishart, who first formulated the distribution in 1928. Other names include Wishart ensemble, or Wishart–Laguerre ensemble, or LOE, LUE, LSE.

In statistics, propagation of uncertainty is the effect of variables' uncertainties on the uncertainty of a function based on them. When the variables are the values of experimental measurements they have uncertainties due to measurement limitations which propagate due to the combination of variables in the function.

Hotellings <i>T</i>-squared distribution Type of probability distribution

In statistics, particularly in hypothesis testing, the Hotelling's T-squared distribution (T2), proposed by Harold Hotelling, is a multivariate probability distribution that is tightly related to the F-distribution and is most notable for arising as the distribution of a set of sample statistics that are natural generalizations of the statistics underlying the Student's t-distribution. The Hotelling's t-squared statistic (t2) is a generalization of Student's t-statistic that is used in multivariate hypothesis testing.

In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. The sample covariance matrix (SCM) is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in Rp×p; however, measured using the intrinsic geometry of positive-definite matrices, the SCM is a biased and inefficient estimator. In addition, if the random variable has a normal distribution, the sample covariance matrix has a Wishart distribution and a slightly differently scaled version of it is the maximum likelihood estimate. Cases involving missing data, heteroscedasticity, or autocorrelated residuals require deeper considerations. Another issue is the robustness to outliers, to which sample covariance matrices are highly sensitive.

In statistics, the inverse Wishart distribution, also called the inverted Wishart distribution, is a probability distribution defined on real-valued positive-definite matrices. In Bayesian statistics it is used as the conjugate prior for the covariance matrix of a multivariate normal distribution.

In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there is another confounding variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. This is precisely the motivation for including other right-side variables in a multiple regression; but while multiple regression gives unbiased results for the effect size, it does not give a numerical value of a measure of the strength of the relationship between the two variables of interest.

In the theory of stochastic processes, filtering describes the problem of determining the state of a system from an incomplete and potentially noisy set of observations. While originally motivated by problems in engineering, filtering found applications in many fields from signal processing to finance.

In the fields of computer vision and image analysis, the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or interest points so to make correspondences between images, recognize textures, categorize objects or build panoramas.

In statistics, the projection matrix, sometimes also called the influence matrix or hat matrix, maps the vector of response values to the vector of fitted values. It describes the influence each response value has on each fitted value. The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation.

In estimation theory, the extended Kalman filter (EKF) is the nonlinear version of the Kalman filter which linearizes about an estimate of the current mean and covariance. In the case of well defined transition models, the EKF has been considered the de facto standard in the theory of nonlinear state estimation, navigation systems and GPS.

Within bayesian statistics for machine learning, kernel methods arise from the assumption of an inner product space or similarity structure on inputs. For some such methods, such as support vector machines (SVMs), the original formulation and its regularization were not Bayesian in nature. It is helpful to understand them from a Bayesian perspective. Because the kernels are not necessarily positive semidefinite, the underlying structure may not be inner product spaces, but instead more general reproducing kernel Hilbert spaces. In Bayesian probability kernel methods are a key component of Gaussian processes, where the kernel function is known as the covariance function. Kernel methods have traditionally been used in supervised learning problems where the input space is usually a space of vectors while the output space is a space of scalars. More recently these methods have been extended to problems that deal with multiple outputs such as in multi-task learning.

Kernel methods are a well-established tool to analyze the relationship between input data and the corresponding output of a function. Kernels encapsulate the properties of functions in a computationally efficient way and allow algorithms to easily swap functions of varying complexity.

The complex inverse Wishart distribution is a matrix probability distribution defined on complex-valued positive-definite matrices and is the complex analog of the real inverse Wishart distribution. The complex Wishart distribution was extensively investigated by Goodman while the derivation of the inverse is shown by Shaman and others. It has greatest application in least squares optimization theory applied to complex valued data samples in digital radio communications systems, often related to Fourier Domain complex filtering.

In statistics and machine learning, Gaussian process approximation is a computational method that accelerates inference tasks in the context of a Gaussian process model, most commonly likelihood evaluation and prediction. Like approximations of other models, they can often be expressed as additional assumptions imposed on the model, which do not correspond to any actual feature, but which retain its key properties while simplifying calculations. Many of these approximation methods can be expressed in purely linear algebraic or functional analytic terms as matrix or function approximations. Others are purely algorithmic and cannot easily be rephrased as a modification of a statistical model.

References

  1. Kalman, R. E. (1960). "A new approach to linear filtering and prediction problems". Journal of Basic Engineering. 82 (1): 35–45. doi:10.1115/1.3662552. S2CID   1242324.
  2. Evensen, G. (1994). "Sequential data assimilation with nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics". Journal of Geophysical Research. 99 (C5): 143–162. Bibcode:1994JGR....9910143E. doi:10.1029/94JC00572. hdl: 1956/3035 .
  3. Houtekamer, P.; Mitchell, H. L. (1998). "Data assimilation using an ensemble Kalman filter technique". Monthly Weather Review . 126 (3): 796–811. Bibcode:1998MWRv..126..796H. CiteSeerX   10.1.1.3.1706 . doi:10.1175/1520-0493(1998)126<0796:DAUAEK>2.0.CO;2.
  4. For a survey of EnKF and related data assimilation techniques, see Evensen, G. (2007). Data Assimilation : The Ensemble Kalman Filter. Berlin: Springer. ISBN   978-3-540-38300-0.
  5. Anderson, B. D. O.; Moore, J. B. (1979). Optimal Filtering. Englewood Cliffs, NJ: Prentice-Hall. ISBN   978-0-13-638122-8.
  6. 1 2 Johns, C. J.; Mandel, J. (2008). "A Two-Stage Ensemble Kalman Filter for Smooth Data Assimilation". Environmental and Ecological Statistics. 15 (1): 101–110. CiteSeerX   10.1.1.67.4916 . doi:10.1007/s10651-007-0033-0. S2CID   14820232.
  7. 1 2 Burgers, G.; van Leeuwen, P. J.; Evensen, G. (1998). "Analysis Scheme in the Ensemble Kalman Filter". Monthly Weather Review. 126 (6): 1719–1724. Bibcode:1998MWRv..126.1719B. CiteSeerX   10.1.1.41.5827 . doi:10.1175/1520-0493(1998)126<1719:ASITEK>2.0.CO;2.
  8. 1 2 Evensen, G. (2003). "The Ensemble Kalman Filter: Theoretical Formulation and Practical Implementation". Ocean Dynamics. 53 (4): 343–367. Bibcode:2003OcDyn..53..343E. CiteSeerX   10.1.1.5.6990 . doi:10.1007/s10236-003-0036-9. S2CID   129233333.
  9. 1 2 3 4 Mandel, J. (June 2006). "Efficient Implementation of the Ensemble Kalman Filter" (PDF). Center for Computational Mathematics Reports. University of Colorado at Denver and Health Sciences Center. 231.
  10. 1 2 Golub, G. H.; Loan, C. F. V. (1989). Matrix Computations (Second ed.). Baltimore: Johns Hopkins Univ. Press. ISBN   978-0-8018-3772-2.
  11. Chen, Y.; Snyder, C. (2007). "Assimilating Vortex Position with an Ensemble Kalman Filter". Monthly Weather Review. 135 (5): 1828–1845. Bibcode:2007MWRv..135.1828C. doi: 10.1175/MWR3351.1 .
  12. Hager, W. W. (1989). "Updating the inverse of a matrix". SIAM Review . 31 (2): 221–239. doi:10.1137/1031049.
  13. Anderson, J. L. (2001). "An ensemble adjustment Kalman filter for data assimilation". Monthly Weather Review. 129 (12): 2884–2903. Bibcode:2001MWRv..129.2884A. CiteSeerX   10.1.1.5.9952 . doi:10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2.
  14. Evensen, G. (2004). "Sampling strategies and square root analysis schemes for the EnKF". Ocean Dynamics. 54 (6): 539–560. Bibcode:2004OcDyn..54..539E. CiteSeerX   10.1.1.3.6213 . doi:10.1007/s10236-004-0099-2. S2CID   120171951.
  15. Tippett, M. K.; Anderson, J. L.; Bishop, C. H.; Hamill, T. M.; Whitaker, J. S. (2003). "Ensemble square root filters". Monthly Weather Review. 131 (7): 1485–1490. Bibcode:2003MWRv..131.1485T. CiteSeerX   10.1.1.332.775 . doi:10.1175/1520-0493(2003)131<1485:ESRF>2.0.CO;2.
  16. Anderson, J. L. (2003). "A local least squares framework for ensemble filtering". Monthly Weather Review. 131 (4): 634–642. Bibcode:2003MWRv..131..634A. CiteSeerX   10.1.1.10.6543 . doi:10.1175/1520-0493(2003)131<0634:ALLSFF>2.0.CO;2.
  17. Ott, E.; Hunt, B. R.; Szunyogh, I.; Zimin, A. V.; Kostelich, E. J.; Corazza, M.; Kalnay, E.; Patil, D.; Yorke, J. A. (2004). "A local ensemble Kalman filter for atmospheric data assimilation". Tellus A . 56 (5): 415–428. arXiv: physics/0203058 . Bibcode:2004TellA..56..415O. doi:10.3402/tellusa.v56i5.14462. S2CID   218577557.
  18. Ravela, S.; Emanuel, K.; McLaughlin, D. (2007). "Data Assimilation by Field Alignment". Physica . D: Nonlinear Phenomena. 230 (1–2): 127–145. Bibcode:2007PhyD..230..127R. doi:10.1016/j.physd.2006.09.035.
  19. Beezley, J. D.; Mandel, J. (2008). "Morphing ensemble Kalman filters". Tellus A. 60 (1): 131–140. arXiv: 0705.3693 . Bibcode:2008TellA..60..131B. doi:10.1111/j.1600-0870.2007.00275.x. S2CID   1009227.
  20. 1 2 Mandel, J.; Beezley, J. D. (November 2006). Predictor-corrector and morphing ensemble filters for the assimilation of sparse data into high dimensional nonlinear systems (PDF). 11th Symposium on Integrated Observing and Assimilation Systems for the Atmosphere, Oceans, and Land Surface (IOAS-AOLS), CD-ROM, Paper 4.12, 87th American Meteorological Society Annual Meeting, San Antonio, TX, January 2007. CCM Report 239. University of Colorado at Denver and Health Sciences Center.
  21. Anderson, J. L.; Anderson, S. L. (1999). "A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts". Monthly Weather Review. 127 (12): 2741–2758. Bibcode:1999MWRv..127.2741A. doi:10.1175/1520-0493(1999)127<2741:AMCIOT>2.0.CO;2.
  22. Bengtsson, T.; Snyder, C.; Nychka, D. (2003). "Toward a nonlinear ensemble filter for high dimensional systems". Journal of Geophysical Research: Atmospheres. 108 (D24): STS 2–1–10. Bibcode:2003JGRD..108.8775B. doi: 10.1029/2002JD002900 .
  23. van Leeuwen, P. (2003). "A variance-minimizing filter for large-scale applications". Monthly Weather Review. 131 (9): 2071–2084. Bibcode:2003MWRv..131.2071V. CiteSeerX   10.1.1.7.3719 . doi:10.1175/1520-0493(2003)131<2071:AVFFLA>2.0.CO;2.