# Kriging

Last updated

In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. [1] Interpolating methods based on other criteria such as smoothness (e.g., smoothing spline) may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.

## Contents

The theoretical basis for the method was developed by the French mathematician Georges Matheron in 1960, based on the master's thesis of Danie G. Krige, the pioneering plotter of distance-weighted average gold grades at the Witwatersrand reef complex in South Africa. Krige sought to estimate the most likely distribution of gold based on samples from a few boreholes. The English verb is to krige, and the most common noun is kriging; both are often pronounced with a hard "g", following an Anglicized pronunciation of the name "Krige". The word is sometimes capitalized as Kriging in the literature.

Though computationally intensive in its basic formulation, kriging can be scaled to larger problems using various approximation methods.

## Main principles

Kriging predicts the value of a function at a given point by computing a weighted average of the known values of the function in the neighborhood of the point. The method is closely related to regression analysis. Both theories derive a best linear unbiased estimator based on assumptions on covariances, make use of Gauss–Markov theorem to prove independence of the estimate and error, and use very similar formulae. Even so, they are useful in different frameworks: kriging is made for estimation of a single realization of a random field, while regression models are based on multiple observations of a multivariate data set.

The kriging estimation may also be seen as a spline in a reproducing kernel Hilbert space, with the reproducing kernel given by the covariance function. [2] The difference with the classical kriging approach is provided by the interpretation: while the spline is motivated by a minimum-norm interpolation based on a Hilbert-space structure, kriging is motivated by an expected squared prediction error based on a stochastic model.

Kriging with polynomial trend surfaces is mathematically identical to generalized least squares polynomial curve fitting.

Kriging can also be understood as a form of Bayesian inference. [3] Kriging starts with a prior distribution over functions. This prior takes the form of a Gaussian process: ${\displaystyle N}$ samples from a function will be normally distributed, where the covariance between any two samples is the covariance function (or kernel) of the Gaussian process evaluated at the spatial location of two points. A set of values is then observed, each value associated with a spatial location. Now, a new value can be predicted at any new spatial location by combining the Gaussian prior with a Gaussian likelihood function for each of the observed values. The resulting posterior distribution is also Gaussian, with a mean and covariance that can be simply computed from the observed values, their variance, and the kernel matrix derived from the prior.

### Geostatistical estimator

In geostatistical models, sampled data are interpreted as the result of a random process. The fact that these models incorporate uncertainty in their conceptualization doesn't mean that the phenomenon – the forest, the aquifer, the mineral deposit – has resulted from a random process, but rather it allows one to build a methodological basis for the spatial inference of quantities in unobserved locations and to quantify the uncertainty associated with the estimator.

A stochastic process is, in the context of this model, simply a way to approach the set of data collected from the samples. The first step in geostatistical modulation is to create a random process that best describes the set of observed data.

A value from location ${\displaystyle x_{1}}$ (generic denomination of a set of geographic coordinates) is interpreted as a realization ${\displaystyle z(x_{1})}$ of the random variable ${\displaystyle Z(x_{1})}$. In the space ${\displaystyle A}$, where the set of samples is dispersed, there are ${\displaystyle N}$ realizations of the random variables ${\displaystyle Z(x_{1}),Z(x_{2}),\ldots ,Z(x_{N})}$, correlated between themselves.

The set of random variables constitutes a random function, of which only one realization is known – the set ${\displaystyle z(x_{i})}$ of observed data. With only one realization of each random variable, it's theoretically impossible to determine any statistical parameter of the individual variables or the function. The proposed solution in the geostatistical formalism consists in assuming various degrees of stationarity in the random function, in order to make the inference of some statistic values possible.

For instance, if one assumes, based on the homogeneity of samples in area ${\displaystyle A}$ where the variable is distributed, the hypothesis that the first moment is stationary (i.e. all random variables have the same mean), then one is assuming that the mean can be estimated by the arithmetic mean of sampled values.

The hypothesis of stationarity related to the second moment is defined in the following way: the correlation between two random variables solely depends on the spatial distance between them and is independent of their location. Thus if ${\displaystyle \mathbf {h} =x_{2}-x_{1}}$ and ${\displaystyle |\mathbf {h} |=h}$, then:

${\displaystyle C{\big (}Z(x_{1}),Z(x_{2}){\big )}=C{\big (}Z(x_{i}),Z(x_{i}+\mathbf {h} ){\big )}=C(h),}$
${\displaystyle \gamma {\big (}Z(x_{1}),Z(x_{2}){\big )}=\gamma {\big (}Z(x_{i}),Z(x_{i}+\mathbf {h} ){\big )}=\gamma (h).}$

For simplicity, we define ${\displaystyle C(x_{i},x_{j})=C{\big (}Z(x_{i}),Z(x_{j}){\big )}}$ and ${\displaystyle \gamma (x_{i},x_{j})=\gamma {\big (}Z(x_{i}),Z(x_{j}){\big )}}$.

This hypothesis allows one to infer those two measures – the variogram and the covariogram:

${\displaystyle \gamma (h)={\frac {1}{2|N(h)|}}\sum _{(i,j)\in N(h)}{\big (}Z(x_{i})-Z(x_{j}){\big )}^{2},}$
${\displaystyle C(h)={\frac {1}{|N(h)|}}\sum _{(i,j)\in N(h)}{\big (}Z(x_{i})-m(h){\big )}{\big (}Z(x_{j})-m(h){\big )},}$

where:

${\displaystyle m(h)={\frac {1}{2|N(h)|}}\sum _{(i,j)\in N(h)}Z(x_{i})+Z(x_{j})}$;
${\displaystyle N(h)}$ denotes the set of pairs of observations ${\displaystyle i,\;j}$ such that ${\displaystyle |x_{i}-x_{j}|=h}$, and ${\displaystyle |N(h)|}$ is the number of pairs in the set.

In this set, ${\displaystyle (i,\;j)}$ and ${\displaystyle (j,\;i)}$ denote the same element. Generally an "approximate distance" ${\displaystyle h}$ is used, implemented using a certain tolerance.

### Linear estimation

Spatial inference, or estimation, of a quantity ${\displaystyle Z\colon \mathbb {R} ^{n}\to \mathbb {R} }$, at an unobserved location ${\displaystyle x_{0}}$, is calculated from a linear combination of the observed values ${\displaystyle z_{i}=Z(x_{i})}$ and weights ${\displaystyle w_{i}(x_{0}),\;i=1,\ldots ,N}$:

${\displaystyle {\hat {Z}}(x_{0})={\begin{bmatrix}w_{1}&w_{2}&\cdots &w_{N}\end{bmatrix}}\cdot {\begin{bmatrix}z_{1}\\z_{2}\\\vdots \\z_{N}\end{bmatrix}}=\sum _{i=1}^{n}w_{i}(x_{0})\times Z(x_{i}).}$

The weights ${\displaystyle w_{i}}$ are intended to summarize two extremely important procedures in a spatial inference process:

• reflect the structural "proximity" of samples to the estimation location ${\displaystyle x_{0}}$;
• at the same time, they should have a desegregation effect, in order to avoid bias caused by eventual sample clusters.

When calculating the weights ${\displaystyle w_{i}}$, there are two objectives in the geostatistical formalism: unbias and minimal variance of estimation.

If the cloud of real values ${\displaystyle Z(x_{0})}$ is plotted against the estimated values ${\displaystyle {\hat {Z}}(x_{0})}$, the criterion for global unbias, intrinsic stationarity or wide sense stationarity of the field, implies that the mean of the estimations must be equal to mean of the real values.

The second criterion says that the mean of the squared deviations ${\displaystyle {\big (}{\hat {Z}}(x)-Z(x){\big )}}$ must be minimal, which means that when the cloud of estimated values versus the cloud real values is more disperse, the estimator is more imprecise.

## Methods

Depending on the stochastic properties of the random field and the various degrees of stationarity assumed, different methods for calculating the weights can be deduced, i.e. different types of kriging apply. Classical methods are:

• Ordinary kriging assumes constant unknown mean only over the search neighborhood of ${\displaystyle x_{0}}$.
• Simple kriging assumes stationarity of the first moment over the entire domain with a known mean: ${\displaystyle E\{Z(x)\}=E\{Z(x_{0})\}=m}$, where ${\displaystyle m}$ is the known mean.
• Universal kriging assumes a general polynomial trend model, such as linear trend model ${\displaystyle \textstyle E\{Z(x)\}=\sum _{k=0}^{p}\beta _{k}f_{k}(x)}$.
• IRFk-kriging assumes ${\displaystyle E\{Z(x)\}}$ to be an unknown polynomial in ${\displaystyle x}$.
• Indicator kriging uses indicator functions instead of the process itself, in order to estimate transition probabilities.
• Multiple-indicator kriging is a version of indicator kriging working with a family of indicators. Initially, MIK showed considerable promise as a new method that could more accurately estimate overall global mineral deposit concentrations or grades. However, these benefits have been outweighed by other inherent problems of practicality in modelling due to the inherently large block sizes used and also the lack of mining scale resolution. Conditional simulation is fast, becoming the accepted replacement technique in this case.[ citation needed ]
• Disjunctive kriging is a nonlinear generalisation of kriging.
• Log-normal kriging interpolates positive data by means of logarithms.
• Latent kriging assumes the various krigings on the latent level (second stage) of the nonlinear mixed-effects model to produce a spatial functional prediction. [4] This technique is useful when analyzing a spatial functional data ${\displaystyle \{(y_{i},x_{i},s_{i})\}_{i=1}^{n}}$, where ${\displaystyle y_{i}=(y_{i1},y_{i2},\cdots ,y_{iT_{i}})^{\top }}$ is a time series data over ${\displaystyle T_{i}}$ period, ${\displaystyle x_{i}=(x_{i1},x_{i2},\cdots ,x_{ip})^{\top }}$ is a vector of ${\displaystyle p}$ covariates, and ${\displaystyle s_{i}=(s_{i1},s_{i2})^{\top }}$ is a spatial location (longitude, latitude) of the ${\displaystyle i}$-th subject.
• Co-kriging denotes the joint kriging of data from multiple sources with a relationship between the different data sources. [5] Co-kriging is also possible in a Bayesian approach. [6] [7]
• Bayesian kriging departs from the optimization of unknown coefficients and hyperparameters, which is understood as a Maximum Likelihood estimate from the Bayesian perspective. Instead, the coefficients and hyperparameters are estimated from their expectation values.An advantage of Bayesian kriging is, that it allows to quantify the evidence for and the uncertainty of the kriging emulator. [8] If the emulator is employed to propagate uncertainties, the quality of the kriging emulator can be assessed by comparing the emulator uncertainty to the total uncertainty (see also Bayesian Polynomial Chaos). Bayesian kriging can also be mixed with co-kriging. [6] [7]

### Ordinary kriging

The unknown value ${\displaystyle Z(x_{0})}$ is interpreted as a random variable located in ${\displaystyle x_{0}}$, as well as the values of neighbors samples ${\displaystyle Z(x_{i}),\ i=1,\ldots ,N}$. The estimator ${\displaystyle {\hat {Z}}(x_{0})}$ is also interpreted as a random variable located in ${\displaystyle x_{0}}$, a result of the linear combination of variables.

In order to deduce the kriging system for the assumptions of the model, the following error committed while estimating ${\displaystyle Z(x)}$ in ${\displaystyle x_{0}}$ is declared:

${\displaystyle \epsilon (x_{0})={\hat {Z}}(x_{0})-Z(x_{0})={\begin{bmatrix}W^{T}&-1\end{bmatrix}}\cdot {\begin{bmatrix}Z(x_{1})&\cdots &Z(x_{N})&Z(x_{0})\end{bmatrix}}^{T}=\sum _{i=1}^{N}w_{i}(x_{0})\times Z(x_{i})-Z(x_{0}).}$

The two quality criteria referred to previously can now be expressed in terms of the mean and variance of the new random variable ${\displaystyle \epsilon (x_{0})}$:

Lack of bias

Since the random function is stationary, ${\displaystyle E[Z(x_{i})]=E[Z(x_{0})]=m}$, the following constraint is observed:

${\displaystyle E[\epsilon (x_{0})]=0\Leftrightarrow \sum _{i=1}^{N}w_{i}(x_{0})\times E[Z(x_{i})]-E[Z(x_{0})]=0\Leftrightarrow }$
${\displaystyle \Leftrightarrow m\sum _{i=1}^{N}w_{i}(x_{0})-m=0\Leftrightarrow \sum _{i=1}^{N}w_{i}(x_{0})=1\Leftrightarrow \mathbf {1} ^{T}\cdot W=1.}$

In order to ensure that the model is unbiased, the weights must sum to one.

Minimum variance

Two estimators can have ${\displaystyle E[\epsilon (x_{0})]=0}$, but the dispersion around their mean determines the difference between the quality of estimators. To find an estimator with minimum variance, we need to minimize ${\displaystyle E[\epsilon (x_{0})^{2}]}$.

{\displaystyle {\begin{aligned}\operatorname {Var} (\epsilon (x_{0}))&=\operatorname {Var} \left({\begin{bmatrix}W^{T}&-1\end{bmatrix}}\cdot {\begin{bmatrix}Z(x_{1})&\cdots &Z(x_{N})&Z(x_{0})\end{bmatrix}}^{T}\right)\\&={\begin{bmatrix}W^{T}&-1\end{bmatrix}}\cdot \operatorname {Var} \left({\begin{bmatrix}Z(x_{1})&\cdots &Z(x_{N})&Z(x_{0})\end{bmatrix}}^{T}\right)\cdot {\begin{bmatrix}W\\-1\end{bmatrix}}.\end{aligned}}}

See covariance matrix for a detailed explanation.

${\displaystyle \operatorname {Var} (\epsilon (x_{0}))={\begin{bmatrix}W^{T}&-1\end{bmatrix}}\cdot {\begin{bmatrix}\operatorname {Var} _{x_{i}}&\operatorname {Cov} _{x_{i}x_{0}}\\\operatorname {Cov} _{x_{i}x_{0}}^{T}&\operatorname {Var} _{x_{0}}\end{bmatrix}}\cdot {\begin{bmatrix}W\\-1\end{bmatrix}},}$

where the literals ${\displaystyle \left\{\operatorname {Var} _{x_{i}},\operatorname {Var} _{x_{0}},\operatorname {Cov} _{x_{i}x_{0}}\right\}}$ stand for

${\displaystyle \left\{\operatorname {Var} \left({\begin{bmatrix}Z(x_{1})&\cdots &Z(x_{N})\end{bmatrix}}^{T}\right),\operatorname {Var} {\big (}Z(x_{0}){\big )},\operatorname {Cov} \left({\begin{bmatrix}Z(x_{1})&\cdots &Z(x_{N})\end{bmatrix}}^{T},Z(x_{0})\right)\right\}.}$

Once defined the covariance model or variogram, ${\displaystyle C(\mathbf {h} )}$ or ${\displaystyle \gamma (\mathbf {h} )}$, valid in all field of analysis of ${\displaystyle Z(x)}$, then we can write an expression for the estimation variance of any estimator in function of the covariance between the samples and the covariances between the samples and the point to estimate:

${\displaystyle {\begin{cases}\operatorname {Var} {\big (}\epsilon (x_{0}){\big )}=W^{T}\cdot \operatorname {Var} _{x_{i}}\cdot W-\operatorname {Cov} _{x_{i}x_{0}}^{T}\cdot W-W^{T}\cdot \operatorname {Cov} _{x_{i}x_{0}}+\operatorname {Var} _{x_{0}},\\\operatorname {Var} {\big (}\epsilon (x_{0}){\big )}=\operatorname {Cov} (0)+\sum _{i}\sum _{j}w_{i}w_{j}\operatorname {Cov} (x_{i},x_{j})-2\sum _{i}w_{i}C(x_{i},x_{0}).\end{cases}}}$

Some conclusions can be asserted from this expression. The variance of estimation:

• is not quantifiable to any linear estimator, once the stationarity of the mean and of the spatial covariances, or variograms, are assumed;
• grows when the covariance between the samples and the point to estimate decreases. This means that, when the samples are farther away from ${\displaystyle x_{0}}$, the estimation becomes worse;
• grows with the a priori variance ${\displaystyle C(0)}$ of the variable ${\displaystyle Z(x)}$; when the variable is less disperse, the variance is lower in any point of the area ${\displaystyle A}$;
• does not depend on the values of the samples, which means that the same spatial configuration (with the same geometrical relations between samples and the point to estimate) always reproduces the same estimation variance in any part of the area ${\displaystyle A}$; this way, the variance does not measure the uncertainty of estimation produced by the local variable.
System of equations
${\displaystyle W={\underset {\mathbf {1} ^{T}\cdot W=1}{\operatorname {arg\,min} }}\left(W^{T}\cdot \operatorname {Var} _{x_{i}}\cdot W-\operatorname {Cov} _{x_{i}x_{0}}^{T}\cdot W-W^{T}\cdot \operatorname {Cov} _{x_{i}x_{0}}+\operatorname {Var} _{x_{0}}\right).}$

Solving this optimization problem (see Lagrange multipliers) results in the kriging system:

${\displaystyle {\begin{bmatrix}{\hat {W}}\\\mu \end{bmatrix}}={\begin{bmatrix}\operatorname {Var} _{x_{i}}&\mathbf {1} \\\mathbf {1} ^{T}&0\end{bmatrix}}^{-1}\cdot {\begin{bmatrix}\operatorname {Cov} _{x_{i}x_{0}}\\1\end{bmatrix}}={\begin{bmatrix}\gamma (x_{1},x_{1})&\cdots &\gamma (x_{1},x_{n})&1\\\vdots &\ddots &\vdots &\vdots \\\gamma (x_{n},x_{1})&\cdots &\gamma (x_{n},x_{n})&1\\1&\cdots &1&0\end{bmatrix}}^{-1}{\begin{bmatrix}\gamma (x_{1},x^{*})\\\vdots \\\gamma (x_{n},x^{*})\\1\end{bmatrix}}.}$

The additional parameter ${\displaystyle \mu }$ is a Lagrange multiplier used in the minimization of the kriging error ${\displaystyle \sigma _{k}^{2}(x)}$ to honor the unbiasedness condition.

### Simple kriging

Simple kriging is mathematically the simplest, but the least general. [9] It assumes the expectation of the random field is known and relies on a covariance function. However, in most applications neither the expectation nor the covariance are known beforehand.

The practical assumptions for the application of simple kriging are:

• Wide-sense stationarity of the field (variance stationary).
• The expectation is zero everywhere: ${\displaystyle \mu (x)=0}$.
• Known covariance function ${\displaystyle c(x,y)=\operatorname {Cov} {\big (}Z(x),Z(y){\big )}}$.

The covariance function is a crucial design choice, since it stipulates the properties of the Gaussian process and thereby the behaviour of the model. The covariance function encodes information about, for instance, smoothness and periodicity, which is reflected in the estimate produced. A very common covariance function is the squared exponential, which heavily favours smooth function estimates. [10] For this reason, it can produce poor estimates in many real-world applications, especially when the true underlying function contains discontinuities and rapid changes.

System of equations

The kriging weights of simple kriging have no unbiasedness condition and are given by the simple kriging equation system:

${\displaystyle {\begin{pmatrix}w_{1}\\\vdots \\w_{n}\end{pmatrix}}={\begin{pmatrix}c(x_{1},x_{1})&\cdots &c(x_{1},x_{n})\\\vdots &\ddots &\vdots \\c(x_{n},x_{1})&\cdots &c(x_{n},x_{n})\end{pmatrix}}^{-1}{\begin{pmatrix}c(x_{1},x_{0})\\\vdots \\c(x_{n},x_{0})\end{pmatrix}}.}$

This is analogous to a linear regression of ${\displaystyle Z(x_{0})}$ on the other ${\displaystyle z_{1},\ldots ,z_{n}}$.

Estimation

The interpolation by simple kriging is given by

${\displaystyle {\hat {Z}}(x_{0})={\begin{pmatrix}z_{1}\\\vdots \\z_{n}\end{pmatrix}}'{\begin{pmatrix}c(x_{1},x_{1})&\cdots &c(x_{1},x_{n})\\\vdots &\ddots &\vdots \\c(x_{n},x_{1})&\cdots &c(x_{n},x_{n})\end{pmatrix}}^{-1}{\begin{pmatrix}c(x_{1},x_{0})\\\vdots \\c(x_{n},x_{0})\end{pmatrix}}.}$

The kriging error is given by

${\displaystyle \operatorname {Var} {\big (}{\hat {Z}}(x_{0})-Z(x_{0}){\big )}=\underbrace {c(x_{0},x_{0})} _{\operatorname {Var} {\big (}Z(x_{0}){\big )}}-\underbrace {{\begin{pmatrix}c(x_{1},x_{0})\\\vdots \\c(x_{n},x_{0})\end{pmatrix}}'{\begin{pmatrix}c(x_{1},x_{1})&\cdots &c(x_{1},x_{n})\\\vdots &\ddots &\vdots \\c(x_{n},x_{1})&\cdots &c(x_{n},x_{n})\end{pmatrix}}^{-1}{\begin{pmatrix}c(x_{1},x_{0})\\\vdots \\c(x_{n},x_{0})\end{pmatrix}}} _{\operatorname {Var} {\big (}{\hat {Z}}(x_{0}){\big )}},}$

which leads to the generalised least-squares version of the Gauss–Markov theorem (Chiles & Delfiner 1999, p. 159):

${\displaystyle \operatorname {Var} {\big (}Z(x_{0}){\big )}=\operatorname {Var} {\big (}{\hat {Z}}(x_{0}){\big )}+\operatorname {Var} {\big (}{\hat {Z}}(x_{0})-Z(x_{0}){\big )}.}$

### Bayesian kriging

See also Bayesian Polynomial Chaos

### Properties

• The kriging estimation is unbiased: ${\displaystyle E[{\hat {Z}}(x_{i})]=E[Z(x_{i})]}$.
• The kriging estimation honors the actually observed value: ${\displaystyle {\hat {Z}}(x_{i})=Z(x_{i})}$ (assuming no measurement error is incurred).
• The kriging estimation ${\displaystyle {\hat {Z}}(x)}$ is the best linear unbiased estimator of ${\displaystyle Z(x)}$ if the assumptions hold. However (e.g. Cressie 1993): [11]
• As with any method, if the assumptions do not hold, kriging might be bad.
• There might be better nonlinear and/or biased methods.
• No properties are guaranteed when the wrong variogram is used. However, typically still a "good" interpolation is achieved.
• Best is not necessarily good: e.g. in case of no spatial dependence the kriging interpolation is only as good as the arithmetic mean.
• Kriging provides ${\displaystyle \sigma _{k}^{2}}$ as a measure of precision. However, this measure relies on the correctness of the variogram.

## Applications

Although kriging was developed originally for applications in geostatistics, it is a general method of statistical interpolation and can be applied within any discipline to sampled data from random fields that satisfy the appropriate mathematical assumptions. It can be used where spatially related data has been collected (in 2-D or 3-D) and estimates of "fill-in" data are desired in the locations (spatial gaps) between the actual measurements.

To date kriging has been used in a variety of disciplines, including the following:

### Design and analysis of computer experiments

Another very important and rapidly growing field of application, in engineering, is the interpolation of data coming out as response variables of deterministic computer simulations, [29] e.g. finite element method (FEM) simulations. In this case, kriging is used as a metamodeling tool, i.e. a black-box model built over a designed set of computer experiments. In many practical engineering problems, such as the design of a metal forming process, a single FEM simulation might be several hours or even a few days long. It is therefore more efficient to design and run a limited number of computer simulations, and then use a kriging interpolator to rapidly predict the response in any other design point. Kriging is therefore used very often as a so-called surrogate model, implemented inside optimization routines. [30]

## Related Research Articles

In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed. The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions. This theorem has seen many changes during the formal development of probability theory. Previous versions of the theorem date back to 1811, but in its modern general form, this fundamental result in probability theory was precisely stated as late as 1920, thereby serving as a bridge between classical and modern probability theory.

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

In linear algebra, a Toeplitz matrix or diagonal-constant matrix, named after Otto Toeplitz, is a matrix in which each descending diagonal from left to right is constant. For instance, the following matrix is a Toeplitz matrix:

In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, the covariance is positive. In the opposite case, when the greater values of one variable mainly correspond to the lesser values of the other,, the covariance is negative. The sign of the covariance therefore shows the tendency in the linear relationship between the variables. The magnitude of the covariance is not easy to interpret because it is not normalized and hence depends on the magnitudes of the variables. The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation.

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator, ridge regression, or simply any degenerate estimator.

For statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone, by estimating a joint probability distribution over the variables for each timeframe. The filter is named after Rudolf E. Kálmán, who was one of the primary developers of its theory.

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector. Any covariance matrix is symmetric and positive semi-definite and its main diagonal contains variances.

In mathematics, a block matrix or a partitioned matrix is a matrix that is interpreted as having been broken into sections called blocks or submatrices. Intuitively, a matrix interpreted as a block matrix can be visualized as the original matrix with a collection of horizontal and vertical lines, which break it up, or partition it, into a collection of smaller matrices. Any matrix may be interpreted as a block matrix in one or more ways, with each interpretation defined by how its rows and columns are partitioned.

In control theory, a state observer or state estimator is a system that provides an estimate of the internal state of a given real system, from measurements of the input and output of the real system. It is typically computer-implemented, and provides the basis of many practical applications.

The Durbin–Wu–Hausman test is a statistical hypothesis test in econometrics named after James Durbin, De-Min Wu, and Jerry A. Hausman. The test evaluates the consistency of an estimator when compared to an alternative, less efficient estimator which is already known to be consistent. It helps one evaluate if a statistical model corresponds to the data.

In probability theory and statistics, covariance is a measure of how much two variables change together, and the covariance function, or kernel, describes the spatial or temporal covariance of a random variable process or field. For a random field or stochastic process Z(x) on a domain D, a covariance function C(xy) gives the covariance of the values of the random field at the two locations x and y:

In probability theory and statistics, a cross-covariance matrix is a matrix whose element in the i, j position is the covariance between the i-th element of a random vector and j-th element of another random vector. A random vector is a random variable with multiple dimensions. Each element of the vector is a scalar random variable. Each element has either a finite number of observed empirical values or a finite or infinite number of potential values. The potential values are specified by a theoretical joint probability distribution. Intuitively, the cross-covariance matrix generalizes the notion of covariance to multiple dimensions.

In probability theory, the family of complex normal distributions, denoted or , characterizes complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix , and the relation matrix . The standard complex normal is the univariate distribution with , , and .

In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is zero if and only if the random vectors are independent. Thus, distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to Pearson's correlation, which can only detect linear association between two random variables.

In econometrics, Prais–Winsten estimation is a procedure meant to take care of the serial correlation of type AR(1) in a linear model. Conceived by Sigbert Prais and Christopher Winsten in 1954, it is a modification of Cochrane–Orcutt estimation in the sense that it does not lose the first observation, which leads to more efficiency as a result and makes it a special case of feasible generalized least squares.

In applied statistics and geostatistics, regression-kriging (RK) is a spatial prediction technique that combines a regression of the dependent variable on auxiliary variables with interpolation (kriging) of the regression residuals. It is mathematically equivalent to the interpolation method variously called universal kriging and kriging with external drift, where auxiliary predictors are used directly to solve the kriging weights.

In statistics, the variance function is a smooth function which depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

In probability theory and statistics, complex random variables are a generalization of real-valued random variables to complex numbers, i.e. the possible values a complex random variable may take are complex numbers. Complex random variables can always be considered as pairs of real random variables: their real and imaginary parts. Therefore, the distribution of one complex random variable may be interpreted as the joint distribution of two real random variables.

In probability theory and statistics, a complex random vector is typically a tuple of complex-valued random variables, and generally is a random variable taking values in a vector space over the field of complex numbers. If are complex-valued random variables, then the n-tuple is a complex random vector. Complex random variables can always be considered as pairs of real random vectors: their real and imaginary parts.

## References

1. Chung, Sang Yong; Venkatramanan, S.; Elzain, Hussam Eldin; Selvam, S.; Prasanna, M. V. (2019). "Supplement of Missing Data in Groundwater-Level Variations of Peak Type Using Geostatistical Methods". GIS and Geostatistical Techniques for Groundwater Science. Elsevier. pp. 33–41. doi:10.1016/b978-0-12-815413-7.00004-3. ISBN   978-0-12-815413-7. S2CID   189989265.
2. Wahba, Grace (1990). Spline Models for Observational Data. 59. SIAM. doi:10.1137/1.9781611970128. ISBN   978-0-89871-244-5.
3. Williams, C. K. I. (1998). "Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond". Learning in Graphical Models. pp. 599–621. doi:10.1007/978-94-011-5014-9_23. ISBN   978-94-010-6104-9.
4. Lee, Se Yoon; Mallick, Bani (2021). "Bayesian Hierarchical Modeling: Application Towards Production Results in the Eagle Ford Shale of South Texas". Sankhya B. doi:.
5. Le Gratiet, Loic; Garnier, Josselin (2014). "RECURSIVE CO-KRIGING MODEL FOR DESIGN OF COMPUTER EXPERIMENTS WITH MULTIPLE LEVELS OF FIDELITY". International Journal for Uncertainty Quantification. 4 (5): 365–386. doi:10.1615/Int.J.UncertaintyQuantification.2014006914. ISSN   2152-5080.
6. Ranftl, Sascha; Melito, Gian Marco; Badeli, Vahid; Reinbacher-Köstinger, Alice; Ellermann, Katrin; Linden, Wolfgang von der (2019-12-09). "On the Diagnosis of Aortic Dissection with Impedance Cardiography: A Bayesian Feasibility Study Framework with Multi-Fidelity Simulation Data". Proceedings. 33 (1): 24. doi:10.3390/proceedings2019033024. ISSN   2504-3900.
7. Ranftl, Sascha; Melito, Gian Marco; Badeli, Vahid; Reinbacher-Köstinger, Alice; Ellermann, Katrin; von der Linden, Wolfgang (2019-12-31). "Bayesian Uncertainty Quantification with Multi-Fidelity Data and Gaussian Processes for Impedance Cardiography of Aortic Dissection". Entropy. 22 (1): 58. doi:10.3390/e22010058. ISSN   1099-4300. PMC  . PMID   33285833.
8. Ranftl, Sascha; von der Linden, Wolfgang (2021-11-13). "Bayesian Surrogate Analysis and Uncertainty Propagation". Physical Sciences Forum. 3 (1): 6. doi:10.3390/psf2021003006. ISSN   2673-9984.
9. Olea, Ricardo A. (1999). Geostatistics for Engineers and Earth Scientists. Kluwer Academic. ISBN   978-1-4615-5001-3.
10. Rasmussen, Carl Edward; Williams, Christopher K. I. (2005-11-23). Gaussian Processes for Machine Learning. doi:10.7551/mitpress/3206.001.0001. ISBN   9780262256834.
11. Cressie 1993, Chiles&Delfiner 1999, Wackernagel 1995.
12. Bayraktar, Hanefi; Sezer, Turalioglu (2005). "A Kriging-based approach for locating a sampling site—in the assessment of air quality". SERRA. 19 (4): 301–305. doi:10.1007/s00477-005-0234-8. S2CID   122643497.
13. Chiles, J.-P. and P. Delfiner (1999) Geostatistics, Modeling Spatial Uncertainty, Wiley Series in Probability and statistics.
14. Zimmerman, D. A.; De Marsily, G.; Gotway, C. A.; Marietta, M. G.; Axness, C. L.; Beauheim, R. L.; Bras, R. L.; Carrera, J.; Dagan, G.; Davies, P. B.; Gallegos, D. P.; Galli, A.; Gómez-Hernández, J.; Grindrod, P.; Gutjahr, A. L.; Kitanidis, P. K.; Lavenue, A. M.; McLaughlin, D.; Neuman, S. P.; Ramarao, B. S.; Ravenne, C.; Rubin, Y. (1998). "A comparison of seven geostatistically based inverse approaches to estimate transmissivities for modeling advective transport by groundwater flow" (PDF). Water Resources Research. 34 (6): 1373–1413. Bibcode:1998WRR....34.1373Z. doi:.
15. Tonkin, M. J.; Larson, S. P. (2002). "Kriging Water Levels with a Regional-Linear and Point-Logarithmic Drift". Ground Water. 40 (2): 185–193. doi:10.1111/j.1745-6584.2002.tb02503.x. PMID   11916123.
16. Journel, A. G. and C. J. Huijbregts (1978) Mining Geostatistics, Academic Press London.
17. Richmond, A. (2003). "Financially Efficient Ore Selections Incorporating Grade Uncertainty". Mathematical Geology . 35 (2): 195–215. doi:10.1023/A:1023239606028. S2CID   116703619.
18. Goovaerts (1997) Geostatistics for natural resource evaluation, OUP. ISBN   0-19-511538-4
19. Emery, X. (2005). "Simple and Ordinary Multigaussian Kriging for Estimating Recoverable Reserves". Mathematical Geology . 37 (3): 295–319. doi:10.1007/s11004-005-1560-6. S2CID   92993524.
20. Papritz, A.; Stein, A. (2002). "Spatial prediction by linear kriging". Spatial Statistics for Remote Sensing. Remote Sensing and Digital Image Processing. 1. p. 83. doi:10.1007/0-306-47647-9_6. ISBN   0-7923-5978-X.
21. Barris, J. (2008) An expert system for appraisal by the method of comparison. PhD Thesis, UPC, Barcelona
22. Barris, J. and Garcia Almirall, P. (2010) A density function of the appraisal value, UPC, Barcelona.
23. Oghenekarho Okobiah, Saraju Mohanty, and Elias Kougianos (2013) Geostatistical-Inspired Fast Layout Optimization of a Nano-CMOS Thermal Sensor . Archived 2014-07-14 at the Wayback Machine , IET Circuits, Devices and Systems (CDS), Vol. 7, No. 5, Sep. 2013, pp. 253–262.
24. Koziel, Slawomir (2011). "Accurate modeling of microwave devices using kriging-corrected space mapping surrogates". International Journal of Numerical Modelling: Electronic Networks, Devices and Fields. 25: 1–14. doi:10.1002/jnm.803.
25. Pastorello, Nicola (2014). "The SLUGGS survey: exploring the metallicity gradients of nearby early-type galaxies to large radii". Monthly Notices of the Royal Astronomical Society. 442 (2): 1003–1039. arXiv:. Bibcode:2014MNRAS.442.1003P. doi:10.1093/mnras/stu937. S2CID   119221897.
26. Foster, Caroline; Pastorello, Nicola; Roediger, Joel; Brodie, Jean; Forbes, Duncan; Kartha, Sreeja; Pota, Vincenzo; Romanowsky, Aaron; Spitler, Lee; Strader, Jay; Usher, Christopher; Arnold, Jacob (2016). "The SLUGGS survey: stellar kinematics, kinemetry and trends at large radii in 25 early-type galaxies". Monthly Notices of the Royal Astronomical Society. 457 (1): 147–171. arXiv:. Bibcode:2016MNRAS.457..147F. doi:10.1093/mnras/stv2947. S2CID   53472235.
27. Bellstedt, Sabine; Forbes, Duncan; Foster, Caroline; Romanowsky, Aaron; Brodie, Jean; Pastorello, Nicola; Alabi, Adebusola; Villaume, Alexa (2017). "The SLUGGS survey: using extended stellar kinematics to disentangle the formation histories of low-mass S) galaxies". Monthly Notices of the Royal Astronomical Society. 467 (4): 4540–4557. arXiv:. Bibcode:2017MNRAS.467.4540B. doi:10.1093/mnras/stx418. S2CID   54521046.
28. Lee, Se Yoon; Mallick, Bani (2021). "Bayesian Hierarchical Modeling: Application Towards Production Results in the Eagle Ford Shale of South Texas". Sankhya B. doi:.
29. Sacks, J.; Welch, W. J.; Mitchell, T. J.; Wynn, H. P. (1989). "Design and Analysis of Computer Experiments". Statistical Science. 4 (4): 409–435. doi:. JSTOR   2245858.
30. Strano, M. (March 2008). "A technique for FEM optimization under reliability constraint of process variables in sheet metal forming". International Journal of Material Forming. 1 (1): 13–20. doi:10.1007/s12289-008-0001-8. S2CID   136682565.

### Historical references

1. Chilès, Jean-Paul; Desassis, Nicolas (2018). "Fifty Years of Kriging". Handbook of Mathematical Geosciences. Cham: Springer International Publishing. doi:10.1007/978-3-319-78999-6_29. ISBN   978-3-319-78998-9.
2. Agterberg, F. P., Geomathematics, Mathematical Background and Geo-Science Applications, Elsevier Scientific Publishing Company, Amsterdam, 1974.
3. Cressie, N. A. C., The origins of kriging, Mathematical Geology, v. 22, pp. 239–252, 1990.
4. Krige, D. G., A statistical approach to some mine valuations and allied problems at the Witwatersrand, Master's thesis of the University of Witwatersrand, 1951.
5. Link, R. F. and Koch, G. S., Experimental Designs and Trend-Surface Analsysis, Geostatistics, A colloquium, Plenum Press, New York, 1970.
6. Matheron, G., "Principles of geostatistics", Economic Geology, 58, pp. 1246–1266, 1963.
7. Matheron, G., "The intrinsic random functions, and their applications", Adv. Appl. Prob., 5, pp. 439–468, 1973.
8. Merriam, D. F. (editor), Geostatistics, a colloquium, Plenum Press, New York, 1970.

### Books

• Abramowitz, M., and Stegun, I. (1972), Handbook of Mathematical Functions, Dover Publications, New York.
• Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2004). Hierarchical Modeling and Analysis for Spatial Data. Chapman and Hall/CRC Press, Taylor and Francis Group.
• Chiles, J.-P. and P. Delfiner (1999) Geostatistics, Modeling Spatial uncertainty, Wiley Series in Probability and statistics.
• Clark, I., and Harper, W. V., (2000) Practical Geostatistics 2000, Ecosse North America, USA.
• Cressie, N. (1993) Statistics for spatial data, Wiley, New York.
• David, M. (1988) Handbook of Applied Advanced Geostatistical Ore Reserve Estimation, Elsevier Scientific Publishing
• Deutsch, C. V., and Journel, A. G. (1992), GSLIB – Geostatistical Software Library and User's Guide, Oxford University Press, New York, 338 pp.
• Goovaerts, P. (1997) Geostatistics for Natural Resources Evaluation, Oxford University Press, New York, ISBN   0-19-511538-4.
• Isaaks, E. H., and Srivastava, R. M. (1989), An Introduction to Applied Geostatistics, Oxford University Press, New York, 561 pp.
• Journel, A. G. and C. J. Huijbregts (1978) Mining Geostatistics, Academic Press London.
• Journel, A. G. (1989), Fundamentals of Geostatistics in Five Lessons, American Geophysical Union, Washington D.C.
• Press, W. H.; Teukolsky, S. A.; Vetterling, W. T.; Flannery, B. P. (2007), "Section 3.7.4. Interpolation by Kriging", Numerical Recipes: The Art of Scientific Computing (3rd ed.), New York: Cambridge University Press, ISBN   978-0-521-88068-8
• Stein, M. L. (1999), Statistical Interpolation of Spatial Data: Some Theory for Kriging, Springer, New York.
• Wackernagel, H. (1995) Multivariate Geostatistics - An Introduction with Applications, Springer Berlin