Regression-kriging

Last updated

In applied statistics and geostatistics, regression-kriging (RK) is a spatial prediction technique that combines a regression of the dependent variable on auxiliary variables (such as parameters derived from digital elevation modelling, remote sensing/imagery, and thematic maps) with interpolation (kriging) of the regression residuals. It is mathematically equivalent to the interpolation method variously called universal kriging and kriging with external drift, where auxiliary predictors are used directly to solve the kriging weights. [1]

Contents

BLUP for spatial data

The universal model of spatial variation scheme. The universal model of spatial variation.jpg
The universal model of spatial variation scheme.

Regression-kriging is an implementation of the best linear unbiased predictor (BLUP) for spatial data, i.e. the best linear interpolator assuming the universal model of spatial variation. Matheron (1969) proposed that a value of a target variable at some location can be modeled as a sum of the deterministic and stochastic components: [2]

which he termed universal model of spatial variation. Both deterministic and stochastic components of spatial variation can be modeled separately. By combining the two approaches, we obtain:

where is the fitted deterministic part, is the interpolated residual, are estimated deterministic model coefficients ( is the estimated intercept), are kriging weights determined by the spatial dependence structure of the residual and where is the residual at location . The regression coefficients can be estimated from the sample by some fitting method, e.g. ordinary least squares (OLS) or, optimally, using generalized least squares (GLS): [3]

where is the vector of estimated regression coefficients, is the covariance matrix of the residuals, is a matrix of predictors at the sampling locations and is the vector of measured values of the target variable. The GLS estimation of regression coefficients is, in fact, a special case of the geographically weighted regression. In the case, the weights are determined objectively to account for the spatial auto-correlation between the residuals.

Once the deterministic part of variation has been estimated (regression-part), the residual can be interpolated with kriging and added to the estimated trend. The estimation of the residuals is an iterative process: first the deterministic part of variation is estimated using OLS, then the covariance function of the residuals is used to obtain the GLS coefficients. Next, these are used to re-compute the residuals, from which an updated covariance function is computed, and so on. Although this is by many geostatisticians recommended as the proper procedure, Kitanidis (1994) showed that use of the covariance function derived from the OLS residuals (i.e. a single iteration) is often satisfactory, because it is not different enough from the function derived after several iterations; i.e. it does not affect much the final predictions. Minasny and McBratney (2007) report similar results—it seems that using more higher quality data is more important than to use more sophisticated statistical methods. [4]

In matrix notation, regression-kriging is commonly written as: [5]

where is the predicted value at location , is the vector of predictors and is the vector of kriging weights used to interpolate the residuals. The RK model is considered to be the Best Linear Predictor of spatial data. [5] [6] It has a prediction variance that reflects the position of new locations (extrapolation) in both geographical and feature space:

where is the sill variation and is the vector of covariances of residuals at the unvisited location.

Decision tree for selecting a suitable spatial prediction model. Decision tree for selecting a suitable spatial prediction model.jpg
Decision tree for selecting a suitable spatial prediction model.

Many (geo)statisticians believe that there is only one Best Linear Unbiased Prediction model for spatial data (e.g. regression-kriging), all other techniques such as ordinary kriging, environmental correlation, averaging of values per polygons or inverse distance interpolation can be seen as its special cases. If the residuals show no spatial auto-correlation (pure nugget effect), the regression-kriging converges to pure multiple linear regression, because the covariance matrix () becomes an identity matrix. Likewise, if the target variable shows no correlation with the auxiliary predictors, the regression-kriging model reduces to ordinary kriging model because the deterministic part equals the (global) mean value. Hence, pure kriging and pure regression should be considered as only special cases of regression-kriging (see figure).

RK and UK/KED

The geostatistical literature uses many different terms for what are essentially the same or at least very similar techniques. This confuses the users and distracts them from using the right technique for their mapping projects. In fact, both universal kriging, kriging with external drift, and regression-kriging are basically the same technique.

Matheron (1969) originally termed the technique Le krigeage universel, however, the technique was intended as a generalized case of kriging where the trend is modelled as a function of coordinates. Thus, many authors reserve the term universal kriging (UK) for the case when only the coordinates are used as predictors. If the deterministic part of variation (drift) is defined externally as a linear function of some auxiliary variables, rather than the coordinates, the term kriging with external drift (KED) is preferred (according to Hengl 2007, "About regression-kriging: From equations to case studies"). In the case of UK or KED, the predictions are made as with kriging, with the difference that the covariance matrix of residuals is extended with the auxiliary predictors. However, the drift and residuals can also be estimated separately and then summed. This procedure was suggested by Ahmed et al. (1987) and Odeh et al. (1995) later named it regression-kriging, while Goovaerts (1997) uses the term kriging with a trend model to refer to a family of interpolators, and refers to RK as simple kriging with varying local means. Minasny and McBratney (2007) simply call this technique Empirical Best Linear Unbiased Predictor i.e. E-BLUP. [7] [8] [9] [4]

In the case of KED, predictions at new locations are made by:

for

for or in matrix notation:

where is the target variable, 's are the predictor variables i.e. values at a new location , is the vector of KED weights (), is the number of predictors and is the vector of observations at primary locations. The KED weights are solved using the extended matrices:

where is the vector of solved weights, are the Lagrange multipliers, is the extended covariance matrix of residuals and is the extended vector of covariances at new location.

In the case of KED, the extended covariance matrix of residuals looks like this (Webster and Oliver, 2007; p. 183): [10]

and like this:

Hence, KED looks exactly as ordinary kriging, except the covariance matrix/vector are extended with values of auxiliary predictors.

Although the KED seems, at first glance, to be computationally more straightforward than RK, the parameters of the variogram for KED must also be estimated from regression residuals, thus requiring a separate regression modelling step. This regression should be GLS because of the likely spatial correlation between residuals. Note that many analyst use instead the OLS residuals, which may not be too different from the GLS residuals. However, they are not optimal if there is any spatial correlation, and indeed they may be quite different for clustered sample points or if the number of samples is relatively small ().

A limitation of KED is the instability of the extended matrix in the case that the covariate does not vary smoothly in space. RK has the advantage that it explicitly separates trend estimation from spatial prediction of residuals, allowing the use of arbitrarily-complex forms of regression, rather than the simple linear techniques that can be used with KED. In addition, it allows the separate interpretation of the two interpolated components. The emphasis on regression is important also because fitting of the deterministic part of variation (regression) is often more beneficial for the quality of final maps than fitting of the stochastic part (residuals).

Software to run regression-kriging

Example of a generic framework for spatial prediction of soil variables based on regression-kriging. A generic framework for spatial prediction of soil variables.png
Example of a generic framework for spatial prediction of soil variables based on regression-kriging.

Regression-kriging can be automated e.g. in R statistical computing environment, by using gstat and/or geoR package. Typical inputs/outputs include:

INPUTS:

OUTPUTS:

Application of regression-kriging

Regression-kriging is used in various applied fields, from meteorology, climatology, soil mapping, geological mapping, species distribution modeling and similar. The only requirement for using regression-kriging versus e.g. ordinary kriging is that one or more covariate layers exist, and which are significantly correlated with the feature of interest. Some general applications of regression-kriging are:

Simulations of zinc concentrations derived using a regression-Kriging model. This model uses one continuous (distance to the river) and one categorical (flooding frequency) covariate. Code used to produce these maps is available here. Simulations of zinc using regression-kriging model.png
Simulations of zinc concentrations derived using a regression-Kriging model. This model uses one continuous (distance to the river) and one categorical (flooding frequency) covariate. Code used to produce these maps is available here.

Regression-kriging-based algorithms play more and more important role in geostatistics because the number of possible covariates is increasing every day. [1] For example, DEMs are now available from a number of sources. Detailed and accurate images of topography can now be ordered from remote sensing systems such as SPOT and ASTER; SPOT5 offers the High Resolution Stereoscopic (HRS) scanner, which can be used to produce DEMs at resolutions of up to 5 m. [12] Finer differences in elevation can also be obtained with airborne laser-scanners. The cost of data is either free or dropping in price as technology advances. NASA recorded most of the world's topography in the Shuttle Radar Topographic Mission in 2000. [13] From summer of 2004, these data has been available (e.g. via USGS ftp) for almost whole globe at resolution of about 90 m (for the North American continent at resolution of about 30 m). Likewise, MODIS multispectral images are freely available for download at resolutions of 250 m. A large free repository of Landsat images is also available for download via the Global Land Cover Facility (GLCF).

Related Research Articles

Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including petroleum geology, hydrogeology, hydrology, meteorology, oceanography, geochemistry, geometallurgy, geography, forestry, environmental control, landscape ecology, soil science, and agriculture. Geostatistics is applied in varied branches of geography, particularly those involving the spread of diseases (epidemiology), the practice of commerce and military planning (logistics), and the development of efficient spatial networks. Geostatistical algorithms are incorporated in many places, including geographic information systems (GIS).

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator, ridge regression, or simply any degenerate estimator.

<span class="mw-page-title-main">Kalman filter</span> Algorithm that estimates unknowns from a series of measurements over time

For statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone, by estimating a joint probability distribution over the variables for each timeframe. The filter is named after Rudolf E. Kálmán, who was one of the primary developers of its theory.

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

<span class="mw-page-title-main">Logistic regression</span> Statistical model for a binary dependent variable

In statistics, the logistic model is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression analysis, logistic regression is estimating the parameters of a logistic model. Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

<span class="mw-page-title-main">Kriging</span> Method of interpolation

In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. Interpolating methods based on other criteria such as smoothness may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.

In statistics, the coefficient of multiple correlation is a measure of how well a given variable can be predicted using a linear function of a set of other variables. It is the correlation between the variable's values and the best predictions that can be computed linearly from the predictive variables.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable.

In spatial statistics the theoretical variogram, denoted , is a function describing the degree of spatial dependence of a spatial random field or stochastic process . The semivariogram is half the variogram.

In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.

In statistics, Cook's distance or Cook's D is a commonly used estimate of the influence of a data point when performing a least-squares regression analysis. In a practical ordinary least squares analysis, Cook's distance can be used in several ways: to indicate influential data points that are particularly worth checking for validity; or to indicate regions of the design space where it would be good to be able to obtain more data points. It is named after the American statistician R. Dennis Cook, who introduced the concept in 1977.

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in the regression model. Least squares and weighted least squares may need to be more statistically efficient and prevent misleading inferences. GLS was first described by Alexander Aitken in 1935.

In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there is another confounding variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. This is precisely the motivation for including other right-side variables in a multiple regression; but while multiple regression gives unbiased results for the effect size, it does not give a numerical value of a measure of the strength of the relationship between the two variables of interest.

In statistics, the projection matrix, sometimes also called the influence matrix or hat matrix, maps the vector of response values to the vector of fitted values. It describes the influence each response value has on each fitted value. The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation.

Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.

In statistics and in machine learning, a linear predictor function is a linear function of a set of coefficients and explanatory variables, whose value is used to predict the outcome of a dependent variable. This sort of function usually comes in linear regression, where the coefficients are called regression coefficients. However, they also occur in various types of linear classifiers, as well as in various other models, such as principal component analysis and factor analysis. In many of these models, the coefficients are referred to as "weights".

Kernel methods are a well-established tool to analyze the relationship between input data and the corresponding output of a function. Kernels encapsulate the properties of functions in a computationally efficient way and allow algorithms to easily swap functions of varying complexity.

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

Vecchia approximation is a Gaussian processes approximation technique originally developed by Aldo Vecchia, a statistician at United States Geological Survey. It is one of the earliest attempts to use Gaussian processes in high-dimensional settings. It has since been extensively generalized giving rise to many contemporary approximations.

In statistics and machine learning, Gaussian process approximation is a computational method that accelerates inference tasks in the context of a Gaussian process model, most commonly likelihood evaluation and prediction. Like approximations of other models, they can often be expressed as additional assumptions imposed on the model, which do not correspond to any actual feature, but which retain its key properties while simplifying calculations. Many of these approximation methods can be expressed in purely linear algebraic or functional analytic terms as matrix or function approximations. Others are purely algorithmic and cannot easily be rephrased as a modification of a statistical model.

References

  1. 1 2 Pebesma, Edzer J (1 July 2006). "The Role of External Variables and GIS Databases in Geostatistical Analysis" (PDF). Transactions in GIS. 10 (4): 615–632. doi:10.1111/j.1467-9671.2006.01015.x. S2CID   22146107.
  2. Matheron, Georges (1969). "Part 1 of Cahiers du Centre de morphologie mathématique de Fontainebleau". Le krigeage universel. École nationale supérieure des mines de Paris.
  3. Cressie, Noel (2012). Statistics for spatio-temporal data. Hoboken, N.J.: Wiley. ISBN   978-0-471-69274-4.
  4. 1 2 Minasny, Budiman; McBratney, Alex B. (31 July 2007). "Spatial prediction of soil properties using EBLUP with the Matérn covariance function". Geoderma. 140 (4): 324–336. Bibcode:2007Geode.140..324M. doi:10.1016/j.geoderma.2007.04.028.
  5. 1 2 Christensen, Ronald (2001). Advanced linear modeling: multivariate, time series, and spatial data; nonparametric regression and response surface maximization (2. ed.). New York, NY [u.a.]: Springer. ISBN   978-0-387-95296-3.
  6. Goldberger, A.S. (1962). "Best Linear Unbiased Prediction in the Generalized Linear Regression Model". Journal of the American Statistical Association. 57 (298): 369–375. doi:10.1080/01621459.1962.10480665. JSTOR   2281645.
  7. Ahmed, Shakeel; De Marsily, Ghislain (1 January 1987). "Comparison of geostatistical methods for estimating transmissivity using data on transmissivity and specific capacity". Water Resources Research. 23 (9): 1717. Bibcode:1987WRR....23.1717A. doi:10.1029/WR023i009p01717.
  8. Odeh, I.O.A.; McBratney, A.B.; Chittleborough, D.J. (31 July 1995). "Further results on prediction of soil properties from terrain attributes: heterotopic cokriging and regression-kriging". Geoderma. 67 (3–4): 215–226. Bibcode:1995Geode..67..215O. doi:10.1016/0016-7061(95)00007-B.
  9. 1 2 Hengl, Tomislav; Heuvelink, Gerard B.M.; Stein, Alfred (30 April 2004). "A generic framework for spatial prediction of soil variables based on regression-kriging" (PDF). Geoderma. 120 (1–2): 75–93. Bibcode:2004Geode.120...75H. doi:10.1016/j.geoderma.2003.08.018.
  10. Webster, Richard; Oliver, Margaret A. (2007). Geostatistics for environmental scientists (2nd ed.). Chichester: Wiley. ISBN   978-0-470-02858-2.
  11. Hengl, Tomislav; Bajat, Branislav; Blagojević, Dragan; Reuter, Hannes I. (1 December 2008). "Geostatistical modeling of topography using auxiliary maps" (PDF). Computers & Geosciences. 34 (12): 1886–1899. Bibcode:2008CG.....34.1886H. doi:10.1016/j.cageo.2008.01.005.
  12. Toutin, Thierry (30 April 2006). "Generation of DSMs from SPOT-5 in-track HRS and across-track HRG stereo data using spatiotriangulation and autocalibration". ISPRS Journal of Photogrammetry and Remote Sensing. 60 (3): 170–181. Bibcode:2006JPRS...60..170T. doi:10.1016/j.isprsjprs.2006.02.003.
  13. Rabus, Bernhard; Eineder, Michael; Roth, Achim; Bamler, Richard (31 January 2003). "The shuttle radar topography mission—a new class of digital elevation models acquired by spaceborne radar". ISPRS Journal of Photogrammetry and Remote Sensing. 57 (4): 241–262. Bibcode:2003JPRS...57..241R. doi:10.1016/S0924-2716(02)00124-7.

Further reading