# Geostatistics

Last updated

Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, [1] it is currently applied in diverse disciplines including petroleum geology, hydrogeology, hydrology, meteorology, oceanography, geochemistry, geometallurgy, geography, forestry, environmental control, landscape ecology, soil science, and agriculture (esp. in precision farming). Geostatistics is applied in varied branches of geography, particularly those involving the spread of diseases (epidemiology), the practice of commerce and military planning (logistics), and the development of efficient spatial networks. Geostatistical algorithms are incorporated in many places, including geographic information systems (GIS) and the R statistical environment.

## Background

Geostatistics is intimately related to interpolation methods, but extends far beyond simple interpolation problems. Geostatistical techniques rely on statistical models that are based on random function (or random variable) theory to model the uncertainty associated with spatial estimation and simulation.

A number of simpler interpolation methods/algorithms, such as inverse distance weighting, bilinear interpolation and nearest-neighbor interpolation, were already well known before geostatistics. [2] Geostatistics goes beyond the interpolation problem by considering the studied phenomenon at unknown locations as a set of correlated random variables.

Let Z(x) be the value of the variable of interest at a certain location x. This value is unknown (e.g. temperature, rainfall, piezometric level, geological facies, etc.). Although there exists a value at location x that could be measured, geostatistics considers this value as random since it was not measured, or has not been measured yet. However, the randomness of Z(x) is not complete, but defined by a cumulative distribution function (CDF) that depends on certain information that is known about the value Z(x):

${\displaystyle F({\mathit {z}},\mathbf {x} )=\operatorname {Prob} \lbrace Z(\mathbf {x} )\leqslant {\mathit {z}}\mid {\text{information}}\rbrace .}$

Typically, if the value of Z is known at locations close to x (or in the neighborhood of x) one can constrain the CDF of Z(x) by this neighborhood: if a high spatial continuity is assumed, Z(x) can only have values similar to the ones found in the neighborhood. Conversely, in the absence of spatial continuity Z(x) can take any value. The spatial continuity of the random variables is described by a model of spatial continuity that can be either a parametric function in the case of variogram-based geostatistics, or have a non-parametric form when using other methods such as multiple-point simulation [3] or pseudo-genetic techniques.

By applying a single spatial model on an entire domain, one makes the assumption that Z is a stationary process. It means that the same statistical properties are applicable on the entire domain. Several geostatistical methods provide ways of relaxing this stationarity assumption.

In this framework, one can distinguish two modeling goals:

1. Estimating the value for Z(x), typically by the expectation, the median or the mode of the CDF f(z,x). This is usually denoted as an estimation problem.
2. Sampling from the entire probability density function f(z,x) by actually considering each possible outcome of it at each location. This is generally done by creating several alternative maps of Z, called realizations. Consider a domain discretized in N grid nodes (or pixels). Each realization is a sample of the complete N-dimensional joint distribution function
${\displaystyle F(\mathbf {z} ,\mathbf {x} )=\operatorname {Prob} \lbrace Z(\mathbf {x} _{1})\leqslant z_{1},Z(\mathbf {x} _{2})\leqslant z_{2},...,Z(\mathbf {x} _{N})\leqslant z_{N}\rbrace .}$
In this approach, the presence of multiple solutions to the interpolation problem is acknowledged. Each realization is considered as a possible scenario of what the real variable could be. All associated workflows are then considering ensemble of realizations, and consequently ensemble of predictions that allow for probabilistic forecasting. Therefore, geostatistics is often used to generate or update spatial models when solving inverse problems. [4] [5]

A number of methods exist for both geostatistical estimation and multiple realizations approaches. Several reference books provide a comprehensive overview of the discipline. [6] [2] [7] [8] [9] [10] [11] [12] [13] [14] [15]

## Methods

### Estimation

#### Kriging

Kriging is a group of geostatistical techniques to interpolate the value of a random field (e.g., the elevation, z, of the landscape as a function of the geographic location) at an unobserved location from observations of its value at nearby locations.

#### Bayesian estimation

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update a probability model as more evidence or information becomes available. Bayesian inference is playing an increasingly important role in Geostatistics. [16] Bayesian estimation implements kriging through a spatial process, most commonly a Gaussian process, and updates the process using Bayes' Theorem to calculate its posterior. High-dimensional Bayesian Geostatistics [17]

## Notes

1. Krige, Danie G. (1951). "A statistical approach to some basic mine valuation problems on the Witwatersrand". J. of the Chem., Metal. and Mining Soc. of South Africa 52 (6): 119–139
2. Isaaks, E. H. and Srivastava, R. M. (1989), An Introduction to Applied Geostatistics, Oxford University Press, New York, USA.
3. Mariethoz, Gregoire, Caers, Jef (2014). Multiple-point geostatistics: modeling with training images. Wiley-Blackwell, Chichester, UK, 364 p.
4. Hansen, T.M., Journel, A.G., Tarantola, A. and Mosegaard, K. (2006). "Linear inverse Gaussian theory and geostatistics", Geophysics 71
5. Kitanidis, P.K. and Vomvoris, E.G. (1983). "A geostatistical approach to the inverse problem in groundwater modeling (steady state) and one-dimensional simulations", Water Resources Research 19(3):677-690
6. Remy, N., et al. (2009), Applied Geostatistics with SGeMS: A User's Guide, 284 pp., Cambridge University Press, Cambridge.
7. Deutsch, C.V., Journel, A.G, (1997). GSLIB: Geostatistical Software Library and User's Guide (Applied Geostatistics Series), Second Edition, Oxford University Press, 369 pp., http://www.gslib.com/
8. Chilès, J.-P., and P. Delfiner (1999), Geostatistics - Modeling Spatial Uncertainty, John Wiley & Sons, Inc., New York, USA.
9. Lantuéjoul, C. (2002), Geostatistical simulation: Models and algorithms, 232 pp., Springer, Berlin.
10. Journel, A. G. and Huijbregts, C.J. (1978) Mining Geostatistics, Academic Press. ISBN   0-12-391050-1
11. Kitanidis, P.K. (1997) Introduction to Geostatistics: Applications in Hydrogeology, Cambridge University Press.
12. Wackernagel, H. (2003). Multivariate geostatistics, Third edition, Springer-Verlag, Berlin, 387 pp.
13. Pyrcz, M. J. and Deutsch, C.V., (2014). Geostatistical Reservoir Modeling, 2nd Edition, Oxford University Press, 448 pp.
14. Tahmasebi, P., Hezarkhani, A., Sahimi, M., 2012, Multiple-point geostatistical modeling based on the cross-correlation functions, Computational Geosciences, 16(3):779-79742,
15. Schnetzler, Manu. "Statios - WinGslib".
16. Banerjee S., Carlin B.P., and Gelfand A.E. (2014). Hierarchical Modeling and Analysis for Spatial Data, Second Edition. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. ISBN   9781439819173
17. Banerjee, Sudipto. High-Dimensional Bayesian Geostatistics. Bayesian Anal. 12 (2017), no. 2, 583--614. doi : 10.1214/17-BA1056R. https://projecteuclid.org/euclid.ba/1494921642

## Related Research Articles

In statistics, the likelihood function measures the goodness of fit of a statistical model to a sample of data for given values of the unknown parameters. It is formed from the joint probability distribution of the sample, but viewed and used as a function of the parameters only, thus treating the random variables as fixed at the observed values.

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution, i.e. every finite linear combination of them is normally distributed. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions on the priors, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. Interpolating methods based on other criteria such as smoothness may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.

Geologic modelling,geological modelling or geomodelling is the applied science of creating computerized representations of portions of the Earth's crust based on geophysical and geological observations made on and below the Earth surface. A geomodel is the numerical equivalent of a three-dimensional geological map complemented by a description of physical quantities in the domain of interest. Geomodelling is related to the concept of Shared Earth Model; which is a multidisciplinary, interoperable and updatable knowledge base about the subsurface.

In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population. However, while problems associated with "mixture distributions" relate to deriving the properties of the overall population from those of the sub-populations, "mixture models" are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population identity information.

In spatial statistics the theoretical variogram is a function describing the degree of spatial dependence of a spatial random field or stochastic process . The semivariogram is half the variogram.

Markov chain geostatistics uses Markov chain spatial models, simulation algorithms and associated spatial correlation measures based on the Markov chain random field theory, which extends a single Markov chain into a multi-dimensional random field for geostatistical modeling. A Markov chain random field is still a single spatial Markov chain. The spatial Markov chain moves or jumps in a space and decides its state at any unobserved location through interactions with its nearest known neighbors in different directions. The data interaction process can be well explained as a local sequential Bayesian updating process within a neighborhood. Because single-step transition probability matrices are difficult to estimate from sparse sample data and are impractical in representing the complex spatial heterogeneity of states, the transiogram, which is defined as a transition probability function over the distance lag, is proposed as the accompanying spatial measure of Markov chain random fields.

Data assimilation is a mathematical discipline that seeks to optimally combine theory with observations. There may be a number of different goals sought, for example—to determine the optimal state estimate of a system, to determine initial conditions for a numerical forecast model, to interpolate sparse observation data using knowledge of the system being observed, to train numerical model parameters based on observed data. Depending on the goal, different solution methods may be used. Data assimilation is distinguished from other forms of machine learning, image analysis, and statistical methods in that it utilizes a dynamical model of the system being analyzed.

Spatial analysis or spatial statistics includes any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques, many still in their early development, using different analytic approaches and applied in fields as diverse as astronomy, with its studies of the placement of galaxies in the cosmos, to chip fabrication engineering, with its use of "place and route" algorithms to build complex wiring structures. In a more restricted sense, spatial analysis is the technique applied to structures at the human scale, most notably in the analysis of geographic data.

Georges François Paul Marie Matheron was a French mathematician and civil engineer of mines, known as the founder of geostatistics and a co-founder of mathematical morphology. In 1968, he created the Centre de Géostatistique et de Morphologie Mathématique at the Paris School of Mines in Fontainebleau. He is known for his contributions on Kriging and mathematical morphology. His seminal work is posted for study and review to the Online Library of the Centre de Géostatistique, Fontainebleau, France.

Uncertainty quantification (UQ) is the science of quantitative characterization and reduction of uncertainties in both computational and real world applications. It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known. An example would be to predict the acceleration of a human body in a head-on crash with another car: even if the speed was exactly known, small differences in the manufacturing of individual cars, how tightly every bolt has been tightened, etc., will lead to different results that can only be predicted in a statistical sense.

In numerical analysis, multivariate interpolation is interpolation on functions of more than one variable; when the variates are spatial coordinates, it is also known as spatial interpolation.

In geophysics, seismic inversion is the process of transforming seismic reflection data into a quantitative rock-property description of a reservoir. Seismic inversion may be pre- or post-stack, deterministic, random or geostatistical; it typically includes other reservoir measurements such as well logs and cores.

In the oil and gas industry, reservoir modeling involves the construction of a computer model of a petroleum reservoir, for the purposes of improving estimation of reserves and making decisions regarding the development of the field, predicting future production, placing additional wells, and evaluating alternative reservoir management scenarios.

Pedometric mapping, or statistical soil mapping, is data-driven generation of soil property and class maps that is based on use of statistical methods. The main objective of pedometric mapping is to predict values of some soil variable at unobserved locations and access the uncertainty of that estimate using statistical inference i.e. statistically optimal approaches. From the application point of view, the main objective of soil mapping is to accurately predict response of a soil-plant ecosystem to various soil management strategies. In other words, the main objective of pedometric mapping is to generate maps of soil properties and soil classes that can be used to feed other environmental models or for decision making. Pedometric mapping is largely based on applying geostatistics in soil science and other statistical methods used in pedometrics.

In applied statistics and geostatistics, regression-kriging (RK) is a spatial prediction technique that combines a regression of the dependent variable on auxiliary variables with interpolation (kriging) of the regression residuals. It is mathematically equivalent to the interpolation method variously called universal kriging and kriging with external drift, where auxiliary predictors are used directly to solve the kriging weights.

Mean-field particle methods are a broad class of interacting type Monte Carlo algorithms for simulating from a sequence of probability distributions satisfying a nonlinear evolution equation. These flows of probability measures can always be interpreted as the distributions of the random states of a Markov process whose transition probabilities depends on the distributions of the current random states. A natural way to simulate these sophisticated nonlinear Markov processes is to sample a large number of copies of the process, replacing in the evolution equation the unknown distributions of the random states by the sampled empirical measures. In contrast with traditional Monte Carlo and Markov chain Monte Carlo methods these mean-field particle techniques rely on sequential interacting samples. The terminology mean-field reflects the fact that each of the samples interacts with the empirical measures of the process. When the size of the system tends to infinity, these random empirical measures converge to the deterministic distribution of the random states of the nonlinear Markov chain, so that the statistical interaction between particles vanishes. In other words, starting with a chaotic configuration based on independent copies of initial state of the nonlinear Markov chain model, the chaos propagates at any time horizon as the size the system tends to infinity; that is, finite blocks of particles reduces to independent copies of the nonlinear Markov process. This result is called the propagation of chaos property. The terminology "propagation of chaos" originated with the work of Mark Kac in 1976 on a colliding mean-field kinetic gas model.

André Georges Journel is a French American engineer who excelled in formulating and promoting geostatistics in the earth sciences and engineering, first from the Centre of Mathematical Morphology in Fontainebleau, France and later from Stanford University.

## References

1. Armstrong, M and Champigny, N, 1988, A Study on Kriging Small Blocks, CIM Bulletin, Vol 82, No 923
2. Armstrong, M, 1992, Freedom of Speech? De Geeostatisticis, July, No 14
3. Champigny, N, 1992, Geostatistics: A tool that works, The Northern Miner, May 18
4. Clark I, 1979, Practical Geostatistics, Applied Science Publishers, London
5. David, M, 1977, Geostatistical Ore Reserve Estimation, Elsevier Scientific Publishing Company, Amsterdam
6. Hald, A, 1952, Statistical Theory with Engineering Applications, John Wiley & Sons, New York
7. Honarkhah, Mehrdad; Caers, Jef (2010). "Stochastic Simulation of Patterns Using Distance-Based Pattern Modeling". Mathematical Geosciences. 42 (5): 487–517. doi:10.1007/s11004-010-9276-7. (best paper award IAMG 09)
8. ISO/DIS 11648-1 Statistical aspects of sampling from bulk materials-Part1: General principles
9. Lipschutz, S, 1968, Theory and Problems of Probability, McCraw-Hill Book Company, New York.
10. Matheron, G. 1962. Traité de géostatistique appliquée. Tome 1, Editions Technip, Paris, 334 pp.
11. Matheron, G. 1989. Estimating and choosing, Springer-Verlag, Berlin.
12. McGrew, J. Chapman, & Monroe, Charles B., 2000. An introduction to statistical problem solving in geography, second edition, McGraw-Hill, New York.
13. Merks, J W, 1992, Geostatistics or voodoo science, The Northern Miner, May 18
14. Merks, J W, Abuse of statistics, CIM Bulletin, January 1993, Vol 86, No 966
15. Myers, Donald E.; "What Is Geostatistics?
16. Philip, G M and Watson, D F, 1986, Matheronian Geostatistics; Quo Vadis?, Mathematical Geology, Vol 18, No 1
17. Pyrcz, M.J. and Deutsch, C.V., 2014, Geostatistical Reservoir Modeling, 2nd Edition, Oxford University Press, New York, p. 448
18. Sharov, A: Quantitative Population Ecology, 1996, https://web.archive.org/web/20020605050231/http://www.ento.vt.edu/~sharov/PopEcol/popecol.html
19. Shine, J.A., Wakefield, G.I.: A comparison of supervised imagery classification using analyst-chosen and geostatistically-chosen training sets, 1999, https://web.archive.org/web/20020424165227/http://www.geovista.psu.edu/sites/geocomp99/Gc99/044/gc_044.htm
20. Strahler, A. H., and Strahler A., 2006, Introducing Physical Geography, 4th Ed., Wiley.
21. Tahmasebi, P., Hezarkhani, A., Sahimi, M., 2012, Multiple-point geostatistical modeling based on the cross-correlation functions, Computational Geosciences, 16(3):779-79742.
22. Volk, W, 1980, Applied Statistics for Engineers, Krieger Publishing Company, Huntington, New York.