Wartenberg's coefficient

Last updated September 13, 2024

Wartenberg's coefficient is a measure of correlation developed by epidemiologist Daniel Wartenberg.^[1] This coefficient is a multivariate extension of spatial autocorrelation that aims to account for spatial dependence of data while studying their covariance.^[2] A modified version of this statistic is available in the R package adespatial.^[3]

For data $x_{i}$ measured at $N$ spatial sites Moran's I is a measure of the spatial autocorrelation of the data. By standardizing the observations $z_{i}=(x_{i}-{\bar {x}})/s$ by subtracting the mean and dividing by the variance as well as normalising the spatial weight matrix such that $\sum _{ij}w_{ij}=1$ we can write Moran's I as

I=\sum _{ij}w_{ij}z_{i}z_{j}

Wartenberg generalized this by letting $z_{i}$ be a vector of $M$ observations at $i$ and defining where:

I=Z^{T}WZ

$W$ is the $N\times N$ spatial weight matrix
$Z$ is the $N\times M$ standardized data matrix
$Z^{T}$ is the transpose of $Z$
$I$ is the $M\times M$ spatial correlation matrix.

For two variables $x$ and $y$ the bivariate correlation is

I_{xy}={\frac {\sum _{ij}w_{ij}(x_{i}-{\bar {x}})(y_{j}-{\bar {y}})}{{\sqrt {\sum _{i}(x_{i}-{\bar {x}})^{2}}}{\sqrt {\sum _{i}(y_{i}-{\bar {y}})^{2}}}}}

For $M=1$ this reduces to Moran's $I$ . For larger values of $M$ the diagonals of $I$ are the Moran indices for each of the variables and the off-diagonals give the corresponding Wartenberg correlation coefficients. $I$ is an example of a Mantel statistic and so its significance can be evaluated using the Mantel test.^[4]

Criticisms

Lee^[5] points out some problems with this coefficient namely:

There is only one factor of $W$ in the numerator, so the comparison is between the raw $x$ data and the spatially averaged $y$ data.
$I_{xy}\neq I_{yx}$ for non-symmetric spatial weight matrices.

He suggests an alternative coefficient which has two factors of $W$ in the numerator and is symmetric for any weight matrix.

Related Research Articles

Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

The weighted arithmetic mean is similar to an ordinary arithmetic mean, except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The notion of weighted mean plays a role in descriptive statistics and also occurs in a more general form in several other areas of mathematics.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of children from a primary school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

In statistics, Spearman's rank correlation coefficient or Spearman's ρ, named after Charles Spearman and often denoted by the Greek letter (rho) or as $, is a nonparametric measure of rank correlation. It assesses how well the relationship between two variables can be described using a monotonic function.$

A Savitzky–Golay filter is a digital filter that can be applied to a set of digital data points for the purpose of smoothing the data, that is, to increase the precision of the data without distorting the signal tendency. This is achieved, in a process known as convolution, by fitting successive sub-sets of adjacent data points with a low-degree polynomial by the method of linear least squares. When the data points are equally spaced, an analytical solution to the least-squares equations can be found, in the form of a single set of "convolution coefficients" that can be applied to all data sub-sets, to give estimates of the smoothed signal, at the central point of each sub-set. The method, based on established mathematical procedures, was popularized by Abraham Savitzky and Marcel J. E. Golay, who published tables of convolution coefficients for various polynomials and sub-set sizes in 1964. Some errors in the tables have been corrected. The method has been extended for the treatment of 2- and 3-dimensional data.

Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the unequal variance of observations (heteroscedasticity) is incorporated into the regression. WLS is also a specialization of generalized least squares, when all the off-diagonal entries of the covariance matrix of the errors, are null.

<span class="mw-page-title-main">Correlogram</span> Image of correlation statistics

In the analysis of data, a correlogram is a chart of correlation statistics. For example, in time series analysis, a plot of the sample autocorrelations $versus is an autocorrelogram . If cross-correlation is plotted, the result is called a cross-correlogram .$

In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient, is a statistic used to measure the ordinal association between two measured quantities. A τ test is a non-parametric hypothesis test for statistical dependence based on the τ coefficient. It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities. It is named after Maurice Kendall, who developed it in 1938, though Gustav Fechner had proposed a similar measure in the context of time series in 1897.

In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there is another confounding variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. This is precisely the motivation for including other right-side variables in a multiple regression; but while multiple regression gives unbiased results for the effect size, it does not give a numerical value of a measure of the strength of the relationship between the two variables of interest.

In probability theory and statistics, the covariance function describes how much two random variables change together (their covariance) with varying spatial or temporal separation. For a random field or stochastic process Z(x) on a domain D, a covariance function C(x, y) gives the covariance of the values of the random field at the two locations x and y:

Geary's C is a measure of spatial autocorrelation that attempts to determine if observations of the same variable are spatially autocorrelated globally. Spatial autocorrelation is more complex than autocorrelation because the correlation is multi-dimensional and bi-directional.

In statistics, Moran's I is a measure of spatial autocorrelation developed by Patrick Alfred Pierce Moran. Spatial autocorrelation is characterized by a correlation in a signal among nearby locations in space. Spatial autocorrelation is more complex than one-dimensional autocorrelation because spatial correlation is multi-dimensional and multi-directional.

In statistics, the RV coefficient is a multivariate generalization of the squared Pearson correlation coefficient. It measures the closeness of two set of points that may each be represented in a matrix.

Tjøstheim's coefficient is a measure of spatial association that attempts to quantify the degree to which two spatial data sets are related. Developed by Norwegian statistician Dag Tjøstheim. It is similar to rank correlation coefficients like Spearman's rank correlation coefficient and the Kendall rank correlation coefficient but also explicitly considers the spatial relationship between variables.

Getis–Ord statistics, also known as G_i^*, are used in spatial analysis to measure the local and global spatial autocorrelation. Developed by statisticians Arthur Getis and J. Keith Ord they are commonly used for Hot Spot Analysis to identify where features with high or low values are spatially clustered in a statistically significant way. Getis-Ord statistics are available in a number of software libraries such as CrimeStat, GeoDa, ArcGIS, PySAL and R.

Lee's L is a bivariate spatial correlation coefficient which measures the association between two sets of observations made at the same spatial sites. Standard measures of association such as the Pearson correlation coefficient do not account for the spatial dimension of data, in particular they are vulnerable to inflation due to spatial autocorrelation. Lee's L is available in numerous spatial analysis software libraries including spdep and PySAL and has been applied in diverse applications such as studying air pollution, viticulture and housing rent.

Join count statistics are a method of spatial analysis used to assess the degree of association, in particular the autocorrelation, of categorical variables distributed over a spatial map. They were originally introduced by Australian statistician P. A. P. Moran. Join count statistics have found widespread use in econometrics, remote sensing and ecology. Join count statistics can be computed in a number of software packages including PASSaGE, GeoDA, PySAL and spdep.

The concept of a spatial weight is used in spatial analysis to describe neighbor relations between regions on a map. If location $is a neighbor of location then otherwise . Usually we do not consider a site to be a neighbor of itself so . These coefficients are encoded in the spatial weight matrix$

References

↑ Burger, J; Gochfeld, M (2020). "In Memoriam: Daniel Wartenberg (1952–2020)". Environ Health Perspect. 128 (11): 111601. doi:10.1289/EHP8405. PMC 7641299 . PMID 33147071.
↑ Wartenberg, D (1985). "Multivariate spatial correlation: a method for exploratory geographical analysis". Geographical Analysis. 17 (4): 263–283. Bibcode:1985GeoAn..17..263W. doi:10.1111/j.1538-4632.1985.tb00849.x.
↑ "Adespatial: Multivariate Multiscale Spatial Analysis". 18 October 2023.
↑ Dale, Mark R. T.; Fortin, Marie-Josée (2014). Spatial Analysis: A Guide For Ecologists. Cambridge University Press. p. 428. ISBN 978-0-521-14350-9.
↑ Lee, Sang-Il (2001). "Developing a bivariate spatial association measure: an integration of Pearson's r and Moran's I.". Journal of Geographical Systems. 3 (4): 369–385. Bibcode:2001JGS.....3..369L. doi:10.1007/s101090100064.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Burger, J; Gochfeld, M (2020). "In Memoriam: Daniel Wartenberg (1952–2020)". Environ Health Perspect. 128 (11): 111601. doi:10.1289/EHP8405. PMC 7641299 . PMID 33147071.

[2] Wartenberg, D (1985). "Multivariate spatial correlation: a method for exploratory geographical analysis". Geographical Analysis. 17 (4): 263–283. Bibcode:1985GeoAn..17..263W. doi:10.1111/j.1538-4632.1985.tb00849.x.

[3] "Adespatial: Multivariate Multiscale Spatial Analysis". 18 October 2023.

[4] Dale, Mark R. T.; Fortin, Marie-Josée (2014). Spatial Analysis: A Guide For Ecologists. Cambridge University Press. p. 428. ISBN 978-0-521-14350-9.

[5] Lee, Sang-Il (2001). "Developing a bivariate spatial association measure: an integration of Pearson's r and Moran's I.". Journal of Geographical Systems. 3 (4): 369–385. Bibcode:2001JGS.....3..369L. doi:10.1007/s101090100064.

[1]

[2]

[3]

[4]

[5]

Wartenberg's coefficient

Contents

Criticisms

See also

Related Research Articles

References