Geary's C

Last updated

Geary's C is a measure of spatial autocorrelation that attempts to determine if observations of the same variable are spatially autocorrelated globally (rather than at the neighborhood level). Spatial autocorrelation is more complex than autocorrelation because the correlation is multi-dimensional and bi-directional.

Contents

Global Geary's C


Geary's C is defined as

where is the number of spatial units indexed by and ; is the variable of interest; is the mean of ; is the row of the spatial weights matrix with zeroes on the diagonal (i.e., ); and is the sum of all weights in .

Geary's C statistic computed for different spatial patterns. Using 'rook' neighbors for each grid cell, setting
w
i
j
=
1
{\displaystyle w_{ij}=1}
for neighbours
j
{\displaystyle j}
of
i
{\displaystyle i}
and then row normalizing the weight matrix. Top left shows gives
C
>
1
{\displaystyle C>1}
indicating anti-correlation. Top right shows a spatial gradient giving
C
<
1
{\displaystyle C<1}
indicating correlation. Bottom left shows random data giving a value of
C
~
1
{\displaystyle C\sim 1}
indicating no correlation. Bottom right shows a spreading pattern with positive autocorrelation. Geary example.png
Geary's C statistic computed for different spatial patterns. Using 'rook' neighbors for each grid cell, setting for neighbours of and then row normalizing the weight matrix. Top left shows gives indicating anti-correlation. Top right shows a spatial gradient giving indicating correlation. Bottom left shows random data giving a value of indicating no correlation. Bottom right shows a spreading pattern with positive autocorrelation.

The value of Geary's C lies between 0 and some unspecified value greater than 1. Values significantly lower than 1 demonstrate increasing positive spatial autocorrelation, whilst values significantly higher than 1 illustrate increasing negative spatial autocorrelation.

Geary's C is inversely related to Moran's I, but it is not identical. While Moran's I and Geary's C are both measures of global spatial autocorrelation, they are slightly different. Geary's C uses the sum of squared distances whereas Moran's I uses standardized spatial covariance. By using squared distances Geary's C is less sensitive to linear associations and may pickup autocorrelation where Moran's I may not. [1]

Geary's C is also known as Geary's contiguity ratio or simply Geary's ratio. [2]

This statistic was developed by Roy C. Geary. [3]

Local Geary's C

Like Moran's I, Geary's C can be decomposed into a sum of Local Indicators of Spatial Association (LISA) statistics. LISA statistics can be used to find local clusters through significance testing, though because a large number of tests must be performed (one per sampling area) this approach suffers from the multiple comparisons problem. As noted by Anselin, [4] this means the analysis of the local Geary statistic is aimed at identifying interesting points which should then be subject to further investigation. This is therefore a type of exploratory data analysis.

A local version of is given by [5]

where

then,

Local Geary's C can be calculated in GeoDa and PySAL. [6]


Sources

  1. Anselin, Luc (April 2019). "A Local Indicator of Multivariate Spatial Association: Extending Geary's c". Geographical Analysis. 51 (2): 133–150. doi: 10.1111/gean.12164 .
  2. J. N. R. Jeffers (1973). "A Basic Subroutine for Geary's Contiguity Ratio". Journal of the Royal Statistical Society, Series D. 22 (4). Wiley: 299–302. doi:10.2307/2986827. JSTOR   2986827.
  3. Geary, R. C. (1954). "The Contiguity Ratio and Statistical Mapping". The Incorporated Statistician . 5 (3): 115–145. doi:10.2307/2986645. JSTOR   2986645.
  4. "Local Spatial Autocorrelation (2)".
  5. Anselin, L. (2019). "A local indicator of multivariate spatial association: extending Geary's C". Geographical Analysis. 51 (2): 133–150. doi:10.1111/gean.12164.
  6. "ESDA: Exploratory Spatial Data Analysis — esda v2.6.0 Manual".


Related Research Articles

<span class="mw-page-title-main">Autocorrelation</span> Correlation of a signal with a time-shifted copy of itself, as a function of shift

Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

The weighted arithmetic mean is similar to an ordinary arithmetic mean, except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The notion of weighted mean plays a role in descriptive statistics and also occurs in a more general form in several other areas of mathematics.

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator, ridge regression, or simply any degenerate estimator.

<span class="mw-page-title-main">Kriging</span> Method of interpolation

In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. Interpolating methods based on other criteria such as smoothness may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.

In statistics, G-tests are likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-squared tests were previously recommended.

<span class="mw-page-title-main">Inverse distance weighting</span> Type of deterministic method for multivariate interpolation

Inverse distance weighting (IDW) is a type of deterministic method for multivariate interpolation with a known scattered set of points. The assigned values to unknown points are calculated with a weighted average of the values available at the known points. This method can also be used to create spatial weights matrices in spatial autocorrelation analyses.

<span class="mw-page-title-main">Correlogram</span> Image of correlation statistics

In the analysis of data, a correlogram is a chart of correlation statistics. For example, in time series analysis, a plot of the sample autocorrelations versus is an autocorrelogram. If cross-correlation is plotted, the result is called a cross-correlogram.

In statistics, a random effects model, also called a variance components model, is a statistical model where the model parameters are random variables. It is a kind of hierarchical linear model, which assumes that the data being analysed are drawn from a hierarchy of different populations whose differences relate to that hierarchy. A random effects model is a special case of a mixed model.

In statistics, the inverse Wishart distribution, also called the inverted Wishart distribution, is a probability distribution defined on real-valued positive-definite matrices. In Bayesian statistics it is used as the conjugate prior for the covariance matrix of a multivariate normal distribution.

An index of qualitative variation (IQV) is a measure of statistical dispersion in nominal distributions. Examples include the variation ratio or the information entropy.

<span class="mw-page-title-main">GeoDa</span> Free geovisualization and analysis software

GeoDa is a free software package that conducts spatial data analysis, geovisualization, spatial autocorrelation and spatial modeling.

Indicators of spatial association are statistics that evaluate the existence of clusters in the spatial arrangement of a given variable. For instance, if we are studying cancer rates among census tracts in a given city local clusters in the rates mean that there are areas that have higher or lower rates than is to be expected by chance alone; that is, the values occurring are above or below those of a random distribution in space.

Morans <i>I</i> Measure of spatial autocorrelation

In statistics, Moran's I is a measure of spatial autocorrelation developed by Patrick Alfred Pierce Moran. Spatial autocorrelation is characterized by a correlation in a signal among nearby locations in space. Spatial autocorrelation is more complex than one-dimensional autocorrelation because spatial correlation is multi-dimensional and multi-directional.

<span class="mw-page-title-main">Maximum cut</span> Problem of finding a maximum cut in a graph

In a graph, a maximum cut is a cut whose size is at least the size of any other cut. That is, it is a partition of the graph's vertices into two complementary sets S and T, such that the number of edges between S and T is as large as possible. Finding such a cut is known as the max-cut problem.

The Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) is a multi-criteria decision analysis method, which was originally developed by Ching-Lai Hwang and Yoon in 1981 with further developments by Yoon in 1987, and Hwang, Lai and Liu in 1993. TOPSIS is based on the concept that the chosen alternative should have the shortest geometric distance from the positive ideal solution (PIS) and the longest geometric distance from the negative ideal solution (NIS). A dedicated book in the fuzzy context was published in 2021

Getis–Ord statistics, also known as Gi*, are used in spatial analysis to measure the local and global spatial autocorrelation. Developed by statisticians Arthur Getis and J. Keith Ord they are commonly used for Hot Spot Analysis to identify where features with high or low values are spatially clustered in a statistically significant way. Getis-Ord statistics are available in a number of software libraries such as CrimeStat, GeoDa, ArcGIS, PySAL and R.

Wartenberg's coefficient is a measure of correlation developed by epidemiologist Daniel Wartenberg. This coefficient is a multivariate extension of spatial autocorrelation that aims to account for spatial dependence of data while studying their covariance. A modified version of this statistic is available in the R package adespatial.

Lee's L is a bivariate spatial correlation coefficient which measures the association between two sets of observations made at the same spatial sites. Standard measures of association such as the Pearson correlation coefficient do not account for the spatial dimension of data, in particular they are vulnerable to inflation due to spatial autocorrelation. Lee's L is available in numerous spatial analysis software libraries including spdep and PySAL and has been applied in diverse applications such as studying air pollution, viticulture and housing rent.


Join count statistics are a method of spatial analysis used to assess the degree of association, in particular the autocorrelation, of categorical variables distributed over a spatial map. They were originally introduced by Australian statistician P. A. P. Moran. Join count statistics have found widespread use in econometrics, remote sensing and ecology. Join count statistics can be computed in a number of software packages including PASSaGE, GeoDA, PySAL and spdep.

The concept of a spatial weight is used in spatial analysis to describe neighbor relations between regions on a map. If location is a neighbor of location then otherwise . Usually we do not consider a site to be a neighbor of itself so . These coefficients are encoded in the spatial weight matrix