Lee's L

Last updated

Lee's L is a bivariate spatial correlation coefficient which measures the association between two sets of observations made at the same spatial sites. Standard measures of association such as the Pearson correlation coefficient do not account for the spatial dimension of data, in particular they are vulnerable to inflation due to spatial autocorrelation. Lee's L is available in numerous spatial analysis software libraries including spdep [1] and PySAL [2] (where it is called Spatial_Pearson) and has been applied in diverse applications such as studying air pollution, [3] viticulture [4] and housing rent. [5]

For spatial data and measured at locations connected with the spatial weight matrix first define the spatially lagged vector

with a similar definition for . Then Lee's L [6] is defined as

where are the mean values of . When the spatial weight matrix is row normalized, such that , the first factor is 1.

Lee also defines the spatial smoothing scalar

to measure the spatial autocorrelation of a variable.

It is shown by Lee [6] that the above definition is equivalent to

Where is the Pearson correlation coefficient

This means Lee's L is equivalent to the Pearson correlation of the spatially lagged data, multiplied by a measure of each data set's spatial autocorrelation.

Related Research Articles

<span class="mw-page-title-main">Correlation</span> Statistical concept

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

<span class="mw-page-title-main">Pearson correlation coefficient</span> Measure of linear correlation

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

<span class="mw-page-title-main">Spearman's rank correlation coefficient</span> Nonparametric measure of rank correlation

In statistics, Spearman's rank correlation coefficient or Spearman's ρ, named after Charles Spearman and often denoted by the Greek letter (rho) or as , is a nonparametric measure of rank correlation. It assesses how well the relationship between two variables can be described using a monotonic function.

In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the value of a parameter for a hypothetical population, or to the equation that operationalizes how statistics or parameters lead to the effect size value. Examples of effect sizes include the correlation between two variables, the regression coefficient in a regression, the mean difference, or the risk of a particular event happening. Effect sizes complement statistical hypothesis testing, and play an important role in power analyses, sample size planning, and in meta-analyses. The cluster of data-analysis methods concerning effect sizes is referred to as estimation statistics.

<span class="mw-page-title-main">Fisher transformation</span> Statistical transformation

In statistics, the Fisher transformation of a Pearson correlation coefficient is its inverse hyperbolic tangent (artanh). When the sample correlation coefficient r is near 1 or -1, its distribution is highly skewed, which makes it difficult to estimate confidence intervals and apply tests of significance for the population correlation coefficient ρ. The Fisher transformation solves this problem by yielding a variable whose distribution is approximately normally distributed, with a variance that is stable over different values of r.

<span class="mw-page-title-main">Large eddy simulation</span> Mathematical model for turbulence

Large eddy simulation (LES) is a mathematical model for turbulence used in computational fluid dynamics. It was initially proposed in 1963 by Joseph Smagorinsky to simulate atmospheric air currents, and first explored by Deardorff (1970). LES is currently applied in a wide variety of engineering applications, including combustion, acoustics, and simulations of the atmospheric boundary layer.

In statistics, a rank correlation is any of several statistics that measure an ordinal association—the relationship between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is the assignment of the ordering labels "first", "second", "third", etc. to different observations of a particular variable. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them. For example, two common nonparametric methods of significance that use rank correlation are the Mann–Whitney U test and the Wilcoxon signed-rank test.

<span class="mw-page-title-main">Correlogram</span> Image of correlation statistics

In the analysis of data, a correlogram is a chart of correlation statistics. For example, in time series analysis, a plot of the sample autocorrelations versus is an autocorrelogram. If cross-correlation is plotted, the result is called a cross-correlogram.

In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient, is a statistic used to measure the ordinal association between two measured quantities. A τ test is a non-parametric hypothesis test for statistical dependence based on the τ coefficient. It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities. It is named after Maurice Kendall, who developed it in 1938, though Gustav Fechner had proposed a similar measure in the context of time series in 1897.

In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there is another confounding variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. This is precisely the motivation for including other right-side variables in a multiple regression; but while multiple regression gives unbiased results for the effect size, it does not give a numerical value of a measure of the strength of the relationship between the two variables of interest.

<span class="mw-page-title-main">Intraclass correlation</span> Descriptive statistic

In statistics, the intraclass correlation, or the intraclass correlation coefficient (ICC), is a descriptive statistic that can be used when quantitative measurements are made on units that are organized into groups. It describes how strongly units in the same group resemble each other. While it is viewed as a type of correlation, unlike most other correlation measures, it operates on data structured as groups rather than data structured as paired observations.

In statistics, the concordance correlation coefficient measures the agreement between two variables, e.g., to evaluate reproducibility or for inter-rater reliability.

Geary's C is a measure of spatial autocorrelation that attempts to determine if observations of the same variable are spatially autocorrelated globally. Spatial autocorrelation is more complex than autocorrelation because the correlation is multi-dimensional and bi-directional.

Morans <i>I</i> Measure of spatial autocorrelation

In statistics, Moran's I is a measure of spatial autocorrelation developed by Patrick Alfred Pierce Moran. Spatial autocorrelation is characterized by a correlation in a signal among nearby locations in space. Spatial autocorrelation is more complex than one-dimensional autocorrelation because spatial correlation is multi-dimensional and multi-directional.

In statistics and machine learning, lasso is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting statistical model. The lasso method assumes that the coefficients of the linear model are sparse, meaning that few of them are non-zero. It was originally introduced in geophysics, and later by Robert Tibshirani, who coined the term.

In statistics, Cramér's V is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic and was published by Harald Cramér in 1946.

In statistics, Tschuprow's T is a measure of association between two nominal variables, giving a value between 0 and 1 (inclusive). It is closely related to Cramér's V, coinciding with it for square contingency tables. It was published by Alexander Tschuprow in 1939.

Getis–Ord statistics, also known as Gi*, are used in spatial analysis to measure the local and global spatial autocorrelation. Developed by statisticians Arthur Getis and J. Keith Ord they are commonly used for Hot Spot Analysis to identify where features with high or low values are spatially clustered in a statistically significant way. Getis-Ord statistics are available in a number of software libraries such as CrimeStat, GeoDa, ArcGIS, PySAL and R.

Wartenberg's coefficient is a measure of correlation developed by epidemiologist Daniel Wartenberg. This coefficient is a multivariate extension of spatial autocorrelation that aims to account for spatial dependence of data while studying their covariance. A modified version of this statistic is available in the R package adespatial.


Join count statistics are a method of spatial analysis used to assess the degree of association, in particular the autocorrelation, of categorical variables distributed over a spatial map. They were originally introduced by Australian statistician P. A. P. Moran. Join count statistics have found widespread use in econometrics, remote sensing and ecology. Join count statistics can be computed in a number of software packages including PASSaGE, GeoDA, PySAL and spdep.

References

  1. "Lee's L test for spatial autocorrelation — lee.test".
  2. "API reference — esda v0.1.dev1+ga296c39 Manual".
  3. Yang D, Ye C, Wang X, Lu D, Xu J, Yang H (2018). "Global distribution and evolvement of urbanization and PM2. 5 (1998–2015)". Atmospheric Environment. 182: 171–178. doi:10.1016/j.atmosenv.2018.03.053.
  4. Lu, Yonglong; Yang, Yifu; Sun, Bin; Yuan, Jingjing; Yu, Minzhao; Stenseth, Nils Chr.; Bullock, James M.; Obersteiner, Michael (2020). "Spatial variation in biodiversity loss across China under multiple environmental stressors". Science Advances. 6 (47). doi:10.1126/sciadv.abd0952. PMC   7679164 . PMID   33219032.
  5. Hu, Lirong; He, Shenjing; Han, Zixuan; Xiao, He; Su, Shiliang; Weng, Min; Cai, Zhongliang (2019). "Monitoring housing rental prices based on social media:An integrated approach of machine-learning algorithms and hedonic modeling to inform equitable housing policies". Land Use Policy. 82: 657–673. Bibcode:2019LUPol..82..657H. doi:10.1016/j.landusepol.2018.12.030.
  6. 1 2 Lee, Sang-Il (2001). "Developing a bivariate spatial association measure: an integration of Pearson's r and Moran's I.". Journal of Geographical Systems. 3 (4): 369–385. Bibcode:2001JGS.....3..369L. doi:10.1007/s101090100064.