Distance correlation

Last updated

In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is zero if and only if the random vectors are independent. Thus, distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to Pearson's correlation, which can only detect linear association between two random variables.

Contents

Distance correlation can be used to perform a statistical test of dependence with a permutation test. One first computes the distance correlation (involving the re-centering of Euclidean distance matrices) between two random vectors, and then compares this value to the distance correlations of many shuffles of the data.

Several sets of (x, y) points, with the distance correlation coefficient of x and y for each set. Compare to the graph on correlation Distance Correlation Examples.svg
Several sets of (x, y) points, with the distance correlation coefficient of x and y for each set. Compare to the graph on correlation

Background

The classical measure of dependence, the Pearson correlation coefficient, [1] is mainly sensitive to a linear relationship between two variables. Distance correlation was introduced in 2005 by Gábor J. Székely in several lectures to address this deficiency of Pearson's correlation, namely that it can easily be zero for dependent variables. Correlation = 0 (uncorrelatedness) does not imply independence while distance correlation = 0 does imply independence. The first results on distance correlation were published in 2007 and 2009. [2] [3] It was proved that distance covariance is the same as the Brownian covariance. [3] These measures are examples of energy distances.

The distance correlation is derived from a number of other quantities that are used in its specification, specifically: distance variance, distance standard deviation, and distance covariance. These quantities take the same roles as the ordinary moments with corresponding names in the specification of the Pearson product-moment correlation coefficient.

Definitions

Distance covariance

Let us start with the definition of the sample distance covariance. Let (Xk, Yk), k = 1, 2, ..., n be a statistical sample from a pair of real valued or vector valued random variables (X, Y). First, compute the n by n distance matrices (aj, k) and (bj, k) containing all pairwise distances

where || ||denotes Euclidean norm. Then take all doubly centered distances

where is the j-th row mean, is the k-th column mean, and is the grand mean of the distance matrix of the X sample. The notation is similar for the b values. (In the matrices of centered distances (Aj, k) and (Bj,k) all rows and all columns sum to zero.) The squared sample distance covariance (a scalar) is simply the arithmetic average of the products Aj, k Bj, k:

The statistic Tn = n dCov2n(X, Y) determines a consistent multivariate test of independence of random vectors in arbitrary dimensions. For an implementation see dcov.test function in the energy package for R. [4]

The population value of distance covariance can be defined along the same lines. Let X be a random variable that takes values in a p-dimensional Euclidean space with probability distribution μ and let Y be a random variable that takes values in a q-dimensional Euclidean space with probability distribution ν, and suppose that X and Y have finite expectations. Write

Finally, define the population value of squared distance covariance of X and Y as

One can show that this is equivalent to the following definition:

where E denotes expected value, and and are independent and identically distributed. The primed random variables and denote independent and identically distributed (iid) copies of the variables and and are similarly iid. [5] Distance covariance can be expressed in terms of the classical Pearson's covariance, cov, as follows:

This identity shows that the distance covariance is not the same as the covariance of distances, cov(||XX' ||, ||YY' ||). This can be zero even if X and Y are not independent.

Alternatively, the distance covariance can be defined as the weighted L2 norm of the distance between the joint characteristic function of the random variables and the product of their marginal characteristic functions: [6]

where , , and are the characteristic functions of (X, Y),X, and Y, respectively, p, q denote the Euclidean dimension of X and Y, and thus of s and t, and cp, cq are constants. The weight function is chosen to produce a scale equivariant and rotation invariant measure that doesn't go to zero for dependent variables. [6] [7] One interpretation of the characteristic function definition is that the variables eisX and eitY are cyclic representations of X and Y with different periods given by s and t, and the expression ϕX, Y(s, t) − ϕX(s) ϕY(t) in the numerator of the characteristic function definition of distance covariance is simply the classical covariance of eisX and eitY. The characteristic function definition clearly shows that dCov2(X, Y) = 0 if and only if X and Y are independent.

Distance variance and distance standard deviation

The distance variance is a special case of distance covariance when the two variables are identical. The population value of distance variance is the square root of

where , , and are independent and identically distributed random variables, denotes the expected value, and for function , e.g., .

The sample distance variance is the square root of

which is a relative of Corrado Gini's mean difference introduced in 1912 (but Gini did not work with centered distances). [8]

The distance standard deviation is the square root of the distance variance.

Distance correlation

The distance correlation [2] [3] of two random variables is obtained by dividing their distance covariance by the product of their distance standard deviations. The distance correlation is the square root of

and the sample distance correlation is defined by substituting the sample distance covariance and distance variances for the population coefficients above.

For easy computation of sample distance correlation see the dcor function in the energy package for R. [4]

Properties

Distance correlation

  1. and ; this is in contrast to Pearson's correlation, which can be negative.
  2. if and only if X and Y are independent.
  3. implies that dimensions of the linear subspaces spanned by X and Y samples respectively are almost surely equal and if we assume that these subspaces are equal, then in this subspace for some vector A, scalar b, and orthonormal matrix .

Distance covariance

  1. and ;
  2. for all constant vectors , scalars , and orthonormal matrices .
  3. If the random vectors and are independent then
    Equality holds if and only if and are both constants, or and are both constants, or are mutually independent.
  4. if and only if X and Y are independent.

This last property is the most important effect of working with centered distances.

The statistic is a biased estimator of . Under independence of X and Y [9]

An unbiased estimator of is given by Székely and Rizzo. [10]

Distance variance

  1. if and only if almost surely.
  2. if and only if every sample observation is identical.
  3. for all constant vectors A, scalars b, and orthonormal matrices .
  4. If X and Y are independent then .

Equality holds in (iv) if and only if one of the random variables X or Y is a constant.

Generalization

Distance covariance can be generalized to include powers of Euclidean distance. Define

Then for every , and are independent if and only if . It is important to note that this characterization does not hold for exponent ; in this case for bivariate , is a deterministic function of the Pearson correlation. [2] If and are powers of the corresponding distances, , then sample distance covariance can be defined as the nonnegative number for which

One can extend to metric-space-valued random variables and : If has law in a metric space with metric , then define , , and (provided is finite, i.e., has finite first moment), . Then if has law (in a possibly different metric space with finite first moment), define

This is non-negative for all such iff both metric spaces have negative type. [11] Here, a metric space has negative type if is isometric to a subset of a Hilbert space. [12] If both metric spaces have strong negative type, then iff are independent. [11]

Alternative definition of distance covariance

The original distance covariance has been defined as the square root of , rather than the squared coefficient itself. has the property that it is the energy distance between the joint distribution of and the product of its marginals. Under this definition, however, the distance variance, rather than the distance standard deviation, is measured in the same units as the distances.

Alternately, one could define distance covariance to be the square of the energy distance: In this case, the distance standard deviation of is measured in the same units as distance, and there exists an unbiased estimator for the population distance covariance. [10]

Under these alternate definitions, the distance correlation is also defined as the square , rather than the square root.

Alternative formulation: Brownian covariance

Brownian covariance is motivated by generalization of the notion of covariance to stochastic processes. The square of the covariance of random variables X and Y can be written in the following form:

where E denotes the expected value and the prime denotes independent and identically distributed copies. We need the following generalization of this formula. If U(s), V(t) are arbitrary random processes defined for all real s and t then define the U-centered version of X by

whenever the subtracted conditional expected value exists and denote by YV the V-centered version of Y. [3] [13] [14] The (U,V) covariance of (X,Y) is defined as the nonnegative number whose square is

whenever the right-hand side is nonnegative and finite. The most important example is when U and V are two-sided independent Brownian motions /Wiener processes with expectation zero and covariance |s| + |t||st| = 2 min(s,t) (for nonnegative s, t only). (This is twice the covariance of the standard Wiener process; here the factor 2 simplifies the computations.) In this case the (U,V) covariance is called Brownian covariance and is denoted by

There is a surprising coincidence: The Brownian covariance is the same as the distance covariance:

and thus Brownian correlation is the same as distance correlation.

On the other hand, if we replace the Brownian motion with the deterministic identity function id then Covid(X,Y) is simply the absolute value of the classical Pearson covariance,

Other correlational metrics, including kernel-based correlational metrics (such as the Hilbert-Schmidt Independence Criterion or HSIC) can also detect linear and nonlinear interactions. Both distance correlation and kernel-based metrics can be used in methods such as canonical correlation analysis and independent component analysis to yield stronger statistical power.

See also

Notes

  1. Pearson  1895a , 1895b
  2. 1 2 3 Székely, Rizzo & Bakirov 2007.
  3. 1 2 3 4 Székely & Rizzo 2009a.
  4. 1 2 Rizzo & Székely 2021.
  5. Székely & Rizzo 2014, p. 11.
  6. 1 2 Székely & Rizzo 2009a , p. 1249, Theorem 7, (3.7).
  7. Székely & Rizzo 2012.
  8. Gini 1912.
  9. Székely & Rizzo 2009b.
  10. 1 2 Székely & Rizzo 2014.
  11. 1 2 Lyons 2014.
  12. Klebanov 2005, p. [ page needed ].
  13. Bickel & Xu 2009.
  14. Kosorok 2009.

Related Research Articles

<span class="mw-page-title-main">Variance</span> Statistical measure of how far values spread from their average

In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed.

In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value. The individual variables in a random vector are grouped together because they are all part of a single mathematical system — often they represent different properties of an individual statistical unit. For example, while a given person has a specific age, height and weight, the representation of these features of an unspecified person from within a group would be a random vector. Normally each element of a random vector is a real number.

Multivariate normal distribution Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

Wiener process Stochastic process generalizing Brownian motion

In mathematics, the Wiener process is a real-valued continuous-time stochastic process named in honor of American mathematician Norbert Wiener for his investigations on the mathematical properties of the one-dimensional Brownian motion. It is often also called Brownian motion due to its historical connection with the physical process of the same name originally observed by Scottish botanist Robert Brown. It is one of the best known Lévy processes and occurs frequently in pure and applied mathematics, economics, quantitative finance, evolutionary biology, and physics.

<span class="mw-page-title-main">Correlation</span> Statistical concept

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it normally refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, the covariance is positive. In the opposite case, when the greater values of one variable mainly correspond to the lesser values of the other,, the covariance is negative. The sign of the covariance therefore shows the tendency in the linear relationship between the variables. The magnitude of the covariance is not easy to interpret because it is not normalized and hence depends on the magnitudes of the variables. The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation.

In probability theory and statistics, two real-valued random variables, , , are said to be uncorrelated if their covariance, , is zero. If two variables are uncorrelated, there is no linear relationship between them.

Covariance matrix Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector. Any covariance matrix is symmetric and positive semi-definite and its main diagonal contains variances.

<span class="mw-page-title-main">Pearson correlation coefficient</span> Measure of linear correlation

In statistics, the Pearson correlation coefficient ― also known as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficient ― is a measure of linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

In probability theory, the law of total variance or variance decomposition formula or conditional variance formulas or law of iterated variances also known as Eve's law, states that if and are random variables on the same probability space, and the variance of is finite, then

In probability theory and statistics, the cumulantsκn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. Any two probability distributions whose moments are identical will have identical cumulants as well, and vice versa.

Kriging Method of interpolation

In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. Interpolating methods based on other criteria such as smoothness may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.

The algebra of random variables in statistics, provides rules for the symbolic manipulation of random variables, while avoiding delving too deeply into the mathematically sophisticated ideas of probability theory. Its symbolism allows the treatment of sums, products, ratios and general functions of random variables, as well as dealing with operations such as finding the probability distributions and the expectations, variances and covariances of such combinations.

In probability and statistics, given two stochastic processes and , the cross-covariance is a function that gives the covariance of one process with the other at pairs of time points. With the usual notation for the expectation operator, if the processes have the mean functions and , then the cross-covariance is given by

In statistics, an exchangeable sequence of random variables is a sequence X1X2X3, ... whose joint probability distribution does not change when the positions in the sequence in which finitely many of them appear are altered. Thus, for example the sequences

In probability theory and statistics, the covariance function describes how much two random variables change together with varying spatial or temporal separation. For a random field or stochastic process Z(x) on a domain D, a covariance function C(xy) gives the covariance of the values of the random field at the two locations x and y:

In probability theory and statistics, a cross-covariance matrix is a matrix whose element in the i, j position is the covariance between the i-th element of a random vector and j-th element of another random vector. A random vector is a random variable with multiple dimensions. Each element of the vector is a scalar random variable. Each element has either a finite number of observed empirical values or a finite or infinite number of potential values. The potential values are specified by a theoretical joint probability distribution. Intuitively, the cross-covariance matrix generalizes the notion of covariance to multiple dimensions.

Energy distance is a statistical distance between probability distributions. If X and Y are independent random vectors in Rd with cumulative distribution functions (cdf) F and G respectively, then the energy distance between the distributions F and G is defined to be the square root of

In statistics, the variance function is a smooth function which depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

References