Partial correlation

Last updated

In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there is another confounding variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. This is precisely the motivation for including other right-side variables in a multiple regression; but while multiple regression gives unbiased results for the effect size, it does not give a numerical value of a measure of the strength of the relationship between the two variables of interest.

Contents

For example, given economic data on the consumption, income, and wealth of various individuals, consider the relationship between consumption and income. Failing to control for wealth when computing a correlation coefficient between consumption and income would give a misleading result, since income might be numerically related to wealth which in turn might be numerically related to consumption; a measured correlation between consumption and income might actually be contaminated by these other correlations. The use of a partial correlation avoids this problem.

Like the correlation coefficient, the partial correlation coefficient takes on a value in the range from –1 to 1. The value –1 conveys a perfect negative correlation controlling for some variables (that is, an exact linear relationship in which higher values of one variable are associated with lower values of the other); the value 1 conveys a perfect positive linear relationship, and the value 0 conveys that there is no linear relationship.

The partial correlation coincides with the conditional correlation if the random variables are jointly distributed as the multivariate normal, other elliptical, multivariate hypergeometric, multivariate negative hypergeometric, multinomial, or Dirichlet distribution, but not in general otherwise. [1]

Formal definition

Formally, the partial correlation between X and Y given a set of n controlling variables Z = {Z1, Z2, ..., Zn}, written ρXY·Z, is the correlation between the residuals eX and eY resulting from the linear regression of X with Z and of Y with Z, respectively. The first-order partial correlation (i.e., when n = 1) is the difference between a correlation and the product of the removable correlations divided by the product of the coefficients of alienation of the removable correlations. The coefficient of alienation, and its relation with joint variance through correlation are available in Guilford (1973, pp. 344–345). [2]

Computation

Using linear regression

A simple way to compute the sample partial correlation for some data is to solve the two associated linear regression problems and calculate the correlation between the residuals. Let X and Y be random variables taking real values, and let Z be the n-dimensional vector-valued random variable. Let xi, yi and zi denote the ith of i.i.d. observations from some joint probability distribution over real random variables X, Y, and Z, with zi having been augmented with a 1 to allow for a constant term in the regression. Solving the linear regression problem amounts to finding (n+1)-dimensional regression coefficient vectors and such that

where is the number of observations, and is the scalar product between the vectors and .

The residuals are then

and the sample partial correlation is then given by the usual formula for sample correlation, but between these new derived values:

In the first expression the three terms after minus signs all equal 0 since each contains the sum of residuals from an ordinary least squares regression.

Example

Consider the following data on three variables, X, Y, and Z:

XYZ
210
420
1531
2041

Computing the Pearson correlation coefficient between variables X and Y results in approximately 0.970, while computing the partial correlation between X and Y, using the formula given above, gives a partial correlation of 0.919. The computations were done using R with the following code.

> X<-c(2,4,15,20)> Y<-c(1,2,3,4)> Z<-c(0,0,1,1)> mm1<-lm(X~Z)> res1<-mm1$residuals> mm2<-lm(Y~Z)> res2<-mm2$residuals> cor(res1,res2)[1] 0.919145> cor(X,Y)[1] 0.9695016> generalCorr::parcorMany(cbind(X,Y,Z))     nami namj partij   partji rijMrji  [1,] "X"  "Y"  "0.8844" "1"    "-0.1156"[2,] "X"  "Z"  "0.1581" "1"    "-0.8419"

The lower part of the above code reports generalized nonlinear partial correlation coefficient between X and Y after removing the nonlinear effect of Z to be 0.8844. Also, the generalized partial correlation coefficient between X and Z after removing the nonlinear effect of Y to be 0.1581. See the R package `generalCorr' and its vignettes for details. Simulation and other details are in Vinod (2017) "Generalized correlation and kernel causality with applications in development economics," Communications in Statistics - Simulation and Computation, vol. 46, [4513, 4534], available online: 29 Dec 2015, URL https://doi.org/10.1080/03610918.2015.1122048.

Using recursive formula

It can be computationally expensive to solve the linear regression problems. Actually, the nth-order partial correlation (i.e., with |Z| = n) can be easily computed from three (n - 1)th-order partial correlations. The zeroth-order partial correlation ρXY·Ø is defined to be the regular correlation coefficient ρXY.

It holds, for any that [3]

Naïvely implementing this computation as a recursive algorithm yields an exponential time complexity. However, this computation has the overlapping subproblems property, such that using dynamic programming or simply caching the results of the recursive calls yields a complexity of .

Note in the case where Z is a single variable, this reduces to:[ citation needed ]

Using matrix inversion

The partial correlation can also be written in terms of the joint precision matrix. Consider a set of random variables, of cardinality n. We want the partial correlation between two variables and given all others, i.e., . Suppose the (joint/full) covariance matrix is positive definite and therefore invertible. If the precision matrix is defined as , then

 

 

 

 

(1)

Computing this requires , the inverse of the covariance matrix which runs in time (using the sample covariance matrix to obtain a sample partial correlation). Note that only a single matrix inversion is required to give all the partial correlations between pairs of variables in .

To prove Equation ( 1 ), return to the previous notation (i.e. ) and start with the definition of partial correlation: ρXY·Z is the correlation between the residuals eX and eY resulting from the linear regression of X with Z and of Y with Z, respectively.

First, suppose are the coefficients for linear regression fit; that is,

Write the joint covariance matrix for the vector as

where

Then the standard formula for linear regression gives

Hence, the residuals can be written as

Note that has expectation zero because of the inclusion of an intercept term in . Computing the covariance now gives

 

 

 

 

(2)

Next, write the precision matrix in a similar block form:

Then, by Schur's formula for block-matrix inversion,

The entries of the right-hand-side matrix are precisely the covariances previously computed in ( 2 ), giving

Using the formula for the inverse of a 2×2 matrix gives

So indeed, the partial correlation is

as claimed in ( 1 ).

Interpretation

Geometrical interpretation of partial correlation for the case of N = 3 observations and thus a 2-dimensional hyperplane PartialCorrelationGeometrically.svg
Geometrical interpretation of partial correlation for the case of N = 3 observations and thus a 2-dimensional hyperplane

Geometrical

Let three variables X, Y, Z (where Z is the "control" or "extra variable") be chosen from a joint probability distribution over n variables V. Further, let vi, 1 ≤ iN, be Nn-dimensional i.i.d. observations taken from the joint probability distribution over V. The geometrical interpretation comes from considering the N-dimensional vectors x (formed by the successive values of X over the observations), y (formed by the values of Y), and z (formed by the values of Z).

It can be shown that the residuals eX,i coming from the linear regression of X on Z, if also considered as an N-dimensional vector eX (denoted rX in the accompanying graph), have a zero scalar product with the vector z generated by Z. This means that the residuals vector lies on an (N–1)-dimensional hyperplane Sz that is perpendicular to z.

The same also applies to the residuals eY,i generating a vector eY. The desired partial correlation is then the cosine of the angle φ between the projections eX and eY of x and y, respectively, onto the hyperplane perpendicular to z. [4] :ch. 7

As conditional independence test

With the assumption that all involved variables are multivariate Gaussian, the partial correlation ρXY·Z is zero if and only if X is conditionally independent from Y given Z. [1] This property does not hold in the general case.

To test if a sample partial correlation implies that the true population partial correlation differs from 0, Fisher's z-transform of the partial correlation can be used:

The null hypothesis is , to be tested against the two-tail alternative . can be rejected if

where is the cumulative distribution function of a Gaussian distribution with zero mean and unit standard deviation, is the significance level of , and is the sample size. This z-transform is approximate, and the actual distribution of the sample (partial) correlation coefficient is not straightforward. However, an exact t-test based on a combination of the partial regression coefficient, the partial correlation coefficient, and the partial variances is available. [5]

The distribution of the sample partial correlation was described by Fisher. [6]

Semipartial correlation (part correlation)

The semipartial (or part) correlation statistic is similar to the partial correlation statistic; both compare variations of two variables after certain factors are controlled for. However, to calculate the semipartial correlation, one holds the third variable constant for either X or Y but not both; whereas for the partial correlation, one holds the third variable constant for both. [7] The semipartial correlation compares the unique variation of one variable (having removed variation associated with the Z variable(s)) with the unfiltered variation of the other, while the partial correlation compares the unique variation of one variable to the unique variation of the other.

The semipartial correlation can be viewed as more practically relevant "because it is scaled to (i.e., relative to) the total variability in the dependent (response) variable." [8] Conversely, it is less theoretically useful because it is less precise about the role of the unique contribution of the independent variable.

The absolute value of the semipartial correlation of X with Y is always less than or equal to that of the partial correlation of X with Y. The reason is this: Suppose the correlation of X with Z has been removed from X, giving the residual vector ex . In computing the semipartial correlation, Y still contains both unique variance and variance due to its association with Z. But ex , being uncorrelated with Z, can only explain some of the unique part of the variance of Y and not the part related to Z. In contrast, with the partial correlation, only ey (the part of the variance of Y that is unrelated to Z) is to be explained, so there is less variance of the type that ex cannot explain.

Use in time series analysis

In time series analysis, the partial autocorrelation function (sometimes "partial correlation function") of a time series is defined, for lag , as[ citation needed ]

This function is used to determine the appropriate lag length for an autoregression.

See also

Related Research Articles

<span class="mw-page-title-main">Navier–Stokes equations</span> Equations describing the motion of viscous fluid substances

The Navier–Stokes equations are partial differential equations which describe the motion of viscous fluid substances, named after French engineer and physicist Claude-Louis Navier and Irish physicist and mathematician George Gabriel Stokes. They were developed over several decades of progressively building the theories, from 1822 (Navier) to 1842-1850 (Stokes).

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

Electrical resistivity is a fundamental specific property of a material that measures its electrical resistance or how strongly it resists electric current. A low resistivity indicates a material that readily allows electric current. Resistivity is commonly represented by the Greek letter ρ (rho). The SI unit of electrical resistivity is the ohm-metre (Ω⋅m). For example, if a 1 m3 solid cube of material has sheet contacts on two opposite faces, and the resistance between these contacts is 1 Ω, then the resistivity of the material is 1 Ω⋅m.

<span class="mw-page-title-main">Correlation</span> Statistical concept

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, the covariance is positive. In the opposite case, when the greater values of one variable mainly correspond to the lesser values of the other,, the covariance is negative. The sign of the covariance, therefore, shows the tendency in the linear relationship between the variables. The magnitude of the covariance is the geometric mean of the variances that are in-common for the two random variables. The correlation coefficient normalizes the covariance by dividing by the geometric mean of the total variances for the two random variables.

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

<span class="mw-page-title-main">Pearson correlation coefficient</span> Measure of linear correlation

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

Linear elasticity is a mathematical model of how solid objects deform and become internally stressed due to prescribed loading conditions. It is a simplification of the more general nonlinear theory of elasticity and a branch of continuum mechanics.

The primitive equations are a set of nonlinear partial differential equations that are used to approximate global atmospheric flow and are used in most atmospheric models. They consist of three main sets of balance equations:

  1. A continuity equation: Representing the conservation of mass.
  2. Conservation of momentum: Consisting of a form of the Navier–Stokes equations that describe hydrodynamical flow on the surface of a sphere under the assumption that vertical motion is much smaller than horizontal motion (hydrostasis) and that the fluid layer depth is small compared to the radius of the sphere
  3. A thermal energy equation: Relating the overall temperature of the system to heat sources and sinks
<span class="mw-page-title-main">Canonical correlation</span> Way of inferring information from cross-covariance matrices

In statistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information from cross-covariance matrices. If we have two vectors X = (X1, ..., Xn) and Y = (Y1, ..., Ym) of random variables, and there are correlations among the variables, then canonical-correlation analysis will find linear combinations of X and Y which have maximum correlation with each other. T. R. Knapp notes that "virtually all of the commonly encountered parametric tests of significance can be treated as special cases of canonical-correlation analysis, which is the general procedure for investigating the relationships between two sets of variables." The method was first introduced by Harold Hotelling in 1936, although in the context of angles between flats the mathematical concept was published by Jordan in 1875.

In statistics, propagation of uncertainty is the effect of variables' uncertainties on the uncertainty of a function based on them. When the variables are the values of experimental measurements they have uncertainties due to measurement limitations which propagate due to the combination of variables in the function.

<span class="mw-page-title-main">Cross-correlation</span> Covariance and correlation

In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long signal for a shorter, known feature. It has applications in pattern recognition, single particle analysis, electron tomography, averaging, cryptanalysis, and neurophysiology. The cross-correlation is similar in nature to the convolution of two functions. In an autocorrelation, which is the cross-correlation of a signal with itself, there will always be a peak at a lag of zero, and its size will be the signal energy.

<span class="mw-page-title-main">Fisher transformation</span> Statistical transformation

In statistics, the Fisher transformation of a Pearson correlation coefficient is its inverse hyperbolic tangent (artanh). When the sample correlation coefficient r is near 1 or -1, its distribution is highly skewed, which makes it difficult to estimate confidence intervals and apply tests of significance for the population correlation coefficient ρ. The Fisher transformation solves this problem by yielding a variable whose distribution is approximately normally distributed, with a variance that is stable over different values of r.

<span class="mw-page-title-main">Total least squares</span>

In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generalization of Deming regression and also of orthogonal regression, and can be applied to both linear and non-linear models.

In statistics, Bayesian multivariate linear regression is a Bayesian approach to multivariate linear regression, i.e. linear regression where the predicted outcome is a vector of correlated random variables rather than a single scalar random variable. A more general treatment of this approach can be found in the article MMSE estimator.

The intent of this article is to highlight the important points of the derivation of the Navier–Stokes equations as well as its application and formulation for different families of fluids.

The Cauchy momentum equation is a vector partial differential equation put forth by Cauchy that describes the non-relativistic momentum transport in any continuum.

In econometrics, Prais–Winsten estimation is a procedure meant to take care of the serial correlation of type AR(1) in a linear model. Conceived by Sigbert Prais and Christopher Winsten in 1954, it is a modification of Cochrane–Orcutt estimation in the sense that it does not lose the first observation, which leads to more efficiency as a result and makes it a special case of feasible generalized least squares.

<span class="mw-page-title-main">Matrix representation of Maxwell's equations</span>

In electromagnetism, a branch of fundamental physics, the matrix representations of the Maxwell's equations are a formulation of Maxwell's equations using matrices, complex numbers, and vector calculus. These representations are for a homogeneous medium, an approximation in an inhomogeneous medium. A matrix representation for an inhomogeneous medium was presented using a pair of matrix equations. A single equation using 4 × 4 matrices is necessary and sufficient for any homogeneous medium. For an inhomogeneous medium it necessarily requires 8 × 8 matrices.

In statistics, functional correlation is a dimensionality reduction technique used to quantify the correlation and dependence between two variables when the data is functional. Several approaches have been developed to quantify the relation between two functional variables.

References

  1. 1 2 Baba, Kunihiro; Ritei Shibata; Masaaki Sibuya (2004). "Partial correlation and conditional correlation as measures of conditional independence". Australian and New Zealand Journal of Statistics . 46 (4): 657–664. doi:10.1111/j.1467-842X.2004.00360.x. S2CID   123130024.
  2. Guilford J. P., Fruchter B. (1973). Fundamental statistics in psychology and education. Tokyo: McGraw-Hill Kogakusha, LTD.
  3. Kim, Seongho (November 2015). "ppcor: An R Package for a Fast Calculation to Semi-partial Correlation Coefficients". Communications for Statistical Applications and Methods. 22 (6): 665–674. doi:10.5351/CSAM.2015.22.6.665. ISSN   2287-7843. PMC   4681537 . PMID   26688802.
  4. Rummel, R. J. (1976). "Understanding Correlation".
  5. Kendall MG, Stuart A. (1973) The Advanced Theory of Statistics, Volume 2 (3rd Edition), ISBN   0-85264-215-6, Section 27.22
  6. Fisher, R.A. (1924). "The distribution of the partial correlation coefficient". Metron . 3 (3–4): 329–332.
  7. "Partial and Semipartial Correlation". Archived from the original on 6 February 2014.
  8. StatSoft, Inc. (2010). "Semi-Partial (or Part) Correlation", Electronic Statistics Textbook. Tulsa, OK: StatSoft, accessed January 15, 2011.