# Partial correlation

Last updated

In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. If we are interested in finding to what extent there is a numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there is another, confounding, variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. This is precisely the motivation for including other right-side variables in a multiple regression; but while multiple regression gives unbiased results for the effect size, it does not give a numerical value of a measure of the strength of the relationship between the two variables of interest.

## Contents

For example, if we have economic data on the consumption, income, and wealth of various individuals and we wish to see if there is a relationship between consumption and income, failing to control for wealth when computing a correlation coefficient between consumption and income would give a misleading result, since income might be numerically related to wealth which in turn might be numerically related to consumption; a measured correlation between consumption and income might actually be contaminated by these other correlations. The use of a partial correlation avoids this problem.

Like the correlation coefficient, the partial correlation coefficient takes on a value in the range from –1 to 1. The value –1 conveys a perfect negative correlation controlling for some variables (that is, an exact linear relationship in which higher values of one variable are associated with lower values of the other); the value 1 conveys a perfect positive linear relationship, and the value 0 conveys that there is no linear relationship.

The partial correlation coincides with the conditional correlation if the random variables are jointly distributed as the multivariate normal, other elliptical, multivariate hypergeometric, multivariate negative hypergeometric, multinomial or Dirichlet distribution, but not in general otherwise. 

## Formal definition

Formally, the partial correlation between X and Y given a set of n controlling variables Z = {Z1, Z2, ..., Zn}, written ρXY·Z, is the correlation between the residuals eX and eY resulting from the linear regression of X with Z and of Y with Z, respectively. The first-order partial correlation (i.e., when n = 1) is the difference between a correlation and the product of the removable correlations divided by the product of the coefficients of alienation of the removable correlations. The coefficient of alienation, and its relation with joint variance through correlation are available in Guilford (1973, pp. 344–345). 

## Computation

### Using linear regression

A simple way to compute the sample partial correlation for some data is to solve the two associated linear regression problems, get the residuals, and calculate the correlation between the residuals. Let X and Y be, as above, random variables taking real values, and let Z be the n-dimensional vector-valued random variable. We write xi, yi and zi to denote the ith of N i.i.d. observations from some joint probability distribution over real random variables X, Y and Z, with zi having been augmented with a 1 to allow for a constant term in the regression. Solving the linear regression problem amounts to finding (n+1)-dimensional regression coefficient vectors $\mathbf {w} _{X}^{*}$ and $\mathbf {w} _{Y}^{*}$ such that

$\mathbf {w} _{X}^{*}=\arg \min _{\mathbf {w} }\left\{\sum _{i=1}^{N}(x_{i}-\langle \mathbf {w} ,\mathbf {z} _{i}\rangle )^{2}\right\}$ $\mathbf {w} _{Y}^{*}=\arg \min _{\mathbf {w} }\left\{\sum _{i=1}^{N}(y_{i}-\langle \mathbf {w} ,\mathbf {z} _{i}\rangle )^{2}\right\}$ with N being the number of observations and $\langle \mathbf {w} ,\mathbf {v} \rangle$ the scalar product between the vectors w and v.

The residuals are then

$e_{X,i}=x_{i}-\langle \mathbf {w} _{X}^{*},\mathbf {z} _{i}\rangle$ $e_{Y,i}=y_{i}-\langle \mathbf {w} _{Y}^{*},\mathbf {z} _{i}\rangle$ and the sample partial correlation is then given by the usual formula for sample correlation, but between these new derived values:

${\hat {\rho }}_{XY\cdot \mathbf {Z} }={\frac {N\sum _{i=1}^{N}e_{X,i}e_{Y,i}-\sum _{i=1}^{N}e_{X,i}\sum _{i=1}^{N}e_{Y,i}}{{\sqrt {N\sum _{i=1}^{N}e_{X,i}^{2}-\left(\sum _{i=1}^{N}e_{X,i}\right)^{2}}}~{\sqrt {N\sum _{i=1}^{N}e_{Y,i}^{2}-\left(\sum _{i=1}^{N}e_{Y,i}\right)^{2}}}}}$ $={\frac {N\sum _{i=1}^{N}e_{X,i}e_{Y,i}}{{\sqrt {N\sum _{i=1}^{N}e_{X,i}^{2}}}~{\sqrt {N\sum _{i=1}^{N}e_{Y,i}^{2}}}}}.$ In the first expression the three terms after minus signs all equal 0 since each contains the sum of residuals from an ordinary least squares regression.

#### Example

Suppose we have the following data on three variables, X, Y, and Z:

XYZ
210
420
1531
2041

If we compute the Pearson correlation coefficient between variables X and Y, the result is approximately 0.970, while if we compute the partial correlation between X and Y, using the formula given above, we find a partial correlation of 0.919. The computations were done using R with the following code.

> X=c(2,4,15,20)> Y=c(1,2,3,4)> Z=c(0,0,1,1)> mm1=lm(X~Z)> res1=mm1$residuals> mm2=lm(Y~Z)> res2=mm2$residuals> cor(res1,res2) 0.919145> cor(X,Y) 0.9695016> generalCorr::parcorMany(cbind(X,Y,Z))     nami namj partij   partji rijMrji  [1,] "X"  "Y"  "0.8844" "1"    "-0.1156"[2,] "X"  "Z"  "0.1581" "1"    "-0.8419"

The lower part of the above code reports generalized nonlinear partial correlation coefficient between X and Y after removing the nonlinear effect of Z to be 0.8844. Also the generalized partial correlation coefficient between X and Z after removing the nonlinear effect of Y to be 0.1581. See the R package generalCorr' and its vignettes for details. Simulation and other details are in Vinod (2017) "Generalized correlation and kernel causality with applications in development economics," Communications in Statistics - Simulation and Computation, vol. 46, [4513, 4534], available online: 29 Dec 2015, URL https://doi.org/10.1080/03610918.2015.1122048.

### Using recursive formula

It can be computationally expensive to solve the linear regression problems. Actually, the nth-order partial correlation (i.e., with |Z| = n) can be easily computed from three (n - 1)th-order partial correlations. The zeroth-order partial correlation ρXY·Ø is defined to be the regular correlation coefficient ρXY.

It holds, for any $Z_{0}\in \mathbf {Z} ,$ that[ citation needed ]

$\rho _{XY\cdot \mathbf {Z} }={\frac {\rho _{XY\cdot \mathbf {Z} \setminus \{Z_{0}\}}-\rho _{XZ_{0}\cdot \mathbf {Z} \setminus \{Z_{0}\}}\rho _{Z_{0}Y\cdot \mathbf {Z} \setminus \{Z_{0}\}}}{{\sqrt {1-\rho _{XZ_{0}\cdot \mathbf {Z} \setminus \{Z_{0}\}}^{2}}}{\sqrt {1-\rho _{Z_{0}Y\cdot \mathbf {Z} \setminus \{Z_{0}\}}^{2}}}}}.$ Naïvely implementing this computation as a recursive algorithm yields an exponential time complexity. However, this computation has the overlapping subproblems property, such that using dynamic programming or simply caching the results of the recursive calls yields a complexity of ${\mathcal {O}}(n^{3})$ .

Note in the case where Z is a single variable, this reduces to:[ citation needed ]

$\rho _{XY\cdot Z}={\frac {\rho _{XY}-\rho _{XZ}\rho _{ZY}}{{\sqrt {1-\rho _{XZ}^{2}}}{\sqrt {1-\rho _{ZY}^{2}}}}}$ ### Using matrix inversion

In ${\mathcal {O}}(n^{3})$ time, another approach allows all partial correlations to be computed between any two variables Xi and Xj of a set V of cardinality n, given all others, i.e., $\mathbf {V} \setminus \{X_{i},X_{j}\}$ , if the covariance matrix Ω = (ρXiXj), is positive definite and therefore invertible. If we define the precision matrix P = (pij ) = Ω−1, we have:

$\rho _{X_{i}X_{j}\cdot \mathbf {V} \setminus \{X_{i},X_{j}\}}=-{\frac {p_{ij}}{\sqrt {p_{ii}p_{jj}}}}.$ ## Interpretation Geometrical interpretation of partial correlation for the case of N = 3 observations and thus a 2-dimensional hyperplane

### Geometrical

Let three variables X, Y, Z (where Z is the "control" or "extra variable") be chosen from a joint probability distribution over n variables V. Further let vi, 1 ≤ iN, be Nn-dimensional i.i.d. observations taken from the joint probability distribution over V. We then consider the N-dimensional vectors x (formed by the successive values of X over the observations), y (formed by the values of Y) and z (formed by the values of Z).

It can be shown that the residuals eX,i coming from the linear regression of X on Z, if also considered as an N-dimensional vector eX (denoted rX in the accompanying graph), have a zero scalar product with the vector z generated by Z. This means that the residuals vector lies on an (N–1)-dimensional hyperplane Sz that is perpendicular to z.

The same also applies to the residuals eY,i generating a vector eY. The desired partial correlation is then the cosine of the angle φ between the projections eX and eY of x and y, respectively, onto the hyperplane perpendicular to z.  :ch. 7

### As conditional independence test

With the assumption that all involved variables are multivariate Gaussian, the partial correlation ρXY·Z is zero if and only if X is conditionally independent from Y given Z.  This property does not hold in the general case.

To test if a sample partial correlation ${\hat {\rho }}_{XY\cdot \mathbf {Z} }$ implies that the true population partial correlation differs from 0, Fisher's z-transform of the partial correlation can be used:

$z({\hat {\rho }}_{XY\cdot \mathbf {Z} })={\frac {1}{2}}\ln \left({\frac {1+{\hat {\rho }}_{XY\cdot \mathbf {Z} }}{1-{\hat {\rho }}_{XY\cdot \mathbf {Z} }}}\right).$ The null hypothesis is $H_{0}:\rho _{XY\cdot \mathbf {Z} }=0$ , to be tested against the two-tail alternative $H_{A}:\rho _{XY\cdot \mathbf {Z} }\neq 0$ . We reject H0 with significance level α if:

${\sqrt {N-|\mathbf {Z} |-3}}\cdot |z({\hat {\rho }}_{XY\cdot \mathbf {Z} })|>\Phi ^{-1}(1-\alpha /2),$ where Φ(·) is the cumulative distribution function of a Gaussian distribution with zero mean and unit standard deviation, and N is the sample size. This z-transform is approximate and that the actual distribution of the sample (partial) correlation coefficient is not straightforward. However, an exact t-test based on a combination of the partial regression coefficient, the partial correlation coefficient and the partial variances is available. 

The distribution of the sample partial correlation was described by Fisher. 

## Semipartial correlation (part correlation)

The semipartial (or part) correlation statistic is similar to the partial correlation statistic. Both compare variations of two variables after certain factors are controlled for, but to calculate the semipartial correlation one holds the third variable constant for either X or Y but not both, whereas for the partial correlation one holds the third variable constant for both.  The semipartial correlation compares the unique variation of one variable (having removed variation associated with the Z variable(s)), with the unfiltered variation of the other, while the partial correlation compares the unique variation of one variable to the unique variation of the other.

The semipartial (or part) correlation can be viewed as more practically relevant "because it is scaled to (i.e., relative to) the total variability in the dependent (response) variable."  Conversely, it is less theoretically useful because it is less precise about the role of the unique contribution of the independent variable.

The absolute value of the semipartial correlation of X with Y is always less than or equal to that of the partial correlation of X with Y. The reason is this: Suppose the correlation of X with Z has been removed from X, giving the residual vector ex . In computing the semipartial correlation, Y still contains both unique variance and variance due to its association with Z. But ex , being uncorrelated with Z, can only explain some of the unique part of the variance of Y and not the part related to Z. In contrast, with the partial correlation, only ey (the part of the variance of Y that is unrelated to Z) is to be explained, so there is less variance of the type that ex cannot explain.

## Use in time series analysis

In time series analysis, the partial autocorrelation function (sometimes "partial correlation function") of a time series is defined, for lag h, as

$\varphi (h)=\rho _{X_{0}X_{h}\,\cdot \,\{X_{1},\,\dots \,,X_{h-1}\}}.$ This function is used to determine the appropriate lag length for an autoregression.

## Related Research Articles In physics, the Navier–Stokes equations are certain partial differential equations which describe the motion of viscous fluid substances, named after French engineer and physicist Claude-Louis Navier and Anglo-Irish physicist and mathematician George Gabriel Stokes. They were developed over several decades of progressively building the theories, from 1822 (Navier) to 1842–1850 (Stokes). In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it normally refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

In mathematics, the Laplace operator or Laplacian is a differential operator given by the divergence of the gradient of a scalar function on Euclidean space. It is usually denoted by the symbols , , or . In a Cartesian coordinate system, the Laplacian is given by the sum of second partial derivatives of the function with respect to each independent variable. In other coordinate systems, such as cylindrical and spherical coordinates, the Laplacian also has a useful form. Informally, the Laplacian Δf (p) of a function f at a point p measures by how much the average value of f over small spheres or balls centered at p deviates from f (p). In statistics, the Pearson correlation coefficient ― also known as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficient ― is a measure of linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationship or correlation. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

In the calculus of variations, a field of mathematical analysis, the functional derivative relates a change in a functional to a change in a function on which the functional depends. In fluid dynamics, the Euler equations are a set of quasilinear partial differential equations governing adiabatic and inviscid flow. They are named after Leonhard Euler. In particular, they correspond to the Navier–Stokes equations with zero viscosity and zero thermal conductivity.

In statistics, propagation of uncertainty is the effect of variables' uncertainties on the uncertainty of a function based on them. When the variables are the values of experimental measurements they have uncertainties due to measurement limitations which propagate due to the combination of variables in the function. In statistics, the Fisher transformation of a Pearson correlation coefficient is its inverse hyperbolic tangent (artanh). When the sample correlation coefficients r is significant, its distribution is highly skewed, which makes it difficult to estimate confidence intervals and apply tests of significance for the population correlation coefficient ρ. The Fisher transformation solves this problem by yielding a variable whose distribution is approximately normally distributed, with a variance that is stable over different values of r. In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generalization of Deming regression and also of orthogonal regression, and can be applied to both linear and non-linear models.

Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the variance of observations is incorporated into the regression. WLS is also a specialization of generalized least squares. In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor. In mathematics, a multiple integral is a definite integral of a function of several real variables, for instance, f(x, y) or f(x, y, z). Integrals of a function of two variables over a region in are called double integrals, and integrals of a function of three variables over a region in are called triple integrals. For multiple integrals of a single-variable function, see the Cauchy formula for repeated integration.

In statistics, the Durbin–Watson statistic is a test statistic used to detect the presence of autocorrelation at lag 1 in the residuals from a regression analysis. It is named after James Durbin and Geoffrey Watson. The small sample distribution of this ratio was derived by John von Neumann. Durbin and Watson applied this statistic to the residuals from least squares regressions, and developed bounds tests for the null hypothesis that the errors are serially uncorrelated against the alternative that they follow a first order autoregressive process. Note that the distribution of this test statistic does not depend on the estimated regression coefficients and the variance of the errors.

In statistics, Bayesian multivariate linear regression is a Bayesian approach to multivariate linear regression, i.e. linear regression where the predicted outcome is a vector of correlated random variables rather than a single scalar random variable. A more general treatment of this approach can be found in the article MMSE estimator.

The intent of this article is to highlight the important points of the derivation of the Navier–Stokes equations as well as its application and formulation for different families of fluids.

The Cauchy momentum equation is a vector partial differential equation put forth by Cauchy that describes the non-relativistic momentum transport in any continuum.

A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables X and Y, the distribution of the random variable Z that is formed as the product

In econometrics, Prais–Winsten estimation is a procedure meant to take care of the serial correlation of type AR(1) in a linear model. Conceived by Sigbert Prais and Christopher Winsten in 1954, it is a modification of Cochrane–Orcutt estimation in the sense that it does not lose the first observation, which leads to more efficiency as a result and makes it a special case of feasible generalized least squares.

The Mehler kernel is a complex-valued function found to be the propagator of the quantum harmonic oscillator.

Given a probit model y=1[y* > 0] where y* = x1 β + zδ + u, and u ~ N(0,1), without losing generality, z can be represented as z = x1 θ1 + x2 θ2 + v. When u is correlated with v, there will be an issue of endogeneity. This can be caused by omitted variables and measurement errors. There are also many cases where z is partially determined by y and endogeneity issue arises. For instance, in a model evaluating the effect of different patient features on their choice of whether going to hospital, y is the choice and z is the amount of the medicine a respondent took, then it is very intuitive that more often the respondent goes to hospital, it is more likely that she took more medicine, hence endogeneity issue arises. When there are endogenous explanatory variables, the estimator generated by usual estimation procedure will be inconsistent, then the corresponding estimated Average Partial Effect (APE) will be inconsistent, too.

1. Baba, Kunihiro; Ritei Shibata; Masaaki Sibuya (2004). "Partial correlation and conditional correlation as measures of conditional independence". Australian and New Zealand Journal of Statistics . 46 (4): 657–664. doi:10.1111/j.1467-842X.2004.00360.x.
2. Guilford J. P., Fruchter B. (1973). Fundamental statistics in psychology and education. Tokyo: McGraw-Hill Kogakusha, LTD.
3. Rummel, R. J. (1976). "Understanding Correlation".
4. Kendall MG, Stuart A. (1973) The Advanced Theory of Statistics, Volume 2 (3rd Edition), ISBN   0-85264-215-6, Section 27.22
5. Fisher, R.A. (1924). "The distribution of the partial correlation coefficient". Metron . 3 (3–4): 329–332.
6. https://web.archive.org/web/20140206182503/http://luna.cas.usf.edu/~mbrannic/files/regression/Partial.html. Archived from the original on 2014-02-06.{{cite web}}: Missing or empty |title=` (help)
7. StatSoft, Inc. (2010). "Semi-Partial (or Part) Correlation", Electronic Statistics Textbook. Tulsa, OK: StatSoft, accessed January 15, 2011.