In probability theory and statistics, **partial correlation** measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. If we are interested in finding to what extent there is a numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there is another, confounding, variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. This is precisely the motivation for including other right-side variables in a multiple regression; but while multiple regression gives unbiased results for the effect size, it does not give a numerical value of a measure of the strength of the relationship between the two variables of interest.

- Formal definition
- Computation
- Using linear regression
- Using recursive formula
- Using matrix inversion
- Interpretation
- Geometrical
- As conditional independence test
- Semipartial correlation (part correlation)
- Use in time series analysis
- See also
- References
- External links

For example, if we have economic data on the consumption, income, and wealth of various individuals and we wish to see if there is a relationship between consumption and income, failing to control for wealth when computing a correlation coefficient between consumption and income would give a misleading result, since income might be numerically related to wealth which in turn might be numerically related to consumption; a measured correlation between consumption and income might actually be contaminated by these other correlations. The use of a partial correlation avoids this problem.

Like the correlation coefficient, the partial correlation coefficient takes on a value in the range from –1 to 1. The value –1 conveys a perfect negative correlation controlling for some variables (that is, an exact linear relationship in which higher values of one variable are associated with lower values of the other); the value 1 conveys a perfect positive linear relationship, and the value 0 conveys that there is no linear relationship.

The partial correlation coincides with the conditional correlation if the random variables are jointly distributed as the multivariate normal, other elliptical, multivariate hypergeometric, multivariate negative hypergeometric, multinomial or Dirichlet distribution, but not in general otherwise.^{ [1] }

Formally, the partial correlation between *X* and *Y* given a set of *n* controlling variables **Z** = {*Z*_{1}, *Z*_{2}, ..., *Z*_{n}}, written *ρ*_{XY·Z}, is the correlation between the residuals *e*_{X} and *e*_{Y} resulting from the linear regression of *X* with **Z** and of *Y* with **Z**, respectively. The first-order partial correlation (i.e., when *n* = 1) is the difference between a correlation and the product of the removable correlations divided by the product of the coefficients of alienation of the removable correlations. The coefficient of alienation, and its relation with joint variance through correlation are available in Guilford (1973, pp. 344–345).^{ [2] }

A simple way to compute the sample partial correlation for some data is to solve the two associated linear regression problems, get the residuals, and calculate the correlation between the residuals. Let *X* and *Y* be, as above, random variables taking real values, and let **Z** be the *n*-dimensional vector-valued random variable. We write *x _{i}*,

with *N* being the number of observations and the scalar product between the vectors **w** and **v**.

The residuals are then

and the sample **partial** correlation is then given by the usual formula for sample correlation, but between these new *derived* values:

In the first expression the three terms after minus signs all equal 0 since each contains the sum of residuals from an ordinary least squares regression.

Suppose we have the following data on three variables, *X*, *Y*, and *Z*:

X | Y | Z |
---|---|---|

2 | 1 | 0 |

4 | 2 | 0 |

15 | 3 | 1 |

20 | 4 | 1 |

If we compute the Pearson correlation coefficient between variables *X* and *Y*, the result is approximately 0.970, while if we compute the partial correlation between *X* and *Y*, using the formula given above, we find a partial correlation of 0.919. The computations were done using R with the following code.

`> X=c(2,4,15,20)> Y=c(1,2,3,4)> Z=c(0,0,1,1)> mm1=lm(X~Z)> res1=mm1$residuals> mm2=lm(Y~Z)> res2=mm2$residuals> cor(res1,res2)[1] 0.919145> cor(X,Y)[1] 0.9695016> generalCorr::parcorMany(cbind(X,Y,Z)) nami namj partij partji rijMrji [1,] "X" "Y" "0.8844" "1" "-0.1156"[2,] "X" "Z" "0.1581" "1" "-0.8419"`

The lower part of the above code reports generalized nonlinear partial correlation coefficient between X and Y after removing the nonlinear effect of Z to be 0.8844. Also the generalized partial correlation coefficient between X and Z after removing the nonlinear effect of Y to be 0.1581. See the R package `generalCorr' and its vignettes for details. Simulation and other details are in Vinod (2017) "Generalized correlation and kernel causality with applications in development economics," Communications in Statistics - Simulation and Computation, vol. 46, [4513, 4534], available online: 29 Dec 2015, URL https://doi.org/10.1080/03610918.2015.1122048.

It can be computationally expensive to solve the linear regression problems. Actually, the *n*th-order partial correlation (i.e., with |**Z**| = *n*) can be easily computed from three (*n* - 1)th-order partial correlations. The zeroth-order partial correlation *ρ*_{XY·Ø} is defined to be the regular correlation coefficient *ρ*_{XY}.

It holds, for any that^{[ citation needed ]}

Naïvely implementing this computation as a recursive algorithm yields an exponential time complexity. However, this computation has the overlapping subproblems property, such that using dynamic programming or simply caching the results of the recursive calls yields a complexity of .

Note in the case where Z is a single variable, this reduces to:^{[ citation needed ]}

In time, another approach allows *all* partial correlations to be computed between any two variables *X _{i}* and

Let three variables *X*, *Y*, *Z* (where *Z* is the "control" or "extra variable") be chosen from a joint probability distribution over *n* variables **V**. Further let **v**_{i}, 1 ≤ *i* ≤ *N*, be *N**n*-dimensional i.i.d. observations taken from the joint probability distribution over **V**. We then consider the *N*-dimensional vectors **x** (formed by the successive values of *X* over the observations), **y** (formed by the values of *Y*) and **z** (formed by the values of *Z*).

It can be shown that the residuals *e _{X,i}* coming from the linear regression of

The same also applies to the residuals *e _{Y,i}* generating a vector

With the assumption that all involved variables are multivariate Gaussian, the partial correlation *ρ*_{XY·Z} is zero if and only if *X* is conditionally independent from *Y* given **Z**.^{ [1] } This property does not hold in the general case.

To test if a sample partial correlation implies that the true population partial correlation differs from 0, Fisher's *z-transform of the partial correlation* can be used:

The null hypothesis is , to be tested against the two-tail alternative . We reject *H*_{0} with significance level *α* if:

where Φ(·) is the cumulative distribution function of a Gaussian distribution with zero mean and unit standard deviation, and *N* is the sample size. This *z*-transform is approximate and that the actual distribution of the sample (partial) correlation coefficient is not straightforward. However, an exact t-test based on a combination of the partial regression coefficient, the partial correlation coefficient and the partial variances is available.^{ [4] }

The distribution of the sample partial correlation was described by Fisher.^{ [5] }

The semipartial (or part) correlation statistic is similar to the partial correlation statistic. Both compare variations of two variables after certain factors are controlled for, but to calculate the semipartial correlation one holds the third variable constant for either *X* or *Y* but not both, whereas for the partial correlation one holds the third variable constant for both.^{ [6] } The semipartial correlation compares the unique variation of one variable (having removed variation associated with the *Z* variable(s)), with the unfiltered variation of the other, while the partial correlation compares the unique variation of one variable to the unique variation of the other.

The semipartial (or part) correlation can be viewed as more practically relevant "because it is scaled to (i.e., relative to) the total variability in the dependent (response) variable." ^{ [7] } Conversely, it is less theoretically useful because it is less precise about the role of the unique contribution of the independent variable.

The absolute value of the semipartial correlation of *X* with *Y* is always less than or equal to that of the partial correlation of *X* with *Y*. The reason is this: Suppose the correlation of *X* with *Z* has been removed from *X*, giving the residual vector *e*_{x} . In computing the semipartial correlation, *Y* still contains both unique variance and variance due to its association with *Z*. But *e*_{x} , being uncorrelated with *Z*, can only explain some of the unique part of the variance of *Y* and not the part related to *Z*. In contrast, with the partial correlation, only *e*_{y} (the part of the variance of *Y* that is unrelated to *Z*) is to be explained, so there is less variance of the type that *e*_{x} cannot explain.

In time series analysis, the partial autocorrelation function (sometimes "partial correlation function") of a time series is defined, for lag *h*, as

This function is used to determine the appropriate lag length for an autoregression.

In physics, the **Navier–Stokes equations** are certain partial differential equations which describe the motion of viscous fluid substances, named after French engineer and physicist Claude-Louis Navier and Anglo-Irish physicist and mathematician George Gabriel Stokes. They were developed over several decades of progressively building the theories, from 1822 (Navier) to 1842–1850 (Stokes).

In statistics, **correlation ** or **dependence ** is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it normally refers to the degree to which a pair of variables are *linearly* related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

In mathematics, the **Laplace operator** or **Laplacian** is a differential operator given by the divergence of the gradient of a scalar function on Euclidean space. It is usually denoted by the symbols , , or . In a Cartesian coordinate system, the Laplacian is given by the sum of second partial derivatives of the function with respect to each independent variable. In other coordinate systems, such as cylindrical and spherical coordinates, the Laplacian also has a useful form. Informally, the Laplacian Δ*f* (*p*) of a function *f* at a point *p* measures by how much the average value of *f* over small spheres or balls centered at *p* deviates from *f* (*p*).

In statistics, the **Pearson correlation coefficient** ― also known as **Pearson's r**, the

In the calculus of variations, a field of mathematical analysis, the **functional derivative** relates a change in a functional to a change in a function on which the functional depends.

In fluid dynamics, the **Euler equations** are a set of quasilinear partial differential equations governing adiabatic and inviscid flow. They are named after Leonhard Euler. In particular, they correspond to the Navier–Stokes equations with zero viscosity and zero thermal conductivity.

In statistics, **propagation of uncertainty** is the effect of variables' uncertainties on the uncertainty of a function based on them. When the variables are the values of experimental measurements they have uncertainties due to measurement limitations which propagate due to the combination of variables in the function.

In statistics, the **Fisher transformation** of a Pearson correlation coefficient is its inverse hyperbolic tangent (artanh). When the sample correlation coefficients *r* is significant, its distribution is highly skewed, which makes it difficult to estimate confidence intervals and apply tests of significance for the population correlation coefficient ρ. The Fisher transformation solves this problem by yielding a variable whose distribution is approximately normally distributed, with a variance that is stable over different values of *r*.

In applied statistics, **total least squares** is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generalization of Deming regression and also of orthogonal regression, and can be applied to both linear and non-linear models.

**Weighted least squares** (**WLS**), also known as **weighted linear regression**, is a generalization of ordinary least squares and linear regression in which knowledge of the variance of observations is incorporated into the regression. WLS is also a specialization of generalized least squares.

In statistics, **simple linear regression** is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective *simple* refers to the fact that the outcome variable is related to a single predictor.

In mathematics, a **multiple integral** is a definite integral of a function of several real variables, for instance, *f*(*x*, *y*) or *f*(*x*, *y*, *z*). Integrals of a function of two variables over a region in are called **double integrals**, and integrals of a function of three variables over a region in are called **triple integrals**. For multiple integrals of a single-variable function, see the Cauchy formula for repeated integration.

In statistics, the **Durbin–Watson statistic** is a test statistic used to detect the presence of autocorrelation at lag 1 in the residuals from a regression analysis. It is named after James Durbin and Geoffrey Watson. The small sample distribution of this ratio was derived by John von Neumann. Durbin and Watson applied this statistic to the residuals from least squares regressions, and developed bounds tests for the null hypothesis that the errors are serially uncorrelated against the alternative that they follow a first order autoregressive process. Note that the distribution of this test statistic does not depend on the estimated regression coefficients and the variance of the errors.

In statistics, **Bayesian multivariate linear regression** is a Bayesian approach to multivariate linear regression, i.e. linear regression where the predicted outcome is a vector of correlated random variables rather than a single scalar random variable. A more general treatment of this approach can be found in the article MMSE estimator.

The intent of this article is to highlight the important points of the **derivation of the Navier–Stokes equations** as well as its application and formulation for different families of fluids.

The **Cauchy momentum equation** is a vector partial differential equation put forth by Cauchy that describes the non-relativistic momentum transport in any continuum.

A **product distribution** is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables *X* and *Y*, the distribution of the random variable *Z* that is formed as the product

In econometrics, **Prais–Winsten estimation** is a procedure meant to take care of the serial correlation of type AR(1) in a linear model. Conceived by Sigbert Prais and Christopher Winsten in 1954, it is a modification of Cochrane–Orcutt estimation in the sense that it does not lose the first observation, which leads to more efficiency as a result and makes it a special case of feasible generalized least squares.

The **Mehler kernel** is a complex-valued function found to be the propagator of the quantum harmonic oscillator.

Given a probit model *y=1[y* > 0]* where *y* = x _{1} β + zδ + u, and u ~ N(0,1)*, without losing generality,

- 1 2 Baba, Kunihiro; Ritei Shibata; Masaaki Sibuya (2004). "Partial correlation and conditional correlation as measures of conditional independence".
*Australian and New Zealand Journal of Statistics*.**46**(4): 657–664. doi:10.1111/j.1467-842X.2004.00360.x. - ↑ Guilford J. P., Fruchter B. (1973).
*Fundamental statistics in psychology and education*. Tokyo: McGraw-Hill Kogakusha, LTD. - ↑ Rummel, R. J. (1976). "Understanding Correlation".
- ↑ Kendall MG, Stuart A. (1973)
*The Advanced Theory of Statistics*, Volume 2 (3rd Edition), ISBN 0-85264-215-6, Section 27.22 - ↑ Fisher, R.A. (1924). "The distribution of the partial correlation coefficient".
*Metron*.**3**(3–4): 329–332. - ↑ https://web.archive.org/web/20140206182503/http://luna.cas.usf.edu/~mbrannic/files/regression/Partial.html. Archived from the original on 2014-02-06.
`{{cite web}}`

: Missing or empty`|title=`

(help) - ↑ StatSoft, Inc. (2010). "Semi-Partial (or Part) Correlation", Electronic Statistics Textbook. Tulsa, OK: StatSoft, accessed January 15, 2011.

Wikiversity has learning resources about Partial correlation |

- Prokhorov, A.V. (2001) [1994], "Partial correlation coefficient",
*Encyclopedia of Mathematics*, EMS Press - Mathematical formulae in the "Description" section of the IMSL Numerical Library PCORR routine
- A three-variable example

This page is based on this Wikipedia article

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.