Law of total covariance

Last updated April 27, 2024

In probability theory, the law of total covariance,^[1]covariance decomposition formula, or conditional covariance formula states that if X, Y, and Z are random variables on the same probability space, and the covariance of X and Y is finite, then

Note: The conditional expected values E( X | Z ) and E( Y | Z ) are random variables whose values depend on the value of Z. Note that the conditional expected value of X given the eventZ = z is a function of z. If we write E( X | Z = z) = g(z) then the random variable E( X | Z ) is g(Z). Similar comments apply to the conditional covariance.

Proof

The law of total covariance can be proved using the law of total expectation: First,

\operatorname {cov} (X,Y)=\operatorname {E} [XY]-\operatorname {E} [X]\operatorname {E} [Y]

from a simple standard identity on covariances. Then we apply the law of total expectation by conditioning on the random variable Z:

=\operatorname {E} {\big [}\operatorname {E} [XY\mid Z]{\big ]}-\operatorname {E} {\big [}\operatorname {E} [X\mid Z]{\big ]}\operatorname {E} {\big [}\operatorname {E} [Y\mid Z]{\big ]}

Now we rewrite the term inside the first expectation using the definition of covariance:

=\operatorname {E} \!{\big [}\operatorname {cov} (X,Y\mid Z)+\operatorname {E} [X\mid Z]\operatorname {E} [Y\mid Z]{\big ]}-\operatorname {E} {\big [}\operatorname {E} [X\mid Z]{\big ]}\operatorname {E} {\big [}\operatorname {E} [Y\mid Z]{\big ]}

Since expectation of a sum is the sum of expectations, we can regroup the terms:

=\operatorname {E} \!{\big [}\operatorname {cov} (X,Y\mid Z){\big ]}+\operatorname {E} {\big [}\operatorname {E} [X\mid Z]\operatorname {E} [Y\mid Z]{\big ]}-\operatorname {E} {\big [}\operatorname {E} [X\mid Z]{\big ]}\operatorname {E} {\big [}\operatorname {E} [Y\mid Z]{\big ]}

Finally, we recognize the final two terms as the covariance of the conditional expectations E[X | Z] and E[Y | Z]:

=\operatorname {E} {\big [}\operatorname {cov} (X,Y\mid Z){\big ]}+\operatorname {cov} {\big (}\operatorname {E} [X\mid Z],\operatorname {E} [Y\mid Z]{\big )}

Notes and references

↑ Matthew R. Rudary, On Predictive Linear Gaussian Models, ProQuest, 2009, page 121.
↑ Sheldon M. Ross, A First Course in Probability, sixth edition, Prentice Hall, 2002, page 392.

Related Research Articles

In probability theory, the expected value is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of the possible values a random variable can take, weighted by the probability of those outcomes. Since it is obtained through arithmetic, the expected value sometimes may not even be included in the sample data set; it is not the value you would "expect" to get in reality.

In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by $,,,, or .$

In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value. The individual variables in a random vector are grouped together because they are all part of a single mathematical system — often they represent different properties of an individual statistical unit. For example, while a given person has a specific age, height and weight, the representation of these features of an unspecified person from within a group would be a random vector. Normally each element of a random vector is a real number.

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

Covariance in probability theory and statistics is a measure of the joint variability of two random variables.

In probability theory and statistics, two real-valued random variables, $,, are said to be uncorrelated if their covariance,, is zero. If two variables are uncorrelated, there is no linear relationship between them.$

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

The proposition in probability theory known as the law of total expectation, the law of iterated expectations (LIE), Adam's law, the tower rule, and the smoothing theorem, among other names, states that if $is a random variable whose expected value is defined, and is any random variable on the same probability space, then$

In probability theory, the law of total variance or variance decomposition formula or conditional variance formulas or law of iterated variances also known as Eve's law, states that if $and are random variables on the same probability space, and the variance of is finite, then$

In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on only a finite number of values, the "conditions" are that the variable can only take on a subset of those values. More formally, in the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space.

In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. Interpolating methods based on other criteria such as smoothness may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.

The algebra of random variables in statistics, provides rules for the symbolic manipulation of random variables, while avoiding delving too deeply into the mathematically sophisticated ideas of probability theory. Its symbolism allows the treatment of sums, products, ratios and general functions of random variables, as well as dealing with operations such as finding the probability distributions and the expectations, variances and covariances of such combinations.

In statistics and signal processing, a minimum mean square error (MMSE) estimator is an estimation method which minimizes the mean square error (MSE), which is a common measure of estimator quality, of the fitted values of a dependent variable. In the Bayesian setting, the term MMSE more specifically refers to estimation with quadratic loss function. In such case, the MMSE estimator is given by the posterior mean of the parameter to be estimated. Since the posterior mean is cumbersome to calculate, the form of the MMSE estimator is usually constrained to be within a certain class of functions. Linear MMSE estimators are a popular choice since they are easy to use, easy to calculate, and very versatile. It has given rise to many popular estimators such as the Wiener–Kolmogorov filter and Kalman filter.

In statistics, an exchangeable sequence of random variables is a sequence X₁, X₂, X₃, ... whose joint probability distribution does not change when the positions in the sequence in which finitely many of them appear are altered. In other words, the joint distribution is invariant to finite permutation. Thus, for example the sequences

In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there is another confounding variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. This is precisely the motivation for including other right-side variables in a multiple regression; but while multiple regression gives unbiased results for the effect size, it does not give a numerical value of a measure of the strength of the relationship between the two variables of interest.

In probability theory and statistics, the covariance function describes how much two random variables change together (their covariance) with varying spatial or temporal separation. For a random field or stochastic process Z(x) on a domain D, a covariance function C(x, y) gives the covariance of the values of the random field at the two locations x and y:

In probability theory and statistics, a conditional variance is the variance of a random variable given the value(s) of one or more other variables. Particularly in econometrics, the conditional variance is also known as the scedastic function or skedastic function. Conditional variances are important parts of autoregressive conditional heteroskedasticity (ARCH) models.

In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is zero if and only if the random vectors are independent. Thus, distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to Pearson's correlation, which can only detect linear association between two random variables.

In probability theory and statistics, complex random variables are a generalization of real-valued random variables to complex numbers, i.e. the possible values a complex random variable may take are complex numbers. Complex random variables can always be considered as pairs of real random variables: their real and imaginary parts. Therefore, the distribution of one complex random variable may be interpreted as the joint distribution of two real random variables.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Matthew R. Rudary, On Predictive Linear Gaussian Models, ProQuest, 2009, page 121.

[2] Sheldon M. Ross, A First Course in Probability, sixth edition, Prentice Hall, 2002, page 392.

[1]

[2]

Law of total covariance

Contents

Proof

See also

Notes and references

Related Research Articles