Conditional variance

Last updated June 04, 2024

In probability theory and statistics, a conditional variance is the variance of a random variable given the value(s) of one or more other variables. Particularly in econometrics, the conditional variance is also known as the scedastic function or skedastic function.^[1] Conditional variances are important parts of autoregressive conditional heteroskedasticity (ARCH) models.

Definition

The conditional variance of a random variable Y given another random variable X is

\operatorname {Var} (Y\mid X)=\operatorname {E} {\Big (}{\big (}Y-\operatorname {E} (Y\mid X){\big )}^{2}\;{\Big |}\;X{\Big )}.

The conditional variance tells us how much variance is left if we use $\operatorname {E} (Y\mid X)$ to "predict" Y. Here, as usual, $\operatorname {E} (Y\mid X)$ stands for the conditional expectation of Y given X, which we may recall, is a random variable itself (a function of X, determined up to probability one). As a result, $\operatorname {Var} (Y\mid X)$ itself is a random variable (and is a function of X).

Explanation, relation to least-squares

Recall that variance is the expected squared deviation between a random variable (say, Y) and its expected value. The expected value can be thought of as a reasonable prediction of the outcomes of the random experiment (in particular, the expected value is the best constant prediction when predictions are assessed by expected squared prediction error). Thus, one interpretation of variance is that it gives the smallest possible expected squared prediction error. If we have the knowledge of another random variable (X) that we can use to predict Y, we can potentially use this knowledge to reduce the expected squared error. As it turns out, the best prediction of Y given X is the conditional expectation. In particular, for any $f:\mathbb {R} \to \mathbb {R}$ measurable,

{\begin{aligned}\operatorname {E} [(Y-f(X))^{2}]&=\operatorname {E} [(Y-\operatorname {E} (Y|X)\,\,+\,\,\operatorname {E} (Y|X)-f(X))^{2}]\\&=\operatorname {E} [\operatorname {E} \{(Y-\operatorname {E} (Y|X)\,\,+\,\,\operatorname {E} (Y|X)-f(X))^{2}|X\}]\\&=\operatorname {E} [\operatorname {Var} (Y|X)]+\operatorname {E} [(\operatorname {E} (Y|X)-f(X))^{2}]\,.\end{aligned}}

By selecting $f(X)=\operatorname {E} (Y|X)$ , the second, nonnegative term becomes zero, showing the claim. Here, the second equality used the law of total expectation. We also see that the expected conditional variance of Y given X shows up as the irreducible error of predicting Y given only the knowledge of X.

Special cases, variations

Conditioning on discrete random variables

When X takes on countable many values $S=\{x_{1},x_{2},\dots \}$ with positive probability, i.e., it is a discrete random variable, we can introduce $\operatorname {Var} (Y|X=x)$ , the conditional variance of Y given that X=x for any x from S as follows:

\operatorname {Var} (Y|X=x)=\operatorname {E} ((Y-\operatorname {E} (Y\mid X=x))^{2}\mid X=x)=\operatorname {E} (Y^{2}|X=x)-\operatorname {E} (Y|X=x)^{2},

where recall that $\operatorname {E} (Z\mid X=x)$ is the conditional expectation of Z given that X=x, which is well-defined for $x\in S$ . An alternative notation for $\operatorname {Var} (Y|X=x)$ is $\operatorname {Var} _{Y\mid X}(Y|x).$

Note that here $\operatorname {Var} (Y|X=x)$ defines a constant for possible values of x, and in particular, $\operatorname {Var} (Y|X=x)$ , is not a random variable.

The connection of this definition to $\operatorname {Var} (Y|X)$ is as follows: Let S be as above and define the function $v:S\to \mathbb {R}$ as $v(x)=\operatorname {Var} (Y|X=x)$ . Then, $v(X)=\operatorname {Var} (Y|X)$ almost surely.

Definition using conditional distributions

The "conditional expectation of Y given X=x" can also be defined more generally using the conditional distribution of Y given X (this exists in this case, as both here X and Y are real-valued).

In particular, letting $P_{Y|X}$ be the (regular) conditional distribution $P_{Y|X}$ of Y given X, i.e., $P_{Y|X}:{\mathcal {B}}\times \mathbb {R} \to [0,1]$ (the intention is that $P_{Y|X}(U,x)=P(Y\in U|X=x)$ almost surely over the support of X), we can define

$\operatorname {Var} (Y|X=x)=\int \left(y-\int y'P_{Y|X}(dy'|x)\right)^{2}P_{Y|X}(dy|x).$

This can, of course, be specialized to when Y is discrete itself (replacing the integrals with sums), and also when the conditional density of Y given X=x with respect to some underlying distribution exists.

Components of variance

The law of total variance says

$\operatorname {Var} (Y)=\operatorname {E} (\operatorname {Var} (Y\mid X))+\operatorname {Var} (\operatorname {E} (Y\mid X)).$

In words: the variance of Y is the sum of the expected conditional variance of Y given X and the variance of the conditional expectation of Y given X. The first term captures the variation left after "using X to predict Y", while the second term captures the variation due to the mean of the prediction of Y due to the randomness of X.

Related Research Articles

In probability theory, the expected value is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of the possible values a random variable can take, weighted by the probability of those outcomes. Since it is obtained through arithmetic, the expected value sometimes may not even be included in the sample data set; it is not the value you would "expect" to get in reality.

In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by $,,,, or .$

In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value. The individual variables in a random vector are grouped together because they are all part of a single mathematical system — often they represent different properties of an individual statistical unit. For example, while a given person has a specific age, height and weight, the representation of these features of an unspecified person from within a group would be a random vector. Normally each element of a random vector is a real number.

In probability theory, the law of large numbers (LLN) is a mathematical theorem that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive is because of randomness or because the estimator does not account for information that could produce a more accurate estimate. In machine learning, specifically empirical risk minimization, MSE may refer to the empirical risk, as an estimate of the true MSE.

<span class="mw-page-title-main">Martingale (probability theory)</span> Model in probability theory

In probability theory, a martingale is a sequence of random variables for which, at a particular time, the conditional expectation of the next value in the sequence is equal to the present value, regardless of all prior values.

The proposition in probability theory known as the law of total expectation, the law of iterated expectations (LIE), Adam's law, the tower rule, and the smoothing theorem, among other names, states that if $is a random variable whose expected value is defined, and is any random variable on the same probability space, then$

In probability theory, the law of total variance or variance decomposition formula or conditional variance formulas or law of iterated variances also known as Eve's law, states that if $and are random variables on the same probability space, and the variance of is finite, then$

In statistics, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result that characterizes the transformation of an arbitrarily crude estimator into an estimator that is optimal by the mean-squared-error criterion or any of a variety of similar criteria.

In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on only a finite number of values, the "conditions" are that the variable can only take on a subset of those values. More formally, in the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space.

In probability theory and statistics, the conditional probability distribution is a probability distribution that describes the probability of an outcome given the occurrence of a particular event. Given two jointly distributed random variables $and, the conditional probability distribution of given is the probability distribution of when is known to be a particular value; in some cases the conditional probabilities may be expressed as functions containing the unspecified value of as a parameter. When both and are categorical variables, a conditional probability table is typically used to represent the conditional probability. The conditional distribution contrasts with the marginal distribution of a random variable, which is its distribution without reference to the value of the other variable.$

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

Probability theory and statistics have some commonly used conventions, in addition to standard mathematical notation and mathematical symbols.

In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial has probability of success . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.

In probability theory and statistics, a categorical distribution is a discrete probability distribution that describes the possible results of a random variable that can take on one of K possible categories, with the probability of each category separately specified. There is no innate underlying ordering of these outcomes, but numerical labels are often attached for convenience in describing the distribution,. The K-dimensional categorical distribution is the most general distribution over a K-way event; any other discrete distribution over a size-K sample space is a special case. The parameters specifying the probabilities of each possible outcome are constrained only by the fact that each must be in the range 0 to 1, and all must sum to 1.

In probability theory, regular conditional probability is a concept that formalizes the notion of conditioning on the outcome of a random variable. The resulting conditional probability distribution is a parametrized family of probability measures called a Markov kernel.

In probability theory, the law of total covariance, covariance decomposition formula, or conditional covariance formula states that if X, Y, and Z are random variables on the same probability space, and the covariance of X and Y is finite, then

In probability theory and statistics, complex random variables are a generalization of real-valued random variables to complex numbers, i.e. the possible values a complex random variable may take are complex numbers. Complex random variables can always be considered as pairs of real random variables: their real and imaginary parts. Therefore, the distribution of one complex random variable may be interpreted as the joint distribution of two real random variables.

The Blackwell-Girshick equation is an equation in probability theory that allows for the calculation of the variance of random sums of random variables. It is the equivalent of Wald's lemma for the expectation of composite distributions.

References

↑ Spanos, Aris (1999). "Conditioning and regression". Probability Theory and Statistical Inference. New York: Cambridge University Press. pp. 339–356 [p. 342]. ISBN 0-521-42408-9.