Variance of the mean and predicted responses

Last updated March 18, 2023

In regression, mean response (or expected response) and predicted response, also known as mean outcome (or expected outcome) and predicted outcome, are values of the dependent variable calculated from the regression parameters and a given value of the independent variable. The values of these two responses are the same, but their calculated variances are different. The concept is a generalization of the distinction between the standard error of the mean and the sample standard deviation.

Background

In straight line fitting, the model is

y_{i}=\alpha +\beta x_{i}+\varepsilon _{i}\,

where $y_{i}$ is the response variable, $x_{i}$ is the explanatory variable, ε_i is the random error, and $\alpha$ and $\beta$ are parameters. The mean, and predicted, response value for a given explanatory value, x_d, is given by

{\hat {y}}_{d}={\hat {\alpha }}+{\hat {\beta }}x_{d},

while the actual response would be

y_{d}=\alpha +\beta x_{d}+\varepsilon _{d}\,

Expressions for the values and variances of ${\hat {\alpha }}$ and ${\hat {\beta }}$ are given in linear regression.

Variance of the mean response

Since the data in this context is defined to be (x, y) pairs for every observation, the mean response at a given value of x, say x_d, is an estimate of the mean of the y values in the population at the x value of x_d, that is ${\hat {E}}(y\mid x_{d})\equiv {\hat {y}}_{d}\!$ . The variance of the mean response is given by

\operatorname {Var} \left({\hat {\alpha }}+{\hat {\beta }}x_{d}\right)=\operatorname {Var} \left({\hat {\alpha }}\right)+\left(\operatorname {Var} {\hat {\beta }}\right)x_{d}^{2}+2x_{d}\operatorname {Cov} \left({\hat {\alpha }},{\hat {\beta }}\right).

This expression can be simplified to

\operatorname {Var} \left({\hat {\alpha }}+{\hat {\beta }}x_{d}\right)=\sigma ^{2}\left({\frac {1}{m}}+{\frac {\left(x_{d}-{\bar {x}}\right)^{2}}{\sum (x_{i}-{\bar {x}})^{2}}}\right),

where m is the number of data points.

To demonstrate this simplification, one can make use of the identity

\sum (x_{i}-{\bar {x}})^{2}=\sum x_{i}^{2}-{\frac {1}{m}}\left(\sum x_{i}\right)^{2}.

Variance of the predicted response

The predicted response distribution is the predicted distribution of the residuals at the given point x_d. So the variance is given by

{\begin{aligned}\operatorname {Var} \left(y_{d}-\left[{\hat {\alpha }}+{\hat {\beta }}x_{d}\right]\right)&=\operatorname {Var} (y_{d})+\operatorname {Var} \left({\hat {\alpha }}+{\hat {\beta }}x_{d}\right)-2\operatorname {Cov} \left(y_{d},\left[{\hat {\alpha }}+{\hat {\beta }}x_{d}\right]\right)\\&=\operatorname {Var} (y_{d})+\operatorname {Var} \left({\hat {\alpha }}+{\hat {\beta }}x_{d}\right).\end{aligned}}

The second line follows from the fact that $\operatorname {Cov} \left(y_{d},\left[{\hat {\alpha }}+{\hat {\beta }}x_{d}\right]\right)$ is zero because the new prediction point is independent of the data used to fit the model. Additionally, the term $\operatorname {Var} \left({\hat {\alpha }}+{\hat {\beta }}x_{d}\right)$ was calculated earlier for the mean response.

Since $\operatorname {Var} (y_{d})=\sigma ^{2}$ (a fixed but unknown parameter that can be estimated), the variance of the predicted response is given by

{\begin{aligned}\operatorname {Var} \left(y_{d}-\left[{\hat {\alpha }}+{\hat {\beta }}x_{d}\right]\right)&=\sigma ^{2}+\sigma ^{2}\left({\frac {1}{m}}+{\frac {\left(x_{d}-{\bar {x}}\right)^{2}}{\sum (x_{i}-{\bar {x}})^{2}}}\right)\\[4pt]&=\sigma ^{2}\left(1+{\frac {1}{m}}+{\frac {(x_{d}-{\bar {x}})^{2}}{\sum (x_{i}-{\bar {x}})^{2}}}\right).\end{aligned}}

Confidence intervals

The $100(1-\alpha )\%$ confidence intervals are computed as $y_{d}\pm t_{{\frac {\alpha }{2}},m-n-1}{\sqrt {\operatorname {Var} }}$ . Thus, the confidence interval for predicted response is wider than the interval for mean response. This is expected intuitively – the variance of the population of $y$ values does not shrink when one samples from it, because the random variable ε_i does not decrease, but the variance of the mean of the $y$ does shrink with increased sampling, because the variance in ${\hat {\alpha }}$ and ${\hat {\beta }}$ decrease, so the mean response (predicted response value) becomes closer to $\alpha +\beta x_{d}$ .

This is analogous to the difference between the variance of a population and the variance of the sample mean of a population: the variance of a population is a parameter and does not change, but the variance of the sample mean decreases with increased samples.

General linear regression

The general linear model can be written as

y_{i}=\sum _{j=1}^{n}X_{ij}\beta _{j}+\varepsilon _{i}\,

Therefore, since $y_{d}=\sum _{j=1}^{n}X_{dj}{\hat {\beta }}_{j}$ the general expression for the variance of the mean response is

\operatorname {Var} \left(\sum _{j=1}^{n}X_{dj}{\hat {\beta }}_{j}\right)=\sum _{i=1}^{n}\sum _{j=1}^{n}X_{di}S_{ij}X_{dj},

where S is the covariance matrix of the parameters, given by

\mathbf {S} =\sigma ^{2}\left(\mathbf {X^{\mathsf {T}}X} \right)^{-1}.

Related Research Articles

In mathematics, the harmonic mean is one of several kinds of average, and in particular, one of the Pythagorean means. It is sometimes appropriate for situations when the average rate is desired.

In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by $,,,, or .$

In probability theory, the central limit theorem (CLT) establishes that, in many situations, for identically distributed independent samples, the standardized sample mean tends towards the standard normal distribution even if the original variables themselves are not normally distributed.

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator, ridge regression, or simply any degenerate estimator.

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

With a shape parameter $and a scale parameter .$
With a shape parameter $and an inverse scale parameter, called a rate parameter.$

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable.

In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.

In statistics, the delta method is a result concerning the approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator.

In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there is another confounding variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. This is precisely the motivation for including other right-side variables in a multiple regression; but while multiple regression gives unbiased results for the effect size, it does not give a numerical value of a measure of the strength of the relationship between the two variables of interest.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

<span class="mw-page-title-main">Truncated normal distribution</span> Type of probability distribution

In probability and statistics, the truncated normal distribution is the probability distribution derived from that of a normally distributed random variable by bounding the random variable from either below or above. The truncated normal distribution has wide applications in statistics and econometrics.

In probability theory and statistics, the half-normal distribution is a special case of the folded normal distribution.

Experimental uncertainty analysis is a technique that analyses a derived quantity, based on the uncertainties in the experimentally measured quantities that are used in some form of mathematical relationship ("model") to calculate that derived quantity. The model used to convert the measurements into the derived quantity is usually based on fundamental principles of a science or engineering discipline.

The purpose of this page is to provide supplementary materials for the ordinary least squares article, reducing the load of the main article with mathematics and improving its accessibility, while at the same time retaining the completeness of exposition.

In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.

In statistics, the variance function is a smooth function which depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

The generalized functional linear model (GFLM) is an extension of the generalized linear model (GLM) that allows one to regress univariate responses of various types on functional predictors, which are mostly random trajectories generated by a square-integrable stochastic processes. Similarly to GLM, a link function relates the expected value of the response variable to a linear predictor, which in case of GFLM is obtained by forming the scalar product of the random predictor function $with a smooth parameter function . Functional Linear Regression, Functional Poisson Regression and Functional Binomial Regression, with the important Functional Logistic Regression included, are special cases of GFLM. Applications of GFLM include classification and discrimination of stochastic processes and functional data.$

References

Draper, N. R.; Smith, H. (1998). Applied Regression Analysis (3rd ed.). John Wiley. ISBN 0-471-17082-8.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.