# Homoscedasticity

Last updated

In statistics, a sequence (or a vector) of random variables is homoscedastic [1] if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The spellings homoskedasticity and heteroskedasticity are also frequently used. [2]

## Contents

Assuming a variable is homoscedastic when in reality it is heteroscedastic () results in unbiased but inefficient point estimates and in biased estimates of standard errors, and may result in overestimating the goodness of fit as measured by the Pearson coefficient.

## Assumptions of a regression model

A standard assumption in a linear regression, ${\displaystyle y_{i}=X_{i}\beta +\epsilon _{i},i=1,\ldots ,N,}$ is that the variance of the disturbance term ${\displaystyle \epsilon _{i}}$ is the same across observations, and in particular does not depend on the values of the explanatory variables ${\displaystyle X_{i}.}$ [3] This is one of the assumptions under which the Gauss–Markov theorem applies and ordinary least squares (OLS) gives the best linear unbiased estimator ("BLUE"). Homoscedasticity is not required for the coefficient estimates to be unbiased, consistent, and asymptotically normal, but it is required for OLS to be efficient. [4] It is also required for the standard errors of the estimates to be unbiased and consistent, so it is required for accurate hypothesis testing, e.g. for a t-test of whether a coefficient is significantly different from zero.

A more formal way to state the assumption of homoskedasticity is that the diagonals of the variance-covariance matrix of ${\displaystyle \epsilon }$ must all be the same number: ${\displaystyle E\epsilon _{i}\epsilon _{i}=\sigma ^{2}}$, where ${\displaystyle \sigma ^{2}}$ is the same for all i. [5] Note that this still allows for the off-diagonals, the covariances ${\displaystyle E\epsilon _{i}\epsilon _{j}}$, to be nonzero, which is a separate violation of the Gauss-Markov assumptions known as serial correlation.

## Examples

The matrices below are covariances of the disturbance, with entries ${\displaystyle E\epsilon _{i}\epsilon _{j}}$, when there are just three observations across time. The disturbance in matrix A is homoskedastic; this is the simple case where OLS is the best linear unbiased estimator. The disturbances in matrices B and C are heteroskedastic. In matrix B, the variance is time-varying, increasing steadily across time; in matrix C, the variance depends on the value of x. The disturbance in matrix D is homoskedastic because the diagonal variances are constant, even though the off-diagonal covariances are non-zero and ordinary least squares is inefficient for a different reason: serial correlation.

${\displaystyle A=\sigma ^{2}{\begin{bmatrix}1&0&0\\0&1&0\\0&0&1\\\end{bmatrix}}\;\;\;\;\;\;\;B=\sigma ^{2}{\begin{bmatrix}1&0&0\\0&2&0\\0&0&3\\\end{bmatrix}}\;\;\;\;\;\;\;C=\sigma ^{2}{\begin{bmatrix}x_{1}&0&0\\0&x_{2}&0\\0&0&x_{3}\\\end{bmatrix}}\;\;\;\;\;\;\;D=\sigma ^{2}{\begin{bmatrix}1&\rho &\rho ^{2}\\\rho &1&\rho \\\rho ^{2}&\rho &1\\\end{bmatrix}}}$

If y is consumption, x is income, and ${\displaystyle \epsilon }$ is whims of the consumer, and we are estimating ${\displaystyle y_{i}=\beta x_{i}+\epsilon _{i},}$ then if richer consumers' whims affect their spending more in absolute dollars, we might have ${\displaystyle Var(\epsilon _{i})=x_{i}\sigma ^{2},}$ rising with income, as in matrix C above. [5]

## Testing

Residuals can be tested for homoscedasticity using the Breusch–Pagan test, [6] which performs an auxiliary regression of the squared residuals on the independent variables. From this auxiliary regression, the explained sum of squares is retained, divided by two, and then becomes the test statistic for a chi-squared distribution with the degrees of freedom equal to the number of independent variables. [7] The null hypothesis of this chi-squared test is homoscedasticity, and the alternative hypothesis would indicate heteroscedasticity. Since the Breusch–Pagan test is sensitive to departures from normality or small sample sizes, the Koenker–Bassett or 'generalized Breusch–Pagan' test is commonly used instead. [8] [ additional citation(s) needed ] From the auxiliary regression, it retains the R-squared value which is then multiplied by the sample size, and then becomes the test statistic for a chi-squared distribution (and uses the same degrees of freedom). Although it is not necessary for the Koenker–Bassett test, the Breusch–Pagan test requires that the squared residuals also be divided by the residual sum of squares divided by the sample size. [8] Testing for groupwise heteroscedasticity requires the Goldfeld–Quandt test.[ citation needed ]

## Homoscedastic distributions

Two or more normal distributions, ${\displaystyle N(\mu _{i},\Sigma _{i})}$, are homoscedastic if they share a common covariance (or correlation) matrix, ${\displaystyle \Sigma _{i}=\Sigma _{j},\ \forall i,j}$. Homoscedastic distributions are especially useful to derive statistical pattern recognition and machine learning algorithms. One popular example of an algorithm that assumes homoscedasticity is Fisher's linear discriminant analysis.

The concept of homoscedasticity can be applied to distributions on spheres. [9]

## Related Research Articles

In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator, ridge regression, or simply any degenerate estimator.

In statistics, the Wishart distribution is a generalization to multiple dimensions of the gamma distribution. It is named in honor of John Wishart, who first formulated the distribution in 1928.

In statistics, propagation of uncertainty is the effect of variables' uncertainties on the uncertainty of a function based on them. When the variables are the values of experimental measurements they have uncertainties due to measurement limitations which propagate due to the combination of variables in the function.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information. In Bayesian statistics, the asymptotic distribution of the posterior mode depends on the Fisher information and not on the prior. The role of the Fisher information in the asymptotic theory of maximum-likelihood estimation was emphasized by the statistician Ronald Fisher. The Fisher information is also used in the calculation of the Jeffreys prior, which is used in Bayesian statistics.

In econometrics, the autoregressive conditional heteroscedasticity (ARCH) model is a statistical model for time series data that describes the variance of the current error term or innovation as a function of the actual sizes of the previous time periods' error terms; often the variance is related to the squares of the previous innovations. The ARCH model is appropriate when the error variance in a time series follows an autoregressive (AR) model; if an autoregressive moving average (ARMA) model is assumed for the error variance, the model is a generalized autoregressive conditional heteroskedasticity (GARCH) model.

In statistics, a vector of random variables is heteroscedastic if the variability of the random disturbance is different across elements of the vector. Here, variability could be quantified by the variance or any other measure of statistical dispersion. Thus heteroscedasticity is the absence of homoscedasticity. A typical example is the set of observations of income in different cities.

In statistics, particularly in hypothesis testing, the Hotelling's T-squared distribution (T2), proposed by Harold Hotelling, is a multivariate probability distribution that is tightly related to the F-distribution and is most notable for arising as the distribution of a set of sample statistics that are natural generalizations of the statistics underlying the Student's t-distribution.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function of the independent variable.

Vector autoregression (VAR) is a statistical model used to capture the relationship between multiple quantities as they change over time. VAR is a type of stochastic process model. VAR models generalize the single-variable (univariate) autoregressive model by allowing for multivariate time series. VAR models are often used in economics and the natural sciences.

Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the variance of observations is incorporated into the regression. WLS is also a specialization of generalized least squares.

In probability theory and statistics, the noncentral chi distribution is a noncentral generalization of the chi distribution. It is also known as the generalized Rayleigh distribution.

In statistics, the Breusch–Pagan test, developed in 1979 by Trevor Breusch and Adrian Pagan, is used to test for heteroskedasticity in a linear regression model. It was independently suggested with some extension by R. Dennis Cook and Sanford Weisberg in 1983. Derived from the Lagrange multiplier test principle, it tests whether the variance of the errors from a regression is dependent on the values of the independent variables. In that case, heteroskedasticity is present.

In statistics, the White test is a statistical test that establishes whether the variance of the errors in a regression model is constant: that is for homoskedasticity.

In mathematics, there are at least two results known as Weyl's inequality.

In statistics, Bayesian multivariate linear regression is a Bayesian approach to multivariate linear regression, i.e. linear regression where the predicted outcome is a vector of correlated random variables rather than a single scalar random variable. A more general treatment of this approach can be found in the article MMSE estimator.

The sample mean and the sample covariance are statistics computed from a sample of data on one or more random variables.

In statistics and in particular statistical theory, unbiased estimation of a standard deviation is the calculation from a statistical sample of an estimated value of the standard deviation of a population of values, in such a way that the expected value of the calculation equals the true value. Except in some important situations, outlined later, the task has little relevance to applications of statistics since its need is avoided by standard procedures, such as the use of significance tests and confidence intervals, or by using Bayesian analysis.

In econometrics, the Park test is a test for heteroscedasticity. The test is based on the method proposed by Rolla Edward Park for estimating linear regression parameters in the presence of heteroscedastic error terms.

## References

1. "Definition of HOMOSCEDASTICITY".
2. For the Greek etymology of the term, see McCulloch, J. Huston (1985). "On Heteros*edasticity". Econometrica . 53 (2): 483. JSTOR   1911250.
3. Peter Kennedy, A Guide to Econometrics, 5th edition, p. 137.
4. Achen, Christopher H.; Shively, W. Phillips (1995), Cross-Level Inference, University of Chicago Press, pp. 47–48, ISBN   9780226002194 .
5. Peter Kennedy, A Guide to Econometrics, 5th edition, p. 136.
6. Breusch, T. S.; Pagan, A. R. (1979). "A Simple Test for Heteroscedasticity and Random Coefficient Variation". Econometrica. 47 (5): 1287–1294. doi:10.2307/1911963. ISSN   0012-9682. JSTOR   1911963.
7. Ullah, Muhammad Imdad (2012-07-26). "Breusch Pagan Test for Heteroscedasticity". Basic Statistics and Data Analysis. Retrieved 2020-11-28.
8. Pryce, Gwilym. "Heteroscedasticity: Testing and Correcting in SPSS" (PDF). pp. 12–18. Archived (PDF) from the original on 2017-03-27. Retrieved 26 March 2017.
9. Hamsici, Onur C.; Martinez, Aleix M. (2007) "Spherical-Homoscedastic Distributions: The Equivalency of Spherical and Normal Distributions in Classification", Journal of Machine Learning Research, 8, 1583-1623