White test

Last updated

White test is a statistical test that establishes whether the variance of the errors in a regression model is constant: that is for homoskedasticity.

Contents

This test, and an estimator for heteroscedasticity-consistent standard errors, were proposed by Halbert White in 1980. [1] These methods have become widely used, making this paper one of the most cited articles in economics. [2]

In cases where the White test statistic is statistically significant, heteroskedasticity may not necessarily be the cause; instead the problem could be a specification error. In other words, the White test can be a test of heteroskedasticity or specification error or both. If no cross product terms are introduced in the White test procedure, then this is a test of pure heteroskedasticity. If cross products are introduced in the model, then it is a test of both heteroskedasticity and specification bias.

Testing constant variance

To test for constant variance one undertakes an auxiliary regression analysis: this regresses the squared residuals from the original regression model onto a set of regressors that contain the original regressors along with their squares and cross-products. [3] One then inspects the R2. The Lagrange multiplier (LM) test statistic is the product of the R2 value and sample size:

This follows a chi-squared distribution, with degrees of freedom equal to P  1, where P is the number of estimated parameters (in the auxiliary regression).

The logic of the test is as follows. First, the squared residuals from the original model serve as a proxy for the variance of the error term at each observation. (The error term is assumed to have a mean of zero, and the variance of a zero-mean random variable is just the expectation of its square.) The independent variables in the auxiliary regression account for the possibility that the error variance depends on the values of the original regressors in some way (linear or quadratic). If the error term in the original model is in fact homoskedastic (has a constant variance) then the coefficients in the auxiliary regression (besides the constant) should be statistically indistinguishable from zero and the R2 should be “small". Conversely, a “large" R2 (scaled by the sample size so that it follows the chi-squared distribution) counts against the hypothesis of homoskedasticity.

An alternative to the White test is the Breusch–Pagan test, where the Breusch-Pagan test is designed to detect only linear forms of heteroskedasticity. Under certain conditions and a modification of one of the tests, they can be found to be algebraically equivalent. [4]

If homoskedasticity is rejected one can use heteroskedasticity-consistent standard errors.

Software implementations

See also

Related Research Articles

In statistics, the score test assesses constraints on statistical parameters based on the gradient of the likelihood function—known as the score—evaluated at the hypothesized parameter value under the null hypothesis. Intuitively, if the restricted estimator is near the maximum of the likelihood function, the score should not differ from zero by more than sampling error. While the finite sample distributions of score tests are generally unknown, they have an asymptotic χ2-distribution under the null hypothesis as first proved by C. R. Rao in 1948, a fact that can be used to determine statistical significance.

In statistics, homogeneity and its opposite, heterogeneity, arise in describing the properties of a dataset, or several datasets. They relate to the validity of the often convenient assumption that the statistical properties of any one part of an overall dataset are the same as any other part. In meta-analysis, which combines the data from several studies, homogeneity measures the differences or similarities between the several studies.

In econometrics, the seemingly unrelated regressions (SUR) or seemingly unrelated regression equations (SURE) model, proposed by Arnold Zellner in (1962), is a generalization of a linear regression model that consists of several regression equations, each having its own dependent variable and potentially different sets of exogenous explanatory variables. Each equation is a valid linear regression on its own and can be estimated separately, which is why the system is called seemingly unrelated, although some authors suggest that the term seemingly related would be more appropriate, since the error terms are assumed to be correlated across the equations.

In statistics, the Breusch–Pagan test, developed in 1979 by Trevor Breusch and Adrian Pagan, is used to test for heteroskedasticity in a linear regression model. It was independently suggested with some extension by R. Dennis Cook and Sanford Weisberg in 1983. Derived from the Lagrange multiplier test principle, it tests whether the variance of the errors from a regression is dependent on the values of the independent variables. In that case, heteroskedasticity is present.

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression model. GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences, as compared to conventional least squares and weighted least squares methods. It was first described by Alexander Aitken in 1935.

In statistics, the Durbin–Watson statistic is a test statistic used to detect the presence of autocorrelation at lag 1 in the residuals from a regression analysis. It is named after James Durbin and Geoffrey Watson. The small sample distribution of this ratio was derived by John von Neumann. Durbin and Watson applied this statistic to the residuals from least squares regressions, and developed bounds tests for the null hypothesis that the errors are serially uncorrelated against the alternative that they follow a first order autoregressive process. Note that the distribution of this test statistic does not depend on the estimated regression coefficients and the variance of the errors.

The topic of heteroskedasticity-consistent (HC) standard errors arises in statistics and econometrics in the context of linear regression and time series analysis. These are also known as heteroskedasticity-robust standard errors, Eicker–Huber–White standard errors, to recognize the contributions of Friedhelm Eicker, Peter J. Huber, and Halbert White.

<span class="mw-page-title-main">Structural break</span> Econometric term

In econometrics and statistics, a structural break is an unexpected change over time in the parameters of regression models, which can lead to huge forecasting errors and unreliability of the model in general. This issue was popularised by David Hendry, who argued that lack of stability of coefficients frequently caused forecast failure, and therefore we must routinely test for structural stability. Structural stability − i.e., the time-invariance of regression coefficients − is a central issue in all applications of linear regression models.

In statistics, regression validation is the process of deciding whether the numerical results quantifying hypothesized relationships between variables, obtained from regression analysis, are acceptable as descriptions of the data. The validation process can involve analyzing the goodness of fit of the regression, analyzing whether the regression residuals are random, and checking whether the model's predictive performance deteriorates substantially when applied to data that were not used in model estimation.

In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unmeasured correlation between observations from different timepoints. Although some believe that Generalized estimating equations are robust in everything even with the wrong choice of working-correlation matrix, Generalized estimating equations are only robust to loss of consistency with the wrong choice.

In statistics, the Breusch–Godfrey test is used to assess the validity of some of the modelling assumptions inherent in applying regression-like models to observed data series. In particular, it tests for the presence of serial correlation that has not been included in a proposed model structure and which, if present, would mean that incorrect conclusions would be drawn from other tests or that sub-optimal estimates of model parameters would be obtained.

<span class="mw-page-title-main">Goldfeld–Quandt test</span> Test proposed by Stephen Goldfeld and Richard Quandt

In statistics, the Goldfeld–Quandt test checks for heteroscedasticity in regression analyses. It does this by dividing a dataset into two parts or groups, and hence the test is sometimes called a two-group test. The Goldfeld–Quandt test is one of two tests proposed in a 1965 paper by Stephen Goldfeld and Richard Quandt. Both a parametric and nonparametric test are described in the paper, but the term "Goldfeld–Quandt test" is usually associated only with the former.

A Newey–West estimator is used in statistics and econometrics to provide an estimate of the covariance matrix of the parameters of a regression-type model where the standard assumptions of regression analysis do not apply. It was devised by Whitney K. Newey and Kenneth D. West in 1987, although there are a number of later variants. The estimator is used to try to overcome autocorrelation, and heteroskedasticity in the error terms in the models, often for regressions applied to time series data. The abbreviation "HAC," sometimes used for the estimator, stands for "heteroskedasticity and autocorrelation consistent." There are a number of HAC estimators described in, and HAC estimator does not refer uniquely to Newey-West. One version of Newey-West Bartlett requires the user to specify the bandwidth and usage of the Bartlett Kernel from Kernel density estimation

Halbert Lynn White Jr. was the Chancellor’s Associates Distinguished Professor of Economics at the University of California, San Diego, and a Fellow of the Econometric Society and the American Academy of Arts and Sciences.

Trevor Stanley Breusch is an Australian econometrician and was until his retirement Professor of Econometrics and Deputy Director of Crawford School of Public Policy at the Australian National University. He is noted for the Breusch–Pagan test from the paper "A simple test for heteroscedasticity and random coefficient variation". Another contribution to econometrics is the serial correlation Lagrange multiplier test, often called Breusch–Godfrey test after Breusch and Leslie G. Godfrey, which can be used to identify autocorrelation in the errors of a regression model.

Anil K. Bera is an Indian-American econometrician. He is Professor of Economics at University of Illinois at Urbana–Champaign's Department of Economics. He is most noted for his work with Carlos Jarque on the Jarque–Bera test.

In econometrics, the Park test is a test for heteroscedasticity. The test is based on the method proposed by Rolla Edward Park for estimating linear regression parameters in the presence of heteroscedastic error terms.

In statistics, the Glejser test for heteroscedasticity, developed in 1969 by Herbert Glejser, regresses the residuals on the explanatory variable that is thought to be related to the heteroscedastic variance. After it was found not to be asymptotically valid under asymmetric disturbances, similar improvements have been independently suggested by Im, and Machado and Santos Silva.

<span class="mw-page-title-main">Homoscedasticity and heteroscedasticity</span> Statistical property

In statistics, a sequence of random variables is homoscedastic if all its random variables have the same finite variance; this is also known as homogeneity of variance. The complementary notion is called heteroscedasticity, also known as heterogeneity of variance. The spellings homoskedasticity and heteroskedasticity are also frequently used. Assuming a variable is homoscedastic when in reality it is heteroscedastic results in unbiased but inefficient point estimates and in biased estimates of standard errors, and may result in overestimating the goodness of fit as measured by the Pearson coefficient.

References

  1. White, H. (1980). "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity". Econometrica . 48 (4): 817–838. CiteSeerX   10.1.1.11.7646 . doi:10.2307/1912934. JSTOR   1912934. MR   0575027.
  2. Kim, E.H.; Morse, A.; Zingales, L. (2006). "What Has Mattered to Economics since 1970" (PDF). Journal of Economic Perspectives. 20 (4): 189–202. doi: 10.1257/jep.20.4.189 .
  3. Verbeek, Marno (2008). A Guide to Modern Econometrics (Third ed.). Wiley. pp.  99–100. ISBN   978-0-470-51769-7.
  4. Waldman, Donald M. (1983). "A note on algebraic equivalence of White's test and a variation of the Godfrey/Breusch-Pagan test for heteroscedasticity". Economics Letters. 13 (2–3): 197–200. doi:10.1016/0165-1765(83)90085-X.
  5. "skedastic: Heteroskedasticity Diagnostics for Linear Regression Models". CRAN.
  6. "statsmodels v0.12.1".
  7. Stata. "regress postestimation — Postestimation tools for regress" (PDF).

Further reading