Portmanteau test

Last updated

A portmanteau test is a type of statistical hypothesis test in which the null hypothesis is well specified, but the alternative hypothesis is more loosely specified. Tests constructed in this context can have the property of being at least moderately powerful against a wide range of departures from the null hypothesis. Thus, in applied statistics, a portmanteau test provides a reasonable way of proceeding as a general check of a model's match to a dataset where there are many different ways in which the model may depart from the underlying data generating process. Use of such tests avoids having to be very specific about the particular type of departure being tested.

Examples

In time series analysis, two well-known versions of a portmanteau test are available for testing for autocorrelation in the residuals of a model: it tests whether any of a group of autocorrelations of the residual time series are different from zero. This test is the Ljung–Box test, [1] which is an improved version of the Box–Pierce test, [2] having been devised at essentially the same time; a seemingly trivial simplification (omitted in the improved test) was found to have a deleterious effect. [1] This portmanteau test is useful in working with ARIMA models.

In the context of regression analysis, including regression analysis with time series structures, a portmanteau test has been devised, [3] which allows a general test to be made for the possibility that a range of types nonlinear transformations of combinations of the explanatory variables should have been included in addition to a selected model structure.

Related Research Articles

Statistics Study of the collection, analysis, interpretation, and presentation of data

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled. Exact "F-tests" mainly arise when the models have been fitted to the data using least squares. The name was coined by George W. Snedecor, in honour of Ronald Fisher. Fisher initially developed the statistic as the variance ratio in the 1920s.

Linear trend estimation is a statistical technique to aid interpretation of data. When a series of measurements of a process are treated as, for example, a sequences or time series, trend estimation can be used to make and justify statements about tendencies in the data, by relating the measurements to the times at which they occurred. This model can then be used to describe the behaviour of the observed data, without explaining it.

In econometrics, the autoregressive conditional heteroskedasticity (ARCH) model is a statistical model for time series data that describes the variance of the current error term or innovation as a function of the actual sizes of the previous time periods' error terms; often the variance is related to the squares of the previous innovations. The ARCH model is appropriate when the error variance in a time series follows an autoregressive (AR) model; if an autoregressive moving average (ARMA) model is assumed for the error variance, the model is a generalized autoregressive conditional heteroskedasticity (GARCH) model.

In statistics, the term "error" arises in two ways. Firstly, it arises in the context of decision making, where the probability of error may be considered as being the probability of making a wrong decision and which would have a different value for each type of error. Secondly, it arises in the context of statistical modelling where the model's predicted value may be in error regarding the observed outcome and where the term probability of error may refer to the probabilities of various amounts of error occurring.

Cointegration is a statistical property of a collection (X1X2, ..., Xk) of time series variables. First, all of the series must be integrated of order d. Next, if a linear combination of this collection is integrated of order less than d, then the collection is said to be co-integrated. Formally, if (X,Y,Z) are each integrated of order d, and there exist coefficients a,b,c such that aX + bY + cZ is integrated of order less than d, then X, Y, and Z are cointegrated. Cointegration has become an important property in contemporary time series analysis. Time series often have trends—either deterministic or stochastic. In an influential paper, Charles Nelson and Charles Plosser (1982) provided statistical evidence that many US macroeconomic time series have stochastic trends.

In time series analysis, the Box–Jenkins method, named after the statisticians George Box and Gwilym Jenkins, applies autoregressive moving average (ARMA) or autoregressive integrated moving average (ARIMA) models to find the best fit of a time-series model to past values of a time series.

In statistics, the Dickey–Fuller test tests the null hypothesis that a unit root is present in an autoregressive time series model. The alternative hypothesis is different depending on which version of the test is used, but is usually stationarity or trend-stationarity. The test is named after the statisticians David Dickey and Wayne Fuller, who developed it in 1979.

The Chow test, proposed by econometrician Gregory Chow in 1960, is a test of whether the true coefficients in two linear regressions on different data sets are equal. In econometrics, it is most commonly used in time series analysis to test for the presence of a structural break at a period which can be assumed to be known a priori. In program evaluation, the Chow test is often used to determine whether the independent variables have different impacts on different subgroups of the population.

In statistics, the Breusch–Pagan test, developed in 1979 by Trevor Breusch and Adrian Pagan, is used to test for heteroskedasticity in a linear regression model. It was independently suggested with some extension by R. Dennis Cook and Sanford Weisberg in 1983. Derived from the Lagrange multiplier test principle, it tests whether the variance of the errors from a regression is dependent on the values of the independent variables. In that case, heteroskedasticity is present.

In statistics, the Durbin–Watson statistic is a test statistic used to detect the presence of autocorrelation at lag 1 in the residuals from a regression analysis. It is named after James Durbin and Geoffrey Watson. The small sample distribution of this ratio was derived by John von Neumann. Durbin and Watson applied this statistic to the residuals from least squares regressions, and developed bounds tests for the null hypothesis that the errors are serially uncorrelated against the alternative that they follow a first order autoregressive process. Note that the distribution of this test statistic does not depend on the estimated regression coefficients and the variance of the errors.

Galton's problem, named after Sir Francis Galton, is the problem of drawing inferences from cross-cultural data, due to the statistical phenomenon now called autocorrelation. The problem is now recognized as a general one that applies to all nonexperimental studies and to experimental design as well. It is most simply described as the problem of external dependencies in making statistical estimates when the elements sampled are not statistically independent. Asking two people in the same household whether they watch TV, for example, does not give you statistically independent answers. The sample size, n, for independent observations in this case is one, not two. Once proper adjustments are made that deal with external dependencies, then the axioms of probability theory concerning statistical independence will apply. These axioms are important for deriving measures of variance, for example, or tests of statistical significance.

The Ljung–Box test is a type of statistical test of whether any of a group of autocorrelations of a time series are different from zero. Instead of testing randomness at each distinct lag, it tests the "overall" randomness based on a number of lags, and is therefore a portmanteau test.

The Sargan–Hansen test or Sargan's test is a statistical test used for testing over-identifying restrictions in a statistical model. It was proposed by John Denis Sargan in 1958, and several variants were derived by him in 1975. Lars Peter Hansen re-worked through the derivations and showed that it can be extended to general non-linear GMM in a time series context.

In statistics, the Breusch–Godfrey test is used to assess the validity of some of the modelling assumptions inherent in applying regression-like models to observed data series. In particular, it tests for the presence of serial correlation that has not been included in a proposed model structure and which, if present, would mean that incorrect conclusions would be drawn from other tests or that sub-optimal estimates of model parameters would be obtained.

Goldfeld–Quandt test

In statistics, the Goldfeld–Quandt test checks for homoscedasticity in regression analyses. It does this by dividing a dataset into two parts or groups, and hence the test is sometimes called a two-group test. The Goldfeld–Quandt test is one of two tests proposed in a 1965 paper by Stephen Goldfeld and Richard Quandt. Both a parametric and nonparametric test are described in the paper, but the term "Goldfeld–Quandt test" is usually associated only with the former.

A Newey–West estimator is used in statistics and econometrics to provide an estimate of the covariance matrix of the parameters of a regression-type model when this model is applied in situations where the standard assumptions of regression analysis do not apply. It was devised by Whitney K. Newey and Kenneth D. West in 1987, although there are a number of later variants. The estimator is used to try to overcome autocorrelation, and heteroskedasticity in the error terms in the models, often for regressions applied to time series data. The abbreviation "HAC," sometimes used for the estimator, stands for "heteroskedasticity and autocorrelation consistent."

In statistics, the Phillips–Perron test is a unit root test. That is, it is used in time series analysis to test the null hypothesis that a time series is integrated of order 1. It builds on the Dickey–Fuller test of the null hypothesis in , where is the first difference operator. Like the augmented Dickey–Fuller test, the Phillips–Perron test addresses the issue that the process generating data for might have a higher order of autocorrelation than is admitted in the test equation—making endogenous and thus invalidating the Dickey–Fuller t-test. Whilst the augmented Dickey–Fuller test addresses this issue by introducing lags of as regressors in the test equation, the Phillips–Perron test makes a non-parametric correction to the t-test statistic. The test is robust with respect to unspecified autocorrelation and heteroscedasticity in the disturbance process of the test equation.

Homoscedasticity and heteroscedasticity Statistical property

In statistics, a sequence of random variables is homoscedastic if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The spellings homoskedasticity and heteroskedasticity are also frequently used.

References

  1. 1 2 Ljung, G. M.; Box, G. E. P. (1978). "On a measure of lack of fit in time series models" (PDF). Biometrika . 65 (2): 297–303. doi:10.1093/biomet/65.2.297. Archived from the original on September 23, 2017.
  2. Box, G. E. P.; Pierce, D. A. (1970). "Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models". Journal of the American Statistical Association . 65 (332): 1509–1526. doi:10.1080/01621459.1970.10481180. JSTOR   2284333.
  3. Castle, Jennifer L.; Hendry, David F. (2010). "A Low-Dimension Portmanteau Test for Non-linearity" (PDF). Journal of Econometrics . 158 (2): 231–245. doi:10.1016/j.jeconom.2010.01.006.