In econometrics and statistics, a structural break is an unexpected change over time in the parameters of regression models, which can lead to huge forecasting errors and unreliability of the model in general. [1] [2] [3] This issue was popularised by David Hendry, who argued that lack of stability of coefficients frequently caused forecast failure, and therefore we must routinely test for structural stability. Structural stability − i.e., the time-invariance of regression coefficients − is a central issue in all applications of linear regression models. [4]
For linear regression models, the Chow test is often used to test for a single break in mean at a known time period K for K ∈ [1,T]. [5] [6] This test assesses whether the coefficients in a regression model are the same for periods [1,2, ...,K] and [K + 1, ...,T]. [6]
Other challenges occur where there are:
The Chow test is not applicable in these situations, since it only applies to models with a known breakpoint and where the error variance remains constant before and after the break. [7] [5] [6] Bayesian methods exist to address these difficult cases via Markov chain Monte Carlo inference. [8] [9]
In general, the CUSUM (cumulative sum) and CUSUM-sq (CUSUM squared) tests can be used to test the constancy of the coefficients in a model. The bounds test can also be used. [6] [10] For cases 1 and 2, the sup-Wald (i.e., the supremum of a set of Wald statistics), sup-LM (i.e., the supremum of a set of Lagrange multiplier statistics), and sup-LR (i.e., the supremum of a set of likelihood ratio statistics) tests developed by Andrews (1993, 2003) may be used to test for parameter instability when the number and location of structural breaks are unknown. [11] [12] These tests were shown to be superior to the CUSUM test in terms of statistical power, [11] and are the most commonly used tests for the detection of structural change involving an unknown number of breaks in mean with unknown break points. [4] The sup-Wald, sup-LM, and sup-LR tests are asymptotic in general (i.e., the asymptotic critical values for these tests are applicable for sample size n as n → ∞), [11] and involve the assumption of homoskedasticity across break points for finite samples; [4] however, an exact test with the sup-Wald statistic may be obtained for a linear regression model with a fixed number of regressors and independent and identically distributed (IID) normal errors. [11] A method developed by Bai and Perron (2003) also allows for the detection of multiple structural breaks from data. [13]
The MZ test developed by Maasoumi, Zaman, and Ahmed (2010) allows for the simultaneous detection of one or more breaks in both mean and variance at a known break point. [4] [14] The sup-MZ test developed by Ahmed, Haider, and Zaman (2016) is a generalization of the MZ test which allows for the detection of breaks in mean and variance at an unknown break point. [4]
For a cointegration model, the Gregory–Hansen test (1996) can be used for one unknown structural break, [15] the Hatemi–J test (2006) can be used for two unknown breaks [16] and the Maki (2012) test allows for multiple structural breaks.
There are many statistical packages that can be used to find structural breaks, including R, [17] GAUSS, and Stata, among others. For example, a list of R packages for time series data is summarized at the changepoint detection section of the Time Series Analysis Task View, [18] including both classical and Bayesian methods. [19] [9]
Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. More precisely, it is "the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference." An introductory economics textbook describes econometrics as allowing economists "to sift through mountains of data to extract simple relationships." Jan Tinbergen is one of the two founding fathers of econometrics. The other, Ragnar Frisch, also coined the term in the sense in which it is used today.
In statistics, the likelihood-ratio test assesses the goodness of fit of two competing statistical models, specifically one found by maximization over the entire parameter space and another found after imposing some constraint, based on the ratio of their likelihoods. If the constraint is supported by the observed data, the two likelihoods should not differ by more than sampling error. Thus the likelihood-ratio test tests whether this ratio is significantly different from one, or equivalently whether its natural logarithm is significantly different from zero.
In statistics, point estimation involves the use of sample data to calculate a single value which is to serve as a "best guess" or "best estimate" of an unknown population parameter. More formally, it is the application of a point estimator to the data to obtain a point estimate.
An F-test is any statistical test used to compare the variances of two samples or the ratio of variances between multiple samples. The test statistic, random variable F, is used to determine if the tested data has an F-distribution under the true null hypothesis, and true customary assumptions about the error term (ε). It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled. Exact "F-tests" mainly arise when the models have been fitted to the data using least squares. The name was coined by George W. Snedecor, in honour of Ronald Fisher. Fisher initially developed the statistic as the variance ratio in the 1920s.
In statistics, a nuisance parameter is any parameter which is unspecified but which must be accounted for in the hypothesis testing of the parameters which are of interest.
In the design of experiments, optimal experimental designs are a class of experimental designs that are optimal with respect to some statistical criterion. The creation of this field of statistics has been credited to Danish statistician Kirstine Smith.
In statistics, the coefficient of determination, denoted R2 or r2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).
In statistics, the Wald test assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is the precision of the estimate. Intuitively, the larger this weighted distance, the less likely it is that the constraint is true. While the finite sample distributions of Wald tests are generally unknown, it has an asymptotic χ2-distribution under the null hypothesis, a fact that can be used to determine statistical significance.
In statistics, homogeneity and its opposite, heterogeneity, arise in describing the properties of a dataset, or several datasets. They relate to the validity of the often convenient assumption that the statistical properties of any one part of an overall dataset are the same as any other part. In meta-analysis, which combines the data from several studies, homogeneity measures the differences or similarities between the several studies.
The Chow test, proposed by econometrician Gregory Chow in 1960, is a test of whether the true coefficients in two linear regressions on different data sets are equal. In econometrics, it is most commonly used in time series analysis to test for the presence of a structural break at a period which can be assumed to be known a priori. In program evaluation, the Chow test is often used to determine whether the independent variables have different impacts on different subgroups of the population.
Multilevel models are statistical models of parameters that vary at more than one level. An example could be a model of student performance that contains measures for individual students as well as measures for classrooms within which the students are grouped. These models can be seen as generalizations of linear models, although they can also extend to non-linear models. These models became much more popular after sufficient computing power and software became available.
White test is a statistical test that establishes whether the variance of the errors in a regression model is constant: that is for homoskedasticity.
In statistics, the Durbin–Watson statistic is a test statistic used to detect the presence of autocorrelation at lag 1 in the residuals from a regression analysis. It is named after James Durbin and Geoffrey Watson. The small sample distribution of this ratio was derived by John von Neumann. Durbin and Watson applied this statistic to the residuals from least squares regressions, and developed bounds tests for the null hypothesis that the errors are serially uncorrelated against the alternative that they follow a first order autoregressive process. Note that the distribution of this test statistic does not depend on the estimated regression coefficients and the variance of the errors.
In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unmeasured correlation between observations from different timepoints. Although some believe that Generalized estimating equations are robust in everything even with the wrong choice of working-correlation matrix, Generalized estimating equations are only robust to loss of consistency with the wrong choice.
Bayesian econometrics is a branch of econometrics which applies Bayesian principles to economic modelling. Bayesianism is based on a degree-of-belief interpretation of probability, as opposed to a relative-frequency interpretation.
Anil K. Bera is an Indian-American econometrician. He is Professor of Economics at University of Illinois at Urbana–Champaign's Department of Economics. He is most noted for his work with Carlos Jarque on the Jarque–Bera test.
In statistics, linear regression is a statistical model which estimates the linear relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable. If the explanatory variables are measured with error then errors-in-variables models are required, also known as measurement error models.
Spike-and-slab regression is a type of Bayesian linear regression in which a particular hierarchical prior distribution for the regression coefficients is chosen such that only a subset of the possible regressors is retained. The technique is particularly useful when the number of possible predictors is larger than the number of observations. The idea of the spike-and-slab model was originally proposed by Mitchell & Beauchamp (1988). The approach was further significantly developed by Madigan & Raftery (1994) and George & McCulloch (1997). A recent and important contribution to this literature is Ishwaran & Rao (2005).
In statistics, a sequence of random variables is homoscedastic if all its random variables have the same finite variance; this is also known as homogeneity of variance. The complementary notion is called heteroscedasticity, also known as heterogeneity of variance. The spellings homoskedasticity and heteroskedasticity are also frequently used. Assuming a variable is homoscedastic when in reality it is heteroscedastic results in unbiased but inefficient point estimates and in biased estimates of standard errors, and may result in overestimating the goodness of fit as measured by the Pearson coefficient.
Structural changes and model stability in panel data are of general concern in empirical economics and finance research. Model parameters are assumed to be stable over time if there is no reason to believe otherwise. It is well-known that various economic and political events can cause structural breaks in financial data. ... In both the statistics and econometrics literature we can find very many of papers related to the detection of changes and structural breaks.
The hypothesis of structural stability that the regression coefficients do not change over time is central to all applications of linear regression models.
An important assumption made in using the Chow test is that the disturbance variance is the same in both (or all) regressions. ...
6.4.4 TESTS OF STRUCTURAL BREAK WITH UNEQUAL VARIANCES ...
In a small or moderately sized sample, the Wald test has the unfortunate property that the probability of a type I error is persistently larger than the critical level we use to carry it out. (That is, we shall too frequently reject the null hypothesis that the parameters are the same in the subsamples.) We should be using a larger critical value. Ohtani and Kobayashi (1986) have devised a "bounds" test that gives a partial remedy for the problem.15