Mixed-data sampling

Last updated

Econometric models involving data sampled at different frequencies are of general interest. Mixed-data sampling (MIDAS) is an econometric regression developed by Eric Ghysels with several co-authors. There is now a substantial literature on MIDAS regressions and their applications, including Ghysels, Santa-Clara and Valkanov (2006), [1] Ghysels, Sinko and Valkanov, [2] Andreou, Ghysels and Kourtellos (2010) [3] and Andreou, Ghysels and Kourtellos (2013). [4]

Contents

MIDAS Regressions

A MIDAS regression is a direct forecasting tool which can relate future low-frequency data with current and lagged high-frequency indicators, and yield different forecasting models for each forecast horizon. It can flexibly deal with data sampled at different frequencies and provide a direct forecast of the low-frequency variable. It incorporates each individual high-frequency data in the regression, which solves the problems of losing potentially useful information and including mis-specification.

A simple regression example has the independent variable appearing at a higher frequency than the dependent variable:

where y is the dependent variable, x is the regressor, m denotes the frequency for instance if y is yearly is quarterly is the disturbance and is a lag distribution, for instance the Beta function or the Almon Lag. For example .

The regression models can be viewed in some cases as substitutes for the Kalman filter when applied in the context of mixed frequency data. Bai, Ghysels and Wright (2013) [5] examine the relationship between MIDAS regressions and Kalman filter state space models applied to mixed frequency data. In general, the latter involves a system of equations, whereas, in contrast, MIDAS regressions involve a (reduced form) single equation. As a consequence, MIDAS regressions might be less efficient, but also less prone to specification errors. In cases where the MIDAS regression is only an approximation, the approximation errors tend to be small.

Machine Learning MIDAS Regressions

The MIDAS can also be used for machine learning time series and panel data nowcasting. [6] [7] The machine learning MIDAS regressions involve Legendre polynomials. High-dimensional mixed frequency time series regressions involve certain data structures that once taken into account should improve the performance of unrestricted estimators in small samples. These structures are represented by groups covering lagged dependent variables and groups of lags for a single (high-frequency) covariate. To that end, the machine learning MIDAS approach exploits the sparse-group LASSO (sg-LASSO) regularization that accommodates conveniently such structures. [8] The attractive feature of the sg-LASSO estimator is that it allows us to combine effectively the approximately sparse and dense signals.

Software packages

Several software packages feature MIDAS regressions and related econometric methods. These include:

Related Research Articles

The likelihood function is the joint probability of observed data viewed as a function of the parameters of a statistical model.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution, when direct sampling is difficult. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In the statistical analysis of time series, autoregressive–moving-average (ARMA) models provide a parsimonious description of a (weakly) stationary stochastic process in terms of two polynomials, one for the autoregression (AR) and the second for the moving average (MA). The general ARMA model was described in the 1951 thesis of Peter Whittle, Hypothesis testing in time series analysis, and it was popularized in the 1970 book by George E. P. Box and Gwilym Jenkins.

In time series analysis, the lag operator (L) or backshift operator (B) operates on an element of a time series to produce the previous element. For example, given some time series

In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. To better comprehend the data or to forecast upcoming series points, both of these models are fitted to time series data. ARIMA models are applied in some cases where data show evidence of non-stationarity in the sense of mean, where an initial differencing step can be applied one or more times to eliminate the non-stationarity of the mean function. When the seasonality shows in a time series, the seasonal-differencing could be applied to eliminate the seasonal component. Since the ARMA model, according to the Wold's decomposition theorem, is theoretically sufficient to describe a regular wide-sense stationary time series, we are motivated to make stationary a non-stationary time series, e.g., by using differencing, before we can use the ARMA model. Note that if the time series contains a predictable sub-process, the predictable component is treated as a non-zero-mean but periodic component in the ARIMA framework so that it is eliminated by the seasonal differencing.

In statistics, the score test assesses constraints on statistical parameters based on the gradient of the likelihood function—known as the score—evaluated at the hypothesized parameter value under the null hypothesis. Intuitively, if the restricted estimator is near the maximum of the likelihood function, the score should not differ from zero by more than sampling error. While the finite sample distributions of score tests are generally unknown, they have an asymptotic χ2-distribution under the null hypothesis as first proved by C. R. Rao in 1948, a fact that can be used to determine statistical significance.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable.

In econometrics and statistics, the generalized method of moments (GMM) is a generic method for estimating parameters in statistical models. Usually it is applied in the context of semiparametric models, where the parameter of interest is finite-dimensional, whereas the full shape of the data's distribution function may not be known, and therefore maximum likelihood estimation is not applicable.

In statistics, the Bayesian information criterion (BIC) or Schwarz information criterion is a criterion for model selection among a finite set of models; models with lower BIC are generally preferred. It is based, in part, on the likelihood function and it is closely related to the Akaike information criterion (AIC).

In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables.

Multilevel models are statistical models of parameters that vary at more than one level. An example could be a model of student performance that contains measures for individual students as well as measures for classrooms within which the students are grouped. These models can be seen as generalizations of linear models, although they can also extend to non-linear models. These models became much more popular after sufficient computing power and software became available.

Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes, before some event occurs, to one or more covariates that may be associated with that quantity of time. In a proportional hazards model, the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. For example, taking a drug may halve one's hazard rate for a stroke occurring, or, changing the material from which a manufactured component is constructed may double its hazard rate for failure. Other types of survival models such as accelerated failure time models do not exhibit proportional hazards. The accelerated failure time model describes a situation where the biological or mechanical life history of an event is accelerated.

Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

<span class="mw-page-title-main">Eric Ghysels</span> Belgian economist (born 1956)

Eric Ghysels is a Belgian economist with interest in finance and time series econometrics, and in particular the fields of financial econometrics and financial technology. He is the Edward M. Bernstein Distinguished Professor of Economics at the University of North Carolina and a Professor of Finance at the Kenan-Flagler Business School. He is also the Faculty Research Director of the Rethinc.Labs at the Frank Hawkins Kenan Institute of Private Enterprise.

In time series analysis, the moving-average model, also known as moving-average process, is a common approach for modeling univariate time series. The moving-average model specifies that the output variable is cross-correlated with a non-identical to itself random-variable.

Bayesian econometrics is a branch of econometrics which applies Bayesian principles to economic modelling. Bayesianism is based on a degree-of-belief interpretation of probability, as opposed to a relative-frequency interpretation.

Nowcasting in economics is the prediction of the very recent past, the present, and the very near future state of an economic indicator. The term is a portmanteau of "now" and "forecasting" and originates in meteorology. Typical measures used to assess the state of an economy, such as gross domestic product (GDP) or inflation, are only determined after a delay and are subject to revision. In these cases, nowcasting such indicators can provide an estimate of the variables before the true data are known. Nowcasting models have been applied most notably in Central Banks, who use the estimates to monitor the state of the economy in real-time as a proxy for official measures.

In statistics, ordinal regression, also called ordinal classification, is a type of regression analysis used for predicting an ordinal variable, i.e. a variable whose value exists on an arbitrary scale where only the relative ordering between different values is significant. It can be considered an intermediate problem between regression and classification. Examples of ordinal regression are ordered logit and ordered probit. Ordinal regression turns up often in the social sciences, for example in the modeling of human levels of preference, as well as in information retrieval. In machine learning, ordinal regression may also be called ranking learning.

References

  1. Ghysels, Eric, Pedro Santa-Clara and Rossen Valkanov (2006) Predicting Volatility: How to Get Most Out of Returns Data Sampled at Different Frequencies, Journal of Econometrics, 131, 59-95
  2. Ghysels, Eric and Arthur Sinko and Rossen Valkanov (2006) MIDAS Regressions: Further Results and New Directions, Econometric Reviews, 26, 53-90.
  3. Andreou, Elena & Eric Ghysels & Andros Kourtellos "Regression Models with Mixed Sampling Frequencies", Journal of Econometrics, 158, 246-261.
  4. Andreou, Elena & Eric Ghysels & Andros Kourtellos "Should macroeconomic forecasters use daily financial data and how?", Journal of Business and Economic Statistics 31, 240-251.
  5. Bai, Jennie and Eric Ghysels and Jonathan Wright (2013) State Space Models and MIDAS Regressions, Econometric Reviews, 32, 779–813.
  6. Babii, Andrii; Ghysels, Eric; Striaukas, Jonas (2022-07-03). "Machine Learning Time Series Regressions With an Application to Nowcasting". Journal of Business & Economic Statistics. 40 (3): 1094–1106. arXiv: 2005.14057 . doi:10.1080/07350015.2021.1899933. ISSN   0735-0015.
  7. Babii, Andrii; Ball, Ryan T.; Ghysels, Eric; Striaukas, Jonas (2022-07-26). "Machine learning panel data regressions with heavy-tailed dependent data: Theory and application". Journal of Econometrics: 105315. arXiv: 2008.03600 . doi:10.1016/j.jeconom.2022.07.001. ISSN   0304-4076.
  8. Simon, N., J. Friedman, T. Hastie, and R. Tibshirani (2013): A sparse-group LASSO, Journal of Computational and Graphical Statistics, 22(2), 231-245.
  9. "MIDAS Matlab Toolbox maintained by Hang Qian".
  10. "midasr: Mixed Data Sampling Regression maintained by Virmantas Kvedaras and Vaidotas Zemlys-Balevicius". 23 February 2021.
  11. "midasml: Estimation and Prediction Methods for High-Dimensional Mixed Frequency Time Series Data maintained by Jonas Striaukas". 29 April 2022.
  12. "EViews 9.5 MIDAS Forecasting Demonstration".
  13. "MIDAS Python code". GitHub .
  14. "MIDAS Julia". GitHub .

See also