Autoregressive integrated moving average

Last updated

In time series analysis used in statistics and econometrics, autoregressive integrated moving average (ARIMA) and seasonal ARIMA (SARIMA) models are generalizations of the autoregressive moving average (ARMA) model to non-stationary series and periodic variation, respectively. All these models are fitted to time series in order to better understand it and predict future values. The purpose of these generalizations is to fit the data as well as possible. Specifically, ARMA assumes that the series is stationary, that is, its expected value is constant in time. If instead the series has a trend (but a constant variance/autocovariance), the trend is removed by "differencing" [1] , leaving a stationary series. This operation generalizes ARMA and corresponds to the "integrated" part of ARIMA. Analogously, periodic variation is removed by "seasonal differencing". [2]

Contents

Components

As in ARMA, the "autoregressive" (AR) part of ARIMA indicates that the evolving variable of interest is regressed on its prior values. The "moving average" (MA) part indicates that the regression error is a linear combination of error terms whose values occurred contemporaneously and at various times in the past. [3] The "integrated" (I) part indicates that the data values have been replaced with the difference between each value and the previous value.

According to Wold's decomposition theorem, [4] [5] [6] the ARMA model is sufficient to describe a regular (a.k.a. purely nondeterministic [6] ) wide-sense stationary time series, so we are motivated to make such a non-stationary time series stationary, e.g., by using differencing, before we can use ARMA. [7]

If the time series contains a predictable sub-process (a.k.a. pure sine or complex-valued exponential process [5] ), the predictable component is treated as a non-zero-mean but periodic (i.e., seasonal) component in the ARIMA framework that it is eliminated by the seasonal differencing.

Mathematical formulation

Non-seasonal ARIMA models are usually denoted ARIMA(p, d, q) where parameters p, d, q are non-negative integers: p is the order (number of time lags) of the autoregressive model, d is the degree of differencing (the number of times the data have had past values subtracted), and q is the order of the moving-average model. Seasonal ARIMA models are usually denoted ARIMA(p, d, q)(P, D, Q)m, where the uppercase P, D, Q are the autoregressive, differencing, and moving average terms for the seasonal part of the ARIMA model and m is the number of periods in each season. [8] [2] When two of the parameters are 0, the model may be referred to based on the non-zero parameter, dropping "AR", "I" or "MA" from the acronym. For example, is AR(1), is I(1), and is MA(1).

Given time series data Xt where t is an integer index and the Xt are real numbers, an model is given by

or equivalently by

where is the lag operator, the are the parameters of the autoregressive part of the model, the are the parameters of the moving average part and the are error terms. The error terms are generally assumed to be independent, identically distributed variables sampled from a normal distribution with zero mean.

If the polynomial has a unit root (a factor ) of multiplicity d, then it can be rewritten as:

An ARIMA(p, d, q) process expresses this polynomial factorisation property with p = p'−d, and is given by:

and so is special case of an ARMA(p+d, q) process having the autoregressive polynomial with d unit roots. (This is why no process that is accurately described by an ARIMA model with d > 0 is wide-sense stationary.)

The above can be generalized as follows.

This defines an ARIMA(p, d, q) process with drift.

Other special forms

The explicit identification of the factorization of the autoregression polynomial into factors as above can be extended to other cases, firstly to apply to the moving average polynomial and secondly to include other special factors. For example, having a factor in a model is one way of including a non-stationary seasonality of period s into the model; this factor has the effect of re-expressing the data as changes from s periods ago. Another example is the factor , which includes a (non-stationary) seasonality of period 2.[ clarification needed ] The effect of the first type of factor is to allow each season's value to drift separately over time, whereas with the second type values for adjacent seasons move together.[ clarification needed ]

Identification and specification of appropriate factors in an ARIMA model can be an important step in modeling as it can allow a reduction in the overall number of parameters to be estimated while allowing the imposition on the model of types of behavior that logic and experience suggest should be there.

Differencing

A stationary time series's properties do not change. Specifically, for a wide-sense stationary time series, the mean and the variance/autocovariance are constant over time. Differencing in statistics is a transformation applied to a non-stationary time-series in order to make it stationary in the mean sense (that is, to remove the non-constant trend), but it does not affect the non-stationarity of the variance or autocovariance. Likewise, seasonal differencing is applied to a seasonal time-series to remove the seasonal component.

From the perspective of signal processing, especially the Fourier spectral analysis theory, the trend is a low-frequency part in the spectrum of a series, while the season is a periodic-frequency part. Therefore, differencing is a high-pass (that is, low-stop) filter and the seasonal-differencing is a comb filter to suppress respectively the low-frequency trend and the periodic-frequency season in the spectrum domain (rather than directly in the time domain). [7]

To difference the data, we compute the difference between consecutive observations. Mathematically, this is shown as

It may be necessary to difference the data a second time to obtain a stationary time series, which is referred to as second-order differencing:

Seasonal differencing involves computing the difference between an observation and the corresponding observation in the previous season e.g a year. This is shown as:

The differenced data are then used for the estimation of an ARMA model.

Examples

Some well-known special cases arise naturally or are mathematically equivalent to other popular forecasting models. For example:

Choosing the order

The order p and q can be determined using the sample autocorrelation function (ACF), partial autocorrelation function (PACF), and/or extended autocorrelation function (EACF) method. [10]

Other alternative methods include AIC, BIC, etc. [10] To determine the order of a non-seasonal ARIMA model, a useful criterion is the Akaike information criterion (AIC). It is written as

where L is the likelihood of the data, p is the order of the autoregressive part and q is the order of the moving average part. The k represents the intercept of the ARIMA model. For AIC, if k = 1 then there is an intercept in the ARIMA model (c ≠ 0) and if k = 0 then there is no intercept in the ARIMA model (c = 0).

The corrected AIC for ARIMA models can be written as

The Bayesian Information Criterion (BIC) can be written as

The objective is to minimize the AIC, AICc or BIC values for a good model. The lower the value of one of these criteria for a range of models being investigated, the better the model will suit the data. The AIC and the BIC are used for two completely different purposes. While the AIC tries to approximate models towards the reality of the situation, the BIC attempts to find the perfect fit. The BIC approach is often criticized as there never is a perfect fit to real-life complex data; however, it is still a useful method for selection as it penalizes models more heavily for having more parameters than the AIC would.

AICc can only be used to compare ARIMA models with the same orders of differencing. For ARIMAs with different orders of differencing, RMSE can be used for model comparison.

Estimation of coefficients

Forecasts using ARIMA models

The ARIMA model can be viewed as a "cascade" of two models. The first is non-stationary:

while the second is wide-sense stationary:

Now forecasts can be made for the process , using a generalization of the method of autoregressive forecasting.

Forecast intervals

The forecast intervals (confidence intervals for forecasts) for ARIMA models are based on assumptions that the residuals are uncorrelated and normally distributed. If either of these assumptions does not hold, then the forecast intervals may be incorrect. For this reason, researchers plot the ACF and histogram of the residuals to check the assumptions before producing forecast intervals.

95% forecast interval: , where is the variance of .

For , for all ARIMA models regardless of parameters and orders.

For ARIMA(0,0,q),

[ citation needed ]

In general, forecast intervals from ARIMA models will increase as the forecast horizon increases.

Variations and extensions

A number of variations on the ARIMA model are commonly employed. If multiple time series are used then the can be thought of as vectors and a VARIMA model may be appropriate. Sometimes a seasonal effect is suspected in the model; in that case, it is generally considered better to use a SARIMA (seasonal ARIMA) model than to increase the order of the AR or MA parts of the model. [11] If the time-series is suspected to exhibit long-range dependence, then the d parameter may be allowed to have non-integer values in an autoregressive fractionally integrated moving average model, which is also called a Fractional ARIMA (FARIMA or ARFIMA) model.

Software implementations

Various packages that apply methodology like Box–Jenkins parameter optimization are available to find the right parameters for the ARIMA model.

See also

Related Research Articles

In statistics, the term linear model refers to any model which assumes linearity in the system. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However, the term is also used in time series analysis with a different meaning. In each case, the designation "linear" is used to identify a subclass of models for which substantial reduction in the complexity of the related statistical theory is possible.

Forecasting is the process of making predictions based on past and present data. Later these can be compared (resolved) against what happens. For example, a company might estimate their revenue in the next year, then compare it against the actual results creating a variance actual analysis. Prediction is a similar but more general term. Forecasting might refer to specific formal statistical methods employing time series, cross-sectional or longitudinal data, or alternatively to less formal judgmental methods or the process of prediction and resolution itself. Usage can vary between areas of application: for example, in hydrology the terms "forecast" and "forecasting" are sometimes reserved for estimates of values at certain specific future times, while the term "prediction" is used for more general estimates, such as the number of times floods will occur over a long period.

In the calculus of variations and classical mechanics, the Euler–Lagrange equations are a system of second-order ordinary differential equations whose solutions are stationary points of the given action functional. The equations were discovered in the 1750s by Swiss mathematician Leonhard Euler and Italian mathematician Joseph-Louis Lagrange.

<span class="mw-page-title-main">Path integral formulation</span> Formulation of quantum mechanics

The path integral formulation is a description in quantum mechanics that generalizes the stationary action principle of classical mechanics. It replaces the classical notion of a single, unique classical trajectory for a system with a sum, or functional integral, over an infinity of quantum-mechanically possible trajectories to compute a quantum amplitude.

In econometrics, the autoregressive conditional heteroskedasticity (ARCH) model is a statistical model for time series data that describes the variance of the current error term or innovation as a function of the actual sizes of the previous time periods' error terms; often the variance is related to the squares of the previous innovations. The ARCH model is appropriate when the error variance in a time series follows an autoregressive (AR) model; if an autoregressive moving average (ARMA) model is assumed for the error variance, the model is a generalized autoregressive conditional heteroskedasticity (GARCH) model.

The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Thus, AIC provides a means for model selection.

In the statistical analysis of time series, autoregressive–moving-average (ARMA) models are a way to describe of a (weakly) stationary stochastic process using autoregression (AR) and a moving average (MA), each with a polynomial. They are a tool for understanding a series and predicting future values. AR involves regressing the variable on its own lagged (i.e., past) values. MA involves modeling the error as a linear combination of error terms occurring contemporaneously and at various times in the past. The model is usually denoted ARMA(p, q), where p is the order of AR and q is the order of MA.

In statistics, econometrics, and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it can be used to describe certain time-varying processes in nature, economics, behavior, etc. The autoregressive model specifies that the output variable depends linearly on its own previous values and on a stochastic term ; thus the model is in the form of a stochastic difference equation which should not be confused with a differential equation. Together with the moving-average (MA) model, it is a special case and key component of the more general autoregressive–moving-average (ARMA) and autoregressive integrated moving average (ARIMA) models of time series, which have a more complicated stochastic structure; it is also a special case of the vector autoregressive model (VAR), which consists of a system of more than one interlocking stochastic difference equation in more than one evolving random variable.

In time series analysis, the lag operator (L) or backshift operator (B) operates on an element of a time series to produce the previous element. For example, given some time series

<span class="mw-page-title-main">Granger causality</span> Statistical hypothesis test for forecasting

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. Since the question of "true causality" is deeply philosophical, and because of the post hoc ergo propter hoc fallacy of assuming that one thing preceding another can be used as a proof of causation, econometricians assert that the Granger test finds only "predictive causality". Using the term "causality" alone is a misnomer, as Granger-causality is better described as "precedence", or, as Granger himself later claimed in 1977, "temporally related". Rather than testing whether Xcauses Y, the Granger causality tests whether X forecastsY.

In probability theory and statistics, a unit root is a feature of some stochastic processes that can cause problems in statistical inference involving time series models. A linear stochastic process has a unit root if 1 is a root of the process's characteristic equation. Such a process is non-stationary but does not always have a trend.

In time series analysis, the Box–Jenkins method, named after the statisticians George Box and Gwilym Jenkins, applies autoregressive moving average (ARMA) or autoregressive integrated moving average (ARIMA) models to find the best fit of a time-series model to past values of a time series.

In statistics the mean squared prediction error (MSPE), also known as mean squared error of the predictions, of a smoothing, curve fitting, or regression procedure is the expected value of the squared prediction errors (PE), the square difference between the fitted values implied by the predictive function and the values of the (unobservable) true value g. It is an inverse measure of the explanatory power of and can be used in the process of cross-validation of an estimated model. Knowledge of g would be required in order to calculate the MSPE exactly; in practice, MSPE is estimated.

Cochrane–Orcutt estimation is a procedure in econometrics, which adjusts a linear model for serial correlation in the error term. Developed in the 1940s, it is named after statisticians Donald Cochrane and Guy Orcutt.

In statistics, Wold's decomposition or the Wold representation theorem, named after Herman Wold, says that every covariance-stationary time series can be written as the sum of two time series, one deterministic and one stochastic.

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression model. GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences, as compared to conventional least squares and weighted least squares methods. It was first described by Alexander Aitken in 1935.

The Ljung–Box test is a type of statistical test of whether any of a group of autocorrelations of a time series are different from zero. Instead of testing randomness at each distinct lag, it tests the "overall" randomness based on a number of lags, and is therefore a portmanteau test.

In statistics, autoregressive fractionally integrated moving average models are time series models that generalize ARIMA (autoregressive integrated moving average) models by allowing non-integer values of the differencing parameter. These models are useful in modeling time series with long memory—that is, in which deviations from the long-run mean decay more slowly than an exponential decay. The acronyms "ARFIMA" or "FARIMA" are often used, although it is also conventional to simply extend the "ARIMA(p, d, q)" notation for models, by simply allowing the order of differencing, d, to take fractional values. Fractional differencing and the ARFIMA model were introduced in the early 1980s by Clive Granger, Roselyne Joyeux, and Jonathan Hosking.

In time series analysis, the moving-average model, also known as moving-average process, is a common approach for modeling univariate time series. The moving-average model specifies that the output variable is cross-correlated with a non-identical to itself random-variable.

An error correction model (ECM) belongs to a category of multiple time series models most commonly used for data where the underlying variables have a long-run common stochastic trend, also known as cointegration. ECMs are a theoretically-driven approach useful for estimating both short-term and long-term effects of one time series on another. The term error-correction relates to the fact that last-period's deviation from a long-run equilibrium, the error, influences its short-run dynamics. Thus ECMs directly estimate the speed at which a dependent variable returns to equilibrium after a change in other variables.

References

  1. For further information on Stationarity and Differencing see https://www.otexts.org/fpp/8/1
  2. 1 2 Hyndman, Rob J; Athanasopoulos, George. 8.9 Seasonal ARIMA models. oTexts. Retrieved 19 May 2015.{{cite book}}: |website= ignored (help)
  3. Box, George E. P. (2015). Time Series Analysis: Forecasting and Control. WILEY. ISBN   978-1-118-67502-1.
  4. Hamilton, James (1994). Time Series Analysis. Princeton University Press. ISBN   9780691042893.
  5. 1 2 Papoulis, Athanasios (2002). Probability, Random Variables, and Stochastic processes. Tata McGraw-Hill Education.
  6. 1 2 Triacca, Umberto (19 Feb 2021). "The Wold Decomposition Theorem" (PDF). Archived (PDF) from the original on 2016-03-27.
  7. 1 2 Wang, Shixiong; Li, Chongshou; Lim, Andrew (2019-12-18). "Why Are the ARIMA and SARIMA not Sufficient". arXiv: 1904.07632 [stat.AP].
  8. "Notation for ARIMA Models". Time Series Forecasting System. SAS Institute. Retrieved 19 May 2015.
  9. 1 2 "Introduction to ARIMA models". people.duke.edu. Retrieved 2016-06-05.
  10. 1 2 Missouri State University. "Model Specification, Time Series Analysis" (PDF).
  11. Swain, S; et al. (2018). "Development of an ARIMA Model for Monthly Rainfall Forecasting over Khordha District, Odisha, India". Recent Findings in Intelligent Computing Techniques. Advances in Intelligent Systems and Computing. Vol. 708. pp. 325–331). doi:10.1007/978-981-10-8636-6_34. ISBN   978-981-10-8635-9.{{cite book}}: |journal= ignored (help)
  12. TimeModels.jl www.github.com
  13. ARIMA in NCSS,
  14. Automatic ARMA in NCSS,
  15. Autocorrelations and Partial Autocorrelations in NCSS
  16. 8.7 ARIMA modelling in R | OTexts . Retrieved 2016-05-12.{{cite book}}: |website= ignored (help)
  17. "Box Jenkins model". SAP. Retrieved 8 March 2013.

Further reading