Decomposition of time series

Last updated

The decomposition of time series is a statistical task that deconstructs a time series into several components, each representing one of the underlying categories of patterns. [1] There are two principal types of decomposition, which are outlined below.

Contents

Decomposition based on rates of change

This is an important technique for all types of time series analysis, especially for seasonal adjustment. [2] It seeks to construct, from an observed time series, a number of component series (that could be used to reconstruct the original by additions or multiplications) where each of these has a certain characteristic or type of behavior. For example, time series are usually decomposed into:

Hence a time series using an additive model can be thought of as

whereas a multiplicative model would be

An additive model would be used when the variations around the trend do not vary with the level of the time series whereas a multiplicative model would be appropriate if the trend is proportional to the level of the time series. [3]

Sometimes the trend and cyclical components are grouped into one, called the trend-cycle component. The trend-cycle component can just be referred to as the "trend" component, even though it may contain cyclical behavior. [3] For example, a seasonal decomposition of time series by Loess (STL) [4] plot decomposes a time series into seasonal, trend and irregular components using loess and plots the components separately, whereby the cyclical component (if present in the data) is included in the "trend" component plot.

Decomposition based on predictability

The theory of time series analysis makes use of the idea of decomposing a times series into deterministic and non-deterministic components (or predictable and unpredictable components). [2] See Wold's theorem and Wold decomposition.

Examples

An example of using multiplicative decomposition in biohydrogen production forecast. BioH2p.png
An example of using multiplicative decomposition in biohydrogen production forecast.

Kendall shows an example of a decomposition into smooth, seasonal and irregular factors for a set of data containing values of the monthly aircraft miles flown by UK airlines. [6]

In policy analysis, forecasting future production of biofuels is key data for making better decisions, and statistical time series models have recently been developed to forecast renewable energy sources, and a multiplicative decomposition method was designed to forecast future production of biohydrogen. The optimum length of the moving average (seasonal length) and start point, where the averages are placed, were indicated based on the best coincidence between the present forecast and actual values. [5]

Software

An example of statistical software for this type of decomposition is the program BV4.1 that is based on the Berlin procedure. The R statistical software also includes many packages for time series decomposition, such as seasonal, [7] stl, stlplus, [8] and bfast. Bayesian methods are also available; one example is the BEAST method in a package Rbeast [9] in R, Matlab, and Python.

See also

Related Research Articles

Forecasting is the process of making predictions based on past and present data. Later these can be compared (resolved) against what happens. For example, a company might estimate their revenue in the next year, then compare it against the actual results creating a variance actual analysis. Prediction is a similar but more general term. Forecasting might refer to specific formal statistical methods employing time series, cross-sectional or longitudinal data, or alternatively to less formal judgmental methods or the process of prediction and resolution itself. Usage can vary between areas of application: for example, in hydrology the terms "forecast" and "forecasting" are sometimes reserved for estimates of values at certain specific future times, while the term "prediction" is used for more general estimates, such as the number of times floods will occur over a long period.

In mathematics and statistics, a stationary process is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Consequently, parameters such as mean and variance also do not change over time. If you draw a line through the middle of a stationary process then it should be flat; it may have 'seasonal' cycles around the trend line, but overall it does not trend up nor down.

<span class="mw-page-title-main">Time series</span> Sequence of data points over time

In mathematics, a time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.

Linear trend estimation is a statistical technique to aid interpretation of data. When a series of measurements of a process are treated as, for example, a sequence or time series, trend estimation can be used to make and justify statements about tendencies in the data, by relating the measurements to the times at which they occurred. This model can then be used to describe the behavior of the observed data, without explaining it.

The Hodrick–Prescott filter is a mathematical tool used in macroeconomics, especially in real business cycle theory, to remove the cyclical component of a time series from raw data. It is used to obtain a smoothed-curve representation of a time series, one that is more sensitive to long-term than to short-term fluctuations. The adjustment of the sensitivity of the trend to short-term fluctuations is achieved by modifying a multiplier .

Partial least squares regression is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of maximum variance between the response and independent variables, it finds a linear regression model by projecting the predicted variables and the observable variables to a new space. Because both the X and Y data are projected to new spaces, the PLS family of methods are known as bilinear factor models. Partial least squares discriminant analysis (PLS-DA) is a variant used when the Y is categorical.

X-13ARIMA-SEATS, successor to X-12-ARIMA and X-11, is a set of statistical methods for seasonal adjustment and other descriptive analysis of time series data that are implemented in the U.S. Census Bureau's software package. These methods are or have been used by Statistics Canada, Australian Bureau of Statistics, and the statistical offices of many other countries.

In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. To better comprehend the data or to forecast upcoming series points, both of these models are fitted to time series data. ARIMA models are applied in some cases where data show evidence of non-stationarity in the sense of mean, where an initial differencing step can be applied one or more times to eliminate the non-stationarity of the mean function. When the seasonality shows in a time series, the seasonal-differencing could be applied to eliminate the seasonal component. Since the ARMA model, according to the Wold's decomposition theorem, is theoretically sufficient to describe a regular wide-sense stationary time series, we are motivated to make stationary a non-stationary time series, e.g., by using differencing, before we can use the ARMA model. Note that if the time series contains a predictable sub-process, the predictable component is treated as a non-zero-mean but periodic component in the ARIMA framework so that it is eliminated by the seasonal differencing.

Cointegration is a statistical property of a collection (X1X2, ..., Xk) of time series variables. First, all of the series must be integrated of order d (see Order of integration). Next, if a linear combination of this collection is integrated of order less than d, then the collection is said to be co-integrated. Formally, if (X,Y,Z) are each integrated of order d, and there exist coefficients a,b,c such that aX + bY + cZ is integrated of order less than d, then X, Y, and Z are cointegrated. Cointegration has become an important property in contemporary time series analysis. Time series often have trends—either deterministic or stochastic. In an influential paper , Charles Nelson and Charles Plosser (1982) provided statistical evidence that many US macroeconomic time series (like GNP, wages, employment, etc.) have stochastic trends.

Exponential smoothing or exponential moving average (EMA) is a rule of thumb technique for smoothing time series data using the exponential window function. Whereas in the simple moving average the past observations are weighted equally, exponential functions are used to assign exponentially decreasing weights over time. It is an easily learned and easily applied procedure for making some determination based on prior assumptions by the user, such as seasonality. Exponential smoothing is often used for analysis of time-series data.

Vector autoregression (VAR) is a statistical model used to capture the relationship between multiple quantities as they change over time. VAR is a type of stochastic process model. VAR models generalize the single-variable (univariate) autoregressive model by allowing for multivariate time series. VAR models are often used in economics and the natural sciences.

In probability theory, stochastic drift is the change of the average value of a stochastic (random) process. A related concept is the drift rate, which is the rate at which the average changes. For example, a process that counts the number of heads in a series of fair coin tosses has a drift rate of 1/2 per toss. This is in contrast to the random fluctuations about this average value. The stochastic mean of that coin-toss process is 1/2 and the drift rate of the stochastic mean is 0, assuming 1 = heads and 0 = tails.

In statistics, Wold's decomposition or the Wold representation theorem, named after Herman Wold, says that every covariance-stationary time series can be written as the sum of two time series, one deterministic and one stochastic.

Seasonal adjustment or deseasonalization is a statistical method for removing the seasonal component of a time series. It is usually done when wanting to analyse the trend, and cyclical deviations from trend, of a time series independently of the seasonal components. Many economic phenomena have seasonal cycles, such as agricultural production, and consumer consumption. It is necessary to adjust for this component in order to understand underlying trends in the economy, so official statistics are often adjusted to remove seasonal components. Typically, seasonally adjusted data is reported for unemployment rates to reveal the underlying trends and cycles in labor markets.

In statistics, mean absolute error (MAE) is a measure of errors between paired observations expressing the same phenomenon. Examples of Y versus X include comparisons of predicted versus observed, subsequent time versus initial time, and one technique of measurement versus an alternative technique of measurement. MAE is calculated as the sum of absolute errors divided by the sample size:

<span class="mw-page-title-main">Singular spectrum analysis</span> Nonparametric spectral estimation method

In time series analysis, singular spectrum analysis (SSA) is a nonparametric spectral estimation method. It combines elements of classical time series analysis, multivariate statistics, multivariate geometry, dynamical systems and signal processing. Its roots lie in the classical Karhunen (1946)–Loève spectral decomposition of time series and random fields and in the Mañé (1981)–Takens (1981) embedding theorem. SSA can be an aid in the decomposition of time series into a sum of components, each having a meaningful interpretation. The name "singular spectrum analysis" relates to the spectrum of eigenvalues in a singular value decomposition of a covariance matrix, and not directly to a frequency domain decomposition.

Demand forecasting refers to the process of predicting the quantity of goods and services that will be demanded by consumers at a future point in time. More specifically, the methods of demand forecasting entail using predictive analytics to estimate customer demand in consideration of key economic conditions. This is an important tool in optimizing business profitability through efficient supply chain management. Demand forecasting methods are divided into two major categories, qualitative and quantitative methods. Qualitative methods are based on expert opinion and information gathered from the field. This method is mostly used in situations when there is minimal data available for analysis such as when a business or product has recently been introduced to the market. Quantitative methods, however, use available data, and analytical tools in order to produce predictions. Demand forecasting may be used in resource allocation, inventory management, assessing future capacity requirements, or making decisions on whether to enter a new market.

In time series data, seasonality is the presence of variations that occur at specific regular intervals less than a year, such as weekly, monthly, or quarterly. Seasonality may be caused by various factors, such as weather, vacation, and holidays and consists of periodic, repetitive, and generally regular and predictable patterns in the levels of a time series.

<span class="mw-page-title-main">Trend periodic nonstationary processes</span> Trending periodic processes

Trend periodic non-stationary processes are a type of cyclostationary process that exhibits both periodic behavior and a statistical trend. The trend can be linear or nonlinear, and it can result from systematic changes in the data over time. A cyclostationary process can be formed by removing the trend component. This approach is utilized in the analysis of the trend-stationary process.

References

  1. 1 2 3 "6.1 Time series components | OTexts". www.otexts.org. Retrieved 2016-05-14.
  2. 1 2 Dodge, Y. (2003). The Oxford Dictionary of Statistical Terms . New York: Oxford University Press. ISBN   0-19-920613-9.
  3. 1 2 "6.1 Time series components | OTexts". www.otexts.org. Retrieved 2016-05-18.
  4. "6.5 STL decomposition | OTexts". www.otexts.org. Retrieved 2016-05-18.
  5. 1 2 Asadi, Nooshin; Karimi Alavijeh, Masih; Zilouei, Hamid (2016). "Development of a mathematical methodology to investigate biohydrogen production from regional and national agricultural crop residues: A case study of Iran". International Journal of Hydrogen Energy. doi:10.1016/j.ijhydene.2016.10.021.
  6. Kendall, M. G. (1976). Time-Series (Second ed.). Charles Griffin. (Fig. 5.1). ISBN   0-85264-241-5.
  7. Sax, Christoph. "seasonal: R Interface to X-13-ARIMA-SEATS".
  8. Hafen, Ryan. "stlplus: Enhanced Seasonal Decomposition of Time Series by Loess".
  9. Li, Yang; Zhao, Kaiguang; Hu, Tongxi; Zhang, Xuesong. "BEAST: A Bayesian Ensemble Algorithm for Change-Point Detection and Time Series Decomposition".

Further reading