Decomposition of time series

Last updated

The decomposition of time series is a statistical task that deconstructs a time series into several components, each representing one of the underlying categories of patterns. [1] There are two principal types of decomposition, which are outlined below.

Contents

Decomposition based on rates of change

This is an important technique for all types of time series analysis, especially for seasonal adjustment. [2] It seeks to construct, from an observed time series, a number of component series (that could be used to reconstruct the original by additions or multiplications) where each of these has a certain characteristic or type of behavior. For example, time series are usually decomposed into:

Hence a time series using an additive model can be thought of as

whereas a multiplicative model would be

An additive model would be used when the variations around the trend do not vary with the level of the time series whereas a multiplicative model would be appropriate if the trend is proportional to the level of the time series. [3]

Sometimes the trend and cyclical components are grouped into one, called the trend-cycle component. The trend-cycle component can just be referred to as the "trend" component, even though it may contain cyclical behavior. [3] For example, a seasonal decomposition of time series by Loess (STL) [4] plot decomposes a time series into seasonal, trend and irregular components using loess and plots the components separately, whereby the cyclical component (if present in the data) is included in the "trend" component plot.

Decomposition based on predictability

The theory of time series analysis makes use of the idea of decomposing a times series into deterministic and non-deterministic components (or predictable and unpredictable components). [2] See Wold's theorem and Wold decomposition.

Examples

Kendall shows an example of a decomposition into smooth, seasonal and irregular factors for a set of data containing values of the monthly aircraft miles flown by UK airlines. [5]

In policy analysis, forecasting future production of biofuels is key data for making better decisions, and statistical time series models have recently been developed to forecast renewable energy sources, and a multiplicative decomposition method was designed to forecast future production of biohydrogen. The optimum length of the moving average (seasonal length) and start point, where the averages are placed, were indicated based on the best coincidence between the present forecast and actual values. [6]

An example of using multiplicative decomposition in biohydrogen production forecast. BioH2p.png
An example of using multiplicative decomposition in biohydrogen production forecast.

Software

An example of statistical software for this type of decomposition is the program BV4.1 that is based on the Berlin procedure.

See also

Related Research Articles

Principal component analysis Conversion of observations of possibly-correlated variables into values of fewer, uncorrelated variables

The principal components of a collection of points in a real coordinate space are a sequence of unit vectors, where the -th vector is the direction of a line that best fits the data while being orthogonal to the first vectors. Here, a best-fitting line is defined as one that minimizes the average squared distance from the points to the line. These directions constitute an orthonormal basis in which different individual dimensions of the data are linearly uncorrelated. Principal component analysis (PCA) is the process of computing the principal components and using them to perform a change of basis on the data, sometimes using only the first few principal components and ignoring the rest.

Forecasting is the process of making predictions based on past and present data. Later these can be compared (resolved) against what happens. For example, a company might estimate their revenue in the next year, then compare it against the actual results. Prediction is a similar, but more general term. Forecasting might refer to specific formal statistical methods employing time series, cross-sectional or longitudinal data, or alternatively to less formal judgmental methods or the process of prediction and resolution itself. Usage can differ between areas of application: for example, in hydrology the terms "forecast" and "forecasting" are sometimes reserved for estimates of values at certain specific future times, while the term "prediction" is used for more general estimates, such as the number of times floods will occur over a long period.

In mathematics and statistics, a stationary process is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Consequently, parameters such as mean and variance also do not change over time. To get an intuition of stationarity, one can imagine a frictionless pendulum. It swings back and forth in an oscillatory motion, yet the amplitude and frequency remain constant. Although the pendulum is moving, the process is stationary as its "statistics" are constant. However, if a force were to be applied to the pendulum, either the frequency or amplitude would change, thus making the process non-stationary.

Time series Sequence of data points over time

In mathematics, a time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.

Linear trend estimation is a statistical technique to aid interpretation of data. When a series of measurements of a process are treated as, for example, a sequences or time series, trend estimation can be used to make and justify statements about tendencies in the data, by relating the measurements to the times at which they occurred. This model can then be used to describe the behaviour of the observed data, without explaining it.

The Hodrick–Prescott filter is a mathematical tool used in macroeconomics, especially in real business cycle theory, to remove the cyclical component of a time series from raw data. It is used to obtain a smoothed-curve representation of a time series, one that is more sensitive to long-term than to short-term fluctuations. The adjustment of the sensitivity of the trend to short-term fluctuations is achieved by modifying a multiplier . The filter was popularized in the field of economics in the 1990s by economists Robert J. Hodrick and Nobel Memorial Prize winner Edward C. Prescott. However, it was first proposed much earlier by E. T. Whittaker in 1923.

Partial least squares regression is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of maximum variance between the response and independent variables, it finds a linear regression model by projecting the predicted variables and the observable variables to a new space. Because both the X and Y data are projected to new spaces, the PLS family of methods are known as bilinear factor models. Partial least squares discriminant analysis (PLS-DA) is a variant used when the Y is categorical.

X-13ARIMA-SEATS, successor to X-12-ARIMA and X-11, is a set of statistical methods for seasonal adjustment and other descriptive analysis of time series data that are implemented in the U.S. Census Bureau's software package. These methods are or have been used by Statistics Canada, Australian Bureau of Statistics, and the statistical offices of many other countries.

In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. Both of these models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting). ARIMA models are applied in some cases where data show evidence of non-stationarity in the sense of mean, where an initial differencing step can be applied one or more times to eliminate the non-stationarity of the mean function. When the seasonality shows in a time series, the seasonal-differencing could be applied to eliminate the seasonal component. Since the ARMA model, according to the Wold's decomposition theorem, is theoretically sufficient to describe a regular wide-sense stationary time series, we are motivated to make stationary a non-stationary time series, e.g., by using differencing, before we can use the ARMA model. Note that if the time series contains a predictable sub-process, the predictable component is treated as a non-zero-mean but periodic component in the ARIMA framework so that it is eliminated by the seasonal differencing.

Exponential smoothing is a rule of thumb technique for smoothing time series data using the exponential window function. Whereas in the simple moving average the past observations are weighted equally, exponential functions are used to assign exponentially decreasing weights over time. It is an easily learned and easily applied procedure for making some determination based on prior assumptions by the user, such as seasonality. Exponential smoothing is often used for analysis of time-series data.

Functional data analysis (FDA) is a branch of statistics that analyzes data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional data is considered to be a random function. The physical continuum over which these functions are defined is often time, but may also be spatial location, wavelength, probability, etc. Intrinsically, functional data are infinite dimensional. The high intrinsic dimensionality of these data brings challenges for theory as well as computation, where these challenges vary with how the functional data were sampled. However, the high or infinite dimensional structure of the data is a rich source of information and there are many interesting challenges for research and data analysis.

Vector autoregression (VAR) is a statistical model used to capture the relationship between multiple quantities as they change over time. VAR is a type of stochastic process model. VAR models generalize the single-variable (univariate) autoregressive model by allowing for multivariate time series. VAR models are often used in economics and the natural sciences.

In probability theory, stochastic drift is the change of the average value of a stochastic (random) process. A related concept is the drift rate, which is the rate at which the average changes. For example, a process that counts the number of heads in a series of fair coin tosses has a drift rate of 1/2 per toss. This is in contrast to the random fluctuations about this average value. The stochastic mean of that coin-toss process is 1/2 and the drift rate of the stochastic mean is 0, assuming 1 = heads and 0 = tails.

In statistics, Wold's decomposition or the Wold representation theorem, named after Herman Wold, says that every covariance-stationary time series can be written as the sum of two time series, one deterministic and one stochastic.

Seasonal adjustment or deseasonalization is a statistical method for removing the seasonal component of a time series. It is usually done when wanting to analyse the trend, and cyclical deviations from trend, of a time series independently of the seasonal components. Many economic phenomena have seasonal cycles, such as agricultural production, and consumer consumption. It is necessary to adjust for this component in order to understand underlying trends in the economy, so official statistics are often adjusted to remove seasonal components. Typically, seasonally adjusted data is reported for unemployment rates to reveal the underlying trends and cycles in labor markets.

In statistics, mean absolute error (MAE) is a measure of errors between paired observations expressing the same phenomenon. Examples of Y versus X include comparisons of predicted versus observed, subsequent time versus initial time, and one technique of measurement versus an alternative technique of measurement. MAE is calculated as:

Singular spectrum analysis Nonparametric spectral estimation method

In time series analysis, singular spectrum analysis (SSA) is a nonparametric spectral estimation method. It combines elements of classical time series analysis, multivariate statistics, multivariate geometry, dynamical systems and signal processing. Its roots lie in the classical Karhunen (1946)–Loève spectral decomposition of time series and random fields and in the Mañé (1981)–Takens (1981) embedding theorem. SSA can be an aid in the decomposition of time series into a sum of components, each having a meaningful interpretation. The name "singular spectrum analysis" relates to the spectrum of eigenvalues in a singular value decomposition of a covariance matrix, and not directly to a frequency domain decomposition.

In time series data, seasonality is the presence of variations that occur at specific regular intervals less than a year, such as weekly, monthly, or quarterly. Seasonality may be caused by various factors, such as weather, vacation, and holidays and consists of periodic, repetitive, and generally regular and predictable patterns in the levels of a time series.

An error correction model (ECM) belongs to a category of multiple time series models most commonly used for data where the underlying variables have a long-run common stochastic trend, also known as cointegration. ECMs are a theoretically-driven approach useful for estimating both short-term and long-term effects of one time series on another. The term error-correction relates to the fact that last-period's deviation from a long-run equilibrium, the error, influences its short-run dynamics. Thus ECMs directly estimate the speed at which a dependent variable returns to equilibrium after a change in other variables.

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

References

  1. 1 2 3 "6.1 Time series components | OTexts". www.otexts.org. Retrieved 2016-05-14.
  2. 1 2 Dodge, Y. (2003). The Oxford Dictionary of Statistical Terms . New York: Oxford University Press. ISBN   0-19-920613-9.
  3. 1 2 "6.1 Time series components | OTexts". www.otexts.org. Retrieved 2016-05-18.
  4. "6.5 STL decomposition | OTexts". www.otexts.org. Retrieved 2016-05-18.
  5. Kendall, M. G. (1976). Time-Series (Second ed.). Charles Griffin. (Fig. 5.1). ISBN   0-85264-241-5.
  6. 1 2 Asadi, Nooshin; Karimi Alavijeh, Masih; Zilouei, Hamid (2016). "Development of a mathematical methodology to investigate biohydrogen production from regional and national agricultural crop residues: A case study of Iran". International Journal of Hydrogen Energy. doi:10.1016/j.ijhydene.2016.10.021.

Further reading