Seasonality

Last updated

In time series data, seasonality is the presence of variations that occur at specific regular intervals less than a year, such as weekly, monthly, or quarterly. Seasonality may be caused by various factors, such as weather, vacation, and holidays [1] and consists of periodic, repetitive, and generally regular and predictable patterns in the levels [2] of a time series.

Contents

Seasonal fluctuations in a time series can be contrasted with cyclical patterns. The latter occur when the data exhibits rises and falls that are not of a fixed period. Such non-seasonal fluctuations are usually due to economic conditions and are often related to the "business cycle"; their period usually extends beyond a single year, and the fluctuations are usually of at least two years. [3]

Organisations facing seasonal variations, such as ice-cream vendors, are often interested in knowing their performance relative to the normal seasonal variation. Seasonal variations in the labour market can be attributed to the entrance of school leavers into the job market as they aim to contribute to the workforce upon the completion of their schooling. These regular changes are of less interest to those who study employment data than the variations that occur due to the underlying state of the economy; their focus is on how unemployment in the workforce has changed, despite the impact of the regular seasonal variations. [3]

It is necessary for organisations to identify and measure seasonal variations within their market to help them plan for the future. This can prepare them for the temporary increases or decreases in labour requirements and inventory as demand for their product or service fluctuates over certain periods. This may require training, periodic maintenance, and so forth that can be organized in advance. Apart from these considerations, the organisations need to know if variation they have experienced has been more or less than the expected amount, beyond what the usual seasonal variations account for.

Motivation

There are several main reasons for studying seasonal variation:

  • The description of the seasonal effect provides a better understanding of the impact this component has upon a particular series.
  • After establishing the seasonal pattern, methods can be implemented to eliminate it from the time-series to study the effect of other components such as cyclical and irregular variations. This elimination of the seasonal effect is referred to as de-seasonalizing or seasonal adjustment of data.
  • To use the past patterns of the seasonal variations to contribute to forecasting and the prediction of the future trends, such as in climate normals.

Detection

The following graphical techniques can be used to detect seasonality:

A really good way to find periodicity, including seasonality, in any regular series of data is to remove any overall trend first and then to inspect time periodicity. [5]

The run sequence plot is a recommended first step for analyzing any time series. Although seasonality can sometimes be indicated by this plot, seasonality is shown more clearly by the seasonal subseries plot or the box plot. The seasonal subseries plot does an excellent job of showing both the seasonal differences (between group patterns) and also the within-group patterns. The box plot shows the seasonal difference (between group patterns) quite well, but it does not show within group patterns. However, for large data sets, the box plot is usually easier to read than the seasonal subseries plot.

The seasonal plot, seasonal subseries plot, and the box plot all assume that the seasonal periods are known. In most cases, the analyst will in fact, know this. For example, for monthly data, the period is 12 since there are 12 months in a year. However, if the period is not known, the autocorrelation plot can help. If there is significant seasonality, the autocorrelation plot should show spikes at lags equal to the period. For example, for monthly data, if there is a seasonality effect, we would expect to see significant peaks at lag 12, 24, 36, and so on (although the intensity may decrease the further out we go).

An autocorrelation plot (ACF) can be used to identify seasonality, as it calculates the difference (residual amount) between a Y value and a lagged value of Y. The result gives some points where the two values are close together ( no seasonality ), but other points where there is a large discrepancy. These points indicate a level of seasonality in the data.

An ACF (autocorrelation) plot, of Australia beer consumption data. Acfbeer.png
An ACF (autocorrelation) plot, of Australia beer consumption data.

Semiregular cyclic variations might be dealt with by spectral density estimation.

Calculation

Seasonal variation is measured in terms of an index, called a seasonal index. It is an average that can be used to compare an actual observation relative to what it would be if there were no seasonal variation. An index value is attached to each period of the time series within a year. This implies that if monthly data are considered there are 12 separate seasonal indices, one for each month. The following methods use seasonal indices to measure seasonal variations of a time-series data.

  • Method of simple averages
  • Ratio to trend method
  • Ratio-to-moving-average method
  • Link relatives method

Method of simple averages

The measurement of seasonal variation by using the ratio-to-moving-average method provides an index to measure the degree of the seasonal variation in a time series. The index is based on a mean of 100, with the degree of seasonality measured by variations away from the base. For example, if we observe the hotel rentals in a winter resort, we find that the winter quarter index is 124. The value 124 indicates that 124 percent of the average quarterly rental occur in winter. If the hotel management records 1436 rentals for the whole of last year, then the average quarterly rental would be 359= (1436/4). As the winter-quarter index is 124, we estimate the number of winter rentals as follows:

359*(124/100)=445;

Here, 359 is the average quarterly rental. 124 is the winter-quarter index. 445 the seasonalized winter-quarter rental.

This method is also called the percentage moving average method. In this method, the original data values in the time-series are expressed as percentages of moving averages. The steps and the tabulations are given below.

Ratio to trend method

  1. Find the centered 12 monthly (or 4 quarterly) moving averages of the original data values in the time-series.
  2. Express each original data value of the time-series as a percentage of the corresponding centered moving average values obtained in step(1). In other words, in a multiplicative time-series model, we get (Original data values) / (Trend values) × 100 = (T×C×S×I) / (T×C) × 100 = (S×I ) × 100.
    This implies that the ratio-to-moving average represents the seasonal and irregular components.
  3. Arrange these percentages according to months or quarter of given years. Find the averages over all months or quarters of the given years.
  4. If the sum of these indices is not 1200 (or 400 for quarterly figures), multiply then by a correction factor = 1200 / (sum of monthly indices). Otherwise, the 12 monthly averages will be considered as seasonal indices.

Ratio-to-moving-average method

Let us calculate the seasonal index by the ratio-to-moving-average method from the following data:

Sample Data
Year/Quarters1234
199675605459
199786656380
199890726685
1999100787293

Now calculations for 4 quarterly moving averages and ratio-to-moving-averages are shown in the below table.

Moving Averages
YearQuarterOriginal Values(Y)4 Figures Moving Total4 Figures Moving Average2 Figures Moving Total2 Figures Moving Average(T)Ratio-to-Moving-Average(%)(Y)/ (T)*100
1996175 
260  
24862.00
354 126.7563.375 85.21
25964.75
459 130.7565.375 90.25
26466.00
1997186 134.2567.125128.12
27368.25
265 141.7570.875 91.71
29473.50
363 148.0074.00 85.13
29874.50
480 150.7575.375106.14
30576.25
1998190 153.2576.625117.45
30877.00
272 155.2577.625 92.75
31378.25
366 159.0079.50 83.02
32380.75
485 163.0081.50104.29
32982.25
19991100 166.0083.00120.48
33583.75
278 169.5084.75 92.03
34385.75
372  
493  
Calculation of Seasonal Index
Years/Quarters1234Total
1996   85.21 90.25
1997128.12 91.71 85.13106.14
1998117.45 92.75 83.02104.29
1999120.48 92.04  
Total366.05276.49253.36300.68
Seasonal Average122.01 92.16 84.45100.23398.85
Adjusted Seasonal Average122.36 92.43 84.69100.52400

Now the total of seasonal averages is 398.85. Therefore, the corresponding correction factor would be 400/398.85 = 1.00288. Each seasonal average is multiplied by the correction factor 1.00288 to get the adjusted seasonal indices as shown in the above table.

1. In an additive time-series model, the seasonal component is estimated as:

S = Y – (T + C + I )

where

S : Seasonal values
Y : Actual data values of the time-series
T : Trend values
C : Cyclical values
I : Irregular values.

2. In a multiplicative time-series model, the seasonal component is expressed in terms of ratio and percentage as

Seasonal effect;

However, in practice the detrending of time-series is done to arrive at .

This is done by dividing both sides of by trend values T so that .

3. The deseasonalized time-series data will have only trend (T ), cyclical (C ) and irregular (I ) components and is expressed as:

  • Multiplicative model :

Modeling

A completely regular cyclic variation in a time series might be dealt with in time series analysis by using a sinusoidal model with one or more sinusoids whose period-lengths may be known or unknown depending on the context. A less completely regular cyclic variation might be dealt with by using a special form of an ARIMA model which can be structured so as to treat cyclic variations semi-explicitly. Such models represent cyclostationary processes.

Another method of modelling periodic seasonality is the use of pairs of Fourier terms. Similar to using the sinusoidal model, Fourier terms added into regression models utilize sine and cosine terms in order to simulate seasonality. However, the seasonality of such a regression would be represented as the sum of sine or cosine terms, instead of a single sine or cosine term in a sinusoidal model. Every periodic function can be approximated with the inclusion of Fourier terms.

The difference between a sinusoidal model and a regression with Fourier terms can be simplified as below:

Sinusoidal Model:

Regression With Fourier Terms:

Seasonal adjustment

Seasonal adjustment or deseasonalization is any method for removing the seasonal component of a time series. The resulting seasonally adjusted data are used, for example, when analyzing or reporting non-seasonal trends over durations rather longer than the seasonal period. An appropriate method for seasonal adjustment is chosen on the basis of a particular view taken of the decomposition of time series into components designated with names such as "trend", "cyclic", "seasonal" and "irregular", including how these interact with each other. For example, such components might act additively or multiplicatively. Thus, if a seasonal component acts additively, the adjustment method has two stages:

If it is a multiplicative model, the magnitude of the seasonal fluctuations will vary with the level, which is more likely to occur with economic series. [3] When taking seasonality into account, the seasonally adjusted multiplicative decomposition can be written as ; whereby the original time series is divided by the estimated seasonal component.

The multiplicative model can be transformed into an additive model by taking the log of the time series;

SA Multiplicative decomposition:

Taking log of the time series of the multiplicative model: [3]

One particular implementation of seasonal adjustment is provided by X-12-ARIMA.

In regression analysis

In regression analysis such as ordinary least squares, with a seasonally varying dependent variable being influenced by one or more independent variables, the seasonality can be accounted for and measured by including n-1 dummy variables, one for each of the seasons except for an arbitrarily chosen reference season, where n is the number of seasons (e.g., 4 in the case of meteorological seasons, 12 in the case of months, etc.). Each dummy variable is set to 1 if the data point is drawn from the dummy's specified season and 0 otherwise. Then the predicted value of the dependent variable for the reference season is computed from the rest of the regression, while for any other season it is computed using the rest of the regression and by inserting the value 1 for the dummy variable for that season.

It is important to distinguish seasonal patterns from related patterns. While a seasonal pattern occurs when a time series is affected by the season or the time of the year, such as annual, semiannual, quarterly, etc. A cyclic pattern, or simply a cycle, occurs when the data exhibit rises and falls in other periods, i.e., much longer (e.g., decadal) or much shorter (e.g., weekly) than seasonal. A quasiperiodicity is a more general, irregular periodicity.

See also

Related Research Articles

<span class="mw-page-title-main">Logistic regression</span> Statistical model for a binary dependent variable

In statistics, the logistic model is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables. In regression analysis, logistic regression is estimating the parameters of a logistic model. Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

Forecasting is the process of making predictions based on past and present data. Later these can be compared (resolved) against what happens. For example, a company might estimate their revenue in the next year, then compare it against the actual results creating a variance actual analysis. Prediction is a similar but more general term. Forecasting might refer to specific formal statistical methods employing time series, cross-sectional or longitudinal data, or alternatively to less formal judgmental methods or the process of prediction and resolution itself. Usage can vary between areas of application: for example, in hydrology the terms "forecast" and "forecasting" are sometimes reserved for estimates of values at certain specific future times, while the term "prediction" is used for more general estimates, such as the number of times floods will occur over a long period.

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for sampling from a specified multivariate probability distribution when direct sampling from the joint distribution is difficult, but sampling from the conditional distribution is more practical. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

X-13ARIMA-SEATS, successor to X-12-ARIMA and X-11, is a set of statistical methods for seasonal adjustment and other descriptive analysis of time series data that are implemented in the U.S. Census Bureau's software package. These methods are or have been used by Statistics Canada, Australian Bureau of Statistics, and the statistical offices of many other countries.

In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. To better comprehend the data or to forecast upcoming series points, both of these models are fitted to time series data. ARIMA models are applied in some cases where data show evidence of non-stationarity in the sense of mean, where an initial differencing step can be applied one or more times to eliminate the non-stationarity of the mean function. When the seasonality shows in a time series, the seasonal-differencing could be applied to eliminate the seasonal component. Since the ARMA model, according to the Wold's decomposition theorem, is theoretically sufficient to describe a regular wide-sense stationary time series, we are motivated to make stationary a non-stationary time series, e.g., by using differencing, before we can use the ARMA model. Note that if the time series contains a predictable sub-process, the predictable component is treated as a non-zero-mean but periodic component in the ARIMA framework so that it is eliminated by the seasonal differencing.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable.

Exponential smoothing or exponential moving average (EMA) is a rule of thumb technique for smoothing time series data using the exponential window function. Whereas in the simple moving average the past observations are weighted equally, exponential functions are used to assign exponentially decreasing weights over time. It is an easily learned and easily applied procedure for making some determination based on prior assumptions by the user, such as seasonality. Exponential smoothing is often used for analysis of time-series data.

Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional data is considered to be a random function. The physical continuum over which these functions are defined is often time, but may also be spatial location, wavelength, probability, etc. Intrinsically, functional data are infinite dimensional. The high intrinsic dimensionality of these data brings challenges for theory as well as computation, where these challenges vary with how the functional data were sampled. However, the high or infinite dimensional structure of the data is a rich source of information and there are many interesting challenges for research and data analysis.

In time series analysis, the Box–Jenkins method, named after the statisticians George Box and Gwilym Jenkins, applies autoregressive moving average (ARMA) or autoregressive integrated moving average (ARIMA) models to find the best fit of a time-series model to past values of a time series.

In statistics, a generalized additive model (GAM) is a generalized linear model in which the linear response variable depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions.

In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial has probability of success . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.

<span class="mw-page-title-main">Correlogram</span> Image of correlation statistics

In the analysis of data, a correlogram is a chart of correlation statistics. For example, in time series analysis, a plot of the sample autocorrelations versus is an autocorrelogram. If cross-correlation is plotted, the result is called a cross-correlogram.

Seasonal adjustment or deseasonalization is a statistical method for removing the seasonal component of a time series. It is usually done when wanting to analyse the trend, and cyclical deviations from trend, of a time series independently of the seasonal components. Many economic phenomena have seasonal cycles, such as agricultural production, and consumer consumption. It is necessary to adjust for this component in order to understand underlying trends in the economy, so official statistics are often adjusted to remove seasonal components. Typically, seasonally adjusted data is reported for unemployment rates to reveal the underlying trends and cycles in labor markets.

The decomposition of time series is a statistical task that deconstructs a time series into several components, each representing one of the underlying categories of patterns. There are two principal types of decomposition, which are outlined below.

In statistical signal processing, the goal of spectral density estimation (SDE) or simply spectral estimation is to estimate the spectral density of a signal from a sequence of time samples of the signal. Intuitively speaking, the spectral density characterizes the frequency content of the signal. One purpose of estimating the spectral density is to detect any periodicities in the data, by observing peaks at the frequencies corresponding to these periodicities.

Demand forecasting refers to the process of predicting the quantity of goods and services that will be demanded by consumers at a future point in time. More specifically, the methods of demand forecasting entail using predictive analytics to estimate customer demand in consideration of key economic conditions. This is an important tool in optimizing business profitability through efficient supply chain management. Demand forecasting methods are divided into two major categories, qualitative and quantitative methods. Qualitative methods are based on expert opinion and information gathered from the field. This method is mostly used in situations when there is minimal data available for analysis such as when a business or product has recently been introduced to the market. Quantitative methods, however, use available data, and analytical tools in order to produce predictions. Demand forecasting may be used in resource allocation, inventory management, assessing future capacity requirements, or making decisions on whether to enter a new market.

In statistics, multivariate adaptive regression splines (MARS) is a form of regression analysis introduced by Jerome H. Friedman in 1991. It is a non-parametric regression technique and can be seen as an extension of linear models that automatically models nonlinearities and interactions between variables.

In statistics and in machine learning, a linear predictor function is a linear function of a set of coefficients and explanatory variables, whose value is used to predict the outcome of a dependent variable. This sort of function usually comes in linear regression, where the coefficients are called regression coefficients. However, they also occur in various types of linear classifiers, as well as in various other models, such as principal component analysis and factor analysis. In many of these models, the coefficients are referred to as "weights".

In applied statistics and geostatistics, regression-kriging (RK) is a spatial prediction technique that combines a regression of the dependent variable on auxiliary variables with interpolation (kriging) of the regression residuals. It is mathematically equivalent to the interpolation method variously called universal kriging and kriging with external drift, where auxiliary predictors are used directly to solve the kriging weights.

In statistics, linear regression is a statistical model which estimates the linear relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable. If the explanatory variables are measured with error then errors-in-variables models are required, also known as measurement error models.

References

  1. "Seasonality". |title=Influencing Factors|
  2. "Archived copy". Archived from the original on 2015-05-18. Retrieved 2015-05-13.{{cite web}}: CS1 maint: archived copy as title (link)
  3. 1 2 3 4 5 6.1 Time series components - OTexts.
  4. 2.1 Graphics - OTexts.
  5. "time series - What method can be used to detect seasonality in data?". Cross Validated.

Further reading

PD-icon.svg This article incorporates public domain material from NIST/SEMATECH e-Handbook of Statistical Methods. National Institute of Standards and Technology.