This article includes a list of general references, but it lacks sufficient corresponding inline citations .(May 2022) |
Demand forecasting, also known as demand planning and sales forecasting (DP&SF), [1] involves the prediction of the quantity of goods and services that will be demanded by consumers or business customers at a future point in time. [2] More specifically, the methods of demand forecasting entail using predictive analytics to estimate customer demand in consideration of key economic conditions. This is an important tool in optimizing business profitability through efficient supply chain management. Demand forecasting methods are divided into two major categories, qualitative and quantitative methods:
Demand forecasting may be used in resource allocation, inventory management, assessing future capacity requirements, or making decisions on whether to enter a new market. [3]
Demand forecasting plays an important role for businesses in different industries, particularly with regard to mitigating the risks associated with particular business activities. However, demand forecasting is known to be a challenging task for businesses due to the intricacies of analysis, specifically quantitative analysis. [4] Nevertheless, understanding customer needs is an indispensable part of any industry in order for business activities to be implemented efficiently and more appropriately respond to market needs. If businesses are able to forecast demand effectively, several benefits can be accrued. These include, but are not limited to, waste reduction, optimized allocation of resources, and potentially large increases in sales and revenue.
Some of the reasons why businesses require demand forecasting include:
There are various statistical and econometric analyses used to forecast demand. [9] Forecasting demand can be broken down into seven stage process, the seven stages are described as:
The first step to forecast demand is to determine a set of objectives or information to derive different business strategies. These objectives are based on a set of hypotheses that usually come from a mixture of economic theory or previous empirical studies. For example, a manager may wish to find what the optimal price and production amount would be for a new product, based on how demand elasticity affected past company sales.
There are many different econometric models which differ depending on the analysis that managers wish to perform. The type of model that is chosen to forecast demand depends on many different aspects such as the type of data obtained or the number of observations, etc. [10] In this stage it is important to define the type of variables that will be used to forecast demand. Regression analysis is the main statistical method for forecasting. There are many different types of regression analysis, but fundamentally they provide an analysis of how one or multiple variables affect the dependent variable being measured. An example of a model for forecasting demand is M. Roodman's (1986) demand forecasting regression model for measuring the seasonality affects on a data point being measured. [11] The model was based on a linear regression model, and is used to measure linear trends based on seasonal cycles and their affects on demand i.e. the seasonal demand for a product based on sales in summer and winter.
The linear regression model is described as:
Where is the dependent variable, is the intercept, is the slope coefficient, is the independent variable and e is the error term.
M. Roodman's demand forecasting model is based on linear regression and is described as:
is defined as the set of all t - indices for quarter q. The process that generates the data for all periods t that fall in quarter q is given by:
Once the type of model is specified in stage 2, the data and the method of collecting data must be specified. The model must be specified first in order to determine the variables which need to be collected. Conversely, when deciding on the desired forecasting model, the available data or methods to collect data need to be considered in order to formulate the correct model. Gathering Time series data and cross-sectional data are the different collection methods that may be used. Time series data are based on historical observations taken sequentially in time. These observations are used to derive relevant statistics, characteristics, and insight from the data. [12] The data points that may be collected using time series data may be sales, prices, manufacturing costs, and their corresponding time intervals i.e., weekly, monthly, quarterly, annually, or any other regular interval. Cross-sectional data refers to data collected on a single entity at different periods of time. Cross-sectional data used in demand forecasting usually depicts a data point gathered from an individual, firm, industry, or area. For example, sales for Firm A during quarter 1. This type of data encapsulates a variety of data points which resulted in the final data point. The subset of data points may not be observable or feasible to determine but can be a practical method for adding precision to the demand forecast model. [13] The source for the data can be found via the firm's records, commercial or private agencies, or official sources.
Once the model and data are obtained then the values can be computed to determine the effects the independent variables have on the dependent variable in focus. Using the linear regression model as an example of estimating parameters, the following steps are taken:
Linear regression formula:
The first step is to find the line that minimizes the sum of the squares of the difference between the observed values of the dependent variable and the fitted values from the line. [9] This is expressed as which minimizes and , the fitted value from the regression line.
and also need to be represented to find the intercept and slope of the line. The method of determining and is to use partial differentiation with respect to both and by setting both expressions equal to zero and solving them simultaneously. The method for omitting these variables is described below:
Calculating demand forecast accuracy is the process of determining the accuracy of forecasts made regarding customer demand for a product. [14] [15] Understanding and predicting customer demand is vital to manufacturers and distributors to avoid stock-outs and to maintain adequate inventory levels. While forecasts are never perfect, they are necessary to prepare for actual demand. In order to maintain an optimized inventory and effective supply chain, accurate demand forecasts are imperative.
Forecast accuracy in the supply chain is typically measured using the Mean Absolute Percent Error or MAPE. Statistically, MAPE is defined as the average of percentage errors.
Most practitioners, however, define and use the MAPE as the Mean Absolute Deviation divided by Average Sales, which is just a volume-weighted MAPE, also referred to as the MAD/Mean ratio. This is the same as dividing the sum of the absolute deviations by the total sales of all products. This calculation, where A is the actual value and F the forecast, is also known as WAPE, or the Weighted Absolute Percent Error.
Another interesting option is the weighted . The advantage of this measure is that can weight errors. The only problem is that for seasonal products you will create an undefined result when sales = 0 and that is not symmetrical. This means that you can be much more inaccurate if sales are higher than if they are lower than the forecast. So sMAPE also known as symmetric Mean Absolute Percentage Error, is used to correct this.
Finally, for intermittent demand patterns, none of the above are particularly useful. In this situation, a business may consider MASE (Mean Absolute Scaled Error) as a key performance indicator to use. However, the use of this calculation is challenging as it is not as intuitive as the above-mentioned. [16] Another metric to consider, especially when there are intermittent or lumpy demand patterns at hand, is SPEC (Stock-keeping-oriented Prediction Error Costs). [17] The idea behind this metric is to compare predicted demand and actual demand by computing theoretical incurred costs over the forecast horizon. It assumes, that predicted demand higher than actual demand results in stock-keeping costs, whereas predicted demand lower than actual demand results in opportunity costs. SPEC takes into account temporal shifts (prediction before or after actual demand) or cost-related aspects and allows comparisons between demand forecasts based on business aspects as well.
The forecast error needs to be calculated using actual sales as a base. There are several forms of forecast error calculation methods used, namely Mean Percent Error, Root Mean Squared Error, Tracking Signal and Forecast Bias.
Once the model has been determined, the model is used to test the theory or hypothesis stated in the first stage. The results should describe what is trying to be achieved and determine if the theory or hypothesis is true or false. In relation to the example provided in the first stage, the model should show the relationship between demand elasticity of the market and the correlation it has to past company sales. This should enable managers to make an informed decisions regarding the optimal price and production levels for the new product.
The final step is to then forecast demand based on the data set and model created. In order to forecast demand, estimations of a chosen variable are used to determine the effects it has on demand. Regarding the estimation of the chosen variable, a regression model can be used or both qualitative and quantitative assessments can be implemented. Examples of qualitative and quantitative assessments are:
Others include:
The method of least squares is a parameter estimation method in regression analysis based on minimizing the sum of the squares of the residuals made in the results of each individual equation.
In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator, ridge regression, or simply any degenerate estimator.
In statistics, Deming regression, named after W. Edwards Deming, is an errors-in-variables model that tries to find the line of best fit for a two-dimensional data set. It differs from the simple linear regression in that it accounts for errors in observations on both the x- and the y- axis. It is a special case of total least squares, which allows for any number of predictors and a more complicated error structure.
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more error-free independent variables. The most common form of regression analysis is linear regression, in which one finds the line that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line that minimizes the sum of squared differences between the true data and that line. For specific mathematical reasons, this allows the researcher to estimate the conditional expectation of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the conditional expectation across a broader collection of non-linear models.
Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated. It has been used in many fields including econometrics, chemistry, and engineering. Also known as Tikhonov regularization, named for Andrey Tikhonov, it is a method of regularization of ill-posed problems. It is particularly useful to mitigate the problem of multicollinearity in linear regression, which commonly occurs in models with large numbers of parameters. In general, the method provides improved efficiency in parameter estimation problems in exchange for a tolerable amount of bias.
In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generalization of Deming regression and also of orthogonal regression, and can be applied to both linear and non-linear models.
In statistics, the coefficient of determination, denoted R2 or r2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable. Some sources consider OLS to be linear regression.
In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.
In econometrics, the seemingly unrelated regressions (SUR) or seemingly unrelated regression equations (SURE) model, proposed by Arnold Zellner in (1962), is a generalization of a linear regression model that consists of several regression equations, each having its own dependent variable and potentially different sets of exogenous explanatory variables. Each equation is a valid linear regression on its own and can be estimated separately, which is why the system is called seemingly unrelated, although some authors suggest that the term seemingly related would be more appropriate, since the error terms are assumed to be correlated across the equations.
The mean absolute percentage error (MAPE), also known as mean absolute percentage deviation (MAPD), is a measure of prediction accuracy of a forecasting method in statistics. It usually expresses the accuracy as a ratio defined by the formula:
In statistics, a tobit model is any of a class of regression models in which the observed range of the dependent variable is censored in some way. The term was coined by Arthur Goldberger in reference to James Tobin, who developed the model in 1958 to mitigate the problem of zero-inflated data for observations of household expenditure on durable goods. Because Tobin's method can be easily extended to handle truncated and other non-randomly selected samples, some authors adopt a broader definition of the tobit model that includes these cases.
In statistics, semiparametric regression includes regression models that combine parametric and nonparametric models. They are often used in situations where the fully nonparametric model may not perform well or when the researcher wants to use a parametric model but the functional form with respect to a subset of the regressors or the density of the errors is not known. Semiparametric regression models are a particular type of semiparametric modelling and, since semiparametric models contain a parametric component, they rely on parametric assumptions and may be misspecified and inconsistent, just like a fully parametric model.
Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of other variables, with the goal of obtaining the posterior probability of the regression coefficients and ultimately allowing the out-of-sample prediction of the regressandconditional on observed values of the regressors. The simplest and most widely used version of this model is the normal linear model, in which given is distributed Gaussian. In this model, and under a particular choice of prior probabilities for the parameters—so-called conjugate priors—the posterior can be found analytically. With more arbitrarily chosen priors, the posteriors generally have to be approximated.
The topic of heteroskedasticity-consistent (HC) standard errors arises in statistics and econometrics in the context of linear regression and time series analysis. These are also known as heteroskedasticity-robust standard errors, Eicker–Huber–White standard errors, to recognize the contributions of Friedhelm Eicker, Peter J. Huber, and Halbert White.
In statistical theory, the field of high-dimensional statistics studies data whose dimension is larger than typically considered in classical multivariate analysis. The area arose owing to the emergence of many modern data sets in which the dimension of the data vectors may be comparable to, or even larger than, the sample size, so that justification for the use of traditional techniques, often based on asymptotic arguments with the dimension held fixed as the sample size increased, was lacking.
In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.
Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.
Numerical methods for linear least squares entails the numerical analysis of linear least squares problems.
In statistics, linear regression is a model that estimates the linear relationship between a scalar response and one or more explanatory variables. A model with exactly one explanatory variable is a simple linear regression; a model with two or more explanatory variables is a multiple linear regression. This term is distinct from multivariate linear regression, which predicts multiple correlated dependent variables rather than a single dependent variable.