Quantile regression

Last updated

Quantile regression is a type of regression analysis used in statistics and econometrics. Whereas the method of least squares estimates the conditional mean of the response variable across values of the predictor variables, quantile regression estimates the conditional median (or other quantiles) of the response variable. Quantile regression is an extension of linear regression used when the conditions of linear regression are not met.

Contents

Example for quantile regression Quantilsregression.svg
Example for quantile regression

Advantages and applications

One advantage of quantile regression relative to ordinary least squares regression is that the quantile regression estimates are more robust against outliers in the response measurements. However, the main attraction of quantile regression goes beyond this and is advantageous when conditional quantile functions are of interest. Different measures of central tendency and statistical dispersion can be useful to obtain a more comprehensive analysis of the relationship between variables. [1]

In ecology, quantile regression has been proposed and used as a way to discover more useful predictive relationships between variables in cases where there is no relationship or only a weak relationship between the means of such variables. The need for and success of quantile regression in ecology has been attributed to the complexity of interactions between different factors leading to data with unequal variation of one variable for different ranges of another variable. [2]

Another application of quantile regression is in the areas of growth charts, where percentile curves are commonly used to screen for abnormal growth. [3] [4]

History

The idea of estimating a median regression slope, a major theorem about minimizing sum of the absolute deviances and a geometrical algorithm for constructing median regression was proposed in 1760 by Ruđer Josip Bošković, a Jesuit Catholic priest from Dubrovnik. [1] :4 [5] He was interested in the ellipticity of the earth, building on Isaac Newton's suggestion that its rotation could cause it to bulge at the equator with a corresponding flattening at the poles. [6] He finally produced the first geometric procedure for determining the equator of a rotating planet from three observations of a surface feature. More importantly for quantile regression, he was able to develop the first evidence of the least absolute criterion and preceded the least squares introduced by Legendre in 1805 by fifty years. [7]

Other thinkers began building upon Bošković's idea such as Pierre-Simon Laplace, who developed the so-called "methode de situation." This led to Francis Edgeworth's plural median [8] - a geometric approach to median regression - and is recognized as the precursor of the simplex method. [7] The works of Bošković, Laplace, and Edgeworth were recognized as a prelude to Roger Koenker's contributions to quantile regression.

Median regression computations for larger data sets are quite tedious compared to the least squares method, for which reason it has historically generated a lack of popularity among statisticians, until the widespread adoption of computers in the latter part of the 20th century.

Quantiles

Quantile regression expresses the conditional quantiles of a dependent variable as a linear function of the explanatory variables. Crucial to the practicality of quantile regression is that the quantiles can be expressed as the solution of a minimization problem, as we will show in this section before discussing conditional quantiles in the next section.

Quantile of a random variable

Let be a real valued random variable with cumulative distribution function . The th quantile of Y is given by

where

Define the loss function as , where is an indicator function.

A specific quantile can be found by minimizing the expected loss of with respect to : [1] (pp. 56):

This can be shown by computing the derivative of the expected loss via an application of the Leibniz integral rule, setting it to 0, and letting be the solution of

This equation reduces to

and then to

If the solution is not unique, then we have to take the smallest such solution to obtain the th quantile of the random variable Y.

Example

Let be a discrete random variable that takes values with with equal probabilities. The task is to find the median of Y, and hence the value is chosen. Then the expected loss of is

Since is a constant, it can be taken out of the expected loss function (this is only true if ). Then, at u=3,

Suppose that u is increased by 1 unit. Then the expected loss will be changed by on changing u to 4. If, u=5, the expected loss is

and any change in u will increase the expected loss. Thus u=5 is the median. The Table below shows the expected loss (divided by ) for different values of u.

u123456789
Expected loss362924212021242936

Intuition

Consider and let q be an initial guess for . The expected loss evaluated at q is

In order to minimize the expected loss, we move the value of q a little bit to see whether the expected loss will rise or fall. Suppose we increase q by 1 unit. Then the change of expected loss would be

The first term of the equation is and second term of the equation is . Therefore, the change of expected loss function is negative if and only if , that is if and only if q is smaller than the median. Similarly, if we reduce q by 1 unit, the change of expected loss function is negative if and only if q is larger than the median.

In order to minimize the expected loss function, we would increase (decrease) L(q) if q is smaller (larger) than the median, until q reaches the median. The idea behind the minimization is to count the number of points (weighted with the density) that are larger or smaller than q and then move q to a point where q is larger than % of the points.

Sample quantile

The sample quantile can be obtained by solving the following minimization problem

,

where the function is the tilted absolute value function. The intuition is the same as for the population quantile.

Conditional quantile and quantile regression

The th conditional quantile of given is the th quantile of the Conditional probability distribution of given ,

.

We use a capital to denote the conditional quantile to indicate that it is a random variable.

In quantile regression for the th quantile we make the assumption that the th conditional quantile is given as a linear function of the explanatory variables:

.

Given the distribution function of , can be obtained by solving

Solving the sample analog gives the estimator of .

Note that when the loss function is proportional to the absolute value function and thus median regression is the same as linear regression by least absolute deviations.

Computation of estimates for regression parameters

The mathematical forms arising from quantile regression are distinct from those arising in the method of least squares. The method of least squares leads to a consideration of problems in an inner product space, involving projection onto subspaces, and thus the problem of minimizing the squared errors can be reduced to a problem in numerical linear algebra. Quantile regression does not have this structure, and instead the minimization problem can be reformulated as a linear programming problem

where

,   

Simplex methods [1] :181 or interior point methods [1] :190 can be applied to solve the linear programming problem.

Asymptotic properties

For , under some regularity conditions, is asymptotically normal:

where

and

Direct estimation of the asymptotic variance-covariance matrix is not always satisfactory. Inference for quantile regression parameters can be made with the regression rank-score tests or with the bootstrap methods. [9]

Equivariance

See invariant estimator for background on invariance or see equivariance.

Scale equivariance

For any and

Shift equivariance

For any and

Equivariance to reparameterization of design

Let be any nonsingular matrix and

Invariance to monotone transformations

If is a nondecreasing function on 'R, the following invariance property applies:

Example (1):

If and , then . The mean regression does not have the same property since

Bayesian methods for quantile regression

Because quantile regression does not normally assume a parametric likelihood for the conditional distributions of Y|X, the Bayesian methods work with a working likelihood. A convenient choice is the asymmetric Laplacian likelihood, [10] because the mode of the resulting posterior under a flat prior is the usual quantile regression estimates. The posterior inference, however, must be interpreted with care. Yang, Wang and He [11] provided a posterior variance adjustment for valid inference. In addition, Yang and He [12] showed that one can have asymptotically valid posterior inference if the working likelihood is chosen to be the empirical likelihood.

Machine learning methods for quantile regression

Beyond simple linear regression, there are several machine learning methods that can be extended to quantile regression. A switch from the squared error to the tilted absolute value loss function allows gradient descent based learning algorithms to learn a specified quantile instead of the mean. It means that we can apply all neural network and deep learning algorithms to quantile regression. [13] [14] Tree-based learning algorithms are also available for quantile regression (see, e.g., Quantile Regression Forests, [15] as a simple generalization of Random Forests).

Censored quantile regression

If the response variable is subject to censoring, the conditional mean is not identifiable without additional distributional assumptions, but the conditional quantile is often identifiable. For recent work on censored quantile regression, see: Portnoy [16] and Wang and Wang [17]

Example (2):

Let and . Then . This is the censored quantile regression model: estimated values can be obtained without making any distributional assumptions, but at the cost of computational difficulty, [18] some of which can be avoided by using a simple three step censored quantile regression procedure as an approximation. [19]

For random censoring on the response variables, the censored quantile regression of Portnoy (2003) [16] provides consistent estimates of all identifiable quantile functions based on reweighting each censored point appropriately.

Implementations

Numerous statistical software packages include implementations of quantile regression:

Related Research Articles

Expectation–maximization algorithm Iterative method for finding maximum likelihood estimates in statistical models

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.

Logistic distribution Continuous probability distribution

In probability theory and statistics, the logistic distribution is a continuous probability distribution. Its cumulative distribution function is the logistic function, which appears in logistic regression and feedforward neural networks. It resembles the normal distribution in shape but has heavier tails. The logistic distribution is a special case of the Tukey lambda distribution.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression that allows for the response variable to have an error distribution other than the normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

Regression analysis Set of statistical processes for estimating the relationships among variables

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables. The most common form of regression analysis is linear regression, in which one finds the line that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line that minimizes the sum of squared differences between the true data and that line. For specific mathematical reasons, this allows the researcher to estimate the conditional expectation of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the conditional expectation across a broader collection of non-linear models.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function of the independent variable.

The polytomous Rasch model is generalization of the dichotomous Rasch model. It is a measurement model that has potential application in any context in which the objective is to measure a trait or ability through a process in which responses to items are scored with successive integers. For example, the model is applicable to the use of Likert scales, rating scales, and to educational assessment items for which successively higher integer scores are intended to indicate increasing levels of competence or attainment.

In statistics, a tobit model is any of a class of regression models in which the observed range of the dependent variable is censored in some way. The term was coined by Arthur Goldberger in reference to James Tobin, who developed the model in 1958 to mitigate the problem of zero-inflated data for observations of household expenditure on durable goods. Because Tobin's method can be easily extended to handle truncated and other non-randomly selected samples, some authors adopt a broader definition of the tobit model that includes these cases.

In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation. 48 samples of robust M-estimators can be founded in a recent review study.

In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial has probability of success . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.

The softmax function, also known as softargmax or normalized exponential function, is a generalization of the logistic function to multiple dimensions. It is used in multinomial logistic regression and is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes, based on Luce's choice axiom.

The method of iteratively reweighted least squares (IRLS) is used to solve certain optimization problems with objective functions of the form of a p-norm:

A kernel smoother is a statistical technique to estimate a real valued function as the weighted average of neighboring observed data. The weight is defined by the kernel, such that closer points are given higher weights. The estimated function is smooth, and the level of smoothness is set by a single parameter.

Least absolute deviations (LAD), also known as least absolute errors (LAE), least absolute value (LAV), least absolute residual (LAR), sum of absolute deviations, or the L1 norm condition, is a statistical optimality criterion and the statistical optimization technique that relies on it. Similar to the least squares technique, it attempts to find a function which closely approximates a set of data. In the simple case of a set of (x,y) data, the approximation function is a simple "trend line" in two-dimensional Cartesian coordinates. The method minimizes the sum of absolute errors (SAE). The least absolute deviations estimate also arises as the maximum likelihood estimate if the errors have a Laplace distribution. It was introduced in 1757 by Roger Joseph Boscovich.

Gradient boosting Machine learning technique

Gradient boosting is a machine learning technique for regression, classification and other tasks, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. When a decision tree is the weak learner, the resulting algorithm is called gradient boosted trees, which usually outperforms random forest. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.

The Generalized Additive Model for Location, Scale and Shape (GAMLSS) is an approach to statistical modelling and learning. GAMLSS is a modern distribution-based approach to (semiparametric) regression. A parametric distribution is assumed for the response (target) variable but the parameters of this distribution can vary according to explanatory variables using linear, nonlinear or smooth functions. In machine learning parlance, GAMLSS is a form of supervised machine learning.

Quantile Regression Averaging (QRA) is a forecast combination approach to the computation of prediction intervals. It involves applying quantile regression to the point forecasts of a small number of individual forecasting models or experts. It has been introduced in 2014 by Jakub Nowotarski and Rafał Weron and originally used for probabilistic forecasting of electricity prices and loads. Despite its simplicity it has been found to perform extremely well in practice - the top two performing teams in the price track of the Global Energy Forecasting Competition (GEFCom2014) used variants of QRA.

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

Manifold regularization

In machine learning, Manifold regularization is a technique for using the shape of a dataset to constrain the functions that should be learned on that dataset. In many machine learning problems, the data to be learned do not cover the entire input space. For example, a facial recognition system may not need to classify any possible image, but only the subset of images that contain faces. The technique of manifold learning assumes that the relevant subset of data comes from a manifold, a mathematical structure with useful properties. The technique also assumes that the function to be learned is smooth: data with different labels are not likely to be close together, and so the labeling function should not change quickly in areas where there are likely to be many data points. Because of this assumption, a manifold regularization algorithm can use unlabeled data to inform where the learned function is allowed to change quickly and where it is not, using an extension of the technique of Tikhonov regularization. Manifold regularization algorithms can extend supervised learning algorithms in semi-supervised learning and transductive learning settings, where unlabeled data are available. The technique has been used for applications including medical imaging, geographical imaging, and object recognition.

In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs). In particular, VGLMs allow for response variables outside the classical exponential family and for more than one parameter. Each parameter can be transformed by a link function. The VGLM framework is also large enough to naturally accommodate multiple responses; these are several independent responses each coming from a particular statistical distribution with possibly different parameter values.

Data-driven control systems are a broad family of control systems, in which the identification of the process model and/or the design of the controller are based entirely on experimental data collected from the plant.

References

  1. 1 2 3 4 5 Koenker, Roger (2005). Quantile Regression . Cambridge University Press. pp.  146–7. ISBN   978-0-521-60827-5.
  2. Cade, Brian S.; Noon, Barry R. (2003). "A gentle introduction to quantile regression for ecologists" (PDF). Frontiers in Ecology and the Environment. 1 (8): 412–420. doi:10.2307/3868138. JSTOR   3868138.
  3. Wei, Y.; Pere, A.; Koenker, R.; He, X. (2006). "Quantile Regression Methods for Reference Growth Charts". Statistics in Medicine . 25 (8): 1369–1382. doi:10.1002/sim.2271. PMID   16143984.
  4. Wei, Y.; He, X. (2006). "Conditional Growth Charts (with discussions)". Annals of Statistics . 34 (5): 2069–2097 and 2126–2131. arXiv: math/0702634 . doi:10.1214/009053606000000623.
  5. Stigler, S. (1984). "Boscovich, Simpson and a 1760 manuscript note on fitting a linear relation". Biometrika. 71 (3): 615–620. doi:10.1093/biomet/71.3.615.
  6. Koenker, Roger (2005). Quantile Regression . Cambridge: Cambridge University Press. pp.  2. ISBN   9780521845731.
  7. 1 2 Furno, Marilena; Vistocco, Domenico (2018). Quantile Regression: Estimation and Simulation. Hoboken, NJ: John Wiley & Sons. pp. xv. ISBN   9781119975281.
  8. Koenker, Roger (August 1998). "Galton, Edgeworth, Frisch, and prospects for quantile regression in economics" (PDF). UIUC.edu. Retrieved August 22, 2018.
  9. Kocherginsky, M.; He, X.; Mu, Y. (2005). "Practical Confidence Intervals for Regression Quantiles". Journal of Computational and Graphical Statistics . 14 (1): 41–55. doi:10.1198/106186005X27563.
  10. Kozumi, H.; Kobayashi, G. (2011). "Gibbs sampling methods for Bayesian quantile regression" (PDF). Journal of Statistical Computation and Simulation . 81 (11): 1565–1578. doi:10.1080/00949655.2010.496117.
  11. Yang, Y.; Wang, H.X.; He, X. (2016). "Posterior Inference in Bayesian Quantile Regression with Asymmetric Laplace Likelihood". International Statistical Review . 84 (3): 327–344. doi:10.1111/insr.12114. hdl: 2027.42/135059 .
  12. Yang, Y.; He, X. (2010). "Bayesian empirical likelihood for quantile regression". Annals of Statistics . 40 (2): 1102–1131. arXiv: 1207.5378 . doi:10.1214/12-AOS1005.
  13. Petneházi, Gábor (2019-08-21). "QCNN: Quantile Convolutional Neural Network". arXiv: 1908.07978 [cs.LG].
  14. Rodrigues, Filipe; Pereira, Francisco C. (2018-08-27). "Beyond expectation: Deep joint mean and quantile regression for spatio-temporal problems". arXiv: 1808.08798 [stat].
  15. Meinshausen, Nicolai (2006). "Quantile Regression Forests" (PDF). Journal of Machine Learning Research. 7 (6): 983–999.
  16. 1 2 Portnoy, S. L. (2003). "Censored Regression Quantiles". Journal of the American Statistical Association . 98 (464): 1001–1012. doi:10.1198/016214503000000954.
  17. Wang, H.; Wang, L. (2009). "Locally Weighted Censored Quantile Regression". Journal of the American Statistical Association . 104 (487): 1117–1128. CiteSeerX   10.1.1.504.796 . doi:10.1198/jasa.2009.tm08230.
  18. Powell, James L. (1986). "Censored Regression Quantiles". Journal of Econometrics . 32 (1): 143–155. doi:10.1016/0304-4076(86)90016-3.
  19. Chernozhukov, Victor; Hong, Han (2002). "Three-Step Censored Quantile Regression and Extramarital Affairs". J. Amer. Statist. Assoc. 97 (459): 872–882. doi:10.1198/016214502388618663.
  20. "quantreg(x,y,tau,order,Nboot) - File Exchange - MATLAB Central". www.mathworks.com. Retrieved 2016-02-01.
  21. "Gretl Command Reference" (PDF). April 2017.
  22. "quantreg: Quantile Regression". R Project. 2018-12-18.
  23. "gbm: Generalized Boosted Regression Models". R Project. 2019-01-14.
  24. "quantregForest: Quantile Regression Forests". R Project. 2017-12-19.
  25. "qrnn: Quantile Regression Neural Networks". R Project. 2018-06-26.
  26. "qgam: Smooth Additive Quantile Regression Models". R Project. 2019-05-23.
  27. "Quantile Regression Forests". Scikit-garden. Retrieved 3 January 2019.
  28. "Statsmodels: Quantile Regression". Statsmodels. Retrieved 15 November 2019.
  29. "An Introduction to Quantile Regression and the QUANTREG Procedure" (PDF). SAS Support.
  30. "qreg — Quantile regression" (PDF). Stata Manual.
  31. Cameron, A. Colin; Trivedi, Pravin K. (2010). "Quantile Regression". Microeconometrics Using Stata (Revised ed.). College Station: Stata Press. pp. 211–234. ISBN   978-1-59718-073-3.
  32. "JohnLangford/vowpal_wabbit". GitHub. Retrieved 2016-07-09.
  33. "QuantileRegression.m". MathematicaForPrediction. Retrieved 3 January 2019.

Further reading