Kernel regression

Last updated November 01, 2023

In statistics, kernel regression is a non-parametric technique to estimate the conditional expectation of a random variable. The objective is to find a non-linear relation between a pair of random variables X and Y.

Nadaraya–Watson kernel regression

Nadaraya and Watson, both in 1964, proposed to estimate $m$ as a locally weighted average, using a kernel as a weighting function.^[1]^[2]^[3] The Nadaraya–Watson estimator is:

{\widehat {m}}_{h}(x)={\frac {\sum _{i=1}^{n}K_{h}(x-x_{i})y_{i}}{\sum _{i=1}^{n}K_{h}(x-x_{i})}}

where $K_{h}(t)={\frac {1}{h}}K\left({\frac {t}{h}}\right)$ is a kernel with a bandwidth $h$ such that $K(\cdot )$ is of order at least 1, that is $\int _{-\infty }^{\infty }uK(u)du=0$ .

Derivation

\operatorname {E} (Y\mid X=x)=\int yf(y\mid x)\,dy=\int y{\frac {f(x,y)}{f(x)}}\,dy

Using the kernel density estimation for the joint distribution f(x,y) and f(x) with a kernel K,

{\hat {f}}(x,y)={\frac {1}{n}}\sum _{i=1}^{n}K_{h}(x-x_{i})K_{h}(y-y_{i}),

{\hat {f}}(x)={\frac {1}{n}}\sum _{i=1}^{n}K_{h}(x-x_{i}),

we get

{\begin{aligned}\operatorname {\hat {E}} (Y\mid X=x)&=\int {\frac {y\sum _{i=1}^{n}K_{h}(x-x_{i})K_{h}(y-y_{i})}{\sum _{j=1}^{n}K_{h}(x-x_{j})}}\,dy,\\[6pt]&={\frac {\sum _{i=1}^{n}K_{h}(x-x_{i})\int y\,K_{h}(y-y_{i})\,dy}{\sum _{j=1}^{n}K_{h}(x-x_{j})}},\\[6pt]&={\frac {\sum _{i=1}^{n}K_{h}(x-x_{i})y_{i}}{\sum _{j=1}^{n}K_{h}(x-x_{j})}},\end{aligned}}

which is the Nadaraya–Watson estimator.

Priestley–Chao kernel estimator

{\widehat {m}}_{PC}(x)=h^{-1}\sum _{i=2}^{n}(x_{i}-x_{i-1})K\left({\frac {x-x_{i}}{h}}\right)y_{i}

where $h$ is the bandwidth (or smoothing parameter).

Gasser–Müller kernel estimator^[4]

{\widehat {m}}_{GM}(x)=h^{-1}\sum _{i=1}^{n}\left[\int _{s_{i-1}}^{s_{i}}K\left({\frac {x-u}{h}}\right)\,du\right]y_{i}

where $s_{i}={\frac {x_{i-1}+x_{i}}{2}}.$

Example

This example is based upon Canadian cross-section wage data consisting of a random sample taken from the 1971 Canadian Census Public Use Tapes for male individuals having common education (grade 13). There are 205 observations in total.^{[ citation needed ]}

The figure to the right shows the estimated regression function using a second order Gaussian kernel along with asymptotic variability bounds.

Script for example

The following commands of the R programming language use the npreg() function to deliver optimal smoothing and to create the figure given above. These commands can be entered at the command prompt via cut and paste.

install.packages("np")library(np)# non parametric librarydata(cps71)attach(cps71)m<-npreg(logwage~age)plot(m,plot.errors.method="asymptotic",plot.errors.style="band",ylim=c(11,15.2))points(age,logwage,cex=.25)detach(cps71)

According to David Salsburg, the algorithms used in kernel regression were independently developed and used in fuzzy systems: "Coming up with almost exactly the same computer algorithm, fuzzy systems and kernel density-based regressions appear to have been developed completely independently of one another."^[5]

Statistical implementation

GNU Octave mathematical program package
Julia: KernelEstimator.jl
MATLAB: A free MATLAB toolbox with implementation of kernel regression, kernel density estimation, kernel estimation of hazard function and many others is available on these pages (this toolbox is a part of the book ^[6]).
Python: the KernelReg class for mixed data types in the statsmodels.nonparametric sub-package (includes other kernel density related classes), the package kernel_regression as an extension of scikit-learn (inefficient memory-wise, useful only for small datasets)
R: the function npreg of the np package can perform kernel regression.^[7]^[8]
Stata: npregress, kernreg2

Related Research Articles

In mathematical physics, the Dirac delta distribution, also known as the unit impulse, is a generalized function or distribution over the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire real line is equal to one.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator, ridge regression, or simply any degenerate estimator.

On a differentiable manifold, the exterior derivative extends the concept of the differential of a function to differential forms of higher degree. The exterior derivative was first described in its current form by Élie Cartan in 1899. The resulting calculus, known as exterior calculus, allows for a natural, metric-independent generalization of Stokes' theorem, Gauss's theorem, and Green's theorem from vector calculus.

In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on kernels as weights. KDE answers a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the Parzen–Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating it in its current form. One of the famous applications of kernel density estimation is in estimating the class-conditional marginal densities of data when using a naive Bayes classifier, which can improve its prediction accuracy.

In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.

In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. However, M-estimators are not inherently robust, as is clear from the fact that they include maximum likelihood estimators, which are in general not robust. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation.

In statistics, a fixed effects model is a statistical model in which the model parameters are fixed or non-random quantities. This is in contrast to random effects models and mixed models in which all or some of the model parameters are random variables. In many applications including econometrics and biostatistics a fixed effects model refers to a regression model in which the group means are fixed (non-random) as opposed to a random effects model in which the group means are a random sample from a population. Generally, data can be grouped according to several observed factors. The group means could be modeled as fixed or random effects for each grouping. In a fixed effects model each group mean is a group-specific fixed quantity.

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in the regression model. GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences compared to conventional least squares and weighted least squares methods. It was first described by Alexander Aitken in 1935.

Nonparametric regression is a category of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data. That is, no parametric form is assumed for the relationship between predictors and dependent variable. Nonparametric regression requires larger sample sizes than regression based on parametric models because the data must supply the model structure as well as the model estimates.

In statistics, semiparametric regression includes regression models that combine parametric and nonparametric models. They are often used in situations where the fully nonparametric model may not perform well or when the researcher wants to use a parametric model but the functional form with respect to a subset of the regressors or the density of the errors is not known. Semiparametric regression models are a particular type of semiparametric modelling and, since semiparametric models contain a parametric component, they rely on parametric assumptions and may be misspecified and inconsistent, just like a fully parametric model.

Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

The topic of heteroskedasticity-consistent (HC) standard errors arises in statistics and econometrics in the context of linear regression and time series analysis. These are also known as heteroskedasticity-robust standard errors, Eicker–Huber–White standard errors, to recognize the contributions of Friedhelm Eicker, Peter J. Huber, and Halbert White.

<span class="mw-page-title-main">Quantile regression</span> Statistics concept

Quantile regression is a type of regression analysis used in statistics and econometrics. Whereas the method of least squares estimates the conditional mean of the response variable across values of the predictor variables, quantile regression estimates the conditional median of the response variable. Quantile regression is an extension of linear regression used when the conditions of linear regression are not met.

A kernel smoother is a statistical technique to estimate a real valued function $as the weighted average of neighboring observed data. The weight is defined by the kernel, such that closer points are given higher weights. The estimated function is smooth, and the level of smoothness is set by a single parameter. Kernel smoothing is a type of weighted moving average.$

In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). More specifically, PCR is used for estimating the unknown regression coefficients in a standard linear regression model.

In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.

Kernel density estimation is a nonparametric technique for density estimation i.e., estimation of probability density functions, which is one of the fundamental questions in statistics. It can be viewed as a generalisation of histogram density estimation with improved statistical properties. Apart from histograms, other types of density estimators include parametric, spline, wavelet and Fourier series. Kernel density estimators were first introduced in the scientific literature for univariate data in the 1950s and 1960s and subsequently have been widely adopted. It was soon recognised that analogous estimators for multivariate data would be an important addition to multivariate statistics. Based on research carried out in the 1990s and 2000s, multivariate kernel density estimation has reached a level of maturity comparable to its univariate counterparts.

In statistics and econometrics, the maximum score estimator is a nonparametric estimator for discrete choice models developed by Charles Manski in 1975. Unlike the multinomial probit and multinomial logit estimators, it makes no assumptions about the distribution of the unobservable part of utility. However, its statistical properties are more complicated than the multinomial probit and logit models, making statistical inference difficult. To address these issues, Joel Horowitz proposed a variant, called the smoothed maximum score estimator.

A partially linear model is a form of semiparametric model, since it contains parametric and nonparametric elements. Application of the least squares estimators is available to partially linear model, if the hypothesis of the known of nonparametric element is valid. Partially linear equations were first used in the analysis of the relationship between temperature and usage of electricity by Engle, Granger, Rice and Weiss (1986). Typical application of partially linear model in the field of Microeconomics is presented by Tripathi in the case of profitability of firm's production in 1997. Also, partially linear model applied successfully in some other academic field. In 1994, Zeger and Diggle introduced partially linear model into biometrics. In environmental science, Parda-Sanchez et al. used partially linear model to analysis collected data in 2000. So far, partially linear model was optimized in many other statistic methods. In 1988, Robinson applied Nadaraya-Waston kernel estimator to test the nonparametric element to build a least-squares estimator After that, in 1997, local linear method was found by Truong.

References

↑ Nadaraya, E. A. (1964). "On Estimating Regression". Theory of Probability and Its Applications. 9 (1): 141–2. doi:10.1137/1109020.
↑ Watson, G. S. (1964). "Smooth regression analysis". Sankhyā: The Indian Journal of Statistics, Series A. 26 (4): 359–372. JSTOR 25049340.
↑ Bierens, Herman J. (1994). "The Nadaraya–Watson kernel regression function estimator". Topics in Advanced Econometrics. New York: Cambridge University Press. pp. 212–247. ISBN 0-521-41900-X.
↑ Gasser, Theo; Müller, Hans-Georg (1979). "Kernel estimation of regression functions". Springer: 23–68.{{cite journal}}: Cite journal requires |journal= (help)
↑ Salsburg, D. (2002). The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century . W.H. Freeman. pp. 290–91. ISBN 0-8050-7134-2.
↑ Horová, I.; Koláček, J.; Zelinka, J. (2012). Kernel Smoothing in MATLAB: Theory and Practice of Kernel Smoothing. Singapore: World Scientific Publishing. ISBN 978-981-4405-48-5.
↑ np: Nonparametric kernel smoothing methods for mixed data types
↑ Kloke, John; McKean, Joseph W. (2014). Nonparametric Statistical Methods Using R. CRC Press. pp. 98–106. ISBN 978-1-4398-7343-4.

External links

Scale-adaptive kernel regression (with Matlab software).
Tutorial of Kernel regression using spreadsheet (with Microsoft Excel).
An online kernel regression demonstration Requires .NET 3.0 or later.
Kernel regression with automatic bandwidth selection (with Python)

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Nadaraya, E. A. (1964). "On Estimating Regression". Theory of Probability and Its Applications. 9 (1): 141–2. doi:10.1137/1109020.

[2] Watson, G. S. (1964). "Smooth regression analysis". Sankhyā: The Indian Journal of Statistics, Series A. 26 (4): 359–372. JSTOR 25049340.

[3] Bierens, Herman J. (1994). "The Nadaraya–Watson kernel regression function estimator". Topics in Advanced Econometrics. New York: Cambridge University Press. pp. 212–247. ISBN 0-521-41900-X.

[4] Gasser, Theo; Müller, Hans-Georg (1979). "Kernel estimation of regression functions". Springer: 23–68.{{cite journal}}: Cite journal requires |journal= (help)

[5] Salsburg, D. (2002). The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century . W.H. Freeman. pp. 290–91. ISBN 0-8050-7134-2.

[HorKolZel-6] Horová, I.; Koláček, J.; Zelinka, J. (2012). Kernel Smoothing in MATLAB: Theory and Practice of Kernel Smoothing. Singapore: World Scientific Publishing. ISBN 978-981-4405-48-5.

[7] : Nonparametric kernel smoothing methods for mixed data types

[8] Kloke, John; McKean, Joseph W. (2014). Nonparametric Statistical Methods Using R. CRC Press. pp. 98–106. ISBN 978-1-4398-7343-4.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

Kernel regression

Contents

Nadaraya–Watson kernel regression

Derivation

Priestley–Chao kernel estimator

Gasser–Müller kernel estimator^[4]

Example

Script for example

Statistical implementation

See also

Related Research Articles

References

Further reading

External links

Kernel regression

Contents

Nadaraya–Watson kernel regression

Derivation

Priestley–Chao kernel estimator

Gasser–Müller kernel estimator [4]

Example

Script for example

Related

Statistical implementation

See also

Related Research Articles

References

Further reading

External links

Gasser–Müller kernel estimator^[4]