Part of a series on |

Regression analysis |
---|

Models |

Estimation |

Background |

**Nonparametric regression** is a category of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data. That is, no parametric form is assumed for the relationship between predictors and dependent variable. Nonparametric regression requires larger sample sizes than regression based on parametric models because the data must supply the model structure as well as the model estimates.

In nonparametric regression, we have random variables and and assume the following relationship:

where is some deterministic function. Linear regression is a restricted case of nonparametric regression where is assumed to be affine. Some authors use a slightly stronger assumption of additive noise:

where the random variable is the `noise term', with mean 0. Without the assumption that belongs to a specific parametric family of functions it is impossible to get an unbiased estimate for , however most estimators are consistent under suitable conditions.

This is a non-exhaustive list of non-parametric models for regression.

In Gaussian process regression, also known as Kriging, a Gaussian prior is assumed for the regression curve. The errors are assumed to have a multivariate normal distribution and the regression curve is estimated by its posterior mode. The Gaussian prior may depend on unknown hyperparameters, which are usually estimated via empirical Bayes. The hyperparameters typically specify a prior covariance kernel. In case the kernel should also be inferred nonparametrically from the data, the critical filter can be used.

Smoothing splines have an interpretation as the posterior mode of a Gaussian process regression.

Kernel regression estimates the continuous dependent variable from a limited set of data points by convolving the data points' locations with a kernel function—approximately speaking, the kernel function specifies how to "blur" the influence of the data points so that their values can be used to predict the value for nearby locations.

Decision tree learning algorithms can be applied to learn to predict a dependent variable from data.^{ [1] } Although the original Classification And Regression Tree (CART) formulation applied only to predicting univariate data, the framework can be used to predict multivariate data, including time series.^{ [2] }

**Nonparametric statistics** is the branch of statistics that is not based solely on parametrized families of probability distributions. Nonparametric statistics is based on either being distribution-free or having a specified distribution but with the distribution's parameters unspecified. Nonparametric statistics includes both descriptive statistics and statistical inference. Nonparametric tests are often used when the assumptions of parametric tests are violated.

In probability theory and statistics, a **Gaussian process** is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution, i.e. every finite linear combination of them is normally distributed. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

In statistics, originally in geostatistics, **kriging** or **Kriging**, also known as **Gaussian process regression**, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. Interpolating methods based on other criteria such as smoothness may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as **Wiener–Kolmogorov prediction**, after Norbert Wiener and Andrey Kolmogorov.

In statistical modeling, **regression analysis** is a set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables. The most common form of regression analysis is linear regression, in which one finds the line that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line that minimizes the sum of squared differences between the true data and that line. For specific mathematical reasons, this allows the researcher to estimate the conditional expectation of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the conditional expectation across a broader collection of non-linear models.

The **general linear model** or **general multivariate regression model** is a compact way of simultaneously writing several multiple linear regression models. In that sense it is not a separate statistical linear model. The various multiple linear regression models may be compactly written as

In statistics, a **probit model** is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from * probability* +

In statistics, **Poisson regression** is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable *Y* has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables.

In statistics, a **generalized additive model (GAM)** is a generalized linear model in which the linear response variable depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions.

**Local regression** or **local polynomial regression**, also known as **moving regression**, is a generalization of the moving average and polynomial regression. Its most common methods, initially developed for scatterplot smoothing, are **LOESS** and **LOWESS**, both pronounced. They are two strongly related non-parametric regression methods that combine multiple regression models in a *k*-nearest-neighbor-based meta-model. In some fields, LOESS is known and commonly referred to as Savitzky–Golay filter.

**Multilevel models** are statistical models of parameters that vary at more than one level. An example could be a model of student performance that contains measures for individual students as well as measures for classrooms within which the students are grouped. These models can be seen as generalizations of linear models, although they can also extend to non-linear models. These models became much more popular after sufficient computing power and software became available.

In statistics, **semiparametric regression** includes regression models that combine parametric and nonparametric models. They are often used in situations where the fully nonparametric model may not perform well or when the researcher wants to use a parametric model but the functional form with respect to a subset of the regressors or the density of the errors is not known. Semiparametric regression models are a particular type of semiparametric modelling and, since semiparametric models contain a parametric component, they rely on parametric assumptions and may be misspecified and inconsistent, just like a fully parametric model.

The term **kernel** is used in statistical analysis to refer to a window function. The term "kernel" has several distinct meanings in different branches of statistics.

In statistics, **Kernel regression** is a non-parametric technique to estimate the conditional expectation of a random variable. The objective is to find a non-linear relation between a pair of random variables * X* and

In statistics, **multivariate adaptive regression splines** (**MARS**) is a form of regression analysis introduced by Jerome H. Friedman in 1991. It is a non-parametric regression technique and can be seen as an extension of linear models that automatically models nonlinearities and interactions between variables.

**Smoothing splines** are function estimates, , obtained from a set of noisy observations of the target , in order to balance a measure of goodness of fit of to with a derivative based measure of the smoothness of . They provide a means for smoothing noisy data. The most familiar example is the cubic smoothing spline, but there are many other possibilities, including for the case where is a vector quantity.

In statistics, **polynomial regression** is a form of regression analysis in which the relationship between the independent variable *x* and the dependent variable *y* is modelled as an *n*th degree polynomial in *x*. Polynomial regression fits a nonlinear relationship between the value of *x* and the corresponding conditional mean of *y*, denoted E(*y* |*x*). Although *polynomial regression* fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function E(*y* | *x*) is linear in the unknown parameters that are estimated from the data. For this reason, polynomial regression is considered to be a special case of multiple linear regression.

In statistics, **errors-in-variables models** or **measurement error models** are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.

In statistics, **linear regression** is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called *simple linear regression*; for more than one, the process is called **multiple linear regression**. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

A **partially linear model** is a form of semiparametric model, since it contains parametric and nonparametric elements. Application of the least squares estimators is available to partially linear model, if the hypothesis of the known of nonparametric element is valid. Partially linear equations were first used in the analysis of the relationship between temperature and usage of electricity by Engle, Granger, Rice and Weiss (1986). Typical application of partially linear model in the field of Microeconomics is presented by Tripathi in the case of profitability of firm's production in 1997. Also, partially linear model applied successfully in some other academic field. In 1994, Zeger and Diggle introduced partially linear model into biometrics. In environmental science, Parda-Sanchez et al. used partially linear model to analysis collected data in 2000. So far, partially linear model was optimized in many other statistic methods. In 1988, Robinson applied Nadaraya-Waston kernel estimator to test the nonparametric element to build a least-squares estimator After that, in 1997, local linear method was found by Truong.

- ↑ Breiman, Leo; Friedman, J. H.; Olshen, R. A.; Stone, C. J. (1984).
*Classification and regression trees*. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software. ISBN 978-0-412-04841-8. - ↑ Segal, M.R. (1992). "Tree-structured methods for longitudinal data".
*Journal of the American Statistical Association*. American Statistical Association, Taylor & Francis.**87**(418): 407–418. doi:10.2307/2290271. JSTOR 2290271.

- Bowman, A. W.; Azzalini, A. (1997).
*Applied Smoothing Techniques for Data Analysis*. Oxford: Clarendon Press. ISBN 0-19-852396-3. - Fan, J.; Gijbels, I. (1996).
*Local Polynomial Modelling and its Applications*. Boca Raton: Chapman and Hall. ISBN 0-412-98321-4. - Henderson, D. J.; Parmeter, C. F. (2015).
*Applied Nonparametric Econometrics*. New York: Cambridge University Press. ISBN 978-1-107-01025-3. - Li, Q.; Racine, J. (2007).
*Nonparametric Econometrics: Theory and Practice*. Princeton: Princeton University Press. ISBN 978-0-691-12161-1. - Pagan, A.; Ullah, A. (1999).
*Nonparametric Econometrics*. New York: Cambridge University Press. ISBN 0-521-35564-8.

Wikimedia Commons has media related to Nonparametric regression . |

- HyperNiche, software for nonparametric multiplicative regression.
- Scale-adaptive nonparametric regression (with Matlab software).

This page is based on this Wikipedia article

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.