Smoothing splines are function estimates, , obtained from a set of noisy observations of the target , in order to balance a measure of goodness of fit of to with a derivative based measure of the smoothness of . They provide a means for smoothing noisy data. The most familiar example is the cubic smoothing spline, but there are many other possibilities, including for the case where is a vector quantity.
Let be a set of observations, modeled by the relation where the are independent, zero mean random variables (usually assumed to have constant variance). The cubic smoothing spline estimate of the function is defined to be the minimizer (over the class of twice differentiable functions) of [1] [2]
Remarks:
It is useful to think of fitting a smoothing spline in two steps:
Now, treat the second step first.
Given the vector of fitted values, the sum-of-squares part of the spline criterion is fixed. It remains only to minimize , and the minimizer is a natural cubic spline that interpolates the points . This interpolating spline is a linear operator, and can be written in the form
where are a set of spline basis functions. As a result, the roughness penalty has the form
where the elements of A are . The basis functions, and hence the matrix A, depend on the configuration of the predictor variables , but not on the responses or .
A is an n×n matrix given by .
Δ is an (n-2)×n matrix of second differences with elements:
, ,
W is an (n-2)×(n-2) symmetric tri-diagonal matrix with elements:
, and , the distances between successive knots (or x values).
Now back to the first step. The penalized sum-of-squares can be written as
where .
Minimizing over by differentiating against . This results in: [6] and
De Boor's approach exploits the same idea, of finding a balance between having a smooth curve and being close to the given data. [7]
where is a parameter called smooth factor and belongs to the interval , and are the quantities controlling the extent of smoothing (they represent the weight of each point ). In practice, since cubic splines are mostly used, is usually . The solution for was proposed by Christian Reinsch in 1967. [8] For , when approaches , converges to the "natural" spline interpolant to the given data. [7] As approaches , converges to a straight line (the smoothest curve). Since finding a suitable value of is a task of trial and error, a redundant constant was introduced for convenience. [8] is used to numerically determine the value of so that the function meets the following condition:
The algorithm described by de Boor starts with and increases until the condition is met. [7] If is an estimation of the standard deviation for , the constant is recommended to be chosen in the interval . Having means the solution is the "natural" spline interpolant. [8] Increasing means we obtain a smoother curve by getting farther from the given data.
There are two main classes of method for generalizing from smoothing with respect to a scalar to smoothing with respect to a vector . The first approach simply generalizes the spline smoothing penalty to the multidimensional setting. For example, if trying to estimate we might use the Thin plate spline penalty and find the minimizing
The thin plate spline approach can be generalized to smoothing with respect to more than two dimensions and to other orders of differentiation in the penalty. [1] As the dimension increases there are some restrictions on the smallest order of differential that can be used, [1] but actually Duchon's original paper, [9] gives slightly more complicated penalties that can avoid this restriction.
The thin plate splines are isotropic, meaning that if we rotate the co-ordinate system the estimate will not change, but also that we are assuming that the same level of smoothing is appropriate in all directions. This is often considered reasonable when smoothing with respect to spatial location, but in many other cases isotropy is not an appropriate assumption and can lead to sensitivity to apparently arbitrary choices of measurement units. For example, if smoothing with respect to distance and time an isotropic smoother will give different results if distance is measure in metres and time in seconds, to what will occur if we change the units to centimetres and hours.
The second class of generalizations to multi-dimensional smoothing deals directly with this scale invariance issue using tensor product spline constructions. [10] [11] [12] Such splines have smoothing penalties with multiple smoothing parameters, which is the price that must be paid for not assuming that the same degree of smoothness is appropriate in all directions.
Smoothing splines are related to, but distinct from:
Source code for spline smoothing can be found in the examples from Carl de Boor's book A Practical Guide to Splines. The examples are in the Fortran programming language. The updated sources are available also on Carl de Boor's official site .
In the mathematical subfield of numerical analysis, a B-spline or basis spline is a spline function that has minimal support with respect to a given degree, smoothness, and domain partition. Any spline function of given degree can be expressed as a linear combination of B-splines of that degree. Cardinal B-splines have knots that are equidistant from each other. B-splines can be used for curve-fitting and numerical differentiation of experimental data.
The method of least squares is a parameter estimation method in regression analysis based on minimizing the sum of the squares of the residuals made in the results of each individual equation.
In statistics, the logistic model is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables. In regression analysis, logistic regression is estimating the parameters of a logistic model. Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.
Simultaneous equations models are a type of statistical model in which the dependent variables are functions of other dependent variables, rather than just independent variables. This means some of the explanatory variables are jointly determined with the dependent variable, which in economics usually is the consequence of some underlying equilibrium mechanism. Take the typical supply and demand model: whilst typically one would determine the quantity supplied and demanded to be a function of the price set by the market, it is also possible for the reverse to be true, where producers observe the quantity that consumers demand and then set the price.
In statistics, Deming regression, named after W. Edwards Deming, is an errors-in-variables model that tries to find the line of best fit for a two-dimensional data set. It differs from the simple linear regression in that it accounts for errors in observations on both the x- and the y- axis. It is a special case of total least squares, which allows for any number of predictors and a more complicated error structure.
In mathematics, a spline is a function defined piecewise by polynomials. In interpolating problems, spline interpolation is often preferred to polynomial interpolation because it yields similar results, even when using low degree polynomials, while avoiding Runge's phenomenon for higher degrees.
In mathematics, statistics, finance, computer science, particularly in machine learning and inverse problems, regularization is a process that changes the result answer to be "simpler". It is often used to obtain results for ill-posed problems or to prevent overfitting.
In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.
In statistics, a generalized additive model (GAM) is a generalized linear model in which the linear response variable depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions.
Thin plate splines (TPS) are a spline-based technique for data interpolation and smoothing. "A spline is a function defined by polynomials in a piecewise manner." They were introduced to geometric design by Duchon. They are an important special case of a polyharmonic spline. Robust Point Matching (RPM) is a common extension and shortly known as the TPS-RPM algorithm.
Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m ≥ n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box–Cox transformed regressors ().
A kernel smoother is a statistical technique to estimate a real valued function as the weighted average of neighboring observed data. The weight is defined by the kernel, such that closer points are given higher weights. The estimated function is smooth, and the level of smoothness is set by a single parameter. Kernel smoothing is a type of weighted moving average.
In statistics and machine learning, lasso is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting statistical model. The lasso method assumes that the coefficients of the linear model are sparse, meaning that few of them are non-zero. It was originally introduced in geophysics, and later by Robert Tibshirani, who coined the term.
In statistics, the backfitting algorithm is a simple iterative procedure used to fit a generalized additive model. It was introduced in 1985 by Leo Breiman and Jerome Friedman along with generalized additive models. In most cases, the backfitting algorithm is equivalent to the Gauss–Seidel method, an algorithm used for solving a certain linear system of equations.
In digital signal processing, multidimensional sampling is the process of converting a function of a multidimensional variable into a discrete collection of values of the function measured on a discrete set of points. This article presents the basic result due to Petersen and Middleton on conditions for perfectly reconstructing a wavenumber-limited function from its measurements on a discrete lattice of points. This result, also known as the Petersen–Middleton theorem, is a generalization of the Nyquist–Shannon sampling theorem for sampling one-dimensional band-limited functions to higher-dimensional Euclidean spaces.
Functional principal component analysis (FPCA) is a statistical method for investigating the dominant modes of variation of functional data. Using this method, a random function is represented in the eigenbasis, which is an orthonormal basis of the Hilbert space L2 that consists of the eigenfunctions of the autocovariance operator. FPCA represents functional data in the most parsimonious way, in the sense that when using a fixed number of basis functions, the eigenfunction basis explains more variation than any other basis expansion. FPCA can be applied for representing random functions, or in functional regression and classification.
The generalized functional linear model (GFLM) is an extension of the generalized linear model (GLM) that allows one to regress univariate responses of various types on functional predictors, which are mostly random trajectories generated by a square-integrable stochastic processes. Similarly to GLM, a link function relates the expected value of the response variable to a linear predictor, which in case of GFLM is obtained by forming the scalar product of the random predictor function with a smooth parameter function . Functional Linear Regression, Functional Poisson Regression and Functional Binomial Regression, with the important Functional Logistic Regression included, are special cases of GFLM. Applications of GFLM include classification and discrimination of stochastic processes and functional data.
Regularized least squares (RLS) is a family of methods for solving the least-squares problem while using regularization to further constrain the resulting solution.
In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs). In particular, VGLMs allow for response variables outside the classical exponential family and for more than one parameter. Each parameter can be transformed by a link function. The VGLM framework is also large enough to naturally accommodate multiple responses; these are several independent responses each coming from a particular statistical distribution with possibly different parameter values.
A partially linear model is a form of semiparametric model, since it contains parametric and nonparametric elements. Application of the least squares estimators is available to partially linear model, if the hypothesis of the known of nonparametric element is valid. Partially linear equations were first used in the analysis of the relationship between temperature and usage of electricity by Engle, Granger, Rice and Weiss (1986). Typical application of partially linear model in the field of Microeconomics is presented by Tripathi in the case of profitability of firm's production in 1997. Also, partially linear model applied successfully in some other academic field. In 1994, Zeger and Diggle introduced partially linear model into biometrics. In environmental science, Parda-Sanchez et al. used partially linear model to analysis collected data in 2000. So far, partially linear model was optimized in many other statistic methods. In 1988, Robinson applied Nadaraya-Waston kernel estimator to test the nonparametric element to build a least-squares estimator After that, in 1997, local linear method was found by Truong.
{{cite web}}
: CS1 maint: location (link)