Radial basis function

Last updated

A radial basis function (RBF) is a real-valued function whose value depends only on the distance between the input and some fixed point, either the origin, so that , or some other fixed point , called a center, so that . Any function that satisfies the property is a radial function. The distance is usually Euclidean distance, although other metrics are sometimes used. They are often used as a collection which forms a basis for some function space of interest, hence the name.

Contents

Sums of radial basis functions are typically used to approximate given functions. This approximation process can also be interpreted as a simple kind of neural network; this was the context in which they were originally applied to machine learning, in work by David Broomhead and David Lowe in 1988, [1] [2] which stemmed from Michael J. D. Powell's seminal research from 1977. [3] [4] [5] RBFs are also used as a kernel in support vector classification. [6] The technique has proven effective and flexible enough that radial basis functions are now applied in a variety of engineering applications. [7] [8]

Definition

A radial function is a function . When paired with a metric on a vector space a function is said to be a radial kernel centered at . A Radial function and the associated radial kernels are said to be radial basis functions if, for any set of nodes

is non-singular. [9] [10]

Examples

Commonly used types of radial basis functions include (writing and using to indicate a shape parameter that can be used to scale the input of the radial kernel [11] ):

These radial basis functions are from and are strictly positive definite functions [12] that require tuning a shape parameter

  • Gaussian:
Gaussian function for several choices of
e
{\displaystyle \varepsilon } Gaussian function shape parameter.png
Gaussian function for several choices of
Plot of the scaled bump function with several choices of
e
{\displaystyle \varepsilon } Bump function shape.png
Plot of the scaled bump function with several choices of
  • Multiquadric:
  • Inverse quadratic:
  • Inverse multiquadric:

These RBFs are compactly supported and thus are non-zero only within a radius of , and thus have sparse differentiation matrices

Approximation

Radial basis functions are typically used to build up function approximations of the form

where the approximating function is represented as a sum of radial basis functions, each associated with a different center , and weighted by an appropriate coefficient The weights can be estimated using the matrix methods of linear least squares, because the approximating function is linear in the weights .

Approximation schemes of this kind have been particularly used[ citation needed ] in time series prediction and control of nonlinear systems exhibiting sufficiently simple chaotic behaviour and 3D reconstruction in computer graphics (for example, hierarchical RBF and Pose Space Deformation).

RBF Network

Two unnormalized Gaussian radial basis functions in one input dimension. The basis function centers are located at
x
1
=
0.75
{\textstyle x_{1}=0.75}
and
x
2
=
3.25
{\textstyle x_{2}=3.25}
. Unnormalized radial basis functions.svg
Two unnormalized Gaussian radial basis functions in one input dimension. The basis function centers are located at and .

The sum

can also be interpreted as a rather simple single-layer type of artificial neural network called a radial basis function network, with the radial basis functions taking on the role of the activation functions of the network. It can be shown that any continuous function on a compact interval can in principle be interpolated with arbitrary accuracy by a sum of this form, if a sufficiently large number of radial basis functions is used.

The approximant is differentiable with respect to the weights . The weights could thus be learned using any of the standard iterative methods for neural networks.

Using radial basis functions in this manner yields a reasonable interpolation approach provided that the fitting set has been chosen such that it covers the entire range systematically (equidistant data points are ideal). However, without a polynomial term that is orthogonal to the radial basis functions, estimates outside the fitting set tend to perform poorly. [ citation needed ]

See also

Related Research Articles

Dirac delta function Pseudo-function δ such that an integral of δ(x-c)f(x) always takes the value of f(c)

In mathematics, the Dirac delta function is a generalized function or distribution introduced by physicist Paul Dirac. It is called a function, although it is not a function on the level you would expect, that is, it is not a function RC, but a function on the space of test functions. It is used to model the density of an idealized point mass or point charge as a function equal to zero everywhere except for zero and whose integral over the entire real line is equal to one. As there is no function that has these properties, the computations made by theoretical physicists appeared to mathematicians as nonsense until the introduction of distributions by Laurent Schwartz to formalize and validate the computations. As a distribution, the Dirac delta function is a linear functional that maps every function to its value at zero. The Kronecker delta function, which is usually defined on a discrete domain and takes values 0 and 1, is a discrete analog of the Dirac delta function.

Support-vector machine Set of methods for supervised statistical learning

In machine learning, support-vector machines are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories by Vladimir Vapnik with colleagues, SVMs are one of the most robust prediction methods, being based on statistical learning frameworks or VC theory proposed by Vapnik and Chervonenkis (1974). Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier. SVM maps training examples to points in space so as to maximise the width of the gap between the two categories. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall.

Noethers theorem Statement relating differentiable symmetries to conserved quantities

Noether's theorem or Noether's first theorem states that every differentiable symmetry of the action of a physical system with conservative forces has a corresponding conservation law. The theorem was proven by mathematician Emmy Noether in 1915 and published in 1918, after a special case was proven by E. Cosserat and F. Cosserat in 1909. The action of a physical system is the integral over time of a Lagrangian function, from which the system's behavior can be determined by the principle of least action. This theorem only applies to continuous and smooth symmetries over physical space.

Poissons equation Expression frequently encountered in mathematical physics, generalization of Laplaces equation.

Poisson's equation is an elliptic partial differential equation of broad utility in theoretical physics. For example, the solution to Poisson's equation is the potential field caused by a given electric charge or mass density distribution; with the potential field known, one can then calculate electrostatic or gravitational (force) field. It is a generalization of Laplace's equation, which is also frequently seen in physics. The equation is named after French mathematician and physicist Siméon Denis Poisson.

Path integral formulation

The path integral formulation is a description in quantum mechanics that generalizes the action principle of classical mechanics. It replaces the classical notion of a single, unique classical trajectory for a system with a sum, or functional integral, over an infinity of quantum-mechanically possible trajectories to compute a quantum amplitude.

Geometrical optics, or ray optics, is a model of optics that describes light propagation in terms of rays. The ray in geometric optics is an abstraction useful for approximating the paths along which light propagates under certain circumstances.

In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model specifies that the output variable depends linearly on its own previous values and on a stochastic term ; thus the model is in the form of a stochastic difference equation. Together with the moving-average (MA) model, it is a special case and key component of the more general autoregressive–moving-average (ARMA) and autoregressive integrated moving average (ARIMA) models of time series, which have a more complicated stochastic structure; it is also a special case of the vector autoregressive model (VAR), which consists of a system of more than one interlocking stochastic difference equation in more than one evolving random variable.

In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from probability + unit. The purpose of the model is to estimate the probability that an observation with particular characteristics will fall into a specific one of the categories; moreover, classifying observations based on their predicted probabilities is a type of binary classification model.

Arc length Distance along a curve

Arc length is the distance between two points along a section of a curve.

Kernel method

In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). The general task of pattern analysis is to find and study general types of relations in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over pairs of data points in raw representation.

In applied mathematics, polyharmonic splines are used for function approximation and data interpolation. They are very useful for interpolating and fitting scattered data in many dimensions. Special cases include thin plate splines and natural cubic splines in one dimension.

In statistics and econometrics, the multivariate probit model is a generalization of the probit model used to estimate several correlated binary outcomes jointly. For example, if it is believed that the decisions of sending at least one child to public school and that of voting in favor of a school budget are correlated, then the multivariate probit model would be appropriate for jointly predicting these two choices on an individual-specific basis. J.R. Ashford and R.R. Sowden initially proposed an approach for multivariate probit analysis. Siddhartha Chib and Edward Greenberg extended this idea and also proposed simulation-based inference methods for the multivariate probit model which simplified and generalized parameter estimation.

In the field of mathematical modeling, a radial basis function network is an artificial neural network that uses radial basis functions as activation functions. The output of the network is a linear combination of radial basis functions of the inputs and neuron parameters. Radial basis function networks have many uses, including function approximation, time series prediction, classification, and system control. They were first formulated in a 1988 paper by Broomhead and Lowe, both researchers at the Royal Signals and Radar Establishment.

Hierarchical RBF

In computer graphics, a hierarchical RBF is an interpolation method based on Radial basis functions (RBF). Hierarchical RBF interpolation has applications in the construction of shape models in 3D computer graphics, treatment of results from a 3D scanner, terrain reconstruction, and others.

In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modelled as an nth degree polynomial in x. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y |x). Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function E(y | x) is linear in the unknown parameters that are estimated from the data. For this reason, polynomial regression is considered to be a special case of multiple linear regression.

In statistics and in machine learning, a linear predictor function is a linear function of a set of coefficients and explanatory variables, whose value is used to predict the outcome of a dependent variable. This sort of function usually comes in linear regression, where the coefficients are called regression coefficients. However, they also occur in various types of linear classifiers, as well as in various other models, such as principal component analysis and factor analysis. In many of these models, the coefficients are referred to as "weights".

The Kansa method is a computer method used to solve partial differential equations. Partial differential equations are mathematical models of things like stresses in a car's body, air flow around a wing, the shock wave in front of a supersonic airplane, quantum mechanical model of an atom, ocean waves, socio-economic models, digital image processing etc. The computer takes the known quantities such as pressure, temperature, air velocity, stress, and then uses the laws of physics to figure out what the rest of the quantities should be like a puzzle being fit together. Then, for example, the stresses in various parts of a car can be determined when that car hits a bump at 70 miles per hour.

In mathematics, singular integral operators of convolution type are the singular integral operators that arise on Rn and Tn through convolution by distributions; equivalently they are the singular integral operators that commute with translations. The classical examples in harmonic analysis are the harmonic conjugation operator on the circle, the Hilbert transform on the circle and the real line, the Beurling transform in the complex plane and the Riesz transforms in Euclidean space. The continuity of these operators on L2 is evident because the Fourier transform converts them into multiplication operators. Continuity on Lp spaces was first established by Marcel Riesz. The classical techniques include the use of Poisson integrals, interpolation theory and the Hardy–Littlewood maximal function. For more general operators, fundamental new techniques, introduced by Alberto Calderón and Antoni Zygmund in 1952, were developed by a number of authors to give general criteria for continuity on Lp spaces. This article explains the theory for the classical operators and sketches the subsequent general theory.

In machine learning, the radial basis function kernel, or RBF kernel, is a popular kernel function used in various kernelized learning algorithms. In particular, it is commonly used in support vector machine classification.

Radial basis function (RBF) interpolation is an advanced method in approximation theory for constructing high-order accurate interpolants of unstructured data, possibly in high-dimensional spaces. The interpolant takes the form of a weighted sum of radial basis functions. RBF interpolation is a mesh-free method, meaning the nodes need not lie on a structured grid, and does not require the formation of a mesh. It is often spectrally accurate and stable for large numbers of nodes even in high dimensions.

References

  1. Radial Basis Function networks Archived 2014-04-23 at the Wayback Machine
  2. Broomhead, David H.; Lowe, David (1988). "Multivariable Functional Interpolation and Adaptive Networks" (PDF). Complex Systems. 2: 321–355. Archived from the original (PDF) on 2014-07-14.
  3. Michael J. D. Powell (1977). "Restart procedures for the conjugate gradient method". Mathematical Programming . 12 (1): 241–254. doi:10.1007/bf01593790. S2CID   9500591.
  4. Sahin, Ferat (1997). A Radial Basis Function Approach to a Color Image Classification Problem in a Real Time Industrial Application (M.Sc.). Virginia Tech. p. 26. hdl:10919/36847. Radial basis functions were first introduced by Powell to solve the real multivariate interpolation problem.
  5. Broomhead & Lowe 1988 , p. 347: "We would like to thank Professor M.J.D. Powell at the Department of Applied Mathematics and Theoretical Physics at Cambridge University for providing the initial stimulus for this work."
  6. VanderPlas, Jake (6 May 2015). "Introduction to Support Vector Machines". [O'Reilly]. Retrieved 14 May 2015.
  7. Buhmann, Martin Dietrich (2003). Radial basis functions : theory and implementations. Cambridge University Press. ISBN   978-0511040207. OCLC   56352083.
  8. Biancolini, Marco Evangelos (2018). Fast radial basis functions for engineering applications. Springer International Publishing. ISBN   9783319750118. OCLC   1030746230.
  9. Fasshauer, Gregory E. (2007). Meshfree Approximation Methods with MATLAB. Singapore: World Scientific Publishing Co. Pte. Ltd. pp. 17–25. ISBN   9789812706331.
  10. Wendland, Holger (2005). Scattered Data Approximation. Cambridge: Cambridge University Press. pp. 11, 18–23, 64–66. ISBN   0521843359.
  11. Fasshauer, Gregory E. (2007). Meshfree Approximation Methods with MATLAB. Singapore: World Scientific Publishing Co. Pte. Ltd. p. 37. ISBN   9789812706331.
  12. Fasshauer, Gregory E. (2007). Meshfree Approximation Methods with MATLAB. Singapore: World Scientific Publishing Co. Pte. Ltd. pp. 37–45. ISBN   9789812706331.

Further reading