Radial basis function kernel

Last updated

In machine learning, the radial basis function kernel, or RBF kernel, is a popular kernel function used in various kernelized learning algorithms. In particular, it is commonly used in support vector machine classification. [1]

Contents

The RBF kernel on two samples and x', represented as feature vectors in some input space, is defined as [2]

may be recognized as the squared Euclidean distance between the two feature vectors. is a free parameter. An equivalent definition involves a parameter :

Since the value of the RBF kernel decreases with distance and ranges between zero (in the infinite-distance limit) and one (when x = x'), it has a ready interpretation as a similarity measure. [2] The feature space of the kernel has an infinite number of dimensions; for , its expansion using the multinomial theorem is: [3]

where ,

Approximations

Because support vector machines and other models employing the kernel trick do not scale well to large numbers of training samples or large numbers of features in the input space, several approximations to the RBF kernel (and similar kernels) have been introduced. [4] Typically, these take the form of a function z that maps a single vector to a vector of higher dimensionality, approximating the kernel:

where is the implicit mapping embedded in the RBF kernel.

Fourier random features


One way to construct such a z is to randomly sample from the Fourier transformation of the kernel [5]

where are independent samples from the normal distribution .

Theorem:

Proof: It suffices to prove the case of . Use the trigonometric identity , the spherical symmetry of gaussian distribution, then evaluate the integral

Theorem:. (Appendix A.2 [6] ).

Nyström method

Another approach uses the Nyström method to approximate the eigendecomposition of the Gram matrix K, using only a random sample of the training set. [7]

See also

Related Research Articles

<span class="mw-page-title-main">Pauli matrices</span> Matrices important in quantum mechanics and the study of spin

In mathematical physics and mathematics, the Pauli matrices are a set of three 2 × 2 complex matrices that are Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

<span class="mw-page-title-main">Uncertainty principle</span> Foundational principle in quantum physics

The uncertainty principle, also known as Heisenberg's indeterminacy principle, is a fundamental concept in quantum mechanics. It states that there is a limit to the precision with which certain pairs of physical properties, such as position and momentum, can be simultaneously known. In other words, the more accurately one property is measured, the less accurately the other property can be known.

In mechanics and geometry, the 3D rotation group, often denoted O(3), is the group of all rotations about the origin of three-dimensional Euclidean space under the operation of composition.

In mathematics, particularly in linear algebra, a skew-symmetricmatrix is a square matrix whose transpose equals its negative. That is, it satisfies the condition

<span class="mw-page-title-main">Green's function</span> Impulse response of an inhomogeneous linear differential operator

In mathematics, a Green's function is the impulse response of an inhomogeneous linear differential operator defined on a domain with specified initial conditions or boundary conditions.

<span class="mw-page-title-main">Stellar dynamics</span>

Stellar dynamics is the branch of astrophysics which describes in a statistical way the collective motions of stars subject to their mutual gravity. The essential difference from celestial mechanics is that the number of body

<span class="mw-page-title-main">Radon transform</span> Integral transform

In mathematics, the Radon transform is the integral transform which takes a function f defined on the plane to a function Rf defined on the (two-dimensional) space of lines in the plane, whose value at a particular line is equal to the line integral of the function over that line. The transform was introduced in 1917 by Johann Radon, who also provided a formula for the inverse transform. Radon further included formulas for the transform in three dimensions, in which the integral is taken over planes. It was later generalized to higher-dimensional Euclidean spaces and more broadly in the context of integral geometry. The complex analogue of the Radon transform is known as the Penrose transform. The Radon transform is widely applicable to tomography, the creation of an image from the projection data associated with cross-sectional scans of an object.

<span class="mw-page-title-main">Rabi cycle</span> Quantum mechanical phenomenon

In physics, the Rabi cycle is the cyclic behaviour of a two-level quantum system in the presence of an oscillatory driving field. A great variety of physical processes belonging to the areas of quantum computing, condensed matter, atomic and molecular physics, and nuclear and particle physics can be conveniently studied in terms of two-level quantum mechanical systems, and exhibit Rabi flopping when coupled to an optical driving field. The effect is important in quantum optics, magnetic resonance and quantum computing, and is named after Isidor Isaac Rabi.

In physics, the S-matrix or scattering matrix relates the initial state and the final state of a physical system undergoing a scattering process. It is used in quantum mechanics, scattering theory and quantum field theory (QFT).

<span class="mw-page-title-main">Reproducing kernel Hilbert space</span> In functional analysis, a Hilbert space

In functional analysis, a reproducing kernel Hilbert space (RKHS) is a Hilbert space of functions in which point evaluation is a continuous linear functional. Roughly speaking, this means that if two functions and in the RKHS are close in norm, i.e., is small, then and are also pointwise close, i.e., is small for all . The converse does not need to be true. Informally, this can be shown by looking at the supremum norm: the sequence of functions converges pointwise, but does not converge uniformly i.e. does not converge with respect to the supremum norm.

<span class="mw-page-title-main">Bloch sphere</span> Geometrical representation of the pure state space of a two-level quantum mechanical system

In quantum mechanics and computing, the Bloch sphere is a geometrical representation of the pure state space of a two-level quantum mechanical system (qubit), named after the physicist Felix Bloch.

<span class="mw-page-title-main">LSZ reduction formula</span> Connection between correlation functions and the S-matrix

In quantum field theory, the Lehmann–Symanzik–Zimmermann (LSZ) reduction formula is a method to calculate S-matrix elements from the time-ordered correlation functions of a quantum field theory. It is a step of the path that starts from the Lagrangian of some quantum field theory and leads to prediction of measurable quantities. It is named after the three German physicists Harry Lehmann, Kurt Symanzik and Wolfhart Zimmermann.

In mathematics, the Fubini–Study metric is a Kähler metric on a complex projective space CPn endowed with a Hermitian form. This metric was originally described in 1904 and 1905 by Guido Fubini and Eduard Study.

In solid-state physics, the tight-binding model is an approach to the calculation of electronic band structure using an approximate set of wave functions based upon superposition of wave functions for isolated atoms located at each atomic site. The method is closely related to the LCAO method used in chemistry. Tight-binding models are applied to a wide variety of solids. The model gives good qualitative results in many cases and can be combined with other models that give better results where the tight-binding model fails. Though the tight-binding model is a one-electron model, the model also provides a basis for more advanced calculations like the calculation of surface states and application to various kinds of many-body problem and quasiparticle calculations.

<span class="mw-page-title-main">Jaynes–Cummings model</span> Model in quantum optics

The Jaynes–Cummings model is a theoretical model in quantum optics. It describes the system of a two-level atom interacting with a quantized mode of an optical cavity, with or without the presence of light. It was originally developed to study the interaction of atoms with the quantized electromagnetic field in order to investigate the phenomena of spontaneous emission and absorption of photons in a cavity.

In physics and mathematics, the solid harmonics are solutions of the Laplace equation in spherical polar coordinates, assumed to be (smooth) functions . There are two kinds: the regular solid harmonics, which are well-defined at the origin and the irregular solid harmonics, which are singular at the origin. Both sets of functions play an important role in potential theory, and are obtained by rescaling spherical harmonics appropriately:

In functional analysis, the dual norm is a measure of size for a continuous linear function defined on a normed vector space.

In mathematics, vector spherical harmonics (VSH) are an extension of the scalar spherical harmonics for use with vector fields. The components of the VSH are complex-valued functions expressed in the spherical coordinate basis vectors.

<span class="mw-page-title-main">Polynomial kernel</span> Machine learning kernel function

In machine learning, the polynomial kernel is a kernel function commonly used with support vector machines (SVMs) and other kernelized models, that represents the similarity of vectors in a feature space over polynomials of the original variables, allowing learning of non-linear models.

In pure and applied mathematics, quantum mechanics and computer graphics, a tensor operator generalizes the notion of operators which are scalars and vectors. A special class of these are spherical tensor operators which apply the notion of the spherical basis and spherical harmonics. The spherical basis closely relates to the description of angular momentum in quantum mechanics and spherical harmonic functions. The coordinate-free generalization of a tensor operator is known as a representation operator.

References

  1. Chang, Yin-Wen; Hsieh, Cho-Jui; Chang, Kai-Wei; Ringgaard, Michael; Lin, Chih-Jen (2010). "Training and testing low-degree polynomial data mappings via linear SVM". Journal of Machine Learning Research. 11: 1471–1490.
  2. 1 2 Jean-Philippe Vert, Koji Tsuda, and Bernhard Schölkopf (2004). "A primer on kernel methods". Kernel Methods in Computational Biology.
  3. Shashua, Amnon (2009). "Introduction to Machine Learning: Class Notes 67577". arXiv: 0904.3664v1 [cs.LG].
  4. Andreas Müller (2012). Kernel Approximations for Efficient SVMs (and other feature extraction methods).
  5. Rahimi, Ali; Recht, Benjamin (2007). "Random Features for Large-Scale Kernel Machines". Advances in Neural Information Processing Systems. Curran Associates, Inc. 20.
  6. Peng, Hao; Pappas, Nikolaos; Yogatama, Dani; Schwartz, Roy; Smith, Noah A.; Kong, Lingpeng (2021-03-19). "Random Feature Attention". arXiv: 2103.02143 [cs.CL].
  7. C.K.I. Williams; M. Seeger (2001). "Using the Nyström method to speed up kernel machines". Advances in Neural Information Processing Systems. 13.