Hyper basis function network

Last updated October 05, 2023

In machine learning, a Hyper basis function network, or HyperBF network, is a generalization of radial basis function (RBF) networks concept, where the Mahalanobis-like distance is used instead of Euclidean distance measure. Hyper basis function networks were first introduced by Poggio and Girosi in the 1990 paper “Networks for Approximation and Learning”.^[1]^[2]

Network Architecture

The typical HyperBF network structure consists of a real input vector $x\in \mathbb {R} ^{n}$ , a hidden layer of activation functions and a linear output layer. The output of the network is a scalar function of the input vector, ${\displaystyle \phi$ , is given by

\phi (x)=\sum _{j=1}^{N}a_{j}\rho _{j}(||x-\mu _{j}||)

where $N$ is a number of neurons in the hidden layer, $\mu _{j}$ and $a_{j}$ are the center and weight of neuron $j$ . The activation function $\rho _{j}(||x-\mu _{j}||)$ at the HyperBF network takes the following form

\rho _{j}(||x-\mu _{j}||)=e^{(x-\mu _{j})^{T}R_{j}(x-\mu _{j})}

where $R_{j}$ is a positive definite $d\times d$ matrix. Depending on the application, the following types of matrices $R_{j}$ are usually considered^[3]

$R_{j}={\frac {1}{2\sigma ^{2}}}\mathbb {I} _{d\times d}$ , where $\sigma >0$ . This case corresponds to the regular RBF network.
$R_{j}={\frac {1}{2\sigma _{j}^{2}}}\mathbb {I} _{d\times d}$ , where $\sigma _{j}>0$ . In this case, the basis functions are radially symmetric, but are scaled with different width.
$R_{j}=diag\left({\frac {1}{2\sigma _{j1}^{2}}},...,{\frac {1}{2\sigma _{jz}^{2}}}\right)\mathbb {I} _{d\times d}$ , where $\sigma _{ji}>0$ . Every neuron has an elliptic shape with a varying size.
Positive definite matrix, but not diagonal.

Training

Training HyperBF networks involves estimation of weights $a_{j}$ , shape and centers of neurons $R_{j}$ and $\mu _{j}$ . Poggio and Girosi (1990) describe the training method with moving centers and adaptable neuron shapes. The outline of the method is provided below.

Consider the quadratic loss of the network $H[\phi ^{*}]=\sum _{i=1}^{N}(y_{i}-\phi ^{*}(x_{i}))^{2}$ . The following conditions must be satisfied at the optimum:

{\frac {\partial H(\phi ^{*})}{\partial a_{j}}}=0

,

{\frac {\partial H(\phi ^{*})}{\partial \mu _{j}}}=0

,

{\frac {\partial H(\phi ^{*})}{\partial W}}=0

where $R_{j}=W^{T}W$ . Then in the gradient descent method the values of $a_{j},\mu _{j},W$ that minimize $H[\phi ^{*}]$ can be found as a stable fixed point of the following dynamic system:

{\dot {a_{j}}}=-\omega {\frac {\partial H(\phi ^{*})}{\partial a_{j}}}

,

{\dot {\mu _{j}}}=-\omega {\frac {\partial H(\phi ^{*})}{\partial \mu _{j}}}

,

{\dot {W}}=-\omega {\frac {\partial H(\phi ^{*})}{\partial W}}

where $\omega$ determines the rate of convergence.

Overall, training HyperBF networks can be computationally challenging. Moreover, the high degree of freedom of HyperBF leads to overfitting and poor generalization. However, HyperBF networks have an important advantage that a small number of neurons is enough for learning complex functions.^[2]

Related Research Articles

In particle physics, the Dirac equation is a relativistic wave equation derived by British physicist Paul Dirac in 1928. In its free form, or including electromagnetic interactions, it describes all spin-1⁄2 massive particles, called "Dirac particles", such as electrons and quarks for which parity is a symmetry. It is consistent with both the principles of quantum mechanics and the theory of special relativity, and was the first theory to account fully for special relativity in the context of quantum mechanics. It was validated by accounting for the fine structure of the hydrogen spectrum in a completely rigorous way.

Noether's theorem or Noether's first theorem states that every differentiable symmetry of the action of a physical system with conservative forces has a corresponding conservation law. The theorem was proven by mathematician Emmy Noether in 1915 and published in 1918. The action of a physical system is the integral over time of a Lagrangian function, from which the system's behavior can be determined by the principle of least action. This theorem only applies to continuous and smooth symmetries over physical space.

In estimation theory and statistics, the Cramér–Rao bound (CRB) relates to estimation of a deterministic parameter. The result is named in honor of Harald Cramér and C. R. Rao, but has also been derived independently by Maurice Fréchet, Georges Darmois, and by Alexander Aitken and Harold Silverstone. It states that the precision of any unbiased estimator is at most the Fisher information; or (equivalently) the reciprocal of the Fisher information is a lower bound on its variance.

In physics, the Hamilton–Jacobi equation, named after William Rowan Hamilton and Carl Gustav Jacob Jacobi, is an alternative formulation of classical mechanics, equivalent to other formulations such as Newton's laws of motion, Lagrangian mechanics and Hamiltonian mechanics.

In mathematics, a Killing vector field, named after Wilhelm Killing, is a vector field on a Riemannian manifold that preserves the metric. Killing fields are the infinitesimal generators of isometries; that is, flows generated by Killing fields are continuous isometries of the manifold. More simply, the flow generates a symmetry, in the sense that moving each point of an object the same distance in the direction of the Killing vector will not distort distances on the object.

In general relativity, the Gibbons–Hawking–York boundary term is a term that needs to be added to the Einstein–Hilbert action when the underlying spacetime manifold has a boundary.

In physics, a sigma model is a field theory that describes the field as a point particle confined to move on a fixed manifold. This manifold can be taken to be any Riemannian manifold, although it is most commonly taken to be either a Lie group or a symmetric space. The model may or may not be quantized. An example of the non-quantized version is the Skyrme model; it cannot be quantized due to non-linearities of power greater than 4. In general, sigma models admit (classical) topological soliton solutions, for example, the Skyrmion for the Skyrme model. When the sigma field is coupled to a gauge field, the resulting model is described by Ginzburg–Landau theory. This article is primarily devoted to the classical field theory of the sigma model; the corresponding quantized theory is presented in the article titled "non-linear sigma model".

In theoretical physics, the Wess–Zumino model has become the first known example of an interacting four-dimensional quantum field theory with linearly realised supersymmetry. In 1974, Julius Wess and Bruno Zumino studied, using modern terminology, dynamics of a single chiral superfield whose cubic superpotential leads to a renormalizable theory.

In geometry, the elliptic coordinate system is a two-dimensional orthogonal coordinate system in which the coordinate lines are confocal ellipses and hyperbolae. The two foci $and are generally taken to be fixed at and, respectively, on the -axis of the Cartesian coordinate system.$

Toroidal coordinates are a three-dimensional orthogonal coordinate system that results from rotating the two-dimensional bipolar coordinate system about the axis that separates its two foci. Thus, the two foci $and in bipolar coordinates become a ring of radius in the plane of the toroidal coordinate system; the -axis is the axis of rotation. The focal ring is also known as the reference circle.$

Oblate spheroidal coordinates are a three-dimensional orthogonal coordinate system that results from rotating the two-dimensional elliptic coordinate system about the non-focal axis of the ellipse, i.e., the symmetry axis that separates the foci. Thus, the two foci are transformed into a ring of radius $in the x - y plane. Oblate spheroidal coordinates can also be considered as a limiting case of ellipsoidal coordinates in which the two largest semi-axes are equal in length.$

The folded normal distribution is a probability distribution related to the normal distribution. Given a normally distributed random variable X with mean μ and variance σ², the random variable Y = |X| has a folded normal distribution. Such a case may be encountered if only the magnitude of some variable is recorded, but not its sign. The distribution is called "folded" because probability mass to the left of x = 0 is folded over by taking the absolute value. In the physics of heat conduction, the folded normal distribution is a fundamental solution of the heat equation on the half space; it corresponds to having a perfect insulator on a hyperplane through the origin.

In differential geometry, normal coordinates at a point p in a differentiable manifold equipped with a symmetric affine connection are a local coordinate system in a neighborhood of p obtained by applying the exponential map to the tangent space at p. In a normal coordinate system, the Christoffel symbols of the connection vanish at the point p, thus often simplifying local calculations. In normal coordinates associated to the Levi-Civita connection of a Riemannian manifold, one can additionally arrange that the metric tensor is the Kronecker delta at the point p, and that the first partial derivatives of the metric at p vanish.

In mathematics — specifically, in stochastic analysis — the infinitesimal generator of a Feller process is a Fourier multiplier operator that encodes a great deal of information about the process.

Least-squares support-vector machines (LS-SVM) for statistics and in statistical modeling, are least-squares versions of support-vector machines (SVM), which are a set of related supervised learning methods that analyze data and recognize patterns, and which are used for classification and regression analysis. In this version one finds the solution by solving a set of linear equations instead of a convex quadratic programming (QP) problem for classical SVMs. Least-squares SVM classifiers were proposed by Johan Suykens and Joos Vandewalle. LS-SVMs are a class of kernel-based learning methods.

In probability theory, the rectified Gaussian distribution is a modification of the Gaussian distribution when its negative elements are reset to 0. It is essentially a mixture of a discrete distribution and a continuous distribution as a result of censoring.

Lagrangian field theory is a formalism in classical field theory. It is the field-theoretic analogue of Lagrangian mechanics. Lagrangian mechanics is used to analyze the motion of a system of discrete particles each with a finite number of degrees of freedom. Lagrangian field theory applies to continua and fields, which have an infinite number of degrees of freedom.

In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

Attempts have been made to describe gauge theories in terms of extended objects such as Wilson loops and holonomies. The loop representation is a quantum hamiltonian representation of gauge theories in terms of loops. The aim of the loop representation in the context of Yang–Mills theories is to avoid the redundancy introduced by Gauss gauge symmetries allowing to work directly in the space of physical states. The idea is well known in the context of lattice Yang–Mills theory. Attempts to explore the continuous loop representation was made by Gambini and Trias for canonical Yang–Mills theory, however there were difficulties as they represented singular objects. As we shall see the loop formalism goes far beyond a simple gauge invariant description, in fact it is the natural geometrical framework to treat gauge theories and quantum gravity in terms of their fundamental physical excitations.

Batch normalization is a method used to make training of artificial neural networks faster and more stable through normalization of the layers' inputs by re-centering and re-scaling. It was proposed by Sergey Ioffe and Christian Szegedy in 2015.

References

↑ T. Poggio and F. Girosi (1990). "Networks for Approximation and Learning". Proc. IEEEVol. 78, No. 9:1481-1497.
1 2 R.N. Mahdi, E.C. Rouchka (2011). "Reduced HyperBF Networks: Regularization by Explicit Complexity Reduction and Scaled Rprop-Based Training". IEEE Transactions of Neural Networks2:673–686.
↑ F. Schwenker, H.A. Kestler and G. Palm (2001). "Three Learning Phases for Radial-Basis-Function Network" Neural Netw.14:439-458.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[PoggioGirosi1990-1] T. Poggio and F. Girosi (1990). "Networks for Approximation and Learning". Proc. IEEEVol. 78, No. 9:1481-1497.

[Mahdi-2] 1 2 R.N. Mahdi, E.C. Rouchka (2011). "Reduced HyperBF Networks: Regularization by Explicit Complexity Reduction and Scaled Rprop-Based Training". IEEE Transactions of Neural Networks2:673–686.

[Schwenker-3] F. Schwenker, H.A. Kestler and G. Palm (2001). "Three Learning Phases for Radial-Basis-Function Network" Neural Netw.14:439-458.

[1]

[2]

[3]

Hyper basis function network

Contents

Network Architecture

Training

Related Research Articles

References