Kernel Fisher discriminant analysis

Last updated July 22, 2022

In statistics, kernel Fisher discriminant analysis (KFD),^[1] also known as generalized discriminant analysis^[2] and kernel discriminant analysis,^[3] is a kernelized version of linear discriminant analysis (LDA). It is named after Ronald Fisher.

Linear discriminant analysis

Intuitively, the idea of LDA is to find a projection where class separation is maximized. Given two sets of labeled data, $\mathbf {C} _{1}$ and $\mathbf {C} _{2}$ , we can calculate the mean value of each class, $\mathbf {m} _{1}$ and $\mathbf {m} _{2}$ , as

\mathbf {m} _{i}={\frac {1}{l_{i}}}\sum _{n=1}^{l_{i}}\mathbf {x} _{n}^{i},

where $l_{i}$ is the number of examples of class $\mathbf {C} _{i}$ . The goal of linear discriminant analysis is to give a large separation of the class means while also keeping the in-class variance small.^[4] This is formulated as maximizing, with respect to $\mathbf {w}$ , the following ratio:

J(\mathbf {w} )={\frac {\mathbf {w} ^{\text{T}}\mathbf {S} _{B}\mathbf {w} }{\mathbf {w} ^{\text{T}}\mathbf {S} _{W}\mathbf {w} }},

where $\mathbf {S} _{B}$ is the between-class covariance matrix and $\mathbf {S} _{W}$ is the total within-class covariance matrix:

{\begin{aligned}\mathbf {S} _{B}&=(\mathbf {m} _{2}-\mathbf {m} _{1})(\mathbf {m} _{2}-\mathbf {m} _{1})^{\text{T}}\\\mathbf {S} _{W}&=\sum _{i=1,2}\sum _{n=1}^{l_{i}}(\mathbf {x} _{n}^{i}-\mathbf {m} _{i})(\mathbf {x} _{n}^{i}-\mathbf {m} _{i})^{\text{T}}.\end{aligned}}

The maximum of the above ratio is attained at

\mathbf {w} \propto \mathbf {S} _{W}^{-1}(\mathbf {m} _{2}-\mathbf {m} _{1}).

as can be shown by the Lagrange multiplier method (sketch of proof):

Maximizing $J(\mathbf {w} )={\frac {\mathbf {w} ^{\text{T}}\mathbf {S} _{B}\mathbf {w} }{\mathbf {w} ^{\text{T}}\mathbf {S} _{W}\mathbf {w} }}$ is equivalent to maximizing

\mathbf {w} ^{\text{T}}\mathbf {S} _{B}\mathbf {w}

subject to

\mathbf {w} ^{\text{T}}\mathbf {S} _{W}\mathbf {w} =1.

This, in turn, is equivalent to maximizing $I(\mathbf {w} ,\lambda )=\mathbf {w} ^{\text{T}}\mathbf {S} _{B}\mathbf {w} -\lambda (\mathbf {w} ^{\text{T}}\mathbf {S} _{W}\mathbf {w} -1)$ , where $\lambda$ is the Lagrange multiplier.

At the maximum, the derivatives of $I(\mathbf {w} ,\lambda )$ with respect to $\mathbf {w}$ and $\lambda$ must be zero. Taking ${\frac {dI}{d\mathbf {w} }}=\mathbf {0}$ yields

\mathbf {S} _{B}\mathbf {w} -\lambda \mathbf {S} _{W}\mathbf {w} =\mathbf {0} ,

which is trivially satisfied by $\mathbf {w} =c\mathbf {S} _{W}^{-1}(\mathbf {m} _{2}-\mathbf {m} _{1})$ and $\lambda =(\mathbf {m} _{2}-\mathbf {m} _{1})^{\text{T}}\mathbf {S} _{W}^{-1}(\mathbf {m} _{2}-\mathbf {m} _{1}).$

Extending LDA

To extend LDA to non-linear mappings, the data, given as the $\ell$ points $\mathbf {x} _{i},$ can be mapped to a new feature space, $F,$ via some function $\phi .$ In this new feature space, the function that needs to be maximized is^[1]

J(\mathbf {w} )={\frac {\mathbf {w} ^{\text{T}}\mathbf {S} _{B}^{\phi }\mathbf {w} }{\mathbf {w} ^{\text{T}}\mathbf {S} _{W}^{\phi }\mathbf {w} }},

where

{\begin{aligned}\mathbf {S} _{B}^{\phi }&=\left(\mathbf {m} _{2}^{\phi }-\mathbf {m} _{1}^{\phi }\right)\left(\mathbf {m} _{2}^{\phi }-\mathbf {m} _{1}^{\phi }\right)^{\text{T}}\\\mathbf {S} _{W}^{\phi }&=\sum _{i=1,2}\sum _{n=1}^{l_{i}}\left(\phi (\mathbf {x} _{n}^{i})-\mathbf {m} _{i}^{\phi }\right)\left(\phi (\mathbf {x} _{n}^{i})-\mathbf {m} _{i}^{\phi }\right)^{\text{T}},\end{aligned}}

and

\mathbf {m} _{i}^{\phi }={\frac {1}{l_{i}}}\sum _{j=1}^{l_{i}}\phi (\mathbf {x} _{j}^{i}).

Further, note that $\mathbf {w} \in F$ . Explicitly computing the mappings $\phi (\mathbf {x} _{i})$ and then performing LDA can be computationally expensive, and in many cases intractable. For example, $F$ may be infinitely dimensional. Thus, rather than explicitly mapping the data to $F$ , the data can be implicitly embedded by rewriting the algorithm in terms of dot products and using kernel functions in which the dot product in the new feature space is replaced by a kernel function, $k(\mathbf {x} ,\mathbf {y} )=\phi (\mathbf {x} )\cdot \phi (\mathbf {y} )$ .

LDA can be reformulated in terms of dot products by first noting that $\mathbf {w}$ will have an expansion of the form^[5]

\mathbf {w} =\sum _{i=1}^{l}\alpha _{i}\phi (\mathbf {x} _{i}).

Then note that

\mathbf {w} ^{\text{T}}\mathbf {m} _{i}^{\phi }={\frac {1}{l_{i}}}\sum _{j=1}^{l}\sum _{k=1}^{l_{i}}\alpha _{j}k\left(\mathbf {x} _{j},\mathbf {x} _{k}^{i}\right)=\mathbf {\alpha } ^{\text{T}}\mathbf {M} _{i},

where

(\mathbf {M} _{i})_{j}={\frac {1}{l_{i}}}\sum _{k=1}^{l_{i}}k(\mathbf {x} _{j},\mathbf {x} _{k}^{i}).

The numerator of $J(\mathbf {w} )$ can then be written as:

\mathbf {w} ^{\text{T}}\mathbf {S} _{B}^{\phi }\mathbf {w} =\mathbf {w} ^{\text{T}}\left(\mathbf {m} _{2}^{\phi }-\mathbf {m} _{1}^{\phi }\right)\left(\mathbf {m} _{2}^{\phi }-\mathbf {m} _{1}^{\phi }\right)^{\text{T}}\mathbf {w} =\mathbf {\alpha } ^{\text{T}}\mathbf {M} \mathbf {\alpha } ,\qquad {\text{where}}\qquad \mathbf {M} =(\mathbf {M} _{2}-\mathbf {M} _{1})(\mathbf {M} _{2}-\mathbf {M} _{1})^{\text{T}}.

Similarly, the denominator can be written as

\mathbf {w} ^{\text{T}}\mathbf {S} _{W}^{\phi }\mathbf {w} =\mathbf {\alpha } ^{\text{T}}\mathbf {N} \mathbf {\alpha } ,\qquad {\text{where}}\qquad \mathbf {N} =\sum _{j=1,2}\mathbf {K} _{j}(\mathbf {I} -\mathbf {1} _{l_{j}})\mathbf {K} _{j}^{\text{T}},

with the $n^{\text{th}},m^{\text{th}}$ component of $\mathbf {K} _{j}$ defined as $k(\mathbf {x} _{n},\mathbf {x} _{m}^{j}),\mathbf {I}$ is the identity matrix, and $\mathbf {1} _{l_{j}}$ the matrix with all entries equal to $1/l_{j}$ . This identity can be derived by starting out with the expression for $\mathbf {w} ^{\text{T}}\mathbf {S} _{W}^{\phi }\mathbf {w}$ and using the expansion of $\mathbf {w}$ and the definitions of $\mathbf {S} _{W}^{\phi }$ and $\mathbf {m} _{i}^{\phi }$

{\begin{aligned}\mathbf {w} ^{\text{T}}\mathbf {S} _{W}^{\phi }\mathbf {w} &=\left(\sum _{i=1}^{l}\alpha _{i}\phi ^{\text{T}}(\mathbf {x} _{i})\right)\left(\sum _{j=1,2}\sum _{n=1}^{l_{j}}\left(\phi (\mathbf {x} _{n}^{j})-\mathbf {m} _{j}^{\phi }\right)\left(\phi (\mathbf {x} _{n}^{j})-\mathbf {m} _{j}^{\phi }\right)^{\text{T}}\right)\left(\sum _{k=1}^{l}\alpha _{k}\phi (\mathbf {x} _{k})\right)\\&=\sum _{j=1,2}\sum _{i=1}^{l}\sum _{n=1}^{l_{j}}\sum _{k=1}^{l}\left(\alpha _{i}\phi ^{\text{T}}(\mathbf {x} _{i})\left(\phi (\mathbf {x} _{n}^{j})-\mathbf {m} _{j}^{\phi }\right)\left(\phi (\mathbf {x} _{n}^{j})-\mathbf {m} _{j}^{\phi }\right)^{\text{T}}\alpha _{k}\phi (\mathbf {x} _{k})\right)\\&=\sum _{j=1,2}\sum _{i=1}^{l}\sum _{n=1}^{l_{j}}\sum _{k=1}^{l}\left(\alpha _{i}k(\mathbf {x} _{i},\mathbf {x} _{n}^{j})-{\frac {1}{l_{j}}}\sum _{p=1}^{l_{j}}\alpha _{i}k(\mathbf {x} _{i},\mathbf {x} _{p}^{j})\right)\left(\alpha _{k}k(\mathbf {x} _{k},\mathbf {x} _{n}^{j})-{\frac {1}{l_{j}}}\sum _{q=1}^{l_{j}}\alpha _{k}k(\mathbf {x} _{k},\mathbf {x} _{q}^{j})\right)\\&=\sum _{j=1,2}\left(\sum _{i=1}^{l}\sum _{n=1}^{l_{j}}\sum _{k=1}^{l}\left(\alpha _{i}\alpha _{k}k(\mathbf {x} _{i},\mathbf {x} _{n}^{j})k(\mathbf {x} _{k},\mathbf {x} _{n}^{j})-{\frac {2\alpha _{i}\alpha _{k}}{l_{j}}}\sum _{p=1}^{l_{j}}k(\mathbf {x} _{i},\mathbf {x} _{n}^{j})k(\mathbf {x} _{k},\mathbf {x} _{p}^{j})+{\frac {\alpha _{i}\alpha _{k}}{l_{j}^{2}}}\sum _{p=1}^{l_{j}}\sum _{q=1}^{l_{j}}k(\mathbf {x} _{i},\mathbf {x} _{p}^{j})k(\mathbf {x} _{k},\mathbf {x} _{q}^{j})\right)\right)\\&=\sum _{j=1,2}\left(\sum _{i=1}^{l}\sum _{n=1}^{l_{j}}\sum _{k=1}^{l}\left(\alpha _{i}\alpha _{k}k(\mathbf {x} _{i},\mathbf {x} _{n}^{j})k(\mathbf {x} _{k},\mathbf {x} _{n}^{j})-{\frac {\alpha _{i}\alpha _{k}}{l_{j}}}\sum _{p=1}^{l_{j}}k(\mathbf {x} _{i},\mathbf {x} _{n}^{j})k(\mathbf {x} _{k},\mathbf {x} _{p}^{j})\right)\right)\\[6pt]&=\sum _{j=1,2}\mathbf {\alpha } ^{\text{T}}\mathbf {K} _{j}\mathbf {K} _{j}^{\text{T}}\mathbf {\alpha } -\mathbf {\alpha } ^{\text{T}}\mathbf {K} _{j}\mathbf {1} _{l_{j}}\mathbf {K} _{j}^{\text{T}}\mathbf {\alpha } \\[4pt]&=\mathbf {\alpha } ^{\text{T}}\mathbf {N} \mathbf {\alpha } .\end{aligned}}

With these equations for the numerator and denominator of $J(\mathbf {w} )$ , the equation for $J$ can be rewritten as

J(\mathbf {\alpha } )={\frac {\mathbf {\alpha } ^{\text{T}}\mathbf {M} \mathbf {\alpha } }{\mathbf {\alpha } ^{\text{T}}\mathbf {N} \mathbf {\alpha } }}.

Then, differentiating and setting equal to zero gives

(\mathbf {\alpha } ^{\text{T}}\mathbf {M} \mathbf {\alpha } )\mathbf {N} \mathbf {\alpha } =(\mathbf {\alpha } ^{\text{T}}\mathbf {N} \mathbf {\alpha } )\mathbf {M} \mathbf {\alpha } .

Since only the direction of $\mathbf {w}$ , and hence the direction of $\mathbf {\alpha } ,$ matters, the above can be solved for $\mathbf {\alpha }$ as

\mathbf {\alpha } =\mathbf {N} ^{-1}(\mathbf {M} _{2}-\mathbf {M} _{1}).

Note that in practice, $\mathbf {N}$ is usually singular and so a multiple of the identity is added to it ^[1]

\mathbf {N} _{\epsilon }=\mathbf {N} +\epsilon \mathbf {I} .

Given the solution for $\mathbf {\alpha }$ , the projection of a new data point is given by^[1]

y(\mathbf {x} )=(\mathbf {w} \cdot \phi (\mathbf {x} ))=\sum _{i=1}^{l}\alpha _{i}k(\mathbf {x} _{i},\mathbf {x} ).

Multi-class KFD

The extension to cases where there are more than two classes is relatively straightforward.^[2]^[6]^[7] Let $c$ be the number of classes. Then multi-class KFD involves projecting the data into a $(c-1)$ -dimensional space using $(c-1)$ discriminant functions

y_{i}=\mathbf {w} _{i}^{\text{T}}\phi (\mathbf {x} )\qquad i=1,\ldots ,c-1.

This can be written in matrix notation

\mathbf {y} =\mathbf {W} ^{\text{T}}\phi (\mathbf {x} ),

where the $\mathbf {w} _{i}$ are the columns of $\mathbf {W}$ .^[6] Further, the between-class covariance matrix is now

\mathbf {S} _{B}^{\phi }=\sum _{i=1}^{c}l_{i}(\mathbf {m} _{i}^{\phi }-\mathbf {m} ^{\phi })(\mathbf {m} _{i}^{\phi }-\mathbf {m} ^{\phi })^{\text{T}},

where $\mathbf {m} ^{\phi }$ is the mean of all the data in the new feature space. The within-class covariance matrix is

\mathbf {S} _{W}^{\phi }=\sum _{i=1}^{c}\sum _{n=1}^{l_{i}}(\phi (\mathbf {x} _{n}^{i})-\mathbf {m} _{i}^{\phi })(\phi (\mathbf {x} _{n}^{i})-\mathbf {m} _{i}^{\phi })^{\text{T}},

The solution is now obtained by maximizing

J(\mathbf {W} )={\frac {\left|\mathbf {W} ^{\text{T}}\mathbf {S} _{B}^{\phi }\mathbf {W} \right|}{\left|\mathbf {W} ^{\text{T}}\mathbf {S} _{W}^{\phi }\mathbf {W} \right|}}.

The kernel trick can again be used and the goal of multi-class KFD becomes^[7]

\mathbf {A} ^{*}={\underset {\mathbf {A} }{\operatorname {argmax} }}={\frac {\left|\mathbf {A} ^{\text{T}}\mathbf {M} \mathbf {A} \right|}{\left|\mathbf {A} ^{\text{T}}\mathbf {N} \mathbf {A} \right|}},

where $A=[\mathbf {\alpha } _{1},\ldots ,\mathbf {\alpha } _{c-1}]$ and

{\begin{aligned}M&=\sum _{j=1}^{c}l_{j}(\mathbf {M} _{j}-\mathbf {M} _{*})(\mathbf {M} _{j}-\mathbf {M} _{*})^{\text{T}}\\N&=\sum _{j=1}^{c}\mathbf {K} _{j}(\mathbf {I} -\mathbf {1} _{l_{j}})\mathbf {K} _{j}^{\text{T}}.\end{aligned}}

The $\mathbf {M} _{i}$ are defined as in the above section and $\mathbf {M} _{*}$ is defined as

(\mathbf {M} _{*})_{j}={\frac {1}{l}}\sum _{k=1}^{l}k(\mathbf {x} _{j},\mathbf {x} _{k}).

$\mathbf {A} ^{*}$ can then be computed by finding the $(c-1)$ leading eigenvectors of $\mathbf {N} ^{-1}\mathbf {M}$ .^[7] Furthermore, the projection of a new input, $\mathbf {x} _{t}$ , is given by^[7]

\mathbf {y} (\mathbf {x} _{t})=\left(\mathbf {A} ^{*}\right)^{\text{T}}\mathbf {K} _{t},

where the $i^{th}$ component of $\mathbf {K} _{t}$ is given by $k(\mathbf {x} _{i},\mathbf {x} _{t})$ .

Classification using KFD

In both two-class and multi-class KFD, the class label of a new input can be assigned as^[7]

f(\mathbf {x} )=arg\min _{j}D(\mathbf {y} (\mathbf {x} ),{\bar {\mathbf {y} }}_{j}),

where ${\bar {\mathbf {y} }}_{j}$ is the projected mean for class $j$ and $D(\cdot ,\cdot )$ is a distance function.

Applications

Kernel discriminant analysis has been used in a variety of applications. These include:

Face recognition ^[3]^[8]^[9] and detection^[10]^[11]
Hand-written digit recognition^[1]^[12]
Palmprint recognition^[13]
Classification of malignant and benign cluster microcalcifications^[14]
Seed classification^[2]
Search for the Higgs Boson at CERN^[15]

Related Research Articles

In physics, the cross section is a measure of the probability that a specific process will take place when some kind of radiant excitation intersects a localized phenomenon. For example, the Rutherford cross-section is a measure of probability that an alpha particle will be deflected by a given angle during an interaction with an atomic nucleus. Cross section is typically denoted $σ$ (sigma) and is expressed in units of area, more specifically in barns. In a way, it can be thought of as the size of the object that the excitation must hit in order for the process to occur, but more exactly, it is a parameter of a stochastic process.

In mathematics, the Dirac delta distribution, also known as the unit impulse, is a generalized function or distribution over the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire real line is equal to one.

The stress–energy tensor, sometimes called the stress–energy–momentum tensor or the energy–momentum tensor, is a tensor physical quantity that describes the density and flux of energy and momentum in spacetime, generalizing the stress tensor of Newtonian physics. It is an attribute of matter, radiation, and non-gravitational force fields. This density and flux of energy and momentum are the sources of the gravitational field in the Einstein field equations of general relativity, just as mass density is the source of such a field in Newtonian gravity.

Heat equation Type of partial differential equation

In mathematics and physics, the heat equation is a certain partial differential equation. Solutions of the heat equation are sometimes known as caloric functions. The theory of the heat equation was first developed by Joseph Fourier in 1822 for the purpose of modeling how a quantity such as heat diffuses through a given region.

In mechanics and geometry, the 3D rotation group, often denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space $under the operation of composition.$

In the calculus of variations, a field of mathematical analysis, the functional derivative relates a change in a functional to a change in a function on which the functional depends.

In physics, the S-matrix or scattering matrix relates the initial state and the final state of a physical system undergoing a scattering process. It is used in quantum mechanics, scattering theory and quantum field theory (QFT).

In the theory of stochastic processes, the Karhunen–Loève theorem, also known as the Kosambi–Karhunen–Loève theorem is a representation of a stochastic process as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.

In physics, the Hamilton–Jacobi equation, named after William Rowan Hamilton and Carl Gustav Jacob Jacobi, is an alternative formulation of classical mechanics, equivalent to other formulations such as Newton's laws of motion, Lagrangian mechanics and Hamiltonian mechanics. The Hamilton–Jacobi equation is particularly useful in identifying conserved quantities for mechanical systems, which may be possible even when the mechanical problem itself cannot be solved completely.

In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a graph or a discrete grid. For the case of a finite-dimensional graph, the discrete Laplace operator is more commonly called the Laplacian matrix.

In geometry, various formalisms exist to express a rotation in three dimensions as a mathematical transformation. In physics, this concept is applied to classical mechanics where rotational kinematics is the science of quantitative description of a purely rotational motion. The orientation of an object at a given instant is described with the same tools, as it is defined as an imaginary rotation from a reference placement in space, rather than an actually observed rotation from a previous placement in space.

In probability theory and statistics, the Dirichlet-multinomial distribution is a family of discrete multivariate probability distributions on a finite support of non-negative integers. It is also called the Dirichlet compound multinomial distribution (DCM) or multivariate Pólya distribution. It is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector $, and an observation drawn from a multinomial distribution with probability vector p and number of trials n . The Dirichlet parameter vector captures the prior belief about the situation and can be seen as a pseudocount: observations of each outcome that occur before the actual data is collected. The compounding corresponds to a Pólya urn scheme. It is frequently encountered in Bayesian statistics, machine learning, empirical Bayes methods and classical statistics as an overdispersed multinomial distribution.$

In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solving integral operator equations. Since then, positive-definite functions and their various analogues and generalizations have arisen in diverse parts of mathematics. They occur naturally in Fourier analysis, probability theory, operator theory, complex function-theory, moment problems, integral equations, boundary-value problems for partial differential equations, machine learning, embedding problem, information theory, and other areas.

A flavor of the k·p perturbation theory used for calculating the structure of multiple, degenerate electronic bands in bulk and quantum well semiconductors. The method is a generalization of the single band k·p theory.

Least-squares support-vector machines (LS-SVM) for statistics and in statistical modeling, are least-squares versions of support-vector machines (SVM), which are a set of related supervised learning methods that analyze data and recognize patterns, and which are used for classification and regression analysis. In this version one finds the solution by solving a set of linear equations instead of a convex quadratic programming (QP) problem for classical SVMs. Least-squares SVM classifiers were proposed by Johan Suykens and Joos Vandewalle. LS-SVMs are a class of kernel-based learning methods.

Multipole radiation is a theoretical framework for the description of electromagnetic or gravitational radiation from time-dependent distributions of distant sources. These tools are applied to physical phenomena which occur at a variety of length scales - from gravitational waves due to galaxy collisions to gamma radiation resulting from nuclear decay. Multipole radiation is analyzed using similar multipole expansion techniques that describe fields from static sources, however there are important differences in the details of the analysis because multipole radiation fields behave quite differently from static fields. This article is primarily concerned with electromagnetic multipole radiation, although the treatment of gravitational waves is similar.

In mathematics, the Kodaira–Spencer map, introduced by Kunihiko Kodaira and Donald C. Spencer, is a map associated to a deformation of a scheme or complex manifold X, taking a tangent space of a point of the deformation space to the first cohomology group of the sheaf of vector fields on X.

In pure and applied mathematics, quantum mechanics and computer graphics, a tensor operator generalizes the notion of operators which are scalars and vectors. A special class of these are spherical tensor operators which apply the notion of the spherical basis and spherical harmonics. The spherical basis closely relates to the description of angular momentum in quantum mechanics and spherical harmonic functions. The coordinate-free generalization of a tensor operator is known as a representation operator.

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space $on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in, discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song, Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.$

Regularized least squares (RLS) is a family of methods for solving the least-squares problem while using regularization to further constrain the resulting solution.

References

1 2 3 4 5 Mika, S; Rätsch, G.; Weston, J.; Schölkopf, B.; Müller, KR (1999). Fisher discriminant analysis with kernels. Neural Networks for Signal Processing. Vol. IX. pp. 41–48. CiteSeerX 10.1.1.35.9904 . doi:10.1109/NNSP.1999.788121. ISBN 978-0-7803-5673-3. S2CID 8473401.
1 2 3 Baudat, G.; Anouar, F. (2000). "Generalized discriminant analysis using a kernel approach". Neural Computation. 12 (10): 2385–2404. CiteSeerX 10.1.1.412.760 . doi:10.1162/089976600300014980. PMID 11032039. S2CID 7036341.
1 2 Li, Y.; Gong, S.; Liddell, H. (2003). "Recognising trajectories of facial identities using kernel discriminant analysis". Image and Vision Computing. 21 (13–14): 1077–1086. CiteSeerX 10.1.1.2.6315 . doi:10.1016/j.imavis.2003.08.010.
↑ Bishop, CM (2006). Pattern Recognition and Machine Learning. New York, NY: Springer.
↑ Scholkopf, B; Herbrich, R.; Smola, A. (2001). A generalized representer theorem. Computational Learning Theory. Lecture Notes in Computer Science. Vol. 2111. pp. 416–426. CiteSeerX 10.1.1.42.8617 . doi:10.1007/3-540-44581-1_27. ISBN 978-3-540-42343-0.
1 2 Duda, R.; Hart, P.; Stork, D. (2001). Pattern Classification. New York, NY: Wiley.
1 2 3 4 5 Zhang, J.; Ma, K.K. (2004). "Kernel fisher discriminant for texture classification".{{cite journal}}: Cite journal requires |journal= (help)
↑ Liu, Q.; Lu, H.; Ma, S. (2004). "Improving kernel Fisher discriminant analysis for face recognition". IEEE Transactions on Circuits and Systems for Video Technology. 14 (1): 42–49. doi:10.1109/tcsvt.2003.818352. S2CID 39657721.
↑ Liu, Q.; Huang, R.; Lu, H.; Ma, S. (2002). "Face recognition using kernel-based Fisher discriminant analysis". IEEE International Conference on Automatic Face and Gesture Recognition.
↑ Kurita, T.; Taguchi, T. (2002). A modification of kernel-based Fisher discriminant analysis for face detection. IEEE International Conference on Automatic Face and Gesture Recognition. pp. 300–305. CiteSeerX 10.1.1.100.3568 . doi:10.1109/AFGR.2002.1004170. ISBN 978-0-7695-1602-8. S2CID 7581426.
↑ Feng, Y.; Shi, P. (2004). "Face detection based on kernel fisher discriminant analysis". IEEE International Conference on Automatic Face and Gesture Recognition.
↑ Yang, J.; Frangi, AF; Yang, JY; Zang, D., Jin, Z. (2005). "KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition". IEEE Transactions on Pattern Analysis and Machine Intelligence. 27 (2): 230–244. CiteSeerX 10.1.1.330.1179 . doi:10.1109/tpami.2005.33. PMID 15688560. S2CID 9771368.{{cite journal}}: CS1 maint: multiple names: authors list (link)
↑ Wang, Y.; Ruan, Q. (2006). "Kernel fisher discriminant analysis for palmprint recognition". International Conference on Pattern Recognition.
↑ Wei, L.; Yang, Y.; Nishikawa, R.M.; Jiang, Y. (2005). "A study on several machine-learning methods for classification of malignant and benign clustered microcalcifications". IEEE Transactions on Medical Imaging. 24 (3): 371–380. doi:10.1109/tmi.2004.842457. PMID 15754987. S2CID 36691320.
↑ Malmgren, T. (1997). "An iterative nonlinear discriminant analysis program: IDA 1.0". Computer Physics Communications. 106 (3): 230–236. Bibcode:1997CoPhC.106..230M. doi:10.1016/S0010-4655(97)00100-8.

External links

Kernel Discriminant Analysis in C# - C# code to perform KFD.
Matlab Toolbox for Dimensionality Reduction - Includes a method for performing KFD.
Handwriting Recognition using Kernel Discriminant Analysis - C# code that demonstrates handwritten digit recognition using KFD.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[flda-1] 1 2 3 4 5 Mika, S; Rätsch, G.; Weston, J.; Schölkopf, B.; Müller, KR (1999). Fisher discriminant analysis with kernels. Neural Networks for Signal Processing. Vol. IX. pp. 41–48. CiteSeerX 10.1.1.35.9904 . doi:10.1109/NNSP.1999.788121. ISBN 978-0-7803-5673-3. S2CID 8473401.

[gda-2] 1 2 3 Baudat, G.; Anouar, F. (2000). "Generalized discriminant analysis using a kernel approach". Neural Computation. 12 (10): 2385–2404. CiteSeerX 10.1.1.412.760 . doi:10.1162/089976600300014980. PMID 11032039. S2CID 7036341.

[faces3-3] 1 2 Li, Y.; Gong, S.; Liddell, H. (2003). "Recognising trajectories of facial identities using kernel discriminant analysis". Image and Vision Computing. 21 (13–14): 1077–1086. CiteSeerX 10.1.1.2.6315 . doi:10.1016/j.imavis.2003.08.010.

[bishop-4] Bishop, CM (2006). Pattern Recognition and Machine Learning. New York, NY: Springer.

[5] Scholkopf, B; Herbrich, R.; Smola, A. (2001). A generalized representer theorem. Computational Learning Theory. Lecture Notes in Computer Science. Vol. 2111. pp. 416–426. CiteSeerX 10.1.1.42.8617 . doi:10.1007/3-540-44581-1_27. ISBN 978-3-540-42343-0.

[duda-6] 1 2 Duda, R.; Hart, P.; Stork, D. (2001). Pattern Classification. New York, NY: Wiley.

[texture-7] 1 2 3 4 5 Zhang, J.; Ma, K.K. (2004). "Kernel fisher discriminant for texture classification".{{cite journal}}: Cite journal requires |journal= (help)

[8] Liu, Q.; Lu, H.; Ma, S. (2004). "Improving kernel Fisher discriminant analysis for face recognition". IEEE Transactions on Circuits and Systems for Video Technology. 14 (1): 42–49. doi:10.1109/tcsvt.2003.818352. S2CID 39657721.

[9] Liu, Q.; Huang, R.; Lu, H.; Ma, S. (2002). "Face recognition using kernel-based Fisher discriminant analysis". IEEE International Conference on Automatic Face and Gesture Recognition.

[faceDetection1-10] Kurita, T.; Taguchi, T. (2002). A modification of kernel-based Fisher discriminant analysis for face detection. IEEE International Conference on Automatic Face and Gesture Recognition. pp. 300–305. CiteSeerX 10.1.1.100.3568 . doi:10.1109/AFGR.2002.1004170. ISBN 978-0-7695-1602-8. S2CID 7581426.

[faceDetection2-11] Feng, Y.; Shi, P. (2004). "Face detection based on kernel fisher discriminant analysis". IEEE International Conference on Automatic Face and Gesture Recognition.

[digitRecognition-12] Yang, J.; Frangi, AF; Yang, JY; Zang, D., Jin, Z. (2005). "KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition". IEEE Transactions on Pattern Analysis and Machine Intelligence. 27 (2): 230–244. CiteSeerX 10.1.1.330.1179 . doi:10.1109/tpami.2005.33. PMID 15688560. S2CID 9771368.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[13] Wang, Y.; Ruan, Q. (2006). "Kernel fisher discriminant analysis for palmprint recognition". International Conference on Pattern Recognition.

[cancer-14] Wei, L.; Yang, Y.; Nishikawa, R.M.; Jiang, Y. (2005). "A study on several machine-learning methods for classification of malignant and benign clustered microcalcifications". IEEE Transactions on Medical Imaging. 24 (3): 371–380. doi:10.1109/tmi.2004.842457. PMID 15754987. S2CID 36691320.

[higgs-15] Malmgren, T. (1997). "An iterative nonlinear discriminant analysis program: IDA 1.0". Computer Physics Communications. 106 (3): 230–236. Bibcode:1997CoPhC.106..230M. doi:10.1016/S0010-4655(97)00100-8.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]