In statistics, kernel Fisher discriminant analysis (KFD),[1] also known as generalized discriminant analysis[2] and kernel discriminant analysis,[3] is a kernelized version of linear discriminant analysis (LDA). It is named after Ronald Fisher.
Intuitively, the idea of LDA is to find a projection where class separation is maximized. Given two sets of labeled data, and , we can calculate the mean value of each class, and , as
where is the number of examples of class . The goal of linear discriminant analysis is to give a large separation of the class means while also keeping the in-class variance small.[4] This is formulated as maximizing, with respect to , the following ratio:
where is the between-class covariance matrix and is the total within-class covariance matrix:
This, in turn, is equivalent to maximizing , where is the Lagrange multiplier.
At the maximum, the derivatives of with respect to and must be zero. Taking yields
which is trivially satisfied by and
Extending LDA
To extend LDA to non-linear mappings, the data, given as the points can be mapped to a new feature space, via some function In this new feature space, the function that needs to be maximized is[1]
where
and
Further, note that . Explicitly computing the mappings and then performing LDA can be computationally expensive, and in many cases intractable. For example, may be infinitely dimensional. Thus, rather than explicitly mapping the data to , the data can be implicitly embedded by rewriting the algorithm in terms of dot products and using kernel functions in which the dot product in the new feature space is replaced by a kernel function,.
LDA can be reformulated in terms of dot products by first noting that will have an expansion of the form[5]
Then note that
where
The numerator of can then be written as:
Similarly, the denominator can be written as
with the component of defined as is the identity matrix, and the matrix with all entries equal to . This identity can be derived by starting out with the expression for and using the expansion of and the definitions of and
With these equations for the numerator and denominator of , the equation for can be rewritten as
Then, differentiating and setting equal to zero gives
Since only the direction of , and hence the direction of matters, the above can be solved for as
Note that in practice, is usually singular and so a multiple of the identity is added to it [1]
Given the solution for , the projection of a new data point is given by[1]
Multi-class KFD
The extension to cases where there are more than two classes is relatively straightforward.[2][6][7] Let be the number of classes. Then multi-class KFD involves projecting the data into a -dimensional space using discriminant functions
This can be written in matrix notation
where the are the columns of .[6] Further, the between-class covariance matrix is now
where is the mean of all the data in the new feature space. The within-class covariance matrix is
The solution is now obtained by maximizing
The kernel trick can again be used and the goal of multi-class KFD becomes[7]
where and
The are defined as in the above section and is defined as
can then be computed by finding the leading eigenvectors of .[7] Furthermore, the projection of a new input, , is given by[7]
where the component of is given by .
Classification using KFD
In both two-class and multi-class KFD, the class label of a new input can be assigned as[7]
where is the projected mean for class and is a distance function.
Applications
Kernel discriminant analysis has been used in a variety of applications. These include:
In physics, the cross section is a measure of the probability that a specific process will take place in a collision of two particles. For example, the Rutherford cross-section is a measure of probability that an alpha particle will be deflected by a given angle during an interaction with an atomic nucleus. Cross section is typically denoted σ (sigma) and is expressed in units of area, more specifically in barns. In a way, it can be thought of as the size of the object that the excitation must hit in order for the process to occur, but more exactly, it is a parameter of a stochastic process.
In mathematics, any vector space has a corresponding dual vector space consisting of all linear forms on together with the vector space structure of pointwise addition and scalar multiplication by constants.
In mathematical analysis, the Dirac delta function, also known as the unit impulse, is a generalized function on the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire real line is equal to one. Since there is no function having this property, modelling the delta "function" rigorously involves the use of limits or, as is common in mathematics, measure theory and the theory of distributions.
The Cauchy–Schwarz inequality is an upper bound on the inner product between two vectors in an inner product space in terms of the product of the vector norms. It is considered one of the most important and widely used inequalities in mathematics.
The stress–energy tensor, sometimes called the stress–energy–momentum tensor or the energy–momentum tensor, is a tensor physical quantity that describes the density and flux of energy and momentum in spacetime, generalizing the stress tensor of Newtonian physics. It is an attribute of matter, radiation, and non-gravitational force fields. This density and flux of energy and momentum are the sources of the gravitational field in the Einstein field equations of general relativity, just as mass density is the source of such a field in Newtonian gravity.
In mathematics and physics, the heat equation is a certain partial differential equation. Solutions of the heat equation are sometimes known as caloric functions. The theory of the heat equation was first developed by Joseph Fourier in 1822 for the purpose of modeling how a quantity such as heat diffuses through a given region.
In mechanics and geometry, the 3D rotation group, often denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space under the operation of composition.
In the calculus of variations, a field of mathematical analysis, the functional derivative relates a change in a functional to a change in a function on which the functional depends.
In physics, the S-matrix or scattering matrix relates the initial state and the final state of a physical system undergoing a scattering process. It is used in quantum mechanics, scattering theory and quantum field theory (QFT).
In the theory of stochastic processes, the Karhunen–Loève theorem, also known as the Kosambi–Karhunen–Loève theorem states that a stochastic process can be represented as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.
In physics, the Hamilton–Jacobi equation, named after William Rowan Hamilton and Carl Gustav Jacob Jacobi, is an alternative formulation of classical mechanics, equivalent to other formulations such as Newton's laws of motion, Lagrangian mechanics and Hamiltonian mechanics.
In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a graph or a discrete grid. For the case of a finite-dimensional graph, the discrete Laplace operator is more commonly called the Laplacian matrix.
In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solving integral operator equations. Since then, positive-definite functions and their various analogues and generalizations have arisen in diverse parts of mathematics. They occur naturally in Fourier analysis, probability theory, operator theory, complex function-theory, moment problems, integral equations, boundary-value problems for partial differential equations, machine learning, embedding problem, information theory, and other areas.
The Luttinger–Kohn model is a flavor of the k·p perturbation theory used for calculating the structure of multiple, degenerate electronic bands in bulk and quantum well semiconductors. The method is a generalization of the single band k·p theory.
Least-squares support-vector machines (LS-SVM) for statistics and in statistical modeling, are least-squares versions of support-vector machines (SVM), which are a set of related supervised learning methods that analyze data and recognize patterns, and which are used for classification and regression analysis. In this version one finds the solution by solving a set of linear equations instead of a convex quadratic programming (QP) problem for classical SVMs. Least-squares SVM classifiers were proposed by Johan Suykens and Joos Vandewalle. LS-SVMs are a class of kernel-based learning methods.
Multipole radiation is a theoretical framework for the description of electromagnetic or gravitational radiation from time-dependent distributions of distant sources. These tools are applied to physical phenomena which occur at a variety of length scales - from gravitational waves due to galaxy collisions to gamma radiation resulting from nuclear decay. Multipole radiation is analyzed using similar multipole expansion techniques that describe fields from static sources, however there are important differences in the details of the analysis because multipole radiation fields behave quite differently from static fields. This article is primarily concerned with electromagnetic multipole radiation, although the treatment of gravitational waves is similar.
In mathematics, the Kodaira–Spencer map, introduced by Kunihiko Kodaira and Donald C. Spencer, is a map associated to a deformation of a scheme or complex manifold X, taking a tangent space of a point of the deformation space to the first cohomology group of the sheaf of vector fields on X.
In pure and applied mathematics, quantum mechanics and computer graphics, a tensor operator generalizes the notion of operators which are scalars and vectors. A special class of these are spherical tensor operators which apply the notion of the spherical basis and spherical harmonics. The spherical basis closely relates to the description of angular momentum in quantum mechanics and spherical harmonic functions. The coordinate-free generalization of a tensor operator is known as a representation operator.
In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song , Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.
Regularized least squares (RLS) is a family of methods for solving the least-squares problem while using regularization to further constrain the resulting solution.
↑ Liu, Q.; Lu, H.; Ma, S. (2004). "Improving kernel Fisher discriminant analysis for face recognition". IEEE Transactions on Circuits and Systems for Video Technology. 14 (1): 42–49. doi:10.1109/tcsvt.2003.818352. S2CID39657721.
↑ Liu, Q.; Huang, R.; Lu, H.; Ma, S. (2002). "Face recognition using kernel-based Fisher discriminant analysis". IEEE International Conference on Automatic Face and Gesture Recognition.
↑ Feng, Y.; Shi, P. (2004). "Face detection based on kernel fisher discriminant analysis". IEEE International Conference on Automatic Face and Gesture Recognition.
↑ Yang, J.; Frangi, AF; Yang, JY; Zang, D., Jin, Z. (2005). "KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition". IEEE Transactions on Pattern Analysis and Machine Intelligence. 27 (2): 230–244. CiteSeerX10.1.1.330.1179. doi:10.1109/tpami.2005.33. PMID15688560. S2CID9771368.{{cite journal}}: CS1 maint: multiple names: authors list (link)
↑ Wang, Y.; Ruan, Q. (2006). "Kernel fisher discriminant analysis for palmprint recognition". International Conference on Pattern Recognition.
↑ Wei, L.; Yang, Y.; Nishikawa, R.M.; Jiang, Y. (2005). "A study on several machine-learning methods for classification of malignant and benign clustered microcalcifications". IEEE Transactions on Medical Imaging. 24 (3): 371–380. doi:10.1109/tmi.2004.842457. PMID15754987. S2CID36691320.
This page is based on this Wikipedia article Text is available under the CC BY-SA 4.0 license; additional terms may apply. Images, videos and audio are available under their respective licenses.