Manifold alignment

Last updated

Manifold alignment is a class of machine learning algorithms that produce projections between sets of data, given that the original data sets lie on a common manifold. The concept was first introduced as such by Ham, Lee, and Saul in 2003, [1] adding a manifold constraint to the general problem of correlating sets of high-dimensional vectors. [2]

Contents

Overview

Manifold alignment assumes that disparate data sets produced by similar generating processes will share a similar underlying manifold representation. By learning projections from each original space to the shared manifold, correspondences are recovered and knowledge from one domain can be transferred to another. Most manifold alignment techniques consider only two data sets, but the concept extends to arbitrarily many initial data sets.

Consider the case of aligning two data sets, and , with and .

Manifold alignment algorithms attempt to project both and into a new d-dimensional space such that the projections both minimize distance between corresponding points and preserve the local manifold structure of the original data. The projection functions are denoted:

Let represent the binary correspondence matrix between points in and :

Let and represent pointwise similarities within data sets. This is usually encoded as the heat kernel of the adjacency matrix of a k-nearest neighbor graph.

Finally, introduce a coefficient , which can be tuned to adjust the weight of the 'preserve manifold structure' goal, versus the 'minimize corresponding point distances' goal.

With these definitions in place, the loss function for manifold alignment can be written:

Solving this optimization problem is equivalent to solving a generalized eigenvalue problem using the graph laplacian [3] of the joint matrix, G:

Inter-data correspondences

The algorithm described above requires full pairwise correspondence information between input data sets; a supervised learning paradigm. However, this information is usually difficult or impossible to obtain in real world applications. Recent work has extended the core manifold alignment algorithm to semi-supervised [4] , unsupervised [5] , and multiple-instance [6] settings.

One-step vs. two-step alignment

The algorithm described above performs a "one-step" alignment, finding embeddings for both data sets at the same time. A similar effect can also be achieved with "two-step" alignments [7] [8] , following a slightly modified procedure:

  1. Project each input data set to a lower-dimensional space independently, using any of a variety of dimension reduction algorithms.
  2. Perform linear manifold alignment on the embedded data, holding the first data set fixed, mapping each additional data set onto the first's manifold. This approach has the benefit of decomposing the required computation, which lowers memory overhead and allows parallel implementations.

Instance-level vs. feature-level projections

Manifold alignment can be used to find linear (feature-level) projections, or nonlinear (instance-level) embeddings. While the instance-level version generally produces more accurate alignments, it sacrifices a great degree of flexibility as the learned embedding is often difficult to parameterize. Feature-level projections allow any new instances to be easily embedded in the manifold space, and projections may be combined to form direct mappings between the original data representations. These properties are especially important for knowledge-transfer applications.

Applications

Manifold alignment is suited to problems with several corpora that lie on a shared manifold, even when each corpus is of a different dimensionality. Many real-world problems fit this description, but traditional techniques are not able to take advantage of all corpora at the same time. Manifold alignment also facilitates transfer learning, in which knowledge of one domain is used to jump-start learning in correlated domains.

Applications of manifold alignment include:

See also

Related Research Articles

In mathematics, particularly linear algebra, an orthonormal basis for an inner product space V with finite dimension is a basis for whose vectors are orthonormal, that is, they are all unit vectors and orthogonal to each other. For example, the standard basis for a Euclidean space is an orthonormal basis, where the relevant inner product is the dot product of vectors. The image of the standard basis under a rotation or reflection is also orthonormal, and every orthonormal basis for arises in this fashion.

<span class="mw-page-title-main">Vapnik–Chervonenkis theory</span> Branch of statistical computational learning theory

Vapnik–Chervonenkis theory was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a statistical point of view.

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Summary of algorithms for nonlinear dimensionality reduction

Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

In mathematics, the total variation identifies several slightly different concepts, related to the structure of the codomain of a function or a measure. For a real-valued continuous function f, defined on an interval [a, b] ⊂ R, its total variation on the interval of definition is a measure of the one-dimensional arclength of the curve with parametric equation xf(x), for x ∈ [a, b]. Functions whose total variation is finite are called functions of bounded variation.

In mathematics, a Killing vector field, named after Wilhelm Killing, is a vector field on a Riemannian manifold that preserves the metric. Killing fields are the infinitesimal generators of isometries; that is, flows generated by Killing fields are continuous isometries of the manifold. More simply, the flow generates a symmetry, in the sense that moving each point of an object the same distance in the direction of the Killing vector will not distort distances on the object.

Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional data is considered to be a random function. The physical continuum over which these functions are defined is often time, but may also be spatial location, wavelength, probability, etc. Intrinsically, functional data are infinite dimensional. The high intrinsic dimensionality of these data brings challenges for theory as well as computation, where these challenges vary with how the functional data were sampled. However, the high or infinite dimensional structure of the data is a rich source of information and there are many interesting challenges for research and data analysis.

In physics, a sigma model is a field theory that describes the field as a point particle confined to move on a fixed manifold. This manifold can be taken to be any Riemannian manifold, although it is most commonly taken to be either a Lie group or a symmetric space. The model may or may not be quantized. An example of the non-quantized version is the Skyrme model; it cannot be quantized due to non-linearities of power greater than 4. In general, sigma models admit (classical) topological soliton solutions, for example, the Skyrmion for the Skyrme model. When the sigma field is coupled to a gauge field, the resulting model is described by Ginzburg–Landau theory. This article is primarily devoted to the classical field theory of the sigma model; the corresponding quantized theory is presented in the article titled "non-linear sigma model".

The concept of an abstract Wiener space is a mathematical construction developed by Leonard Gross to understand the structure of Gaussian measures on infinite-dimensional spaces. The construction emphasizes the fundamental role played by the Cameron–Martin space. The classical Wiener space is the prototypical example.

In mathematics and economics, transportation theory or transport theory is a name given to the study of optimal transportation and allocation of resources. The problem was formalized by the French mathematician Gaspard Monge in 1781.

<span class="mw-page-title-main">Autoencoder</span> Neural network that learns efficient data encoding in an unsupervised manner

An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data. The encoding is validated and refined by attempting to regenerate the input from the encoding. The autoencoder learns a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore insignificant data (“noise”).

In 3-dimensional topology, a part of the mathematical field of geometric topology, the Casson invariant is an integer-valued invariant of oriented integral homology 3-spheres, introduced by Andrew Casson.

Least-squares support-vector machines (LS-SVM) for statistics and in statistical modeling, are least-squares versions of support-vector machines (SVM), which are a set of related supervised learning methods that analyze data and recognize patterns, and which are used for classification and regression analysis. In this version one finds the solution by solving a set of linear equations instead of a convex quadratic programming (QP) problem for classical SVMs. Least-squares SVM classifiers were proposed by Johan Suykens and Joos Vandewalle. LS-SVMs are a class of kernel-based learning methods.

Coherent states have been introduced in a physical context, first as quasi-classical states in quantum mechanics, then as the backbone of quantum optics and they are described in that spirit in the article Coherent states. However, they have generated a huge variety of generalizations, which have led to a tremendous amount of literature in mathematical physics. In this article, we sketch the main directions of research on this line. For further details, we refer to several existing surveys.

<span class="mw-page-title-main">Diffusion map</span>

Diffusion maps is a dimensionality reduction or feature extraction algorithm introduced by Coifman and Lafon which computes a family of embeddings of a data set into Euclidean space whose coordinates can be computed from the eigenvectors and eigenvalues of a diffusion operator on the data. The Euclidean distance between points in the embedded space is equal to the "diffusion distance" between probability distributions centered at those points. Different from linear dimensionality reduction methods such as principal component analysis (PCA), diffusion maps is part of the family of nonlinear dimensionality reduction methods which focus on discovering the underlying manifold that the data has been sampled from. By integrating local similarities at different scales, diffusion maps give a global description of the data-set. Compared with other methods, the diffusion map algorithm is robust to noise perturbation and computationally inexpensive.

Diffusion wavelets are a fast multiscale framework for the analysis of functions on discrete structures like graphs, manifolds, and point clouds in Euclidean space. Diffusion wavelets are an extension of classical wavelet theory from harmonic analysis. Unlike classical wavelets whose basis functions are predetermined, diffusion wavelets are adapted to the geometry of a given diffusion operator . Moreover, the diffusion wavelet basis functions are constructed by dilation using the dyadic powers of . These dyadic powers of diffusion over the space and propagate local relationships in the function throughout the space until they become global. And if the rank of higher powers of decrease, then these higher powers become compressible. From these decaying dyadic powers of comes a chain of decreasing subspaces. These subspaces are the scaling function approximation subspaces, and the differences in the subspace chain are the wavelet subspaces.

Lagrangian field theory is a formalism in classical field theory. It is the field-theoretic analogue of Lagrangian mechanics. Lagrangian mechanics is used to analyze the motion of a system of discrete particles each with a finite number of degrees of freedom. Lagrangian field theory applies to continua and fields, which have an infinite number of degrees of freedom.

Proximal gradientmethods for learning is an area of research in optimization and statistical learning theory which studies algorithms for a general class of convex regularization problems where the regularization penalty may not be differentiable. One such example is regularization of the form

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song , Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.

<span class="mw-page-title-main">Variational autoencoder</span> Deep learning generative model to encode data representation

In machine learning, a variational autoencoder (VAE), is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling, belonging to the families of probabilistic graphical models and variational Bayesian methods.

In statistics, EM algorithm handles latent variables, while GMM is the Gaussian mixture model.

References

  1. Ham, Ji Hun; Daniel D. Lee; Lawrence K. Saul (2003). "Learning high dimensional correspondences from low dimensional manifolds" (PDF). Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003).
  2. Hotelling, H (1936). "Relations between two sets of variates" (PDF). Biometrika. 28 (3–4): 321–377. doi:10.2307/2333955. JSTOR   2333955.
  3. Belkin, M; P Niyogi (2003). "Laplacian eigenmaps for dimensionality reduction and data representation" (PDF). Neural Computation . 15 (6): 1373–1396. CiteSeerX   10.1.1.192.8814 . doi:10.1162/089976603321780317. S2CID   14879317.
  4. Ham, Ji Hun; Daniel D. Lee; Lawrence K. Saul (2005). "Semisupervised alignment of manifolds" (PDF). Proceedings of the Annual Conference on Uncertainty in Artificial Intelligence.
  5. Wang, Chang; Sridhar Mahadevan (2009). Manifold Alignment without Correspondence (PDF). The 21st International Joint Conference on Artificial Intelligence.[ permanent dead link ]
  6. Wang, Chang; Sridhar Mahadevan (2011). Heterogeneous Domain Adaptation using Manifold Alignment (PDF). The 22nd International Joint Conference on Artificial Intelligence. Archived from the original (PDF) on 2012-04-15. Retrieved 2011-12-14.
  7. Lafon, Stephane; Yosi Keller; Ronald R. Coifman (2006). "Data fusion and multicue data matching by diffusion maps" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 28 (11): 1784–1797. CiteSeerX   10.1.1.419.1814 . doi:10.1109/tpami.2006.223. PMID   17063683. S2CID   1186335.[ permanent dead link ]
  8. 1 2 3 4 Wang, Chang; Sridhar Mahadevan (2008). Manifold Alignment using Procrustes Analysis (PDF). The 25th International Conference on Machine Learning.[ permanent dead link ]
  9. Makondo, Ndivhuwo; Benjamin Rosman; Osamu Hasegawa (2015). Knowledge Transfer for Learning Robot Models via Local Procrustes Analysis. The 15th IEEE-RAS International Conference on Humanoid Robots (Humanoids). doi:10.1109/HUMANOIDS.2015.7363502.

Further reading