Generative topographic map

Last updated

Generative topographic map (GTM) is a machine learning method that is a probabilistic counterpart of the self-organizing map (SOM), is probably convergent and does not require a shrinking neighborhood or a decreasing step size. It is a generative model: the data is assumed to arise by first probabilistically picking a point in a low-dimensional space, mapping the point to the observed high-dimensional input space (via a smooth function), then adding noise in that space. The parameters of the low-dimensional probability distribution, the smooth map and the noise are all learned from the training data using the expectation-maximization (EM) algorithm. GTM was introduced in 1996 in a paper by Christopher Bishop, Markus Svensen, and Christopher K. I. Williams.

Machine learning branch of statistics and computer science, which studies algorithms and architectures that learn from observed facts

Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model of sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task. Machine learning algorithms are used in a wide variety of applications, such as email filtering, and computer vision, where it is infeasible to develop an algorithm of specific instructions for performing the task. Machine learning is closely related to computational statistics, which focuses on making predictions using computers. The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a field of study within machine learning, and focuses on exploratory data analysis through unsupervised learning. In its application across business problems, machine learning is also referred to as predictive analytics.

Self-organizing map type of artificial neural network

A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional, discretized representation of the input space of the training samples, called a map, and is therefore a method to do dimensionality reduction. Self-organizing maps differ from other artificial neural networks as they apply competitive learning as opposed to error-correction learning, and in the sense that they use a neighborhood function to preserve the topological properties of the input space.

Neighbourhood (mathematics) an open set in a topological space that contains a given point or subset

In topology and related areas of mathematics, a neighbourhood is one of the basic concepts in a topological space. It is closely related to the concepts of open set and interior. Intuitively speaking, a neighbourhood of a point is a set of points containing that point where one can move some amount in any direction away from that point without leaving the set.

Contents

Details of the algorithm

The approach is strongly related to density networks which use importance sampling and a multi-layer perceptron to form a non-linear latent variable model. In the GTM the latent space is a discrete grid of points which is assumed to be non-linearly projected into data space. A Gaussian noise assumption is then made in data space so that the model becomes a constrained mixture of Gaussians. Then the model's likelihood can be maximized by EM.

In statistics, importance sampling is a general technique for estimating properties of a particular distribution, while only having samples generated from a different distribution than the distribution of interest. It is related to umbrella sampling in computational physics. Depending on the application, the term may refer to the process of sampling from this alternative distribution, the process of inference, or both.

A latent variable model is a statistical model that relates a set of observable variables to a set of latent variables.

Gaussian noise

Gaussian noise, named after Carl Friedrich Gauss, is statistical noise having a probability density function (PDF) equal to that of the normal distribution, which is also known as the Gaussian distribution. In other words, the values that the noise can take on are Gaussian-distributed.

In theory, an arbitrary nonlinear parametric deformation could be used. The optimal parameters could be found by gradient descent, etc.

The suggested approach to the nonlinear mapping is to use a radial basis function network (RBF) to create a nonlinear mapping between the latent space and the data space. The nodes of the RBF network then form a feature space and the nonlinear mapping can then be taken as a linear transform of this feature space. This approach has the advantage over the suggested density network approach that it can be optimised analytically.

In the field of mathematical modeling, a radial basis function network is an artificial neural network that uses radial basis functions as activation functions. The output of the network is a linear combination of radial basis functions of the inputs and neuron parameters. Radial basis function networks have many uses, including function approximation, time series prediction, classification, and system control. They were first formulated in a 1988 paper by Broomhead and Lowe, both researchers at the Royal Signals and Radar Establishment.

Uses

In data analysis, GTMs are like a nonlinear version of principal components analysis, which allows high-dimensional data to be modelled as resulting from Gaussian noise added to sources in lower-dimensional latent space. For example, to locate stocks in plottable 2D space based on their hi-D time-series shapes. Other applications may want to have fewer sources than data points, for example mixture models.

In generative deformational modelling, the latent and data spaces have the same dimensions, for example, 2D images or 1 audio sound waves. Extra 'empty' dimensions are added to the source (known as the 'template' in this form of modelling), for example locating the 1D sound wave in 2D space. Further nonlinear dimensions are then added, produced by combining the original dimensions. The enlarged latent space is then projected back into the 1D data space. The probability of a given projection is, as before, given by the product of the likelihood of the data under the Gaussian noise model with the prior on the deformation parameter. Unlike conventional spring-based deformation modelling, this has the advantage of being analytically optimizable. The disadvantage is that it is a 'data-mining' approach, i.e. the shape of the deformation prior is unlikely to be meaningful as an explanation of the possible deformations, as it is based on a very high, artificial- and arbitrarily constructed nonlinear latent space. For this reason the prior is learned from data rather than created by a human expert, as is possible for spring-based models.

Comparison with Kohonen's self-organizing maps

While nodes in the self-organizing map (SOM) can wander around at will, GTM nodes are constrained by the allowable transformations and their probabilities. If the deformations are well-behaved the topology of the latent space is preserved.

The SOM was created as a biological model of neurons and is a heuristic algorithm. By contrast, the GTM has nothing to do with neuroscience or cognition and is a probabilistically principled model. Thus, it has a number of advantages over SOM, namely:

GTM was introduced by Bishop, Svensen and Williams in their Technical Report in 1997 (Technical Report NCRG/96/015, Aston University, UK) published later in Neural Computation. It was also described in the PhD thesis of Markus Svensen (Aston, 1998).

Applications

See also

Related Research Articles

Supervised learning machine learning task of learning a function that maps an input to an output based on example input-output pairs

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value. A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way.

Artificial neural network computational model used in machine learning, computer science and other research disciplines, which is based on a large collection of connected simple units called artificial neurons, loosely analogous to axons in a biological brain

Artificial neural networks (ANN) or connectionist systems are computing systems vaguely inspired by the biological neural networks that constitute animal brains. The neural network itself is not an algorithm, but rather a framework for many different machine learning algorithms to work together and process complex data inputs. Such systems "learn" to perform tasks by considering examples, generally without being programmed with any task-specific rules. For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually labeled as "cat" or "no cat" and using the results to identify cats in other images. They do this without any prior knowledge about cats, for example, that they have fur, tails, whiskers and cat-like faces. Instead, they automatically generate identifying characteristics from the learning material that they process.

Monte Carlo methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. Their essential idea is using randomness to solve problems that might be deterministic in principle. They are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other approaches. Monte Carlo methods are mainly used in three problem classes: optimization, numerical integration, and generating draws from a probability distribution.

Unsupervised learning machine learning technique

Unsupervised learning is a term used for Hebbian learning, associated to learning without a teacher, also known as self-organization and a method of modelling the probability density of inputs.

Nonlinear dimensionality reduction Summary of algorithms for nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.

Dimensionality reduction process of reducing the number of random variables under consideration

In statistics, machine learning, and information theory, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. It can be divided into feature selection and feature extraction.

Intelligent control is a class of control techniques that use various artificial intelligence computing approaches like neural networks, Bayesian probability, fuzzy logic, machine learning, reinforcement learning, evolutionary computation and genetic algorithms.

Simultaneous localization and mapping computational problem of constructing a map while tracking an agents location within it

In robotic mapping and navigation, simultaneous localization and mapping (SLAM) is the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent's location within it. While this initially appears to be a chicken-and-egg problem there are several algorithms known for solving it, at least approximately, in tractable time for certain environments. Popular approximate solution methods include the particle filter, extended Kalman filter, and GraphSLAM.

The forward algorithm, in the context of a hidden Markov model (HMM), is used to calculate a 'belief state': the probability of a state at a certain time, given the history of evidence. The process is also known as filtering. The forward algorithm is closely related to, but distinct from, the Viterbi algorithm.

In statistical classification, including machine learning, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsistent, but three major types can be distinguished, following Jebara (2004):

Probabilistic latent semantic analysis (PLSA), also known as probabilistic latent semantic indexing is a statistical technique for the analysis of two-mode and co-occurrence data. In effect, one can derive a low-dimensional representation of the observed variables in terms of their affinity to certain hidden variables, just as in latent semantic analysis, from which PLSA evolved.

Autoencoder

An autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore signal “noise.” Along with the reduction side, a reconstructing side is learnt, where the autoencoder tries to generate from the reduced encoding a representation as close as possible to its original input, hence its name. Recently, the autoencoder concept has become more widely used for learning generative models of data. Some of the most powerful AI in the 2010s have involved sparse autoencoders stacked inside of deep neural networks.

Linear-nonlinear-Poisson cascade model

The linear-nonlinear-Poisson (LNP) cascade model is a simplified functional model of neural spike responses. It has been successfully used to describe the response characteristics of neurons in early sensory pathways, especially the visual system. The LNP model is generally implicit when using reverse correlation or the spike-triggered average to characterize neural responses with white-noise stimuli.

There are many types of artificial neural networks (ANN).

In machine learning, the radial basis function kernel, or RBF kernel, is a popular kernel function used in various kernelized learning algorithms. In particular, it is commonly used in support vector machine classification.

t-distributed stochastic neighbor embedding Technique for dimensionality reduction

T-distributed Stochastic Neighbor Embedding (t-SNE) is a machine learning algorithm for visualization developed by Laurens van der Maaten and Geoffrey Hinton. It is a nonlinear dimensionality reduction technique well-suited for embedding high-dimensional data for visualization in a low-dimensional space of two or three dimensions. Specifically, it models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points with high probability.

System identification is a method of identifying or measuring the mathematical model of a system from measurements of the system inputs and outputs. The applications of system identification include any system where the inputs and outputs can be measured and include industrial processes, control systems, economic data, biology and the life sciences, medicine, social systems and many more.

Word embedding method in natural language processing

Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with one dimension per word to a continuous vector space with a much lower dimension.

Outline of machine learning Overview of and topical guide to machine learning

The following outline is provided as an overview of and topical guide to machine learning. Machine learning is a subfield of soft computing within computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. In 1959, Arthur Samuel defined machine learning as a "field of study that gives computers the ability to learn without being explicitly programmed". Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from an example training set of input observations in order to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions.