Growing self-organizing map

Last updated

A growing self-organizing map (GSOM) is a growing variant of a self-organizing map (SOM). The GSOM was developed to address the issue of identifying a suitable map size in the SOM. It starts with a minimal number of nodes (usually 4) and grows new nodes on the boundary based on a heuristic. By using the value called Spread Factor (SF), the data analyst has the ability to control the growth of the GSOM.

Contents

All the starting nodes of the GSOM are boundary nodes, i.e. each node has the freedom to grow in its own direction at the beginning. (Fig. 1) New Nodes are grown from the boundary nodes. Once a node is selected for growing all its free neighboring positions will be grown new nodes. The figure shows the three possible node growth options for a rectangular GSOM.

Node growth options in GSOM: (a) one new node, (b) two new nodes and (c) three new nodes. GSOM-new-node-growth.gif
Node growth options in GSOM: (a) one new node, (b) two new nodes and (c) three new nodes.

The algorithm

The GSOM process is as follows:

  1. Initialization phase:
    1. Initialize the weight vectors of the starting nodes (usually four) with random numbers between 0 and 1.
    2. Calculate the growth threshold () for the given data set of dimension according to the spread factor () using the formula
  2. Growing Phase:
    1. Present input to the network.
    2. Determine the weight vector that is closest to the input vector mapped to the current feature map (winner), using Euclidean distance (similar to the SOM). This step can be summarized as: find such that where , are the input and weight vectors respectively, is the position vector for nodes and is the set of natural numbers.
    3. The weight vector adaptation is applied only to the neighborhood of the winner and the winner itself. The neighborhood is a set of neurons around the winner, but in the GSOM the starting neighborhood selected for weight adaptation is smaller compared to the SOM (localized weight adaptation). The amount of adaptation (learning rate) is also reduced exponentially over the iterations. Even within the neighborhood, weights that are closer to the winner are adapted more than those further away. The weight adaptation can be described by where the Learning Rate , is a sequence of positive parameters converging to zero as . , are the weight vectors of the node before and after the adaptation and is the neighbourhood of the winning neuron at the th iteration. The decreasing value of in the GSOM depends on the number of nodes existing in the map at time .
    4. Increase the error value of the winner (error value is the difference between the input vector and the weight vectors).
    5. When (where is the total error of node and is the growth threshold). Grow nodes if i is a boundary node. Distribute weights to neighbors if is a non-boundary node.
    6. Initialize the new node weight vectors to match the neighboring node weights.
    7. Initialize the learning rate () to its starting value.
    8. Repeat steps 2 – 7 until all inputs have been presented and node growth is reduced to a minimum level.
  3. Smoothing phase.
    1. Reduce learning rate and fix a small starting neighborhood.
    2. Find winner and adapt the weights of the winner and neighbors in the same way as in growing phase.
Approximation of a spiral with noise by 1D SOM (the upper row) and GSOM (the lower row) with 50 (the first column) and 100 (the second column) nodes. The Fraction of variance unexplained is: a) 4.68% (SOM, 50 nodes); b) 1.69% (SOM, 100 nodes); c) 4.20% (GSOM, 50 nodes); d) 2.32% (GSOM, 100 nodes). The initial approximation for SOM was equidistribution of nodes in a segment on the first principal component with the same variance as for the data set. The initial approximation for GSOM was the mean point. SOM versus GSOM.png
Approximation of a spiral with noise by 1D SOM (the upper row) and GSOM (the lower row) with 50 (the first column) and 100 (the second column) nodes. The Fraction of variance unexplained is: a) 4.68% (SOM, 50 nodes); b) 1.69% (SOM, 100 nodes); c) 4.20% (GSOM, 50 nodes); d) 2.32% (GSOM, 100 nodes). The initial approximation for SOM was equidistribution of nodes in a segment on the first principal component with the same variance as for the data set. The initial approximation for GSOM was the mean point.

Applications

The GSOM can be used for many preprocessing tasks in Data mining, for Nonlinear dimensionality reduction, for approximation of principal curves and manifolds, for clustering and classification. It gives often the better representation of the data geometry than the SOM (see the classical benchmark for principal curves on the left).

Related Research Articles

<span class="mw-page-title-main">Self-organizing map</span> Machine learning technique useful for dimensionality reduction

A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional representation of a higher dimensional data set while preserving the topological structure of the data. For example, a data set with variables measured in observations could be represented as clusters of observations with similar values for the variables. These clusters then could be visualized as a two-dimensional "map" such that observations in proximal clusters have more similar values than observations in distal clusters. This can make high-dimensional data easier to visualize and analyze.

<span class="mw-page-title-main">Perceptron</span> Algorithm for supervised learning of binary classifiers

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

Unsupervised learning, is paradigm in machine learning where, in contrast to supervised learning and semi-supervised learning, algorithms learn patterns exclusively from unlabeled data.

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Summary of algorithms for nonlinear dimensionality reduction

Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

In computer science, learning vector quantization (LVQ) is a prototype-based supervised classification algorithm. LVQ is the supervised counterpart of vector quantization systems.

<span class="mw-page-title-main">Backpropagation</span> Optimization algorithm for artificial neural networks

As a machine-learning algorithm, backpropagation performs a backward pass to adjust the model's parameters, aiming to minimize the mean squared error (MSE). In a single-layered network, backpropagation uses the following steps:

  1. Traverse through the network from the input to the output by computing the hidden layers' output and the output layer.
  2. In the output layer, calculate the derivative of the cost function with respect to the input and the hidden layers.
  3. Repeatedly update the weights until they converge or the model has undergone enough iterations.

In abstract algebra, the split-quaternions or coquaternions form an algebraic structure introduced by James Cockle in 1849 under the latter name. They form an associative algebra of dimension four over the real numbers.

In mathematics, the Grothendieck group, or group of differences, of a commutative monoid M is a certain abelian group. This abelian group is constructed from M in the most universal way, in the sense that any abelian group containing a homomorphic image of M will also contain a homomorphic image of the Grothendieck group of M. The Grothendieck group construction takes its name from a specific case in category theory, introduced by Alexander Grothendieck in his proof of the Grothendieck–Riemann–Roch theorem, which resulted in the development of K-theory. This specific case is the monoid of isomorphism classes of objects of an abelian category, with the direct sum as its operation.

Neural gas is an artificial neural network, inspired by the self-organizing map and introduced in 1991 by Thomas Martinetz and Klaus Schulten. The neural gas is a simple algorithm for finding optimal data representations based on feature vectors. The algorithm was coined "neural gas" because of the dynamics of the feature vectors during the adaptation process, which distribute themselves like a gas within the data space. It is applied where data compression or vector quantization is an issue, for example speech recognition, image processing or pattern recognition. As a robustly converging alternative to the k-means clustering it is also used for cluster analysis.

In mathematics, a super vector space is a -graded vector space, that is, a vector space over a field with a given decomposition of subspaces of grade and grade . The study of super vector spaces and their generalizations is sometimes called super linear algebra. These objects find their principal application in theoretical physics where they are used to describe the various algebraic aspects of supersymmetry.

Oja's learning rule, or simply Oja's rule, named after Finnish computer scientist Erkki Oja, is a model of how neurons in the brain or in artificial neural networks change connection strength, or learn, over time. It is a modification of the standard Hebb's Rule that, through multiplicative normalization, solves all stability problems and generates an algorithm for principal components analysis. This is a computational form of an effect which is believed to happen in biological neurons.

<span class="mw-page-title-main">Classical group</span>

In mathematics, the classical groups are defined as the special linear groups over the reals R, the complex numbers C and the quaternions H together with special automorphism groups of symmetric or skew-symmetric bilinear forms and Hermitian or skew-Hermitian sesquilinear forms defined on real, complex and quaternionic finite-dimensional vector spaces. Of these, the complex classical Lie groups are four infinite families of Lie groups that together with the exceptional groups exhaust the classification of simple Lie groups. The compact classical groups are compact real forms of the complex classical groups. The finite analogues of the classical groups are the classical groups of Lie type. The term "classical group" was coined by Hermann Weyl, it being the title of his 1939 monograph The Classical Groups.

The generalized Hebbian algorithm (GHA), also known in the literature as Sanger's rule, is a linear feedforward neural network model for unsupervised learning with applications primarily in principal components analysis. First defined in 1989, it is similar to Oja's rule in its formulation and stability, except it can be applied to networks with multiple outputs. The name originates because of the similarity between the algorithm and a hypothesis made by Donald Hebb about the way in which synaptic strengths in the brain are modified in response to experience, i.e., that changes are proportional to the correlation between the firing of pre- and post-synaptic neurons.

Competitive learning is a form of unsupervised learning in artificial neural networks, in which nodes compete for the right to respond to a subset of the input data. A variant of Hebbian learning, competitive learning works by increasing the specialization of each node in the network. It is well suited to finding clusters within data.

The U-matrix is a representation of a self-organizing map (SOM) where the Euclidean distance between the codebook vectors of neighboring neurons is depicted in a grayscale image. This image is used to visualize the data in a high-dimensional space using a 2D image.

Extension neural network is a pattern recognition method found by M. H. Wang and C. P. Hung in 2003 to classify instances of data sets. Extension neural network is composed of artificial neural network and extension theory concepts. It uses the fast and adaptive learning capability of neural network and correlation estimation property of extension theory by calculating extension distance.
ENN was used in:

<span class="mw-page-title-main">Katz centrality</span>

In graph theory, the Katz centrality or alpha centrality of a node is a measure of centrality in a network. It was introduced by Leo Katz in 1953 and is used to measure the relative degree of influence of an actor within a social network. Unlike typical centrality measures which consider only the shortest path between a pair of actors, Katz centrality measures influence by taking into account the total number of walks between a pair of actors.

<span class="mw-page-title-main">Restricted Boltzmann machine</span> Class of artificial neural network

A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs.

Fusion adaptive resonance theory (fusion ART) is a generalization of self-organizing neural networks known as the original Adaptive Resonance Theory models for learning recognition categories (or cognitive codes) across multiple pattern channels. There is a separate stream of work on fusion ARTMAP, that extends fuzzy ARTMAP consisting of two fuzzy ART modules connected by an inter-ART map field to an extended architecture consisting of multiple ART modules.

<span class="mw-page-title-main">Transformer (machine learning model)</span> Machine learning algorithm used for natural-language processing

A Transformer is a deep learning architecture that relies on the attention mechanism. It is notable for requiring less training time compared to previous recurrent neural architectures, such as long short-term memory (LSTM), and has been prevalently adopted for training large language models on large (language) datasets, such as the Wikipedia Corpus and Common Crawl, by virtue of the parallelized processing of input sequence. More specifically, the model takes in tokenized input tokens, and at each layer, contextualizes each token with other (unmasked) input tokens in parallel via attention mechanism. Though the Transformer model came out in 2017, the core attention mechanism was proposed earlier in 2014 by Bahdanau, Cho, and Bengio for machine translation. This architecture is now used not only in natural language processing, computer vision, but also in audio, and multi-modal processing. It has also led to the development of pre-trained systems, such as generative pre-trained transformers (GPTs) and BERT.

References

  1. The illustration is prepared using free software: E.M. Mirkes, Principal Component Analysis and Self-Organizing Maps: applet. University of Leicester, 2011.

Bibliography

See also