Neural gas

Last updated

Neural gas is an artificial neural network, inspired by the self-organizing map and introduced in 1991 by Thomas Martinetz and Klaus Schulten. [1] The neural gas is a simple algorithm for finding optimal data representations based on feature vectors. The algorithm was coined "neural gas" because of the dynamics of the feature vectors during the adaptation process, which distribute themselves like a gas within the data space. It is applied where data compression or vector quantization is an issue, for example speech recognition, [2] image processing [3] or pattern recognition. As a robustly converging alternative to the k-means clustering it is also used for cluster analysis. [4]

Contents

Algorithm

Suppose we want to model a probability distribution of data vectors using a finite number of feature vectors , where .

  1. For each time step
    1. Sample data vector from
    2. Compute the distance between and each feature vector. Rank the distances.
    3. Let be the index of the closest feature vector, the index of the second closest feature vector, and so on.
    4. Update each feature vector by:

In the algorithm, can be understood as the learning rate, and as the neighborhood range. and are reduced with increasing so that the algorithm converges after many adaptation steps.

The adaptation step of the neural gas can be interpreted as gradient descent on a cost function. By adapting not only the closest feature vector but all of them with a step size decreasing with increasing distance order, compared to (online) k-means clustering a much more robust convergence of the algorithm can be achieved. The neural gas model does not delete a node and also does not create new nodes.

Comparison with SOM

Compared to self-organized map, the neural gas model does not assume that some vectors are neighbors. If two vectors happen to be close together, they would tend to move together, and if two vectors happen to be apart, they would tend to not move together. In contrast, in an SOM, if two vectors are neighbors in the underlying graph, then they will always tend to move together, no matter whether the two vectors happen to be neighbors in the Euclidean space.

The name "neural gas" is because one can imagine it to be what an SOM would be like if there is no underlying graph, and all points are free to move without the bonds that bind them together.

Variants

A number of variants of the neural gas algorithm exists in the literature so as to mitigate some of its shortcomings. More notable is perhaps Bernd Fritzke's growing neural gas, [5] but also one should mention further elaborations such as the Growing When Required network [6] and also the incremental growing neural gas. [7] A performance-oriented approach that avoids the risk of overfitting is the Plastic Neural gas model. [8]

Growing neural gas

Fritzke describes the growing neural gas (GNG) as an incremental network model that learns topological relations by using a "Hebb-like learning rule", [5] only, unlike the neural gas, it has no parameters that change over time and it is capable of continuous learning, i.e. learning on data streams. GNG has been widely used in several domains, [9] demonstrating its capabilities for clustering data incrementally. The GNG is initialized with two randomly positioned nodes which are initially connected with a zero age edge and whose errors are set to 0. Since the in the GNG input data is presented sequentially one by one, the following steps are followed at each iteration:

Incremental growing neural gas

Another neural gas variant inspired in the GNG algorithm is the incremental growing neural gas (IGNG). The authors propose the main advantage of this algorithm to be "learning new data (plasticity) without degrading the previously trained network and forgetting the old input data (stability)." [7]

Growing when required

Having a network with a growing set of nodes, like the one implemented by the GNG algorithm was seen as a great advantage, however some limitation on the learning was seen by the introduction of the parameter λ, in which the network would only be able to grow when iterations were a multiple of this parameter. [6] The proposal to mitigate this problem was a new algorithm, the Growing When Required network (GWR), which would have the network grow more quickly, by adding nodes as quickly as possible whenever the network identified that the existing nodes would not describe the input well enough.

Plastic neural gas

The ability to only grow a network may quickly introduce overfitting; on the other hand, removing nodes on the basis of age only, as in the GNG model, does not ensure that the removed nodes are actually useless, because removal depends on a model parameter that should be carefully tuned to the "memory length" of the stream of input data.

The "Plastic Neural Gas" model [8] solves this problem by making decisions to add or remove nodes using an unsupervised version of cross-validation, which controls an equivalent notion of "generalization ability" for the unsupervised setting.

While growing-only methods only cater for the incremental learning scenario, the ability to grow and shrink is suited to the more general streaming data problem.

Implementations

To find the ranking of the feature vectors, the neural gas algorithm involves sorting, which is a procedure that does not lend itself easily to parallelization or implementation in analog hardware. However, implementations in both parallel software [10] and analog hardware [11] were actually designed.

Related Research Articles

<span class="mw-page-title-main">Supervised learning</span> A paradigm in machine learning

Supervised learning (SL) is a paradigm in machine learning where input objects and a desired output value train a model. The training data is processed, building a function that maps new data on expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. This statistical quality of an algorithm is measured through the so-called generalization error.

Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors. Developed in the early 1980s by Robert M. Gray, it was originally used for data compression. It works by dividing a large set of points (vectors) into groups having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms. In simpler terms, vector quantization chooses a set of points to represent a larger set of points.

<span class="mw-page-title-main">Self-organizing map</span> Machine learning technique useful for dimensionality reduction

A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional representation of a higher-dimensional data set while preserving the topological structure of the data. For example, a data set with variables measured in observations could be represented as clusters of observations with similar values for the variables. These clusters then could be visualized as a two-dimensional "map" such that observations in proximal clusters have more similar values than observations in distal clusters. This can make high-dimensional data easier to visualize and analyze.

Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent pattern. PR has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power.

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

Unsupervised learning is a method in machine learning where, in contrast to supervised learning, algorithms learn patterns exclusively from unlabeled data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a concise representation of its world and then generate imaginative content from it.

Instantaneously trained neural networks are feedforward artificial neural networks that create a new hidden neuron node for each novel training sample. The weights to this hidden neuron separate out not only this training sample but others that are near it, thus providing generalization. This separation is done using the nearest hyperplane that can be written down instantaneously. In the two most important implementations the neighborhood of generalization either varies with the training sample or remains constant. These networks use unary coding for an effective representation of the data sets.

In computer science, learning vector quantization (LVQ) is a prototype-based supervised classification algorithm. LVQ is the supervised counterpart of vector quantization systems.

In machine learning, backpropagation is a gradient estimation method used to train neural network models. The gradient estimate is used by the optimization algorithm to compute the network parameter updates.

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to the uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.

<span class="mw-page-title-main">Feedforward neural network</span> One of two broad types of artificial neural network

A feedforward neural network (FNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. Its flow is uni-directional, meaning that the information in the model flows in only one direction—forward—from the input nodes, through the hidden nodes and to the output nodes, without any cycles or loops, in contrast to recurrent neural networks, which have a bi-directional flow. Modern feedforward networks are trained using the backpropagation method and are colloquially referred to as the "vanilla" neural networks.

A multilayer perceptron (MLP) is a name for a modern feedforward artificial neural network, consisting of fully connected neurons with a nonlinear kind of activation function, organized in at least three layers, notable for being able to distinguish data that is not linearly separable. It is a misnomer because the original perceptron used a Heaviside step function, instead of a nonlinear kind of activation function.

<span class="mw-page-title-main">ADALINE</span> Early single-layer artificial neural network

ADALINE is an early single-layer artificial neural network and the name of the physical device that implemented this network. The network uses memistors. It was developed by professor Bernard Widrow and his doctoral student Ted Hoff at Stanford University in 1960. It is based on the perceptron. It consists of a weight, a bias and a summation function.

In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms. It is also used in situations where it is necessary for the algorithm to dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g., stock price prediction. Online learning algorithms may be prone to catastrophic interference, a problem that can be addressed by incremental learning approaches.

Competitive learning is a form of unsupervised learning in artificial neural networks, in which nodes compete for the right to respond to a subset of the input data. A variant of Hebbian learning, competitive learning works by increasing the specialization of each node in the network. It is well suited to finding clusters within data.

Backpropagation through time (BPTT) is a gradient-based technique for training certain types of recurrent neural networks. It can be used to train Elman networks. The algorithm was independently derived by numerous researchers.

There are many types of artificial neural networks (ANN).

Extension neural network is a pattern recognition method found by M. H. Wang and C. P. Hung in 2003 to classify instances of data sets. Extension neural network is composed of artificial neural network and extension theory concepts. It uses the fast and adaptive learning capability of neural network and correlation estimation property of extension theory by calculating extension distance.
ENN was used in:

An artificial neural network's learning rule or learning process is a method, mathematical logic or algorithm which improves the network's performance and/or training time. Usually, this rule is applied repeatedly over the network. It is done by updating the weights and bias levels of a network when a network is simulated in a specific data environment. A learning rule may accept existing conditions of the network and will compare the expected result and actual result of the network to give new and improved values for weights and bias. Depending on the complexity of actual model being simulated, the learning rule of the network can be as simple as an XOR gate or mean squared error, or as complex as the result of a system of differential equations.

Fusion adaptive resonance theory (fusion ART) is a generalization of self-organizing neural networks known as the original Adaptive Resonance Theory models for learning recognition categories across multiple pattern channels. There is a separate stream of work on fusion ARTMAP, that extends fuzzy ARTMAP consisting of two fuzzy ART modules connected by an inter-ART map field to an extended architecture consisting of multiple ART modules.

References

  1. Thomas Martinetz and Klaus Schulten (1991). "A "neural gas" network learns topologies" (PDF). Artificial Neural Networks. Elsevier. pp. 397–402.
  2. F. Curatelli; O. Mayora-Iberra (2000). "Competitive learning methods for efficient Vector Quantizations in a speech recognition environment". In Osvaldo Cairó; L. Enrique Sucar; Francisco J. Cantú-Ortiz (eds.). MICAI 2000: Advances in artificial intelligence : Mexican International Conference on Artificial Intelligence, Acapulco, Mexico, April 2000 : proceedings . Springer. p. 109. ISBN   978-3-540-67354-5.
  3. Angelopoulou, Anastassia; Psarrou, Alexandra; Garcia Rodriguez, Jose; Revett, Kenneth (2005). "Computer Vision for Biomedical Image Applications". In Yanxi Liu; Tianzi Jiang; Changshui Zhang (eds.). Computer vision for biomedical image applications: first international workshop, CVBIA 2005, Beijing, China, October 21, 2005 : proceedings. Lecture Notes in Computer Science. Vol. 3765. Springer. p. 210. doi:10.1007/11569541_22. ISBN   978-3-540-29411-5.
  4. Fernando Canales; Max Chacon (2007). "Progress in Pattern Recognition, Image Analysis and Applications". In Luis Rueda; Domingo Mery (eds.). Progress in pattern recognition, image analysis and applications: 12th Iberoamerican Congress on Pattern Recognition, CIARP 2007, Viña del Mar-Valparaiso, Chile, November 13–16, 2007; proceedings. Lecture Notes in Computer Science. Vol. 4756. Springer. pp. 684–693. doi: 10.1007/978-3-540-76725-1_71 . ISBN   978-3-540-76724-4.
  5. 1 2 Fritzke, Bernd (1995). "A Growing Neural Gas Network Learns Topologies". Advances in Neural Information Processing Systems. 7: 625–632. Retrieved 2016-04-26.
  6. 1 2 Marsland, Stephen; Shapiro, Jonathan; Nehmzow, Ulrich (2002). "A self-organising network that grows when required". Neural Networks. 15 (8): 1041–1058. CiteSeerX   10.1.1.14.8763 . doi:10.1016/s0893-6080(02)00078-3. PMID   12416693.
  7. 1 2 Prudent, Yann; Ennaji, Abdellatif (2005). "An incremental growing neural gas learns topologies". Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005. Vol. 2. pp. 1211–1216. doi:10.1109/IJCNN.2005.1556026. ISBN   978-0-7803-9048-5. S2CID   41517545.
  8. 1 2 Ridella, Sandro; Rovetta, Stefano; Zunino, Rodolfo (1998). "Plastic algorithm for adaptive vector quantisation". Neural Computing & Applications. 7: 37–51. doi:10.1007/BF01413708. S2CID   1184174.
  9. Iqbal, Hafsa; Campo, Damian; Baydoun, Mohamad; Marcenaro, Lucio; Martin, David; Regazzoni, Carlo (2019). "Clustering Optimization for Abnormality Detection in Semi-Autonomous Systems". 1st International Workshop on Multimodal Understanding and Learning for Embodied Applications. pp. 33–41. doi: 10.1145/3347450.3357657 . ISBN   978-1-4503-6918-3.
  10. Ancona, Fabio; Rovetta, Stefano; Zunino, Rodolfo (1996). "A parallel approach to plastic neural gas". Proceedings of International Conference on Neural Networks (ICNN'96). Vol. 1. pp. 126–130. doi:10.1109/ICNN.1996.548878. ISBN   0-7803-3210-5. S2CID   61686854.
  11. Ancona, Fabio; Rovetta, Stefano; Zunino, Rodolfo (1997). "Hardware implementation of the neural gas". Proceedings of International Conference on Neural Networks (ICNN'97). Vol. 2. pp. 991–994. doi:10.1109/ICNN.1997.616161. ISBN   0-7803-4122-8. S2CID   62480597.

Further reading