Competitive learning

Last updated May 07, 2024

Competitive learning is a form of unsupervised learning in artificial neural networks, in which nodes compete for the right to respond to a subset of the input data.^[1]^[2] A variant of Hebbian learning, competitive learning works by increasing the specialization of each node in the network. It is well suited to finding clusters within data.

Principle

There are three basic elements to a competitive learning rule:^[3]^[4]

A set of neurons that are all the same except for some randomly distributed synaptic weights, and which therefore respond differently to a given set of input patterns
A limit imposed on the "strength" of each neuron
A mechanism that permits the neurons to compete for the right to respond to a given subset of inputs, such that only one output neuron (or only one neuron per group), is active (i.e. "on") at a time. The neuron that wins the competition is called a "winner-take-all" neuron.

Accordingly, the individual neurons of the network learn to specialize on ensembles of similar patterns and in so doing become 'feature detectors' for different classes of input patterns.

The fact that competitive networks recode sets of correlated inputs to one of a few output neurons essentially removes the redundancy in representation which is an essential part of processing in biological sensory systems.^[5]^[6]

Architecture and implementation

Competitive Learning is usually implemented with Neural Networks that contain a hidden layer which is commonly known as “competitive layer”.^[7] Every competitive neuron is described by a vector of weights ${\mathbf {w} }_{i}=\left({w_{i1},..,w_{id}}\right)^{T},i=1,..,M$ and calculates the similarity measure between the input data ${\mathbf {x} }^{n}=\left({x_{n1},..,x_{nd}}\right)^{T}\in \mathbb {R} ^{d}$ and the weight vector ${\mathbf {w} }_{i}$ .

For every input vector, the competitive neurons “compete” with each other to see which one of them is the most similar to that particular input vector. The winner neuron m sets its output $o_{m}=1$ and all the other competitive neurons set their output $o_{i}=0,i=1,..,M,i\neq m$ .

Usually, in order to measure similarity the inverse of the Euclidean distance is used: $\left\|{{\mathbf {x} }-{\mathbf {w} }_{i}}\right\|$ between the input vector ${\mathbf {x} }^{n}$ and the weight vector ${\mathbf {w} }_{i}$ .

Example algorithm

Here is a simple competitive learning algorithm to find three clusters within some input data.

1. (Set-up.) Let a set of sensors all feed into three different nodes, so that every node is connected to every sensor. Let the weights that each node gives to its sensors be set randomly between 0.0 and 1.0. Let the output of each node be the sum of all its sensors, each sensor's signal strength being multiplied by its weight.

2. When the net is shown an input, the node with the highest output is deemed the winner. The input is classified as being within the cluster corresponding to that node.

3. The winner updates each of its weights, moving weight from the connections that gave it weaker signals to the connections that gave it stronger signals.

Thus, as more data are received, each node converges on the centre of the cluster that it has come to represent and activates more strongly for inputs in this cluster and more weakly for inputs in other clusters.

Related Research Articles

A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional representation of a higher-dimensional data set while preserving the topological structure of the data. For example, a data set with $variables measured in observations could be represented as clusters of observations with similar values for the variables. These clusters then could be visualized as a two-dimensional "map" such that observations in proximal clusters have more similar values than observations in distal clusters. This can make high-dimensional data easier to visualize and analyze.$

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

Hebbian theory is a neuropsychological theory claiming that an increase in synaptic efficacy arises from a presynaptic cell's repeated and persistent stimulation of a postsynaptic cell. It is an attempt to explain synaptic plasticity, the adaptation of brain neurons during the learning process. It was introduced by Donald Hebb in his 1949 book The Organization of Behavior. The theory is also called Hebb's rule, Hebb's postulate, and cell assembly theory. Hebb states it as follows:

Let us assume that the persistence or repetition of a reverberatory activity tends to induce lasting cellular changes that add to its stability. ... When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.

Instantaneously trained neural networks are feedforward artificial neural networks that create a new hidden neuron node for each novel training sample. The weights to this hidden neuron separate out not only this training sample but others that are near it, thus providing generalization. This separation is done using the nearest hyperplane that can be written down instantaneously. In the two most important implementations the neighborhood of generalization either varies with the training sample or remains constant. These networks use unary coding for an effective representation of the data sets.

In machine learning, backpropagation is a gradient estimation method used to train neural network models. The gradient estimate is used by the optimization algorithm to compute the network parameter updates.

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to the uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.

A feedforward neural network (FNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. Its flow is uni-directional, meaning that the information in the model flows in only one direction—forward—from the input nodes, through the hidden nodes and to the output nodes, without any cycles or loops, in contrast to recurrent neural networks, which have a bi-directional flow. Modern feedforward networks are trained using the backpropagation method and are colloquially referred to as the "vanilla" neural networks.

A multilayer perceptron (MLP) is a name for a modern feedforward artificial neural network, consisting of fully connected neurons with a nonlinear kind of activation function, organized in at least three layers, notable for being able to distinguish data that is not linearly separable. It is a misnomer because the original perceptron used a Heaviside step function, instead of a nonlinear kind of activation function.

In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pattern analysis is to find and study general types of relations in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite dimensional but only requires a finite dimensional matrix from user-input according to the Representer theorem. Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel processing.

The softmax function, also known as softargmax or normalized exponential function, converts a vector of $K$ real numbers into a probability distribution of $K$ possible outcomes. It is a generalization of the logistic function to multiple dimensions, and used in multinomial logistic regression. The softmax function is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes.

Oja's learning rule, or simply Oja's rule, named after Finnish computer scientist Erkki Oja, is a model of how neurons in the brain or in artificial neural networks change connection strength, or learn, over time. It is a modification of the standard Hebb's Rule that, through multiplicative normalization, solves all stability problems and generates an algorithm for principal components analysis. This is a computational form of an effect which is believed to happen in biological neurons.

In the field of mathematical modeling, a radial basis function network is an artificial neural network that uses radial basis functions as activation functions. The output of the network is a linear combination of radial basis functions of the inputs and neuron parameters. Radial basis function networks have many uses, including function approximation, time series prediction, classification, and system control. They were first formulated in a 1988 paper by Broomhead and Lowe, both researchers at the Royal Signals and Radar Establishment.

The generalized Hebbian algorithm (GHA), also known in the literature as Sanger's rule, is a linear feedforward neural network for unsupervised learning with applications primarily in principal components analysis. First defined in 1989, it is similar to Oja's rule in its formulation and stability, except it can be applied to networks with multiple outputs. The name originates because of the similarity between the algorithm and a hypothesis made by Donald Hebb about the way in which synaptic strengths in the brain are modified in response to experience, i.e., that changes are proportional to the correlation between the firing of pre- and post-synaptic neurons.

There are many types of artificial neural networks (ANN).

Extension neural network is a pattern recognition method found by M. H. Wang and C. P. Hung in 2003 to classify instances of data sets. Extension neural network is composed of artificial neural network and extension theory concepts. It uses the fast and adaptive learning capability of neural network and correlation estimation property of extension theory by calculating extension distance.
ENN was used in:

An artificial neural network's learning rule or learning process is a method, mathematical logic or algorithm which improves the network's performance and/or training time. Usually, this rule is applied repeatedly over the network. It is done by updating the weights and bias levels of a network when a network is simulated in a specific data environment. A learning rule may accept existing conditions of the network and will compare the expected result and actual result of the network to give new and improved values for weights and bias. Depending on the complexity of actual model being simulated, the learning rule of the network can be as simple as an XOR gate or mean squared error, or as complex as the result of a system of differential equations.

Dilution and dropout are regularization techniques for reducing overfitting in artificial neural networks by preventing complex co-adaptations on training data. They are an efficient way of performing model averaging with neural networks. Dilution refers to thinning weights, while dropout refers to randomly "dropping out", or omitting, units during the training process of a neural network. Both trigger the same type of regularization.

Extreme learning machines are feedforward neural networks for classification, regression, clustering, sparse approximation, compression and feature learning with a single layer or multiple layers of hidden nodes, where the parameters of hidden nodes need to be tuned. These hidden nodes can be randomly assigned and never updated, or can be inherited from their ancestors without being changed. In most cases, the output weights of hidden nodes are usually learned in a single step, which essentially amounts to learning a linear model.

A capsule neural network (CapsNet) is a machine learning system that is a type of artificial neural network (ANN) that can be used to better model hierarchical relationships. The approach is an attempt to more closely mimic biological neural organization.

A graph neural network (GNN) belongs to a class of artificial neural networks for processing data that can be represented as graphs.

References

↑ Rumelhart, David; David Zipser; James L. McClelland; et al. (1986). Parallel Distributed Processing, Vol. 1 . MIT Press. pp. 151–193.
↑ Grossberg, Stephen (1987-01-01). "Competitive learning: From interactive activation to adaptive resonance" (PDF). Cognitive Science. 11 (1): 23–63. doi: 10.1016/S0364-0213(87)80025-3 . ISSN 0364-0213.
↑ Rumelhart, David E., and David Zipser. "Feature discovery by competitive learning." Cognitive science 9.1 (1985): 75-112.
↑ Haykin, Simon, "Neural Network. A comprehensive foundation." Neural Networks 2.2004 (2004).
↑ Barlow, Horace B. "Unsupervised learning." Neural computation 1.3 (1989): 295-311.
↑ Edmund T.. Rolls, and Gustavo Deco. Computational neuroscience of vision. Oxford: Oxford university press, 2002.
↑ Salatas, John (24 August 2011). "Implementation of Competitive Learning Networks for WEKA". ICT Research Blog. Retrieved 28 January 2012.

Further information and software

Draft Report "Some Competitive Learning Methods" (contains descriptions of several related algos)
DemoGNG - Java simulator for competitive learning methods

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Rumelhart, David; David Zipser; James L. McClelland; et al. (1986). Parallel Distributed Processing, Vol. 1 . MIT Press. pp. 151–193.

[2] Grossberg, Stephen (1987-01-01). "Competitive learning: From interactive activation to adaptive resonance" (PDF). Cognitive Science. 11 (1): 23–63. doi: 10.1016/S0364-0213(87)80025-3 . ISSN 0364-0213.

[3] Rumelhart, David E., and David Zipser. "Feature discovery by competitive learning." Cognitive science 9.1 (1985): 75-112.

[4] Haykin, Simon, "Neural Network. A comprehensive foundation." Neural Networks 2.2004 (2004).

[5] Barlow, Horace B. "Unsupervised learning." Neural computation 1.3 (1989): 295-311.

[6] Edmund T.. Rolls, and Gustavo Deco. Computational neuroscience of vision. Oxford: Oxford university press, 2002.

[7] Salatas, John (24 August 2011). "Implementation of Competitive Learning Networks for WEKA". ICT Research Blog. Retrieved 28 January 2012.

[1]

[2]

[3]

[4]

[5]

[6]

[7]