Adaptive resonance theory

Last updated

Adaptive resonance theory (ART) is a theory developed by Stephen Grossberg and Gail Carpenter on aspects of how the brain processes information. It describes a number of artificial neural network models which use supervised and unsupervised learning methods, and address problems such as pattern recognition and prediction.

Contents

The primary intuition behind the ART model is that object identification and recognition generally occur as a result of the interaction of 'top-down' observer expectations with 'bottom-up' sensory information. The model postulates that 'top-down' expectations take the form of a memory template or prototype that is then compared with the actual features of an object as detected by the senses. This comparison gives rise to a measure of category belongingness. As long as this difference between sensation and expectation does not exceed a set threshold called the 'vigilance parameter', the sensed object will be considered a member of the expected class. The system thus offers a solution to the 'plasticity/stability' problem, i.e. the problem of acquiring new knowledge without disrupting existing knowledge that is also called incremental learning.

Learning model

Basic ART structure ART.png
Basic ART structure

The basic ART system is an unsupervised learning model. It typically consists of a comparison field and a recognition field composed of neurons, a vigilance parameter (threshold of recognition), and a reset module.

Training

There are two basic methods of training ART-based neural networks: slow and fast. In the slow learning method, the degree of training of the recognition neuron's weights towards the input vector is calculated to continuous values with differential equations and is thus dependent on the length of time the input vector is presented. With fast learning, algebraic equations are used to calculate degree of weight adjustments to be made, and binary values are used. While fast learning is effective and efficient for a variety of tasks, the slow learning method is more biologically plausible and can be used with continuous-time networks (i.e. when the input vector can vary continuously).

Types

ART 1 [1] [2] is the simplest variety of ART networks, accepting only binary inputs. ART 2 [3] extends network capabilities to support continuous inputs. ART 2-A [4] is a streamlined form of ART-2 with a drastically accelerated runtime, and with qualitative results being only rarely inferior to the full ART-2 implementation. ART 3 [5] builds on ART-2 by simulating rudimentary neurotransmitter regulation of synaptic activity by incorporating simulated sodium (Na+) and calcium (Ca2+) ion concentrations into the system's equations, which results in a more physiologically realistic means of partially inhibiting categories that trigger mismatch resets.

ARTMAP overview ARTMAP.png
ARTMAP overview

ARTMAP [6] also known as Predictive ART, combines two slightly modified ART-1 or ART-2 units into a supervised learning structure where the first unit takes the input data and the second unit takes the correct output data, then used to make the minimum possible adjustment of the vigilance parameter in the first unit in order to make the correct classification.

Fuzzy ART [7] implements fuzzy logic into ART's pattern recognition, thus enhancing generalizability. An optional (and very useful) feature of fuzzy ART is complement coding, a means of incorporating the absence of features into pattern classifications, which goes a long way towards preventing inefficient and unnecessary category proliferation. The applied similarity measures are based on the L1 norm. Fuzzy ART is known to be very sensitive to noise.

Fuzzy ARTMAP [8] is merely ARTMAP using fuzzy ART units, resulting in a corresponding increase in efficacy.

Simplified Fuzzy ARTMAP (SFAM) [9] constitutes a strongly simplified variant of fuzzy ARTMAP dedicated to classification tasks.

Gaussian ART [10] and Gaussian ARTMAP [10] use Gaussian activation functions and computations based on probability theory. Therefore, they have some similarity with Gaussian mixture models. In comparison to fuzzy ART and fuzzy ARTMAP, they are less sensitive to noise. But the stability of learnt representations is reduced which may lead to category proliferation in open-ended learning tasks.

Fusion ART and related networks [11] [12] [13] extend ART and ARTMAP to multiple pattern channels. They support several learning paradigms, including unsupervised learning, supervised learning and reinforcement learning.

TopoART [14] combines fuzzy ART with topology learning networks such as the growing neural gas. Furthermore, it adds a noise reduction mechanism. There are several derived neural networks which extend TopoART to further learning paradigms.

Hypersphere ART [15] and Hypersphere ARTMAP [15] are closely related to fuzzy ART and fuzzy ARTMAP, respectively. But as they use a different type of category representation (namely hyperspheres), they do not require their input to be normalised to the interval [0, 1]. They apply similarity measures based on the L2 norm.

LAPART [16] The Laterally Primed Adaptive Resonance Theory (LAPART) neural networks couple two Fuzzy ART algorithms to create a mechanism for making predictions based on learned associations. The coupling of the two Fuzzy ARTs has a unique stability that allows the system to converge rapidly towards a clear solution. Additionally, it can perform logical inference and supervised learning similar to fuzzy ARTMAP.

Criticism

It has been noted that results of Fuzzy ART and ART 1 (i.e., the learnt categories) depend critically upon the order in which the training data are processed. The effect can be reduced to some extent by using a slower learning rate, but is present regardless of the size of the input data set. Hence Fuzzy ART and ART 1 estimates do not possess the statistical property of consistency. [17] This problem can be considered as a side effect of the respective mechanisms ensuring stable learning in both networks.

More advanced ART networks such as TopoART and Hypersphere TopoART that summarise categories to clusters may solve this problem as the shapes of the clusters do not depend on the order of creation of the associated categories. (cf. Fig. 3(g, h) and Fig. 4 of [18] )

Related Research Articles

<span class="mw-page-title-main">Supervised learning</span> A paradigm in machine learning

Supervised learning (SL) is a paradigm in machine learning where input objects and a desired output value train a model. The training data is processed, building a function that maps new data on expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. This statistical quality of an algorithm is measured through the so-called generalization error.

<span class="mw-page-title-main">Neural network (machine learning)</span> Computational model used in machine learning, based on connected, hierarchical functions

Neural networks are a branch of machine learning models that are built using the principles of neuronal organization found in the biological neural networks constituting animal brains.

Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent pattern. PR has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power.

Unsupervised learning is a method in machine learning where, in contrast to supervised learning, algorithms learn patterns exclusively from unlabeled data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a concise representation of its world and then generate imaginative content from it.

Neural gas is an artificial neural network, inspired by the self-organizing map and introduced in 1991 by Thomas Martinetz and Klaus Schulten. The neural gas is a simple algorithm for finding optimal data representations based on feature vectors. The algorithm was coined "neural gas" because of the dynamics of the feature vectors during the adaptation process, which distribute themselves like a gas within the data space. It is applied where data compression or vector quantization is an issue, for example speech recognition, image processing or pattern recognition. As a robustly converging alternative to the k-means clustering it is also used for cluster analysis.

<span class="mw-page-title-main">Stephen Grossberg</span> American scientist (born 1939)

Stephen Grossberg is a cognitive scientist, theoretical and computational psychologist, neuroscientist, mathematician, biomedical engineer, and neuromorphic technologist. He is the Wang Professor of Cognitive and Neural Systems and a Professor Emeritus of Mathematics & Statistics, Psychological & Brain Sciences, and Biomedical Engineering at Boston University.

<span class="mw-page-title-main">Neuro-fuzzy</span>

In the field of artificial intelligence, the designation neuro-fuzzy refers to combinations of artificial neural networks and fuzzy logic.

For holographic data storage, holographic associative memory (HAM) is an information storage and retrieval system based on the principles of holography. Holograms are made by using two beams of light, called a "reference beam" and an "object beam". They produce a pattern on the film that contains them both. Afterwards, by reproducing the reference beam, the hologram recreates a visual image of the original object. In theory, one could use the object beam to do the same thing: reproduce the original reference beam. In HAM, the pieces of information act like the two beams. Each can be used to retrieve the other from the pattern. It can be thought of as an artificial neural network which mimics the way the brain uses information. The information is presented in abstract form by a complex vector which may be expressed directly by a waveform possessing frequency and magnitude. This waveform is analogous to electrochemical impulses believed to transmit information between biological neuron cells.

Computational neurogenetic modeling (CNGM) is concerned with the study and development of dynamic neuronal models for modeling brain functions with respect to genes and dynamic interactions between genes. These include neural network models and their integration with gene network models. This area brings together knowledge from various scientific disciplines, such as computer and information science, neuroscience and cognitive science, genetics and molecular biology, as well as engineering.

Hierarchical temporal memory (HTM) is a biologically constrained machine intelligence technology developed by Numenta. Originally described in the 2004 book On Intelligence by Jeff Hawkins with Sandra Blakeslee, HTM is primarily used today for anomaly detection in streaming data. The technology is based on neuroscience and the physiology and interaction of pyramidal neurons in the neocortex of the mammalian brain.

The generalized Hebbian algorithm (GHA), also known in the literature as Sanger's rule, is a linear feedforward neural network for unsupervised learning with applications primarily in principal components analysis. First defined in 1989, it is similar to Oja's rule in its formulation and stability, except it can be applied to networks with multiple outputs. The name originates because of the similarity between the algorithm and a hypothesis made by Donald Hebb about the way in which synaptic strengths in the brain are modified in response to experience, i.e., that changes are proportional to the correlation between the firing of pre- and post-synaptic neurons.

Neural modeling field (NMF) is a mathematical framework for machine learning which combines ideas from neural networks, fuzzy logic, and model based recognition. It has also been referred to as modeling fields, modeling fields theory (MFT), Maximum likelihood artificial neural networks (MLANS). This framework has been developed by Leonid Perlovsky at the AFRL. NMF is interpreted as a mathematical description of the mind's mechanisms, including concepts, emotions, instincts, imagination, thinking, and understanding. NMF is a multi-level, hetero-hierarchical system. At each level in NMF there are concept-models encapsulating the knowledge; they generate so-called top-down signals, interacting with input, bottom-up signals. These interactions are governed by dynamic equations, which drive concept-model learning, adaptation, and formation of new concept-models for better correspondence to the input, bottom-up signals.

Gail Alexandra Carpenter is an American cognitive scientist, neuroscientist and mathematician. She is now a "Professor Emerita of Mathematics and Statistics, Boston University." She had also been a Professor of Cognitive and Neural Systems at Boston University, and the director of the Department of Cognitive and Neural Systems (CNS) Technology Lab at Boston University.

Competitive learning is a form of unsupervised learning in artificial neural networks, in which nodes compete for the right to respond to a subset of the input data. A variant of Hebbian learning, competitive learning works by increasing the specialization of each node in the network. It is well suited to finding clusters within data.

There are many types of artificial neural networks (ANN).

An artificial neural network's learning rule or learning process is a method, mathematical logic or algorithm which improves the network's performance and/or training time. Usually, this rule is applied repeatedly over the network. It is done by updating the weights and bias levels of a network when a network is simulated in a specific data environment. A learning rule may accept existing conditions of the network and will compare the expected result and actual result of the network to give new and improved values for weights and bias. Depending on the complexity of actual model being simulated, the learning rule of the network can be as simple as an XOR gate or mean squared error, or as complex as the result of a system of differential equations.

<span class="mw-page-title-main">Feature learning</span> Set of learning techniques in machine learning

In machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task.

Fusion adaptive resonance theory (fusion ART) is a generalization of self-organizing neural networks known as the original Adaptive Resonance Theory models for learning recognition categories across multiple pattern channels. There is a separate stream of work on fusion ARTMAP, that extends fuzzy ARTMAP consisting of two fuzzy ART modules connected by an inter-ART map field to an extended architecture consisting of multiple ART modules.

In computer science, incremental learning is a method of machine learning in which input data is continuously used to extend the existing model's knowledge i.e. to further train the model. It represents a dynamic technique of supervised learning and unsupervised learning that can be applied when training data becomes available gradually over time or its size is out of system memory limits. Algorithms that can facilitate incremental learning are known as incremental machine learning algorithms.

The following outline is provided as an overview of and topical guide to machine learning:

References

  1. Carpenter, G.A. & Grossberg, S. (2003), Adaptive Resonance Theory Archived 2006-05-19 at the Wayback Machine , In Michael A. Arbib (Ed.), The Handbook of Brain Theory and Neural Networks, Second Edition (pp. 87-90). Cambridge, MA: MIT Press
  2. Grossberg, S. (1987), Competitive learning: From interactive activation to adaptive resonance Archived 2006-09-07 at the Wayback Machine , Cognitive Science (journal), 11, 23-63
  3. Carpenter, G.A. & Grossberg, S. (1987), ART 2: Self-organization of stable category recognition codes for analog input patterns Archived 2006-09-04 at the Wayback Machine , Applied Optics, 26(23), 4919-4930
  4. Carpenter, G.A., Grossberg, S., & Rosen, D.B. (1991a), ART 2-A: An adaptive resonance algorithm for rapid category learning and recognition Archived 2006-05-19 at the Wayback Machine , Neural Networks , 4, 493-504
  5. Carpenter, G.A. & Grossberg, S. (1990), ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures Archived 2006-09-06 at the Wayback Machine , Neural Networks , 3, 129-152
  6. Carpenter, G.A., Grossberg, S., & Reynolds, J.H. (1991), ARTMAP: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network Archived 2006-05-19 at the Wayback Machine , Neural Networks , 4, 565-588
  7. Carpenter, G.A., Grossberg, S., & Rosen, D.B. (1991b), Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system Archived 2006-05-19 at the Wayback Machine , Neural Networks , 4, 759-771
  8. Carpenter, G.A., Grossberg, S., Markuzon, N., Reynolds, J.H., & Rosen, D.B. (1992), Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps Archived 2006-05-19 at the Wayback Machine , IEEE Transactions on Neural Networks, 3, 698-713
  9. Mohammad-Taghi Vakil-Baghmisheh and Nikola Pavešić. (2003) A Fast Simplified Fuzzy ARTMAP Network, Neural Processing Letters, 17(3):273–316
  10. 1 2 James R. Williamson. (1996), Gaussian ARTMAP: A Neural Network for Fast Incremental Learning of Noisy Multidimensional Maps, Neural Networks, 9(5):881-897
  11. Y.R. Asfour, G.A. Carpenter, S. Grossberg, and G.W. Lesher. (1993) Fusion ARTMAP: an adaptive fuzzy network for multi-channel classification. In: Proceedings of the Third International Conference on Industrial Fuzzy Control and Intelligent Systems (IFIS).
  12. Tan, A.-H.; Carpenter, G. A.; Grossberg, S. (2007). "Intelligence Through Interaction: Towards a Unified Theory for Learning". In Liu, D.; Fei, S.; Hou, Z.-G.; Zhang, H.; Sun, C. (eds.). Advances in Neural Networks – ISNN 2007. Lecture Notes in Computer Science. Vol. 4491. Berlin, Heidelberg: Springer. pp. 1094–1103. doi:10.1007/978-3-540-72383-7_128. ISBN   978-3-540-72383-7.
  13. Tan, A.-H.; Subagdja, B.; Wang, D.; Meng, L. (2019). "Self-organizing neural networks for universal learning and multimodal memory encoding". Neural Networks. 120: 58–73. doi:10.1016/j.neunet.2019.08.020. PMID   31537437. S2CID   202703163.
  14. Marko Tscherepanow. (2010) TopoART: A Topology Learning Hierarchical ART Network, In: Proceedings of the International Conference on Artificial Neural Networks (ICANN), Part III, LNCS 6354, 157-167
  15. 1 2 Georgios C. Anagnostopoulos and Michael Georgiopoulos. (2000), Hypersphere ART and ARTMAP for Unsupervised and Supervised Incremental Learning, In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), vol. 6, 59-64
  16. Sandia National Laboratories (2017) Lapart-python documentation
  17. Sarle, Warren S. (1995), Why Statisticians Should Not FART Archived July 20, 2011, at the Wayback Machine
  18. Marko Tscherepanow. (2012) Incremental On-line Clustering with a Topology-Learning Hierarchical ART Neural Network Using Hyperspherical Categories, In: Poster and Industry Proceedings of the Industrial Conference on Data Mining (ICDM), 22–34

Wasserman, Philip D. (1989), Neural computing: theory and practice, New York: Van Nostrand Reinhold, ISBN   0-442-20743-3