Fusion adaptive resonance theory

Last updated

Fusion adaptive resonance theory (fusion ART) [1] [2] is a generalization of self-organizing neural networks known as the original Adaptive Resonance Theory [3] models for learning recognition categories across multiple pattern channels. There is a separate stream of work on fusion ARTMAP, [4] [5] that extends fuzzy ARTMAP consisting of two fuzzy ART modules connected by an inter-ART map field to an extended architecture consisting of multiple ART modules.

Contents

Fusion ART unifies a number of neural model designs and supports a myriad of learning paradigms, notably unsupervised learning, supervised learning, reinforcement learning, multimodal learning, and sequence learning. In addition, various extensions have been developed for domain knowledge integration, [6] memory representation, [7] [8] and modelling of high level cognition.

Overview

Fusion ART is a natural extension of the original adaptive resonance theory (ART) [3] [9] models developed by Stephen Grossberg and Gail A. Carpenter from a single pattern field to multiple pattern channels. Whereas the original ART models perform unsupervised learning of recognition nodes in response to incoming input patterns, fusion ART learns multi-channel mappings simultaneously across multi-modal pattern channels in an online and incremental manner.

The learning model

Fusion ART employs a multi-channel architecture (as shown below), comprising a category field connected to a fixed number of (K) pattern channels or input fields through bidirectional conditionable pathways. The model unifies a number of network designs, most notably Adaptive Resonance Theory (ART), Adaptive Resonance Associative Map (ARAM) [10] and Fusion Architecture for Learning and COgNition (FALCON), [11] developed over the past decades for a wide range of functions and applications.

Fusion ART Architecture.jpg

Given a set of multimodal patterns, each presented at a pattern channel, the fusion ART pattern encoding cycle comprises five key stages, namely code activation, code competition, activity readout, template matching, and template learning, as described below.

Types of fusion ART

The network dynamics described above can be used to support numerous learning operations. We show how fusion ART can be used for a variety of traditionally distinct learning tasks in the subsequent sections.

Original ART models

With a single pattern channel, the fusion ART architecture reduces to the original ART model. Using a selected vigilance value ρ, an ART model learns a set of recognition nodes in response to an incoming stream of input patterns in a continuous manner. Each recognition node in the field learns to encode a template pattern representing the key characteristics of a set of patterns. ART has been widely used in the context of unsupervised learning for discovering pattern groupings.

Adaptive resonance associative map

By synchronizing pattern coding across multiple pattern channels, fusion ART learns to encode associative mappings across distinct pattern spaces. A specific instance of fusion ART with two pattern channels is known as adaptive resonance associative map (ARAM), that learns multi-dimensional supervised mappings from one pattern space to another pattern space. An ARAM system consists of an input field , an output field , and a category field . Given a set of feature vectors presented at with their corresponding class vectors presented at , ARAM learns a predictive model (encoded by the recognition nodes in ) that associates combinations of key features to their respective classes.

Fuzzy ARAM, based on fuzzy ART operations, has been successfully applied to numerous machine learning tasks, including personal profiling, [12] document classification, [13] personalized content management, [14] and DNA gene expression analysis. [15] In many benchmark experiments, ARAM has demonstrated predictive performance superior to those of many state-of-the-art machine learning systems, including C4.5, Backpropagation Neural Network, K Nearest Neighbour, and Support Vector Machines.

Fusion ART with domain knowledge

During learning, fusion ART formulates recognition categories of input patterns across multiple channels. The knowledge that fusion ART discovers during learning, is compatible with symbolic rule-based representation. Specifically, the recognition categories learned by the category nodes are compatible with a class of IF-THEN rules that maps a set of input attributes (antecedents) in one pattern channel to a disjoint set of output attributes (consequents) in another channel. Due to this compatibility, at any point of the incremental learning process, instructions in the form of IF-THEN rules can be readily translated into the recognition categories of a fusion ART system. The rules are conjunctive in the sense that the attributes in the IF clause and in the THEN clause have an AND relationship. Augmenting a fusion ART network with domain knowledge through explicit instructions serves to improve learning efficiency and predictive accuracy.

The fusion ART rule insertion strategy is similar to that used in Cascade ARTMAP, a generalization of ARTMAP that performs domain knowledge insertion, refinement, and extraction. [16] For direct knowledge insertion, the IF and THEN clauses of each instruction (rule) is translated into a pair of vectors A and B respectively. The vector pairs derived are then used as training patterns for inserting into a fusion ART network. During rule insertion, the vigilance parameters are set to 1s to ensure that each distinct rule is encoded by one category node.

Fusion architecture for learning and cognition (FALCON)

Reinforcement learning is a paradigm wherein an autonomous system learns to adjust its behaviour based on reinforcement signals received from the environment. An instance of fusion ART, known as FALCON (fusion architecture for learning and cognition), learns mappings simultaneously across multi-modal input patterns, involving states, actions, and rewards, in an online and incremental manner. Compared with other ART-based reinforcement learning systems, FALCON presents a truly integrated solution in the sense that there is no implementation of a separate reinforcement learning module or Q-value table. Using competitive coding as the underlying principle of computation, the network dynamics encompasses several learning paradigms, including unsupervised learning, supervised learning, as well as reinforcement learning.

FALCON employs a three-channel architecture, comprising a category field and three pattern fields, namely a sensory field for representing current states, a motor field for representing actions, and a feedback field for representing reward values. A class of FALCON networks, known as TD-FALCON [11] ,incorporates Temporal Difference (TD) methods to estimate and learn value function Q(s,a), that indicates the goodness to take a certain action a in a given state s.

The general sense-act-learn algorithm for TD-FALCON is summarized. Given the current state s, the FALCON network is used to predict the value of performing each available action a in the action set A based on the corresponding state vector and action vector . The value functions are then processed by an action selection strategy (also known as policy) to select an action. Upon receiving a feedback (if any) from the environment after performing the action, a TD formula is used to compute a new estimate of the Q-value for performing the chosen action in the current state. The new Q-value is then used as the teaching signal (represented as reward vector R) for FALCON to learn the association of the current state and the chosen action to the estimated value.

Related Research Articles

<span class="mw-page-title-main">Artificial neural network</span> Computational model used in machine learning, based on connected, hierarchical functions

Artificial neural networks are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains.

<span class="mw-page-title-main">Self-organizing map</span> Machine learning technique useful for dimensionality reduction

A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional representation of a higher dimensional data set while preserving the topological structure of the data. For example, a data set with variables measured in observations could be represented as clusters of observations with similar values for the variables. These clusters then could be visualized as a two-dimensional "map" such that observations in proximal clusters have more similar values than observations in distal clusters. This can make high-dimensional data easier to visualize and analyze.

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

Unsupervised learning is a method in machine learning where, in contrast to supervised learning, algorithms learn patterns exclusively from unlabeled data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a concise representation of its world and then generate imaginative content from it.

Instantaneously trained neural networks are feedforward artificial neural networks that create a new hidden neuron node for each novel training sample. The weights to this hidden neuron separate out not only this training sample but others that are near it, thus providing generalization. This separation is done using the nearest hyperplane that can be written down instantaneously. In the two most important implementations the neighborhood of generalization either varies with the training sample or remains constant. These networks use unary coding for an effective representation of the data sets.

A Hopfield network is a form of recurrent artificial neural network and a type of spin glass system popularised by John Hopfield in 1982 as described by Shun'ichi Amari in 1972 and by Little in 1974 based on Ernst Ising's work with Wilhelm Lenz on the Ising model. Hopfield networks serve as content-addressable ("associative") memory systems with binary threshold nodes, or with continuous variables. Hopfield networks also provide a model for understanding human memory.

In machine learning, backpropagation is a gradient estimation method used to train neural network models. The gradient estimate is used by the optimization algorithm to compute the network parameter updates.

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to the uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.

Neural gas is an artificial neural network, inspired by the self-organizing map and introduced in 1991 by Thomas Martinetz and Klaus Schulten. The neural gas is a simple algorithm for finding optimal data representations based on feature vectors. The algorithm was coined "neural gas" because of the dynamics of the feature vectors during the adaptation process, which distribute themselves like a gas within the data space. It is applied where data compression or vector quantization is an issue, for example speech recognition, image processing or pattern recognition. As a robustly converging alternative to the k-means clustering it is also used for cluster analysis.

<span class="mw-page-title-main">Stephen Grossberg</span> American scientist (born 1939)

Stephen Grossberg is a cognitive scientist, theoretical and computational psychologist, neuroscientist, mathematician, biomedical engineer, and neuromorphic technologist. He is the Wang Professor of Cognitive and Neural Systems and a Professor Emeritus of Mathematics & Statistics, Psychological & Brain Sciences, and Biomedical Engineering at Boston University.

Adaptive resonance theory (ART) is a theory developed by Stephen Grossberg and Gail Carpenter on aspects of how the brain processes information. It describes a number of neural network models which use supervised and unsupervised learning methods, and address problems such as pattern recognition and prediction.

<span class="mw-page-title-main">Long short-term memory</span> Artificial recurrent neural network architecture used in deep learning

Long short-term memory (LSTM) network is a recurrent neural network (RNN), aimed to deal with the vanishing gradient problem present in traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps, thus "long short-term memory". It is applicable to classification, processing and predicting data based on time series, such as in handwriting, speech recognition, machine translation, speech activity detection, robot control, video games, and healthcare.

Gail Alexandra Carpenter is an American cognitive scientist, neuroscientist and mathematician. She is now a "Professor Emerita of Mathematics and Statistics, Boston University." She had also been a Professor of Cognitive and Neural Systems at Boston University, and the director of the Department of Cognitive and Neural Systems (CNS) Technology Lab at Boston University.

Competitive learning is a form of unsupervised learning in artificial neural networks, in which nodes compete for the right to respond to a subset of the input data. A variant of Hebbian learning, competitive learning works by increasing the specialization of each node in the network. It is well suited to finding clusters within data.

There are many types of artificial neural networks (ANN).

<span class="mw-page-title-main">Growing self-organizing map</span> Growing variant of a self-organizing map

A growing self-organizing map (GSOM) is a growing variant of a self-organizing map (SOM). The GSOM was developed to address the issue of identifying a suitable map size in the SOM. It starts with a minimal number of nodes and grows new nodes on the boundary based on a heuristic. By using the value called Spread Factor (SF), the data analyst has the ability to control the growth of the GSOM.

An artificial neural network's learning rule or learning process is a method, mathematical logic or algorithm which improves the network's performance and/or training time. Usually, this rule is applied repeatedly over the network. It is done by updating the weights and bias levels of a network when a network is simulated in a specific data environment. A learning rule may accept existing conditions of the network and will compare the expected result and actual result of the network to give new and improved values for weights and bias. Depending on the complexity of actual model being simulated, the learning rule of the network can be as simple as an XOR gate or mean squared error, or as complex as the result of a system of differential equations.

Convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns feature engineering by itself via filters optimization. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

Dilution and dropout are regularization techniques for reducing overfitting in artificial neural networks by preventing complex co-adaptations on training data. They are an efficient way of performing model averaging with neural networks. Dilution refers to thinning weights, while dropout refers to randomly "dropping out", or omitting, units during the training process of a neural network. Both trigger the same type of regularization.

In computer science, incremental learning is a method of machine learning in which input data is continuously used to extend the existing model's knowledge i.e. to further train the model. It represents a dynamic technique of supervised learning and unsupervised learning that can be applied when training data becomes available gradually over time or its size is out of system memory limits. Algorithms that can facilitate incremental learning are known as incremental machine learning algorithms.

References

  1. Tan, A.-H., Carpenter, G. A. & Grossberg, S. (2007) Intelligence Through Interaction: Towards A Unified Theory for Learning . In proceedings, D. Liu et al. (Eds.): International Symposium on Neural Networks (ISNN'07), LNCS 4491, Part I, pp. 1098-1107.
  2. Tan, A.-H.; Subagdja, B.; Wang, D.; Meng, L. (2019). "Self-organizing neural networks for universal learning and multimodal memory encoding". Neural Networks. 120: 58–73. doi:10.1016/j.neunet.2019.08.020. PMID   31537437. S2CID   202703163.
  3. 1 2 Carpenter, G.A. & Grossberg, S. (2003), Adaptive Resonance Theory Archived 2006-05-19 at the Wayback Machine , In Michael A. Arbib (Ed.), The Handbook of Brain Theory and Neural Networks, Second Edition (pp. 87-90). Cambridge, MA: MIT Press
  4. Y.R. Asfour, G.A. Carpenter, S. Grossberg, and G.W. Lesher. (1993) Fusion ARTMAP: an adaptive fuzzy network for multi-channel classification. In Proceedings of the Third International Conference on Industrial Fuzzy Control and Intelligent Systems (IFIS).
  5. R.F. Harrison and J.M. Borges. (1995) Fusion ARTMAP: Clarification, Implementation and Developments. Research Report No. 589, Department of Automatic Control and Systems Engineering, The University of Sheffield.
  6. Teng, T.-H.; Tan, A.-H.; Zurada, J. M. (2015). "Self-Organizing Neural Networks Integrating Domain Knowledge and Reinforcement Learning". IEEE Transactions on Neural Networks and Learning Systems. 26 (5): 889–902. doi:10.1109/TNNLS.2014.2327636. ISSN   2162-237X. PMID   25881365. S2CID   4664197.
  7. Wang, W.; Subagdja, B.; Tan, A.-H.; Starzyk, J. A. (2012). "Neural Modeling of Episodic Memory: Encoding, Retrieval, and Forgetting". IEEE Transactions on Neural Networks and Learning Systems. 23 (10): 1574–1586. doi:10.1109/TNNLS.2012.2208477. ISSN   2162-237X. PMID   24808003. S2CID   1337309.
  8. Wang, W.; Tan, A.-H.; Teow, L.-N. (2017). "Semantic Memory Modeling and Memory Interaction in Learning Agents". IEEE Transactions on Systems, Man, and Cybernetics: Systems. 47 (11): 2882–2895. doi:10.1109/TSMC.2016.2531683. ISSN   2168-2216. S2CID   12768875.
  9. Grossberg, S. (1987), Competitive learning: From interactive activation to adaptive resonance, Cognitive Science (Publication), 11, 23-63
  10. Tan, A.-H. (1995). "Adaptive Resonance Associative Map" (PDF). Neural Networks. 8 (3): 437–446. doi:10.1016/0893-6080(94)00092-z.
  11. 1 2 Tan, A.-H., Lu, N.; Xiao, D (2008). "Integrating Temporal Difference Methods and Self-Organizing Neural Networks for Reinforcement Learning with Delayed Evaluative Feedback" (PDF). IEEE Transactions on Neural Networks. 9 (2): 230–244.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  12. Tan, A.-H.; Soon, H.-S. (2000). "Predictive Adaptive Resonance Theory and Knowledge Discovery in Databases". Proceedings, Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'00), LNAI. 1805: 173–176.
  13. He, J.; Tan, A.-H.; Tan, C.-L. (2003). "On Machine Learning Methods for Chinese Document Classification" (PDF). Applied Intelligence. 18 (3): 311–322. doi:10.1023/A:1023202221875. S2CID   2033181.
  14. Tan, A.-H.; Ong, H.-L.; Pan, H.; Ng, J.; Li, Q.-X. (2004). "Towards Personalized Web Intelligence" (PDF). Knowledge and Information Systems. 6 (5): 595–616. doi:10.1007/s10115-003-0130-9. S2CID   14699173.
  15. Tan, A.-H.; Pan (2005). "Predictive Neural Networks for Gene Expression Data Analysis" (PDF). Neural Networks. 18 (3): 297–306. doi:10.1016/j.neunet.2005.01.003. PMID   15896577. S2CID   5058995.
  16. Tan, A.-H. (1997). "Cascade ARTMAP: Integrating Neural Computation and Symbolic Knowledge Processing" (PDF). IEEE Transactions on Neural Networks. 8 (2): 237–250. doi:10.1109/72.557661. PMID   18255628.