Attractor network

Last updated

An attractor network is a type of recurrent dynamical network, that evolves toward a stable pattern over time. Nodes in the attractor network converge toward a pattern that may either be fixed-point (a single state), cyclic (with regularly recurring states), chaotic (locally but not globally unstable) or random (stochastic). [1] Attractor networks have largely been used in computational neuroscience to model neuronal processes such as associative memory [2] and motor behavior, as well as in biologically inspired methods of machine learning.

Contents

An attractor network contains a set of n nodes, which can be represented as vectors in a d-dimensional space where n>d. Over time, the network state tends toward one of a set of predefined states on a d-manifold; these are the attractors.

Overview

In attractor networks, an attractor (or attracting set) is a closed subset of states A toward which the system of nodes evolves. A stationary attractor is a state or sets of states where the global dynamics of the network stabilize. Cyclic attractors evolve the network toward a set of states in a limit cycle, which is repeatedly traversed. Chaotic attractors are non-repeating bounded attractors that are continuously traversed.

The network state space is the set of all possible node states. The attractor space is the set of nodes on the attractor. Attractor networks are initialized based on the input pattern. The dimensionality of the input pattern may differ from the dimensionality of the network nodes. The trajectory of the network consists of the set of states along the evolution path as the network converges toward the attractor state. The basin of attraction is the set of states that results in movement towards a certain attractor. [1]

Types

Various types of attractors may be used to model different types of network dynamics. While fixed-point attractor networks are the most common (originating from Hopfield networks [3] ), other types of networks are also examined.

Fixed point attractors

The fixed point attractor naturally follows from the Hopfield network. Conventionally, fixed points in this model represent encoded memories. These models have been used to explain associative memory, classification, and pattern completion. Hopfield nets contain an underlying energy function [4] that allow the network to asymptotically approach a stationary state. One class of point attractor network is initialized with an input, after which the input is removed and the network moves toward a stable state. Another class of attractor network features predefined weights that are probed by different types of input. If this stable state is different during and after the input, it serves as a model of associative memory. However, if the states during and after input do not differ, the network can be used for pattern completion.

Other stationary attractors

Line attractors and plane attractors are used in the study of oculomotor control. These line attractors, or neural integrators, describe eye position in response to stimuli. Ring attractors have been used to model rodent head direction.

Cyclic attractors

Cyclic attractors are instrumental in modelling central pattern generators, neurons that govern oscillatory activity in animals such as chewing, walking, and breathing.

Chaotic attractors

Chaotic attractors (also called strange attractors) have been hypothesized to reflect patterns in odor recognition. While chaotic attractors have the benefit of more quickly converging upon limit cycles, there is yet no experimental evidence to support this theory. [5]

Continuous attractors

Neighboring stable states (fix points) of continuous attractors (also called continuous attractor neural networks) code for neighboring values of a continuous variable such as head direction or actual position in space.

Ring attractors

A subtype of continuous attractors with a particular topology of the neurons (ring for 1-dimensional and torus or twisted torus for 2-dimensional networks). The observed activity of grid cells is successfully explained by assuming the presence of ring attractors in the medial entorhinal cortex. [6] Recently, it has been proposed that similar ring attractors are present in the lateral portion of the entorhinal cortex and their role extends to registering new episodic memories. [7]

Implementations

Attractor networks have mainly been implemented as memory models using fixed-point attractors. However, they have been largely impractical for computational purposes because of difficulties in designing the attractor landscape and network wiring, resulting in spurious attractors and poorly conditioned basins of attraction. Furthermore, training on attractor networks is generally computationally expensive, compared to other methods such as k-nearest neighbor classifiers. [8] However, their role in general understanding of different biological functions, such as, locomotor function, memory, decision-making, to name a few, makes them more attractive as biologically realistic models.

Hopfield networks

Hopfield attractor networks are an early implementation of attractor networks with associative memory. These recurrent networks are initialized by the input, and tend toward a fixed-point attractor. The update function in discrete time is , where is a vector of nodes in the network and is a symmetric matrix describing their connectivity. The continuous time update is .

Bidirectional networks are similar to Hopfield networks, with the special case that the matrix is a block matrix. [4]

Localist attractor networks

Zemel and Mozer (2001) [8] proposed a method to reduce the number of spurious attractors that arise from the encoding of multiple attractors by each connection in the network. Localist attractor networks encode knowledge locally by implementing an expectation–maximization algorithm on a mixture-of-gaussians representing the attractors, to minimize the free energy in the network and converge only the most relevant attractor. This results in the following update equations:

  1. Determine the activity of attractors:
  2. Determine the next state of the network:
  3. Determine the attractor width through network:

( denotes basin strength, denotes the center of the basin. denotes input to the net, is a un-normalized gaussian distribution centered in and of standard deviation equals to .)

The network is then re-observed, and the above steps repeat until convergence. The model also reflects two biologically relevant concepts. The change in models stimulus priming by allowing quicker convergence toward a recently visited attractor. Furthermore, the summed activity of attractors allows a gang effect that causes two nearby attractors to mutually reinforce the other's basin.

Reconsolidation attractor networks

Siegelmann (2008) [9] generalized the localist attractor network model to include the tuning of attractors themselves. This algorithm uses the EM method above, with the following modifications: (1) early termination of the algorithm when the attractor's activity is most distributed, or when high entropy suggests a need for additional memories, and (2) the ability to update the attractors themselves: , where is the step size parameter of the change of . This model reflects memory reconsolidation in animals, and shows some of the same dynamics as those found in memory experiments.

Further developments in attractor networks, such as kernel-based attractor networks, [10] have improved the computational feasibility of attractor networks as a learning algorithm, while maintaining the high-level flexibility to perform pattern completion on complex compositional structures.

Related Research Articles

Unsupervised learning is a method in machine learning where, in contrast to supervised learning, algorithms learn patterns exclusively from unlabeled data. Within such an approach, a machine learning model tries to find any similarities, differences, patterns, and structure in data by itself. No prior human intervention is needed.

Hebbian theory is a neuropsychological theory claiming that an increase in synaptic efficacy arises from a presynaptic cell's repeated and persistent stimulation of a postsynaptic cell. It is an attempt to explain synaptic plasticity, the adaptation of brain neurons during the learning process. It was introduced by Donald Hebb in his 1949 book The Organization of Behavior. The theory is also called Hebb's rule, Hebb's postulate, and cell assembly theory. Hebb states it as follows:

Let us assume that the persistence or repetition of a reverberatory activity tends to induce lasting cellular changes that add to its stability. ... When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.

<span class="mw-page-title-main">Boltzmann machine</span> Type of stochastic recurrent neural network

A Boltzmann machine, named after Ludwig Boltzmann is a stochastic spin-glass model with an external field, i.e., a Sherrington–Kirkpatrick model, that is a stochastic Ising model. It is a statistical physics technique applied in the context of cognitive science. It is also classified as a Markov random field.

A Hopfield network is a spin glass system used to model neural networks, based on Ernst Ising's work with Wilhelm Lenz on the Ising model of magnetic materials. Hopfield networks were first described with respect to recurrent neural networks by Shun'ichi Amari in 1972 and with respect to biological neural networks by William Little in 1974, and were popularised by John Hopfield in 1982. Hopfield networks serve as content-addressable ("associative") memory systems with binary threshold nodes, or with continuous variables. Hopfield networks also provide a model for understanding human memory.

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to the uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that cannot be unrolled.

<span class="mw-page-title-main">Feedforward neural network</span> One of two broad types of artificial neural network

A feedforward neural network (FNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. Its flow is uni-directional, meaning that the information in the model flows in only one direction—forward—from the input nodes, through the hidden nodes and to the output nodes, without any cycles or loops, in contrast to recurrent neural networks, which have a bi-directional flow. Modern feedforward networks are trained using the backpropagation method and are colloquially referred to as the "vanilla" neural networks.

<span class="mw-page-title-main">Quantum neural network</span> Quantum Mechanics in Neural Networks

Quantum neural networks are computational neural network models which are based on the principles of quantum mechanics. The first ideas on quantum neural computation were published independently in 1995 by Subhash Kak and Ron Chrisley, engaging with the theory of quantum mind, which posits that quantum effects play a role in cognitive function. However, typical research in quantum neural networks involves combining classical artificial neural network models with the advantages of quantum information in order to develop more efficient algorithms. One important motivation for these investigations is the difficulty to train classical neural networks, especially in big data applications. The hope is that features of quantum computing such as quantum parallelism or the effects of interference and entanglement can be used as resources. Since the technological implementation of a quantum computer is still in a premature stage, such quantum neural network models are mostly theoretical proposals that await their full implementation in physical experiments.

The softmax function, also known as softargmax or normalized exponential function, converts a vector of K real numbers into a probability distribution of K possible outcomes. It is a generalization of the logistic function to multiple dimensions, and used in multinomial logistic regression. The softmax function is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes.

Oja's learning rule, or simply Oja's rule, named after Finnish computer scientist Erkki Oja, is a model of how neurons in the brain or in artificial neural networks change connection strength, or learn, over time. It is a modification of the standard Hebb's Rule that, through multiplicative normalization, solves all stability problems and generates an algorithm for principal components analysis. This is a computational form of an effect which is believed to happen in biological neurons.

<span class="mw-page-title-main">Long short-term memory</span> Artificial recurrent neural network architecture used in deep learning

Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at dealing with the vanishing gradient problem present in traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps, thus "long short-term memory". It is applicable to classification, processing and predicting data based on time series, such as in handwriting, speech recognition, machine translation, speech activity detection, robot control, video games, and healthcare.

Bidirectional associative memory (BAM) is a type of recurrent neural network. BAM was introduced by Bart Kosko in 1988. There are two types of associative memory, auto-associative and hetero-associative. BAM is hetero-associative, meaning given a pattern it can return another pattern which is potentially of a different size. It is similar to the Hopfield network in that they are both forms of associative memory. However, Hopfield nets return patterns of the same size.

Artificial neural networks are combinations of multiple simple mathematical functions that implement more complicated functions from (typically) real-valued vectors to real-valued vectors. The spaces of multivariate functions that can be implemented by a network are determined by the structure of the network, the set of simple functions, and its multiplicative parameters. A great deal of theoretical work has gone into characterizing these function spaces.

There are many types of artificial neural networks (ANN).

Extreme learning machines are feedforward neural networks for classification, regression, clustering, sparse approximation, compression and feature learning with a single layer or multiple layers of hidden nodes, where the parameters of hidden nodes need to be tuned. These hidden nodes can be randomly assigned and never updated, or can be inherited from their ancestors without being changed. In most cases, the output weights of hidden nodes are usually learned in a single step, which essentially amounts to learning a linear model.

Fusion adaptive resonance theory (fusion ART) is a generalization of self-organizing neural networks known as the original Adaptive Resonance Theory models for learning recognition categories across multiple pattern channels. There is a separate stream of work on fusion ARTMAP, that extends fuzzy ARTMAP consisting of two fuzzy ART modules connected by an inter-ART map field to an extended architecture consisting of multiple ART modules.

<span class="mw-page-title-main">Transformer (deep learning architecture)</span> Machine learning algorithm used for natural-language processing

A transformer is a deep learning architecture developed by Google and based on the multi-head attention mechanism, proposed in a 2017 paper "Attention Is All You Need". Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism allowing the signal for key tokens to be amplified and less important tokens to be diminished. The transformer paper, published in 2017, is based on the softmax-based attention mechanism proposed by Bahdanau et. al. in 2014 for machine translation, and the Fast Weight Controller, similar to a transformer, proposed in 1992.

A Neural Network Gaussian Process (NNGP) is a Gaussian process (GP) obtained as the limit of a certain type of sequence of neural networks. Specifically, a wide variety of network architectures converges to a GP in the infinitely wide limit, in the sense of distribution. The concept constitutes an intensional definition, i.e., a NNGP is just a GP, but distinguished by how it is obtained.

The spike response model (SRM) is a spiking neuron model in which spikes are generated by either a deterministic or a stochastic threshold process. In the SRM, the membrane voltage V is described as a linear sum of the postsynaptic potentials (PSPs) caused by spike arrivals to which the effects of refractoriness and adaptation are added. The threshold is either fixed or dynamic. In the latter case it increases after each spike. The SRM is flexible enough to account for a variety of neuronal firing pattern in response to step current input. The SRM has also been used in the theory of computation to quantify the capacity of spiking neural networks; and in the neurosciences to predict the subthreshold voltage and the firing times of cortical neurons during stimulation with a time-dependent current stimulation. The name Spike Response Model points to the property that the two important filters and of the model can be interpreted as the response of the membrane potential to an incoming spike (response kernel , the PSP) and to an outgoing spike (response kernel , also called refractory kernel). The SRM has been formulated in continuous time and in discrete time. The SRM can be viewed as a generalized linear model (GLM) or as an (integrated version of) a generalized integrate-and-fire model with adaptation.

Modern Hopfield networks are generalizations of the classical Hopfield networks that break the linear scaling relationship between the number of input features and the number of stored memories. This is achieved by introducing stronger non-linearities leading to super-linear memory storage capacity as a function of the number of feature neurons. The network still requires a sufficient number of hidden neurons.

Neural operators are a class of deep learning architectures designed to learn maps between infinite-dimensional function spaces. Neural operators represent an extension of traditional artificial neural networks, marking a departure from the typical focus on learning mappings between finite-dimensional Euclidean spaces or finite sets. Neural operators directly learn operators between function spaces; they can receive input functions, and the output function can be evaluated at any discretization.

References

  1. 1 2 Amit, D. J. (1989). Modeling brain function: The world of attractor neural networks. New York, NY: Cambridge University Press.
  2. Poucet, B. & Save, E. (2005). "Attractors in Memory". Science. 308 (5723): 799–800. doi:10.1126/science.1112555. PMID   15879197. S2CID   9681032.
  3. Hopfield, J. J. (1982). "Neural networks and physical systems with emergent collective computational abilities". Proceedings of the National Academy of Sciences. 79 (8): 2554–2558. Bibcode:1982PNAS...79.2554H. doi: 10.1073/pnas.79.8.2554 . PMC   346238 . PMID   6953413.
  4. 1 2 John Hopfield (ed.). "Hopfield network". Scholarpedia .
  5. Chris Eliasmith (ed.). "Attractor network". Scholarpedia .
  6. McNaughton BL, Battaglia FP, Jensen O, Moser EI, Moser MB (August 2006). "Path integration and the neural basis of the "cognitive map"". Nat. Rev. Neurosci. 7 (8): 663–678. doi:10.1038/nrn1932. PMID   16858394. S2CID   16928213.
  7. Kovács KA (September 2020). "Episodic Memories: How do the Hippocampus and the Entorhinal Ring Attractors Cooperate to Create Them?". Frontiers in Systems Neuroscience. 14: 68. doi: 10.3389/fnsys.2020.559186 . PMC   7511719 . PMID   33013334.
  8. 1 2 Zemel, R. & Mozer, M. (2001). "Localist attractor networks". Neural Computation. 13 (5): 1045–1064. doi:10.1162/08997660151134325. PMID   11359644. S2CID   2934449.
  9. Siegelmann, H. T. (2008). "Analog-symbolic memory that tracks via reconsolidation". Physica D. 237 (9): 1207–1214. Bibcode:2008PhyD..237.1207S. doi:10.1016/j.physd.2008.03.038.
  10. Nowicki, D.; Siegelmann, H.T. (2010). "Flexible Kernel Memory". PLOS ONE. 5 (6): e10955. Bibcode:2010PLoSO...510955N. doi: 10.1371/journal.pone.0010955 . PMC   2883999 . PMID   20552013.