Sparse distributed memory

Last updated

Sparse distributed memory (SDM) is a mathematical model of human long-term memory introduced by Pentti Kanerva in 1988 while he was at NASA Ames Research Center. [1]

Contents

This memory exhibits behaviors, both in theory and in experiment, that resemble those previously unapproached by machines – e.g., rapid recognition of faces or odors, discovery of new connections between seemingly unrelated ideas, etc. Sparse distributed memory is used for storing and retrieving large amounts ( bits) of information without focusing on the accuracy but on similarity of information. [2] There are some recent applications in robot navigation [3] and experience-based robot manipulation. [4]

General principle

It is a generalized random-access memory (RAM) for long (e.g., 1,000 bit) binary words. These words serve as both addresses to and data for the memory. The main attribute of the memory is sensitivity to similarity. This means that a word can be read back not only by giving the original write address but also by giving one close to it, as measured by the number of mismatched bits (i.e., the Hamming distance between memory addresses). [1]

SDM implements transformation from logical space to physical space using distributed data representation and storage, similarly to encoding processes in human memory. [5] A value corresponding to a logical address is stored into many physical addresses. This way of storing is robust and not deterministic. A memory cell is not addressed directly. If input data (logical addresses) are partially damaged at all, we can still get correct output data. [6]

The theory of the memory is mathematically complete [1] and has been verified by computer simulation. It arose from the observation that the distances between points of a high-dimensional space resemble the proximity relations between concepts in human memory. The theory is also practical in that memories based on it can be implemented with conventional random-access memory elements. [7]

Definition

Human memory has a tendency to congregate memories based on similarities between them (although they may not be related), such as "firetrucks are red and apples are red". [8] Sparse distributed memory is a mathematical representation of human memory, and uses high-dimensional space to help model the large amounts of memory that mimics that of the human neural network. [9] [10] An important property of such high dimensional spaces is that two randomly chosen vectors are relatively far away from each other, meaning that they are uncorrelated. [11] SDM can be considered a realization of locality-sensitive hashing.

The underlying idea behind a SDM is the mapping of a huge binary memory onto a smaller set of physical locations, so-called hard locations. As a general guideline, those hard locations should be uniformly distributed in the virtual space, to mimic the existence of the larger virtual space as accurately as possible. Every datum is stored distributed by a set of hard locations, and retrieved by averaging those locations. Therefore, recall may not be perfect, accuracy depending on the saturation of the memory.

Kanerva's proposal is based on four basic ideas: [12]

  1. The boolean space , or points in dimensions, exhibits properties which are similar to humans' intuitive notions of relationships between the concepts. This means that it makes sense to store data as points of the mentioned space where each memory item is stored as an n-bit vector.
  2. Neurons with n inputs can be used as address decoders of a random-access memory
  3. Unifying principle: data stored into the memory can be used as addresses to the same memory. Distance between two points is a measure of similarity between two memory items. The closer the points, the more similar the stored vectors.
  4. Time can be traced in the memory as a function of where the data are stored, if the data are organized as sequences of events.

The binary space N

The SDM works with n-dimensional vectors with binary components. Depending on the context, the vectors are called points, patterns, addresses, words, memory items, data, or events. This section is mostly about the properties of the vector space N = . Let n be number of dimensions of the space. The number of points, or possible memory items, is then . We will denote this number by N and will use N and to stand also for the space itself. [6]

Concepts Related to the space N: [6]

Properties of the space N: [1] [6]

The space N can be represented by the vertices of the unit cube in n-dimensional Euclidean space. The vertices lie on the surface of an n-dimensional sphere with (Euclidean-metric) radius . This gives rise to the sphere analogy. We will call a space spherical if

  1. any point x has a unique opposite 'x,
  2. the entire space is between any point x and its opposite 'x, and
  3. all points are "equal" (meaning that for any two points x and y there is a distance preserving automorphism of the space that maps x to y, so that from any of its points the space "looks" the same).

The surface of a sphere (in Euclidean 3d-space) clearly is spherical. According to definition, N is also spherical, since y ⊕ x ⊕ (…) is an automorphism that maps x to y. Because N is spherical, it is helpful to think of it as the surface of a sphere with circumference 2n. All points of N are equally qualified as points of origin, and a point and its complement are like two poles at distance n from each other, with the entire space in between. The points halfway between the poles and perpendicular to them are like the equator.

Distribution of the space N

The number of points that are exactly d bits from an arbitrary point x (say, from the point 0) is the number of ways to choose d coordinates from a total of n coordinates, and is therefore given by the binomial coefficient:

The distribution of N thus is the binomial distribution with parameters n and p, where p = 1/2. The mean of the binomial distribution is n/2, and the variance is n/4. This distribution function will be denoted by N(d). The normal distribution F with mean n/2 and standard deviation is a good approximation to it: N(d) = Pr{d(x, y) ≤ d} ≅ F{(d − n / 2)/ }

Tendency to orthogonality

An outstanding property of N is that most of it lies at approximately the mean (indifference) distance n/2 from a point (and its complement). In other words, most of the space is nearly orthogonal to any given point, and the larger n is, the more pronounced is this effect.

As neural network

The SDM may be regarded either as a content-addressable extension of a classical random-access memory (RAM) or as a special type of three layer feedforward neural network. The main SDM alterations to the RAM are: [13]

Neuron model

An idealized description of neuron is as follows: a neuron has a cell body with two kinds of branches: dendrites and an axon . It receives input signals from other neurons via dendrites, integrates (sums) them and generates its own (electric) output signal which is sent to outside neurons via axon. The points of electric contact between neurons are called synapses .

When a neuron generates signal it is firing and after firing it must recover before it fires again. The relative importance of a synapse to the firing of neuron is called synaptic weight (or input coefficient). There are two kinds of synapses: excitatory that trigger neuron to fire and inhibitory that hinder firing. The neuron is either excitatory or inhibitory according to the kinds of synapses its axon makes. [14]

A neuron fires when the sum of inputs exceed a specific threshold. The higher the threshold the more important it is that excitatory synapses have input while inhibitory ones do not. [15] Whether a recovered neuron actually fires depends on whether it received sufficient excitatory input (beyond the threshold) and not too much of inhibitory input within a certain period.

The formal model of neuron makes further simplifying assumptions. [16] An n-input neuron is modeled by a linear threshold function as follows :

For where n is the number of inputs, let be the output at time t: , and let be the i-th input at time t: . Let be the weight of the i-th input and let be the threshold.

The weighted sum of the inputs at time t is defined by

The neuron output at time t is then defined as a boolean function:

Where Ft=1 means that the neuron fires at time t and Ft=0 that it doesn't, i.e. in order for neuron to fire the weighted sum must reach or exceed the threshold . Excitatory inputs increase the sum and inhibitory inputs decrease it.

Neuron as address-decoder

Kanerva's key thesis [1] is that certain neurons could have their input coefficients and thresholds fixed over the entire life of an organism and used as address decoders where n-tuple of input coefficients (the pattern to which neurons respond most readily) determines the n-bit memory address, and the threshold controls the size of the region of similar address patterns to which the neuron responds.

This mechanism is complementary to adjustable synapses or adjustable weights in a neural network (perceptron convergence learning), as this fixed accessing mechanism would be a permanent frame of reference which allows to select the synapses in which the information is stored and from which it is retrieved under given set of circumstances. Furthermore, an encoding of the present circumstance would serve as an address.

The address a of a neuron with input coefficients w where is defined as an n-bit input pattern that maximizes the weighted sum. The maximum occurs when the inhibitory inputs are zeros and the excitatory inputs are ones. The i-th bit of address is:

(assuming weights are non-zero)

The maximum weighted sum is then the sum of all positive coefficients:

And the minimum weighted sum would correspond to a point opposite the neuron address a`:

When the threshold c is in range the output of the neuron is 0 for some addresses (input patterns) and 1 for others. If the threshold is above S the output is always 0, if it's below s the output is always 1. So by a proper choice of the threshold a neuron responds only to just one address. When the threshold is S (the maximum for the weighted sum) the neuron responds only to its own address and acts like an address decoder of a conventional random-access memory.

Memory location

SDM is designed to cope with address patterns that span an enormous address space (order of ). SDM assumes that the address patterns actually describing physical situations of interest are sparsely scattered throughout the input space. It is impossible to reserve a separate physical location corresponding to each possible input; SDM implements only a limited number of physical or hard locations. The physical location is called a memory (or hard) location. [7]

Every hard location has associated with it two items:

In SDM a word could be stored in memory by writing it in a free storage location and at the same time providing the location with the appropriate address decoder. A neuron as an address decoder would select a location based on similarity of the location's address to the retrieval cue. Unlike conventional Turing machines SDM is taking advantage of parallel computing by the address decoders. The mere accessing the memory is regarded as computing, the amount of which increases with memory size. [1]

Address pattern

An N-bit vector used in writing to and reading from the memory. The address pattern is a coded description of an environmental state. (e.g. N = 256.)

Data pattern

An M-bit vector that is the object of the writing and reading operations. Like the address pattern, it is a coded description of an environmental state. (e.g. M = 256.)

Writing

Writing is the operation of storing a data pattern into the memory using a particular address pattern. During a write, the input to the memory consists of an address pattern and a data pattern. The address pattern is used to select hard memory locations whose hard addresses are within a certain cutoff distance from the address pattern. The data pattern is stored into each of the selected locations.

Reading

Reading is the operation of retrieving a data pattern from the memory using a particular address pattern. During a read, an address pattern is used to select a certain number of hard memory locations (just like during a write). The contents of the selected locations are bitwise summed and thresholded to derive an M-bit data pattern. This serves as the output read from the memory.

Pointer chains

All of the items are linked in a single list (or array) of pointers to memory locations, and are stored in RAM. Each address in an array points to an individual line in the memory. That line is then returned if it is similar to other lines. Neurons are utilized as address decoders and encoders, similar to the way neurons work in the brain, and return items from the array that match or are similar.

Critical distance

Kanerva's model of memory has a concept of a critical point: prior to this point, a previously stored item can be easily retrieved; but beyond this point an item cannot be retrieved. Kanerva has methodically calculated this point for a particular set of (fixed) parameters. The corresponding critical distance of a Sparse Distributed Memory can be approximately evaluated minimizing the following equation with the restriction and . The proof can be found in, [17] [18]

Where:

Probabilistic interpretation

An associative memory system using sparse, distributed representations can be reinterpreted as an importance sampler, a Monte Carlo method of approximating Bayesian inference. [19] The SDM can be considered a Monte Carlo approximation to a multidimensional conditional probability integral. The SDM will produce acceptable responses from a training set when this approximation is valid, that is, when the training set contains sufficient data to provide good estimates of the underlying joint probabilities and there are enough Monte Carlo samples to obtain an accurate estimate of the integral. [20]

Biological plausibility

Sparse coding may be a general strategy of neural systems to augment memory capacity. To adapt to their environments, animals must learn which stimuli are associated with rewards or punishments and distinguish these reinforced stimuli from similar but irrelevant ones. Such task requires implementing stimulus-specific associative memories in which only a few neurons out of a population respond to any given stimulus and each neuron responds to only a few stimuli out of all possible stimuli.

Theoretical work on SDM by Kanerva has suggested that sparse coding increases the capacity of associative memory by reducing overlap between representations. Experimentally, sparse representations of sensory information have been observed in many systems, including vision, [21] audition, [22] touch, [23] and olfaction. [24] However, despite the accumulating evidence for widespread sparse coding and theoretical arguments for its importance, a demonstration that sparse coding improves the stimulus-specificity of associative memory has been lacking until recently.

Some progress has been made in 2014 by Gero Miesenböck's lab at the University of Oxford analyzing Drosophila Olfactory system. [25] In Drosophila, sparse odor coding by the Kenyon cells of the mushroom body is thought to generate a large number of precisely addressable locations for the storage of odor-specific memories. Lin et al. [26] demonstrated that sparseness is controlled by a negative feedback circuit between Kenyon cells and the GABAergic anterior paired lateral (APL) neuron. Systematic activation and blockade of each leg of this feedback circuit show that Kenyon cells activate APL and APL inhibits Kenyon cells. Disrupting the Kenyon cell-APL feedback loop decreases the sparseness of Kenyon cell odor responses, increases inter-odor correlations, and prevents flies from learning to discriminate similar, but not dissimilar, odors. These results suggest that feedback inhibition suppresses Kenyon cell activity to maintain sparse, decorrelated odor coding and thus the odor-specificity of memories. A 2017 publication in Science [27] showed that fly olfactory circuit implements an improved version of binary locality sensitive hashing via sparse, random projections.

Applications

In applications of the memory, the words are patterns of features. Some features are produced by a sensory system, others control a motor system. There is a current pattern (of e.g. 1000 bits), which is the current contents of the system's focus. The sensors feed into the focus, the motors are driven from the focus, and the memory is accessed through the focus.

What goes on in the world  the system's "subjective" experience  is represented internally by a sequence of patterns in the focus. The memory stores this sequence and can recreate it later in the focus if addressed with a pattern similar to one encountered in the past. Thus, the memory learns to predict what is about to happen. Wide applications of the memory would be in systems that deal with real-world information in real time.

The applications include vision   detecting and identifying objects in a scene and anticipating subsequent scenes  robotics, signal detection and verification, and adaptive learning and control. On the theoretical side, the working of the memory may help us understand memory and learning in humans and animals. [7] [28]

SDM can be applied to the problem of finding the best match to a test word in a dataset of stored words. [1] [29] or, in other words, the Nearest neighbor search problem.

Consider a memory with N locations where . Let each location have the capacity for one n-bit word (e.g. N= 2100 100-bit words), and let the address decoding be done by N address decoder neurons. Set the threshold of each neuron x to its maximum weighted sum and use a common parameter d to adjust all thresholds when accessing the memory. The effective threshold of neuron x will be then which means that the location x is accessible every time the address x is within d bits of the address presented to memory (i.e. the address held by the address register). With we have a conventional random-access memory. Assume further that each location has a special location-occupied bit that can be accessed in the same way as the regular datum bits. Writing a word to a location sets this location-occupied bit. Assume that only occupied location can be read.

To file the data in memory, start by setting and issue a command to clear the location-occupied bit. This single operation marks all memory as unoccupied regardless of the values of the address register. Then set and write each word y of the data set with y itself as the address. Notice that each write operation affects only one location: the location y. Filing time is thus proportional to the number of words in the dataset.

Finding the best match for a test word z, involves placing z in the address register and finding the least distance d for which there is an occupied location. We can start the search by setting and incrementing d successively until an occupied location is found. This method gives average search times that are proportional to the number of address bits or slightly less than [1] because the nearest occupied location can be expected to be just under bits from z (with binary search on d this would be O(log(n)).

With 100-bit words 2100 locations would be needed, i.e. an enormously large memory. However if we construct the memory as we store the words of the dataset we need only one location (and one address decoder) for each word of the data set. None of the unoccupied locations need to be present. This represents the aspect of sparseness in SDM.

Speech recognition

SDM can be applied in transcribing speech, with the training consisting of "listening" to a large corpus of spoken language. Two hard problems with natural speech are how to detect word boundaries and how to adjust to different speakers. The memory should be able to handle both. First, it stores sequences of patterns as pointer chains. In training  in listening to speech  it will build a probabilistic structure with the highest incidence of branching at word boundaries. In transcribing speech, these branching points are detected and tend to break the stream into segments that correspond to words. Second, the memory's sensitivity to similarity is its mechanism for adjusting to different speakers  and to the variations in the voice of the same speaker. [7]

"Realizing forgetting"

Decay Functions
The exponential decay function Exponential decay mechanism.svg
The exponential decay function
The negated-translated sigmoid function Negated sigmoid function.png
The negated-translated sigmoid function

At the University of Memphis, Uma Ramamurthy, Sidney K. D'Mello, and Stan Franklin created a modified version of the sparse distributed memory system that represents "realizing forgetting." It uses a decay equation to better show interference in data. The sparse distributed memory system distributes each pattern into approximately one hundredth of the locations,[ clarification needed ] so interference can have detrimental results. [30]

Two possible examples of decay from this modified sparse distributed memory are presented

Exponential decay mechanism:

Negated-translated sigmoid decay mechanism:

In the exponential decay function, it approaches zero more quickly as x increases, and a is a constant (usually between 3-9) and c is a counter. For the negated-translated sigmoid function, the decay is similar to the exponential decay function when a is greater than 4. [30]

As the graph approaches 0, it represents how the memory is being forgotten using decay mechanisms.

Genetic sparse distributed memory

Ashraf Anwar, Stan Franklin, and Dipankar Dasgupta at The University of Memphis; proposed a model for SDM initialization using Genetic Algorithms and Genetic Programming (1999).

Genetic memory uses genetic algorithm and sparse distributed memory as a pseudo artificial neural network. It has been considered for use in creating artificial life. [31]

Statistical prediction

SDM has been applied to statistical prediction, the task of associating extremely large perceptual state vectors with future events. In conditions of near- or over- capacity, where the associative memory behavior of the model breaks down, the processing performed by the model can be interpreted as that of a statistical predictor and each data counter in an SDM can be viewed as an independent estimate of the conditional probability of a binary function f being equal to the activation set defined by the counter's memory location. [32]

Artificial general intelligence

Reinforcement learning

SDMs provide a linear, local function approximation scheme, designed to work when a very large/high-dimensional input (address) space has to be mapped into a much smaller physical memory. In general, local architectures, SDMs included, can be subject to the curse of dimensionality, as some target functions may require, in the worst case, an exponential number of local units to be approximated accurately across the entire input space. However, it is widely believed that most decision-making systems need high accuracy only around low-dimensional manifolds of the state space, or important state "highways". [37] The work in Ratitch et al. [38] combined the SDM memory model with the ideas from memory-based learning, which provides an approximator that can dynamically adapt its structure and resolution in order to locate regions of the state space that are "more interesting" [39] and allocate proportionally more memory resources to model them accurately.

Object indexing in computer vision

Dana H. Ballard's lab [40] demonstrated a general-purpose object indexing technique for computer vision that combines the virtues of principal component analysis with the favorable matching properties of high-dimensional spaces to achieve high precision recognition. The indexing algorithm uses an active vision system in conjunction with a modified form of SDM and provides a platform for learning the association between an object's appearance and its identity.

Extensions

Many extensions and improvements to SDM have been proposed, e.g.:

Implementation

See also

Related Research Articles

<span class="mw-page-title-main">Self-organizing map</span> Machine learning technique useful for dimensionality reduction

A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional representation of a higher dimensional data set while preserving the topological structure of the data. For example, a data set with variables measured in observations could be represented as clusters of observations with similar values for the variables. These clusters then could be visualized as a two-dimensional "map" such that observations in proximal clusters have more similar values than observations in distal clusters. This can make high-dimensional data easier to visualize and analyze.

<span class="mw-page-title-main">Perceptron</span> Algorithm for supervised learning of binary classifiers

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

An artificial neuron is a mathematical function conceived as a model of biological neurons in a neural network. Artificial neurons are the elementary units of artificial neural networks. The artificial neuron receives one or more inputs and sums them to produce an output. Usually, each input is separately weighted, and the sum is often added to a term known as a bias, before being passed through a non-linear function known as an activation function or transfer function. The transfer functions usually have a sigmoid shape, but they may also take the form of other non-linear functions, piecewise linear functions, or step functions. They are also often monotonically increasing, continuous, differentiable and bounded. Non-monotonic, unbounded and oscillating activation functions with multiple zeros that outperform sigmoidal and ReLU-like activation functions on many tasks have also been recently explored. The thresholding function has inspired building logic gates referred to as threshold logic; applicable to building logic circuits resembling brain processing. For example, new devices such as memristors have been extensively used to develop such logic in recent times.

Instantaneously trained neural networks are feedforward artificial neural networks that create a new hidden neuron node for each novel training sample. The weights to this hidden neuron separate out not only this training sample but others that are near it, thus providing generalization. This separation is done using the nearest hyperplane that can be written down instantaneously. In the two most important implementations the neighborhood of generalization either varies with the training sample or remains constant. These networks use unary coding for an effective representation of the data sets.

In computer science, learning vector quantization (LVQ) is a prototype-based supervised classification algorithm. LVQ is the supervised counterpart of vector quantization systems.

A Hopfield network is a form of recurrent artificial neural network and a type of spin glass system popularised by John Hopfield in 1982 as described by Shun'ichi Amari in 1972 and by Little in 1974 based on Ernst Ising's work with Wilhelm Lenz on the Ising model. Hopfield networks serve as content-addressable ("associative") memory systems with binary threshold nodes, or with continuous variables. Hopfield networks also provide a model for understanding human memory.

<span class="mw-page-title-main">Backpropagation</span> Optimization algorithm for artificial neural networks

As a machine-learning algorithm, backpropagation performs a backward pass to adjust a neural network model's parameters, aiming to minimize the mean squared error (MSE). In a multi-layered network, backpropagation uses the following steps:

  1. Propagate training data through the model from input to predicted output by computing the successive hidden layers' outputs and finally the final layer's output.
  2. Adjust the model weights to reduce the error relative to the weights.
    1. The error is typically the squared difference between prediction and target.
    2. For each weight, the slope or derivative of the error is found, and the weight adjusted by a negative multiple of this derivative, so as to go downslope toward the minimum-error configuration.
    3. This derivative is easy to calculate for final layer weights, and possible to calculate for one layer given the next layer's derivatives. Starting at the end, then, the derivatives are calculated layer by layer toward the beginning -- thus "backpropagation".
  3. Repeatedly update the weights until they converge or the model has undergone enough iterations.
<span class="mw-page-title-main">Feedforward neural network</span> One of two broad types of artificial neural network

A feedforward neural network (FNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. Its flow is uni-directional, meaning that the information in the model flows in only one direction—forward—from the input nodes, through the hidden nodes and to the output nodes, without any cycles or loops, in contrast to recurrent neural networks, which have a bi-directional flow. Modern feedforward networks are trained using the backpropagation method and are colloquially referred to as the "vanilla" neural networks.

Neural coding is a neuroscience field concerned with characterising the hypothetical relationship between the stimulus and the individual or ensemble neuronal responses and the relationship among the electrical activity of the neurons in the ensemble. Based on the theory that sensory and other information is represented in the brain by networks of neurons, it is thought that neurons can encode both digital and analog information.

<span class="mw-page-title-main">Autoencoder</span> Neural network that learns efficient data encoding in an unsupervised manner

An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data. An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for dimensionality reduction.

Oja's learning rule, or simply Oja's rule, named after Finnish computer scientist Erkki Oja, is a model of how neurons in the brain or in artificial neural networks change connection strength, or learn, over time. It is a modification of the standard Hebb's Rule that, through multiplicative normalization, solves all stability problems and generates an algorithm for principal components analysis. This is a computational form of an effect which is believed to happen in biological neurons.

Hierarchical temporal memory (HTM) is a biologically constrained machine intelligence technology developed by Numenta. Originally described in the 2004 book On Intelligence by Jeff Hawkins with Sandra Blakeslee, HTM is primarily used today for anomaly detection in streaming data. The technology is based on neuroscience and the physiology and interaction of pyramidal neurons in the neocortex of the mammalian brain.

<span class="mw-page-title-main">Biological neuron model</span> Mathematical descriptions of the properties of certain cells in the nervous system

Biological neuron models, also known as a spiking neuron models, are mathematical descriptions of neurons. In particular, these models describe how the voltage potential across the cell membrane changes over time. In an experimental setting, stimulating neurons with an electrical current generates an action potential, that propagates down the neuron's axon. This spike branches out to a large number of downstream neurons, where the signals terminate at synapses. As many as 85% of neurons in the neocortex, the outermost layer of the mammalian brain, consist of excitatory pyramidal neurons, and each pyramidal neuron receives tens of thousands of inputs from other neurons. Thus, spiking neurons are a major information processing unit of the nervous system.

Bidirectional associative memory (BAM) is a type of recurrent neural network. BAM was introduced by Bart Kosko in 1988. There are two types of associative memory, auto-associative and hetero-associative. BAM is hetero-associative, meaning given a pattern it can return another pattern which is potentially of a different size. It is similar to the Hopfield network in that they are both forms of associative memory. However, Hopfield nets return patterns of the same size.

Gather/scatter is a type of memory addressing that at once collects (gathers) from, or stores (scatters) data to, multiple, arbitrary indices. Examples of its use include sparse linear algebra operations, sorting algorithms, fast Fourier transforms, and some computational graph theory problems. It is the vector equivalent of register indirect addressing, with gather involving indexed reads, and scatter, indexed writes. Vector processors have hardware support for gather and scatter operations, as do many input/output systems, allowing large data sets to be transferred to main memory more rapidly.

Competitive learning is a form of unsupervised learning in artificial neural networks, in which nodes compete for the right to respond to a subset of the input data. A variant of Hebbian learning, competitive learning works by increasing the specialization of each node in the network. It is well suited to finding clusters within data.

There are many types of artificial neural networks (ANN).

An attractor network is a type of recurrent dynamical network, that evolves toward a stable pattern over time. Nodes in the attractor network converge toward a pattern that may either be fixed-point, cyclic, chaotic or random (stochastic). Attractor networks have largely been used in computational neuroscience to model neuronal processes such as associative memory and motor behavior, as well as in biologically inspired methods of machine learning. An attractor network contains a set of n nodes, which can be represented as vectors in a d-dimensional space where n>d. Over time, the network state tends toward one of a set of predefined states on a d-manifold; these are the attractors.

RAMnets is one of the oldest practical neurally inspired classification algorithms. The RAMnets is also known as a type of "n-tuple recognition method" or "weightless neural network".

The spike response model (SRM) is a spiking neuron model in which spikes are generated by either a deterministic or a stochastic threshold process. In the SRM, the membrane voltage V is described as a linear sum of the postsynaptic potentials (PSPs) caused by spike arrivals to which the effects of refractoriness and adaptation are added. The threshold is either fixed or dynamic. In the latter case it increases after each spike. The SRM is flexible enough to account for a variety of neuronal firing pattern in response to step current input. The SRM has also been used in the theory of computation to quantify the capacity of spiking neural networks; and in the neurosciences to predict the subthreshold voltage and the firing times of cortical neurons during stimulation with a time-dependent current stimulation. The name Spike Response Model points to the property that the two important filters and of the model can be interpreted as the response of the membrane potential to an incoming spike (response kernel , the PSP) and to an outgoing spike (response kernel , also called refractory kernel). The SRM has been formulated in continuous time and in discrete time. The SRM can be viewed as a generalized linear model (GLM) or as an (integrated version of) a generalized integrate-and-fire model with adaptation.

References

  1. 1 2 3 4 5 6 7 8 Kanerva, Pentti (1988). Sparse Distributed Memory. The MIT Press. ISBN   978-0-262-11132-4.
  2. Kanerva, Pentti (1988). Sparse Distributed Memory. The MIT Press. ISBN   978-0-262-11132-4.
  3. Mendes, Mateus; Crisostomo, Manuel; Coimbra, A. Paulo (2008). "Robot navigation using a sparse distributed memory". 2008 IEEE International Conference on Robotics and Automation. pp. 53–58. doi:10.1109/ROBOT.2008.4543186. ISBN   978-1-4244-1646-2. S2CID   10977460.
  4. Jockel, S.; Lindner, F.; Jianwei Zhang (2009). "Sparse distributed memory for experience-based robot manipulation". 2008 IEEE International Conference on Robotics and Biomimetics. pp. 1298–1303. doi:10.1109/ROBIO.2009.4913187. ISBN   978-1-4244-2678-2. S2CID   16650992.
  5. Rissman, Jesse; Wagner, Anthony D. (2012). "Distributed representations in memory: insights from functional brain imaging". Annual Review of Psychology. 63: 101–28. doi:10.1146/annurev-psych-120710-100344. PMC   4533899 . PMID   21943171.
  6. 1 2 3 4 Grebeníček, František. "Sparse Distributed Memory− Pattern Data Analysis. URL: http://www.fit.vutbr.cz/~grebenic/Publikace/mosis2000.pdf"
  7. 1 2 3 4 5 Flynn, Michael J., Pentti Kanerva, and Neil Bhadkamkar. "Sparse distributed memory prototype: principles and operation." (1989).
  8. C. George Boeree (2002). "General Psychology". Shippensburg University.
  9. Pentti Kanerva (1993). "Sparse Distributed Memory and Related Models". Pennsylvania State University: 50–76. CiteSeerX   10.1.1.2.8403 .{{cite journal}}: Cite journal requires |journal= (help)
  10. M. J. Flynn; P. Kanerva & N. Bhadkamkar (December 1989). "Sparse Distributed Memory: Principles and Operation" (PDF). Stanford University. Retrieved 1 November 2011.[ permanent dead link ]
  11. 1 2 Snaider, Javier, and Stan Franklin. "Integer sparse distributed memory Archived 2021-08-02 at the Wayback Machine ." Twenty-fifth international flairs conference. 2012.
  12. Mendes, Mateus Daniel Almeida. "Intelligent robot navigation using a sparse distributed memory." Phd thesis, (2010). URL: https://eg.sib.uc.pt/handle/10316/17781 Archived 2016-03-04 at the Wayback Machine
  13. Grebenıcek, František. Neural Nets as Associative Memories. Diss. Brno University of Technology, 2001. URL: http://www.vutium.vutbr.cz/tituly/pdf/ukazka/80-214-1914-8.pdf Archived 2016-03-04 at the Wayback Machine
  14. Kandel, Eric R., James H. Schwartz, and Thomas M. Jessell, eds. Principles of neural science. Vol. 4. New York: McGraw-Hill, 2000.
  15. Eccles, John G. "Under the Spell of the Synapse." The Neurosciences: Paths of Discovery, I. Birkhäuser Boston, 1992. 159-179.
  16. McCulloch, Warren S.; Pitts, Walter (1943). "A logical calculus of the ideas immanent in nervous activity". Bulletin of Mathematical Biophysics. 5 (4): 115–133. doi:10.1007/bf02478259.
  17. Brogliato, Marcelo Salhab (2012). Understanding Critical Distance in Sparse Distributed Memory (Thesis). hdl:10438/13095.
  18. Brogliato, Marcelo Salhab; Chada, Daniel de Magalhães; Linhares, Alexandre (2014). "Sparse Distributed Memory: understanding the speed and robustness of expert memory". Frontiers in Human Neuroscience. 8 (222): 222. doi: 10.3389/fnhum.2014.00222 . PMC   4009432 . PMID   24808842.
  19. Abbott, Joshua T., Jessica B. Hamrick, and Thomas L. Griffiths. "Approximating Bayesian inference with a sparse distributed memory system." Proceedings of the 35th annual conference of the cognitive science society. 2013.
  20. Anderson (1989). "A conditional probability interpretation of Kanerva's sparse distributed memory". International Joint Conference on Neural Networks. Vol. 1. pp. 415–417. doi:10.1109/ijcnn.1989.118597. S2CID   13935339.
  21. Vinje, WE; Gallant, JL (2000). "Sparse coding and decorrelation in primary visual cortex during natural vision" (PDF). Science. 287 (5456): 1273–1276. Bibcode:2000Sci...287.1273V. CiteSeerX   10.1.1.456.2467 . doi:10.1126/science.287.5456.1273. PMID   10678835. S2CID   13307465. Archived from the original (PDF) on 2017-09-11.
  22. Hromádka, T; Deweese, MR; Zador, AM (2008). "Sparse representation of sounds in the unanesthetized auditory cortex". PLOS Biol. 6 (1): e16. doi: 10.1371/journal.pbio.0060016 . PMC   2214813 . PMID   18232737.
  23. Crochet, S; Poulet, JFA; Kremer, Y; Petersen, CCH (2011). "Synaptic mechanisms underlying sparse coding of active touch". Neuron. 69 (6): 1160–1175. doi: 10.1016/j.neuron.2011.02.022 . PMID   21435560. S2CID   18528092.
  24. Ito, I; Ong, RCY; Raman, B; Stopfer, M (2008). "Sparse odor representation and olfactory learning". Nat Neurosci. 11 (10): 1177–1184. doi:10.1038/nn.2192. PMC   3124899 . PMID   18794840.
  25. A sparse memory is a precise memory. Oxford Science blog. 28 Feb 2014. http://www.ox.ac.uk/news/science-blog/sparse-memory-precise-memory
  26. Lin, Andrew C.; et al. (2014). "Sparse, decorrelated odor coding in the mushroom body enhances learned odor discrimination". Nature Neuroscience. 17 (4): 559–568. doi:10.1038/nn.3660. PMC   4000970 . PMID   24561998.
  27. Dasgupta, Sanjoy; Stevens, Charles F.; Navlakha, Saket (2017). "A neural algorithm for a fundamental computing problem". Science. 358 (6364): 793–796. Bibcode:2017Sci...358..793D. doi: 10.1126/science.aam9868 . PMID   29123069.
  28. Denning, Peter J. Sparse distributed memory. Research Institute for Advanced Computer Science [NASA Ames Research Center], 1989.
  29. Minsky, Marvin, and Papert Seymour. "Perceptrons." (1969). "Time vs. memory for best matching - an open problem" p. 222–225
  30. 1 2 Uma Ramamurthy; Sidney K. D'Mello; Stan Franklin. "Realizing Forgetting in a Modified Sparse Distributed Memory System" (PDF). Computer Science Department and The Institute for Intelligent Systems. The University of Memphis. pp. 1992–1997. Archived from the original on 5 April 2012. Retrieved 1 November 2011.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
  31. Rocha LM, Hordijk W (2005). "Material representations: From the genetic code to the evolution of cellular automata". Artificial Life. 11 (1–2): 189–214. CiteSeerX   10.1.1.115.6605 . doi:10.1162/1064546053278964. PMID   15811227. S2CID   5742197. Archived from the original on 2013-09-20. Retrieved 2013-08-02.
  32. Rogers, David. "Statistical prediction with Kanerva's sparse distributed memory." Advances in neural information processing systems. 1989.
  33. Rao, R. P. N.; Fuentes, O. (1998). "Hierarchical Learning of Navigational Behaviors in an Autonomous Robot using a Predictive Sparse Distributed Memory". Machine Learning. 31: 87–113. doi: 10.1023/a:1007492624519 . S2CID   8305178.
  34. Franklin, Stan, et al. "The role of consciousness in memory." Brains, Minds and Media 1.1 (2005): 38.
  35. Shastri, Lokendra (2002). "Episodic memory and cortico–hippocampal interactions" (PDF). Trends in Cognitive Sciences. 6 (4): 162–168. doi:10.1016/s1364-6613(02)01868-5. PMID   11912039. S2CID   15022802.
  36. Anwar, Ashraf; Franklin, Stan (2003). "Sparse distributed memory for 'conscious' software agents". Cognitive Systems Research. 4 (4): 339–354. doi:10.1016/s1389-0417(03)00015-9. S2CID   13380583.
  37. Ratitch, Bohdana, Swaminathan Mahadevan, and Doina Precup. "Sparse distributed memories in reinforcement learning: Case studies." Proc. of the Workshop on Learning and Planning in Markov Processes-Advances and Challenges. 2004.
  38. Ratitch, Bohdana, and Doina Precup. "Sparse distributed memories for on-line value-based reinforcement learning." Machine Learning: ECML 2004. Springer Berlin Heidelberg, 2004. 347-358.
  39. Bouchard-Côté, Alexandre. "Sparse Memory Structures Detection." (2004).
  40. Rao, Rajesh PN, and Dana H. Ballard. "Object indexing using an iconic sparse distributed memory." Computer Vision, 1995. Proceedings., Fifth International Conference on. IEEE, 1995.
  41. D'Mello, Sidney K., Ramamurthy, U., & Franklin, S. 2005. Encoding and Retrieval Efficiency of Episodic Data in a Modified Sparse Distributed Memory System. In Proceedings of the 27th Annual Meeting of the Cognitive Science Society. Stresa, Ital
  42. Ramamaurthy, U., Sidney K. D'Mello, and Stan Franklin. "Modified sparse distributed memory as transient episodic memory for cognitive software agents [ dead link ]." Systems, Man and Cybernetics, 2004 IEEE International Conference on. Vol. 6. IEEE, 2004.
  43. Snaider, Javier; Franklin, Stan (2012). "Extended sparse distributed memory and sequence storage". Cognitive Computation. 4 (2): 172–180. doi:10.1007/s12559-012-9125-8. S2CID   14319722.
  44. Furber, Steve B.; et al. (2004). "Sparse distributed memory using N-of-M codes". Neural Networks. 17 (10): 1437–1451. doi:10.1016/j.neunet.2004.07.003. PMID   15541946.
  45. Sharp, Thomas: "Application of sparse distributed memory to the Inverted Pendulum Problem". Diss. University of Manchester, 2009. URL: http://studentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/MSc09/FullText/SharpThomas.pdf
  46. Bose, Joy. Engineering a Sequence Machine Through Spiking Neurons Employing Rank-order Codes [ dead link ]. Diss. University of Manchester, 2007.
  47. Simon Thorpe and Jacques Gautrais. Rank order coding. In Computational Neuroscience: Trends in research, pages 113–118. Plenum Press, 1998.
  48. Furber, Stephen B.; et al. (2007). "Sparse distributed memory using rank-order neural codes". IEEE Transactions on Neural Networks. 18 (3): 648–659. CiteSeerX   10.1.1.686.6196 . doi:10.1109/tnn.2006.890804. PMID   17526333. S2CID   14256161.
  49. Calimera, A; Macii, E; Poncino, M (2013). "The Human Brain Project and neuromorphic computing". Functional Neurology. 28 (3): 191–6. PMC   3812737 . PMID   24139655.
  50. Hely, Tim; Willshaw, David J.; Hayes, Gillian M. (1997). "A new approach to Kanerva's sparse distributed memory". IEEE Transactions on Neural Networks. 8 (3): 791–794. doi:10.1109/72.572115. PMID   18255679. S2CID   18628649.
  51. Caraig, Lou Marvin. "A New Training Algorithm for Kanerva's Sparse Distributed Memory." arXiv preprint arXiv:1207.5774 (2012).
  52. Anwar, Ashraf; Franklin, Stan (2005-01-01). Ng, Michael K.; Doncescu, Andrei; Yang, Laurence T.; Leng, Tau (eds.). A Sparse Distributed Memory Capable of Handling Small Cues, SDMSCue. IFIP — The International Federation for Information Processing. Springer US. pp. 23–38. doi:10.1007/0-387-24049-7_2. ISBN   978-0-387-24048-0. S2CID   10290721.
  53. Method and apparatus for a sparse distributed memory system US 5113507 A, by Louis A. Jaeckel, Universities Space Research Association, 1992, URL: http://www.google.com/patents/US5113507
  54. Method and device for storing and recalling information implementing a kanerva memory system US 5829009 A, by Gary A. Frazier, Texas Instruments Incorporated, 1998, URL: https://www.google.com/patents/US5829009
  55. Furber, Stephen B. "Digital memory." U.S. Patent No. 7,512,572. 31 Mar. 2009.URL: https://www.google.com/patents/US7512572
  56. Emruli, Blerim; Sandin, Fredrik; Delsing, Jerker (2015). "Vector space architecture for emergent interoperability of systems by learning from demonstration". Biologically Inspired Cognitive Architectures. 11: 53–64. doi:10.1016/j.bica.2014.11.015.
  57. Emruli, Blerim; Sandin, Fredrik (2014). "Analogical mapping with sparse distributed memory: A simple model that learns to generalize from examples". Cognitive Computation. 6 (1): 74–88. doi:10.1007/s12559-013-9206-3. S2CID   12139021.
  58. Berchtold, Martin. "Processing Sensor Data with the Common Sense Toolkit (CSTK)." *(2005).
  59. The Mind Wanders by B. Hayes, 2018. url: http://bit-player.org/2018/the-mind-wanders
  60. 1 2 Brogliato, Marcelo S.; Chada, Daniel M.; Linhares, Alexandre (2014). "Sparse distributed memory: understanding the speed and robustness of expert memory". Frontiers in Human Neuroscience. 8: 222. doi: 10.3389/fnhum.2014.00222 . PMC   4009432 . PMID   24808842.
  61. Surkan, Alvin J. (1992). "WSDM: Weighted sparse distributed memory prototype expressed in APL". ACM SIGAPL APL Quote Quad. 23: 235–242. doi:10.1145/144052.144142.
  62. Turk, Andreas, and Günther Görz. "Kanerva's sparse distributed memory: an object-oriented implementation on the connection machine." IJCAI. 1995.
  63. Silva; Tadeu Pinheiro, Marcus; Pádua Braga, Antônio; Soares Lacerda, Wilian (2004). "Reconfigurable co-processor for kanerva's sparse distributed memory" (PDF). Microprocessors and Microsystems. 28 (3): 127–134. doi:10.1016/j.micpro.2004.01.003.
  64. Brown, Robert L. (June 1987). "Two Demonstrators and a Simulator for a Sparse Distributed Memory" (PDF). NASA Technical Reports Archive.