Bidirectional associative memory

Last updated September 14, 2020

Bidirectional associative memory (BAM) is a type of recurrent neural network. BAM was introduced by Bart Kosko in 1988.^[1] There are two types of associative memory, auto-associative and hetero-associative. BAM is hetero-associative, meaning given a pattern it can return another pattern which is potentially of a different size. It is similar to the Hopfield network in that they are both forms of associative memory. However, Hopfield nets return patterns of the same size.

Topology

A BAM contains two layers of neurons, which we shall denote X and Y. Layers X and Y are fully connected to each other. Once the weights have been established, input into layer X presents the pattern in layer Y, and vice versa.

The layers can be connected in both directions (bidirectional) with the result the weight matrix sent from the X layer to the Y layer is $W$ and the weight matrix for signals sent from the Y layer to the X layer is $W^{T}$ . Thus, the weight matrix is calculated in both directions.^[2]

Procedure

Learning

Imagine we wish to store two associations, A1:B1 and A2:B2.

A1 = (1, 0, 1, 0, 1, 0), B1 = (1, 1, 0, 0)
A2 = (1, 1, 1, 0, 0, 0), B2 = (1, 0, 1, 0)

These are then transformed into the bipolar forms:

X1 = (1, -1, 1, -1, 1, -1), Y1 = (1, 1, -1, -1)
X2 = (1, 1, 1, -1, -1, -1), Y2 = (1, -1, 1, -1)

From there, we calculate $M=\sum {\!X_{i}^{T}Y_{i}}$ where $X_{i}^{T}$ denotes the transpose. So,

$M=\left[{\begin{array}{*{10}c}2&0&0&-2\\0&-2&2&0\\2&0&0&-2\\-2&0&0&2\\0&2&-2&0\\-2&0&0&2\\\end{array}}\right]$

Recall

To retrieve the association A1, we multiply it by M to get (4, 2, -2, -4), which, when run through a threshold, yields (1, 1, 0, 0), which is B1. To find the reverse association, multiply this by the transpose of M.

Capacity

The memory or storage capacity of BAM may be given as $\min(m,n)$ , where " $n$ " is the number of units in the X layer and " $m$ " is the number of units in the Y layer.^[3]

The internal matrix has n x p independent degrees of freedom, where n is the dimension of the first vector (6 in this example) and p is the dimension of the second vector (4). This allows the BAM to be able to reliably store and recall a total of up to min(n,p) independent vector pairs, or min(6,4) = 4 in this example.^[1] The capacity can be increased above by sacrificing reliability (incorrect bits on the output).

Stability

A pair $(A,B)$ defines the state of a BAM. To store a pattern, the energy function value for that pattern has to occupy a minimum point in the energy landscape.

The stability analysis of a BAM is based on the definition of Lyapunov function (energy function) $E$ , with each state $(A,B)$ . When a paired pattern $(A,B)$ is presented to BAM, the neurons change states until a bi-directionally stable state $(A_{f},B_{f})$ is reached, which Kosko proved to correspond to a local minimum of the energy function. The discrete BAM is proved to converge to a stable state.

The Energy Function proposed by Kosko is $E(A,B)=-AMB^{T}$ for the bidirectional case, which for a particular case $A=B$ corresponds to Hopfield's Auto-associative Energy Function.^[3] (i.e. $E(A,B)=-AMA^{T}$ ).

Related Research Articles

In linear algebra, a symmetric $real matrix is said to be positive-definite if the scalar is strictly positive for every non-zero column vector of real numbers. Here denotes the transpose of . When interpreting as the output of an operator,, that is acting on an input,, the property of positive definiteness implies that the output always has a positive inner product with the input, as often observed in physical processes. Put differently, that applying M to z (Mz) keeps the output in the direction of z.$

Principal component analysis Conversion of a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components

The principal components of a collection of points in a real p-space are a sequence of $direction vectors where the vector is the direction of a line that best fits the data while being orthogonal to the first vectors. Here, a best-fitting line is defined as one that minimizes the average squared distance from the points to the line. These directions constitute an orthonormal basis in which different individual dimensions of the data are linearly uncorrelated. Principal component analysis (PCA) is the process of computing the principal components and using them to perform a change of basis on the data, sometimes using only the first few principal components and ignoring the rest.$

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

In mathematics, the Rayleigh quotient for a given complex Hermitian matrix M and nonzero vector x is defined as:

The information bottleneck method is a technique in information theory introduced by Naftali Tishby, Fernando C. Pereira, and William Bialek. It is designed for finding the best tradeoff between accuracy and complexity (compression) when summarizing a random variable X, given a joint probability distribution p(X,Y) between X and an observed relevant variable Y - and described as providing "a surprisingly rich framework for discussing a variety of problems in signal processing and learning".

A Hopfield network is a form of recurrent artificial neural network popularized by John Hopfield in 1982, but described earlier by Little in 1974 based on Ernst Ising's work with Wilhelm Lenz. Hopfield networks serve as content-addressable ("associative") memory systems with binary threshold nodes. They are guaranteed to converge to a local minimum and, therefore, may converge to a false pattern rather than the stored pattern. Hopfield networks also provide a model for understanding human memory.

In machine learning, backpropagation is a widely used algorithm in training feedforward neural networks for supervised learning. Generalizations of backpropagation exist for other artificial neural networks (ANNs), and for functions generally – a class of algorithms referred to generically as "backpropagation". In fitting a neural network, backpropagation computes the gradient of the loss function with respect to the weights of the network for a single input–output example, and does so efficiently, unlike a naive direct computation of the gradient with respect to each weight individually. This efficiency makes it feasible to use gradient methods for training multilayer networks, updating weights to minimize loss; gradient descent, or variants such as stochastic gradient descent, are commonly used. The backpropagation algorithm works by computing the gradient of the loss function with respect to each weight by the chain rule, computing the gradient one layer at a time, iterating backward from the last layer to avoid redundant calculations of intermediate terms in the chain rule; this is an example of dynamic programming.

In mathematics, a matrix norm is a vector norm in a vector space whose elements (vectors) are matrices.

A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition.

In linear algebra, an eigenvector or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted by $, is the factor by which the eigenvector is scaled.$

In machine learning, kernel methods are a class of algorithms for pattern analysis, whose best known member is the support vector machine (SVM). The general task of pattern analysis is to find and study general types of relations in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over pairs of data points in raw representation.

Semidefinite programming (SDP) is a subfield of convex optimization concerned with the optimization of a linear objective function over the intersection of the cone of positive semidefinite matrices with an affine space, i.e., a spectrahedron.

The softmax function, also known as softargmax or normalized exponential function, is a generalization of the logistic function to multiple dimensions. It is used in multinomial logistic regression and is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes.

Autoassociative memory, also known as auto-association memory or an autoassociation network, is any type of memory is able to retrieve a piece of data from only a tiny sample of itself. They are very effective in de-noising or removing interference from the input and can be used to determine whether the given input is “known” or “unknown”.

The image segmentation problem is concerned with partitioning an image into multiple regions according to some homogeneity criterion. This article is primarily concerned with graph theoretic approaches to image segmentation applying graph partitioning via minimum cut or maximum cut. Segmentation-based object categorization can be viewed as a specific case of spectral clustering applied to image segmentation.

There are many types of artificial neural networks (ANN).

A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs.

An attractor network is a type of recurrent dynamical network, that evolves toward a stable pattern over time. Nodes in the attractor network converge toward a pattern that may either be fixed-point, cyclic, chaotic or random (stochastic). Attractor networks have largely been used in computational neuroscience to model neuronal processes such as associative memory and motor behavior, as well as in biologically inspired methods of machine learning. An attractor network contains a set of n nodes, which can be represented as vectors in a d-dimensional space where n>d. Over time, the network state tends toward one of a set of predefined states on a d-manifold; these are the attractors.

Regularized least squares (RLS) is a family of methods for solving the least-squares problem while using regularization to further constrain the resulting solution.

A Siamese neural network is an artificial neural network that uses the same weights while working in tandem on two different input vectors to compute comparable output vectors. Often one of the output vectors is precomputed, thus forming a baseline against which the other output vector is compared. This is similar to comparing fingerprints but can be described more technically as a distance function for locality-sensitive hashing.

References

1 2 Kosko, B. (1988). "Bidirectional Associative Memories" (PDF). IEEE Transactions on Systems, Man, and Cybernetics. 18 (1): 49–60. doi:10.1109/21.87054.
1 2 "Principles of Soft Computing, 3ed". www.wileyindia.com. Retrieved 2020-08-15.
1 2 RAJASEKARAN, S.; PAI, G. A. VIJAYALAKSHMI (2003-01-01). NEURAL NETWORKS, FUZZY LOGIC AND GENETIC ALGORITHM: SYNTHESIS AND APPLICATIONS (WITH CD). PHI Learning Pvt. Ltd. ISBN 978-81-203-2186-1.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Kosko-1] 1 2 Kosko, B. (1988). "Bidirectional Associative Memories" (PDF). IEEE Transactions on Systems, Man, and Cybernetics. 18 (1): 49–60. doi:10.1109/21.87054.

[:0-2] 1 2 "Principles of Soft Computing, 3ed". www.wileyindia.com. Retrieved 2020-08-15.

[:1-3] 1 2 RAJASEKARAN, S.; PAI, G. A. VIJAYALAKSHMI (2003-01-01). NEURAL NETWORKS, FUZZY LOGIC AND GENETIC ALGORITHM: SYNTHESIS AND APPLICATIONS (WITH CD). PHI Learning Pvt. Ltd. ISBN 978-81-203-2186-1.