RAMnets

Last updated

RAMnets is one of the oldest practical neurally inspired classification algorithms. The RAMnets is also known as a type of "n-tuple recognition method" or "weightless neural network".

Contents

Algorithm

Consider (let us say N) sets of n distinct bit locations are selected randomly. These are the n-tuples. The restriction of a pattern to an n-tuple can be regarded as an n-bit number which, together with the identity of the n-tuple, constitutes a `feature' of the pattern. The standard n-tuple recognizer operates simply as follows:

A pattern is classified as belonging to the class for which it has the most features in common with at least one training pattern of that class.

This is the = 0 case of a more general rule whereby the class assigned to unclassified pattern u is

where Dc is the set of training patterns in class c, = x for , for , is the Kronecker delta(=1 if i=j and 0 otherwise.)and is the ith feature of the pattern u:

Here uk is the kth bit of u and is the jth bit location of the ith n-tuple.

With C classes to distinguish, the system can be implemented as a network of NC nodes, each of which is a random access memory (RAM); hence the term RAMnet. The memory content at address of the ith node allocated to class c is set to

 = 

In the usual = 1 case, the 1-bit content of is set if any pattern of Dc has feature and unset otherwise. Recognition is accomplished by summing the contents of the nodes of each class at the addresses given by the features of the unclassified pattern. That is, pattern u is assigned to class

RAM-discriminators and WiSARD

The RAMnets formed the basis of a commercial product known as WiSARD (Wilkie, Stonham and Aleksander Recognition Device) was the first artificial neural network machine to be patented.

A RAM-discriminator consists of a set of X one-bit word RAMs with n inputs and a summing device (Σ). Any such RAM-discriminator can receive a binary pattern of X⋅n bits as input. The RAM input lines are connected to the input pattern by means of a biunivocal pseudo-random mapping. The summing device enables this network of RAMs to exhibit – just like other ANN models based on synaptic weights – generalization and noise tolerance.

In order to train the discriminator one has to set all RAM memory locations to 0 and choose a training set formed by binary patterns of X⋅n bits. For each training pattern, a 1 is stored in the memory location of each RAM addressed by this input pattern. Once the training of patterns is completed, RAM memory contents will be set to a certain number of 0’s and 1’s.

The information stored by the RAM during the training phase is used to deal with previous unseen patterns. When one of these is given as input, the RAM memory contents addressed by the input pattern are read and summed by Σ. The number r thus obtained, which is called the discriminator response, is equal to the number of RAMs that output 1. r reaches the maximum X if the input belongs to the training set. r is equal to 0 if no n-bit component of the input pattern appears in the training set (not a single RAM outputs 1). Intermediate values of r express a kind of “similarity measure” of the input pattern with respect to the patterns in the training set.

A system formed by various RAM-discriminators is called WiSARD. Each RAM-discriminator is trained on a particular class of patterns, and classification by the multi-discriminator system is performed in the following way. When a pattern is given as input, each RAM-discriminator gives a response to that input. The various responses are evaluated by an algorithm which compares them and computes the relative confidence c of the highest response (e.g., the difference d between the highest response and the second highest response, divided by the highest response). A schematic representation of a RAM-discriminator and a 10 RAM-discriminator WiSARD is shown in Figure 1. [1]

See also

Related Research Articles

<span class="mw-page-title-main">Pauli matrices</span> Matrices important in quantum mechanics and the study of spin

In mathematical physics and mathematics, the Pauli matrices are a set of three 2 × 2 complex matrices that are traceless, Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

<span class="mw-page-title-main">Self-organizing map</span> Machine learning technique useful for dimensionality reduction

A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional representation of a higher-dimensional data set while preserving the topological structure of the data. For example, a data set with variables measured in observations could be represented as clusters of observations with similar values for the variables. These clusters then could be visualized as a two-dimensional "map" such that observations in proximal clusters have more similar values than observations in distal clusters. This can make high-dimensional data easier to visualize and analyze.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter k and a scale parameter θ
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.

In Vapnik–Chervonenkis theory, the Vapnik–Chervonenkis (VC) dimension is a measure of the size of a class of sets. The notion can be extended to classes of binary functions. It is defined as the cardinality of the largest set of points that the algorithm can shatter, which means the algorithm can always learn a perfect classifier for any labeling of at least one configuration of those data points. It was originally defined by Vladimir Vapnik and Alexey Chervonenkis.

In mathematics, and specifically differential geometry, a connection form is a manner of organizing the data of a connection using the language of moving frames and differential forms.

In differential topology, the jet bundle is a certain construction that makes a new smooth fiber bundle out of a given smooth fiber bundle. It makes it possible to write differential equations on sections of a fiber bundle in an invariant form. Jets may also be seen as the coordinate free versions of Taylor expansions.

In mathematical statistics, the Kullback–Leibler (KL) divergence, denoted , is a type of statistical distance: a measure of how one reference probability distribution P is different from a second probability distribution Q. Mathematically, it is defined as

In natural language processing, latent Dirichlet allocation (LDA) is a Bayesian network for modeling automatically extracted topics in textual corpora. The LDA is an example of a Bayesian topic model. In this, observations are collected into documents, and each word's presence is attributable to one of the document's topics. Each document will contain a small number of topics.

In mathematics, the Weyl character formula in representation theory describes the characters of irreducible representations of compact Lie groups in terms of their highest weights. It was proved by Hermann Weyl. There is a closely related formula for the character of an irreducible representation of a semisimple Lie algebra. In Weyl's approach to the representation theory of connected compact Lie groups, the proof of the character formula is a key step in proving that every dominant integral element actually arises as the highest weight of some irreducible representation. Important consequences of the character formula are the Weyl dimension formula and the Kostant multiplicity formula.

<span class="mw-page-title-main">Divisor summatory function</span> Summatory function of the divisor-counting function

In number theory, the divisor summatory function is a function that is a sum over the divisor function. It frequently occurs in the study of the asymptotic behaviour of the Riemann zeta function. The various studies of the behaviour of the divisor function are sometimes called divisor problems.

An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data. An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for dimensionality reduction, to generate lower-dimensional embeddings for subsequent use by other machine learning algorithms.

Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating extreme values of functions which cannot be computed directly, but only estimated via noisy observations.

In the differential geometry of surfaces, a Darboux frame is a natural moving frame constructed on a surface. It is the analog of the Frenet–Serret frame as applied to surface geometry. A Darboux frame exists at any non-umbilic point of a surface embedded in Euclidean space. It is named after French mathematician Jean Gaston Darboux.

Location estimation in wireless sensor networks is the problem of estimating the location of an object from a set of noisy measurements. These measurements are acquired in a distributed manner by a set of sensors.

<span class="mw-page-title-main">Theta model</span>

The theta model, or Ermentrout–Kopell canonical model, is a biological neuron model originally developed to mathematically describe neurons in the animal Aplysia. The model is particularly well-suited to describe neural bursting, which is characterized by periodic transitions between rapid oscillations in the membrane potential followed by quiescence. This bursting behavior is often found in neurons responsible for controlling and maintaining steady rhythms such as breathing, swimming, and digesting. Of the three main classes of bursting neurons, the theta model describes parabolic bursting, which is characterized by a parabolic frequency curve during each burst.

Geometric feature learning is a technique combining machine learning and computer vision to solve visual tasks. The main goal of this method is to find a set of representative features of geometric form to represent an object by collecting geometric features from images and learning them using efficient machine learning methods. Humans solve visual tasks and can give fast response to the environment by extracting perceptual information from what they see. Researchers simulate humans' ability of recognizing objects to solve computer vision problems. For example, M. Mata et al.(2002) applied feature learning techniques to the mobile robot navigation tasks in order to avoid obstacles. They used genetic algorithms for learning features and recognizing objects (figures). Geometric feature learning methods can not only solve recognition problems but also predict subsequent actions by analyzing a set of sequential input sensory images, usually some extracting features of images. Through learning, some hypothesis of the next action are given and according to the probability of each hypothesis give a most probable action. This technique is widely used in the area of artificial intelligence.

In machine learning, the vanishing gradient problem is encountered when training neural networks with gradient-based learning methods and backpropagation. In such methods, during each training iteration, each neural network weight receives an update proportional to the partial derivative of the loss function with respect to the current weight. The problem is that as the network depth or sequence length increases, the gradient magnitude typically is expected to decrease, slowing the training process. In the worst case, this may completely stop the neural network from further learning. As one example of the problem cause, traditional activation functions such as the hyperbolic tangent function have gradients in the range [-1,1], and backpropagation computes gradients using the chain rule. This has the effect of multiplying n of these small numbers to compute gradients of the early layers in an n-layer network, meaning that the gradient decreases exponentially with n while the early layers train very slowly.

Quantum counting algorithm is a quantum algorithm for efficiently counting the number of solutions for a given search problem. The algorithm is based on the quantum phase estimation algorithm and on Grover's search algorithm.

In the study of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from kernel methods.

An energy-based model (EBM) is an application of canonical ensemble formulation from statistical physics for learning from data. The approach prominently appears in generative artificial intelligence.

References

  1. Advances in computational intelligence and learning : 17th European Symposium on Artificial Neural Networks; ESANN 2009; Bruges, Belgium, April 22-23-24, 2009; proceedings. Verleysen, Michel, Université catholique de Louvain, ESANN (17 2009.04.22-24 Bruges), European Symposium on Artificial Neural Networks (17 2009.04.22-24 Bruges). Evere: d-side. 2009. ISBN   978-2930307091. OCLC   553956424.{{cite book}}: CS1 maint: others (link)

Further reading

  1. N. M. Allinson, A. R. Kolcz (1997). N-Tuple Neural Networks. Springer Science+Business Media New York: Springer, Boston, MA. ISBN   978-1-4615-6099-9.
  2. Fukunaga, Keinosuke (1990). Introduction to Statistical Pattern Recognition (2nd ed.). Boston: Academic Press. ISBN   0-12-269851-7.
  3. Hornegger, Joachim; Paulus, Dietrich W. R. (1999). Applied Pattern Recognition: A Practical Introduction to Image and Speech Processing in C++ (2nd ed.). San Francisco: Morgan Kaufmann Publishers. ISBN   3-528-15558-2.
  4. An introductory tutorial to classifiers (introducing the basic terms, with numeric example)