Hyperdimensional computing

Last updated

Hyperdimensional computing (HDC) is an approach to computation, particularly artificial intelligence. HDC is motivated by the observation that the cerebellum cortex operates on high-dimensional data representations. [1] In HDC, information is thereby represented as a hyperdimensional (long) vector called a hypervector. A hyperdimensional vector (hypervector) could include thousands of numbers that represent a point in a space of thousands of dimensions. [2] Vector Symbolic Architectures is an older name for the same broad approach. [2] [ failed verification ]

Contents

Process

Data is mapped from the input space to sparse HD space under an encoding function φ : X → H. HD representations are stored in data structures that are subject to corruption by noise/hardware failures. Noisy/corrupted HD representations can still serve as input for learning, classification, etc. They can also be decoded to recover the input data. H is typically restricted to range-limited integers (-v-v) [3]

This is analogous to the learning process conducted by fruit flies olfactory system. The input is a roughly 50-dimensional vector corresponding to odor receptor neuron types. The HD representation uses ~2,000-dimensions. [3]

Transparency

HDC algebra reveals the logic of how and why systems makes decisions, unlike artificial neural networks. Physical world objects can be mapped to hypervectors, to be processed by the algebra. [2]

Performance

HDC is suitable for "in-memory computing systems", which compute and hold data on a single chip, avoiding data transfer delays. Analog devices operate at low voltages. They are energy-efficient, but prone to error-generating noise. HDC's can tolerate such errors. [2]

Various teams have developed low-power HDC hardware accelerators. [3]

Nanoscale memristive devices can be exploited to perform computation. An in-memory hyperdimensional computing system can implement operations on two memristive crossbar engines together with peripheral digital CMOS circuits. Experiments using 760,000 phase-change memory devices performing analog in-memory computing achieved accuracy comparable to software implementations. [4]

Errors

HDC is robust to errors such as an individual bit error (a 0 flips to 1 or vice versa) missed by error-correcting mechanisms. Eliminating such error-correcting mechanisms can save up to 25% of compute cost. This is possible because such errors leave the result "close" to the correct vector. Reasoning using vectors is not compromised. HDC is at least 10x more error tolerant than traditional artificial neural networks, which are already orders of magnitude more tolerant than traditional computing. [2]

Example

A simple example considers images containing black circles and white squares. Hypervectors can represent SHAPE and COLOR variables and hold the corresponding values: CIRCLE, SQUARE, BLACK and WHITE. Bound hypervectors can hold the pairs BLACK and CIRCLE, etc. [2]

Orthogonality

High-dimensional space allows many mutually orthogonal vectors. However, If vectors are instead allowed to be nearly orthogonal, the number of distinct vectors in high-dimensional space is vastly larger. [2]

HDC uses the concept of distributed representations, in which an object/observation is represented by a pattern of values across many dimensions rather than a single constant. [3]

Operations

HDC can combine hypervectors into new hypervectors using well-defined vector space operations.

Groups, rings, and fields over hypervectors become the underlying computing structures with addition, multiplication, permutation, mapping, and inverse as primitive computing operations. [4] All computational tasks are performed in high-dimensional space using simple operations like element-wise additions and dot products. [3]

Binding creates ordered point tuples and is also a function ⊗ : H × H → H. The input is two points in H, while the output is a dissimilar point. Multiplying the SHAPE vector with CIRCLE binds the two, representing the idea “SHAPE is CIRCLE”. This vector is "nearly orthogonal" to SHAPE and CIRCLE. The components are recoverable from the vector (e.g., answer the question "is the shape a circle?"). [3]

Addition creates a vector that combines concepts. For example, adding “SHAPE is CIRCLE” to “COLOR is RED,” creates a vector that represents a red circle.

Permutation rearranges the vector elements. For example, permuting a three-dimensional vector with values labeled x, y and z, can interchange x to y, y to z, and z to x. Events represented by hypervectors A and B can be added, forming one vector, but that would sacrifice the event sequence. Combining addition with permutation preserves the order; the event sequence can be retrieved by reversing the operations.

Bundling combines a set of elements in H as function ⊕ : H ×H → H. The input is two points in H and the output is a third point that is similar to both. [3]

History

Vector symbolic architectures (VSA) provided a systematic approach to high-dimensional symbol representations to support operations such as establishing relationships. Early examples include holographic reduced representations, binary spatter codes, and matrix binding of additive terms. HD computing advanced these models, particularly emphasizing hardware efficiency. [3]

In 2018, Eric Weiss showed how to fully represent an image as a hypervector. A vector could contain information about all the objects in the image, including properties such as color, position, and size. [2]

In 2023, Abbas Rahimi et al., used HDC with neural networks to solve Raven's progressive matrices. [2]

In 2023, Mike Heddes et Al. under the supervision of Professors Givargis, Nicolau and Veidenbaum created a hyper-dimensional computing library [5] that is built on top of PyTorch.

Applications

Image recognition

HDC algorithms can replicate tasks long completed by deep neural networks, such as classifying images. [2]

Classifying an annotated set of handwritten digits uses an algorithm to analyze the features of each image, yielding a hypervector per image. The algorithm then adds the hypervectors for all labeled images of e.g., zero, to create a prototypical hypervector for the concept of zero and repeats this for the other digits. [2]

Classifying an unlabeled image involves creating a hypervector for it and comparing it to the reference hypervectors. This comparison identifies the digit that the new image most resembles. [2]

Given labeled example set is the class of a particular xi. [3]

Given query xq ∈ X the most similar prototype can be found with . The similarity metric ρ is typically the dot-product. [3]

Reasoning

Hypervectors can also be used for reasoning. Raven's progressive matrices presents images of objects in a grid. One position in the grid is blank. The test is to choose from candidate images the one that best fits. [2]

A dictionary of hypervectors represents individual objects. Each hypervector represents an object concept with its attributes. For each test image a neural network generates a binary hypervector (values are +1 or −1) that is as close as possible to some set of dictionary hypervectors. The generated hypervector thus describes all the objects and their attributes in the image. [2]

Another algorithm creates probability distributions for the number of objects in each image and their characteristics. These probability distributions describe the likely characteristics of both the context and candidate images. They too are transformed into hypervectors, then algebra predicts the most likely candidate image to fill the slot. [2]

This approach achieved 88% accuracy on one problem set, beating neural network–only solutions that were 61% accurate. For 3-by-3 grids, the system was 250x faster than a method that used symbolic logic to reason, because of the size of the associated rulebook. [2]

Other

Other applications include bio-signal processing, natural language processing, and robotics. [3]

See also

Related Research Articles

<span class="mw-page-title-main">Supervised learning</span> Paradigm in machine learning

Supervised learning (SL) is a paradigm in machine learning where input objects and a desired output value train a model. The training data is processed, building a function that maps new data to expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. This statistical quality of an algorithm is measured through the so-called generalization error.

<span class="mw-page-title-main">Neural network (machine learning)</span> Computational model used in machine learning, based on connected, hierarchical functions

In machine learning, a neural network is a model inspired by the structure and function of biological neural networks in animal brains.

<span class="mw-page-title-main">Self-organizing map</span> Machine learning technique useful for dimensionality reduction

A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional representation of a higher-dimensional data set while preserving the topological structure of the data. For example, a data set with variables measured in observations could be represented as clusters of observations with similar values for the variables. These clusters then could be visualized as a two-dimensional "map" such that observations in proximal clusters have more similar values than observations in distal clusters. This can make high-dimensional data easier to visualize and analyze.

Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent patterns. PR has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power.

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Projection of data onto lower-dimensional manifolds

Nonlinear dimensionality reduction, also known as manifold learning, is any of various related techniques that aim to project high-dimensional data, potentially existing across non-linear manifolds which cannot be adequately captured by linear decomposition methods, onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

<span class="mw-page-title-main">Artificial neuron</span> Mathematical function conceived as a crude model

An artificial neuron is a mathematical function conceived as a model of biological neurons in a neural network. Artificial neurons are the elementary units of artificial neural networks. The artificial neuron is a function that receives one or more inputs, applies weights to these inputs, and sums them to produce an output.

The Hough transform is a feature extraction technique used in image analysis, computer vision, pattern recognition, and digital image processing. The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure. This voting procedure is carried out in a parameter space, from which object candidates are obtained as local maxima in a so-called accumulator space that is explicitly constructed by the algorithm for computing the Hough transform.

In machine learning, backpropagation is a gradient estimation method commonly used for training neural networks to compute the network parameter updates.

Recurrent neural networks (RNNs) are a class of artificial neural network commonly used for sequential data processing. Unlike feedforward neural networks, which process data in a single pass, RNNs process data across multiple time steps, making them well-adapted for modelling and processing text, speech, and time series.

Neural gas is an artificial neural network, inspired by the self-organizing map and introduced in 1991 by Thomas Martinetz and Klaus Schulten. The neural gas is a simple algorithm for finding optimal data representations based on feature vectors. The algorithm was coined "neural gas" because of the dynamics of the feature vectors during the adaptation process, which distribute themselves like a gas within the data space. It is applied where data compression or vector quantization is an issue, for example speech recognition, image processing or pattern recognition. As a robustly converging alternative to the k-means clustering it is also used for cluster analysis.

In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pattern analysis is to find and study general types of relations in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite dimensional but only requires a finite dimensional matrix from user-input according to the Representer theorem. Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel processing.

For holographic data storage, holographic associative memory (HAM) is an information storage and retrieval system based on the principles of holography. Holograms are made by using two beams of light, called a "reference beam" and an "object beam". They produce a pattern on the film that contains them both. Afterwards, by reproducing the reference beam, the hologram recreates a visual image of the original object. In theory, one could use the object beam to do the same thing: reproduce the original reference beam. In HAM, the pieces of information act like the two beams. Each can be used to retrieve the other from the pattern. It can be thought of as an artificial neural network which mimics the way the brain uses information. The information is presented in abstract form by a complex vector which may be expressed directly by a waveform possessing frequency and magnitude. This waveform is analogous to electrochemical impulses believed to transmit information between biological neuron cells.

An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data. An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for dimensionality reduction, to generate lower-dimensional embeddings for subsequent use by other machine learning algorithms.

The generalized Hough transform (GHT), introduced by Dana H. Ballard in 1981, is the modification of the Hough transform using the principle of template matching. The Hough transform was initially developed to detect analytically defined shapes. In these cases, we have knowledge of the shape and aim to find out its location and orientation in the image. This modification enables the Hough transform to be used to detect an arbitrary object described with its model.

There are many types of artificial neural networks (ANN).

Sparse distributed memory (SDM) is a mathematical model of human long-term memory introduced by Pentti Kanerva in 1988 while he was at NASA Ames Research Center.

<span class="mw-page-title-main">Feature learning</span> Set of learning techniques in machine learning

In machine learning (ML), feature learning or representation learning is a set of techniques that allow a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task.

A convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns features by itself via filter optimization. This type of deep learning network has been applied to process and make predictions from many different types of data including text, images and audio. Convolution-based networks are the de-facto standard in deep learning-based approaches to computer vision and image processing, and have only recently have been replaced -- in some cases -- by newer deep learning architectures such as the transformer. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

A capsule neural network (CapsNet) is a machine learning system that is a type of artificial neural network (ANN) that can be used to better model hierarchical relationships. The approach is an attempt to more closely mimic biological neural organization.

<span class="mw-page-title-main">Transformer (deep learning architecture)</span> Deep learning architecture for modelling sequential data

A transformer is a deep learning architecture developed by researchers at Google and based on the multi-head attention mechanism, proposed in the 2017 paper "Attention Is All You Need". Text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished.

References

  1. Zou, Zhuowen; Alimohamadi, Haleh; Imani, Farhad; Kim, Yeseong; Imani, Mohsen (2021-10-01), Spiking Hyperdimensional Network: Neuromorphic Models Integrated with Memory-Inspired Framework, arXiv: 2110.00214
  2. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Ananthaswamy, Anan (April 13, 2023). "A New Approach to Computation Reimagines Artificial Intelligence". Quanta Magazine.
  3. 1 2 3 4 5 6 7 8 9 10 11 Thomas, Anthony; Dasgupta, Sanjoy; Rosing, Tajana (2021-10-05). "A Theoretical Perspective on Hyperdimensional Computing" (PDF). Journal of Artificial Intelligence Research. 72: 215–249. doi:10.1613/jair.1.12664. ISSN   1076-9757. S2CID   239007517.
  4. 1 2 Karunaratne, Geethan; Le Gallo, Manuel; Cherubini, Giovanni; Benini, Luca; Rahimi, Abbas; Sebastian, Abu (June 2020). "In-memory hyperdimensional computing". Nature Electronics. 3 (6): 327–337. arXiv: 1906.01548 . doi:10.1038/s41928-020-0410-3. ISSN   2520-1131. S2CID   174797921.
  5. Heddes, Mike; Nunes, Igor; Vergés, Pere; Kleyko, Denis; Abraham, Danny; Givargis, Tony; Nicolau, Alexandru; Veidenbaum, Alexander (2022-05-18). "Torchhd: An Open Source Python Library to Support Research on Hyperdimensional Computing and Vector Symbolic Architectures". arXiv: 2205.09208 [cs.LG].