Hierarchical temporal memory

Last updated

Hierarchical temporal memory (HTM) is a biologically constrained machine intelligence technology developed by Numenta. Originally described in the 2004 book On Intelligence by Jeff Hawkins with Sandra Blakeslee, HTM is primarily used today for anomaly detection in streaming data. The technology is based on neuroscience and the physiology and interaction of pyramidal neurons in the neocortex of the mammalian (in particular, human) brain.

Contents

At the core of HTM are learning algorithms that can store, learn, infer, and recall high-order sequences. Unlike most other machine learning methods, HTM constantly learns (in an unsupervised process) time-based patterns in unlabeled data. HTM is robust to noise, and has high capacity (it can learn multiple patterns simultaneously). When applied to computers, HTM is well suited for prediction, [1] anomaly detection, [2] classification, and ultimately sensorimotor applications. [3]

HTM has been tested and implemented in software through example applications from Numenta and a few commercial applications from Numenta's partners[ clarification needed ].

Structure and algorithms

A typical HTM network is a tree-shaped hierarchy of levels (not to be confused with the "layers" of the neocortex, as described below). These levels are composed of smaller elements called regions (or nodes). A single level in the hierarchy possibly contains several regions. Higher hierarchy levels often have fewer regions. Higher hierarchy levels can reuse patterns learned at the lower levels by combining them to memorize more complex patterns.

Each HTM region has the same basic function. In learning and inference modes, sensory data (e.g. data from the eyes) comes into bottom-level regions. In generation mode, the bottom level regions output the generated pattern of a given category. The top level usually has a single region that stores the most general and most permanent categories (concepts); these determine, or are determined by, smaller concepts at lower levels—concepts that are more restricted in time and space[ clarification needed ]. When set in inference mode, a region (in each level) interprets information coming up from its "child" regions as probabilities of the categories it has in memory.

Each HTM region learns by identifying and memorizing spatial patterns—combinations of input bits that often occur at the same time. It then identifies temporal sequences of spatial patterns that are likely to occur one after another.

As an evolving model

HTM is the algorithmic component to Jeff Hawkins’ Thousand Brains Theory of Intelligence. So new findings on the neocortex are progressively incorporated into the HTM model, which changes over time in response. The new findings do not necessarily invalidate the previous parts of the model, so ideas from one generation are not necessarily excluded in its successive one. Because of the evolving nature of the theory, there have been several generations of HTM algorithms, [4] which are briefly described below.

First generation: zeta 1

The first generation of HTM algorithms is sometimes referred to as zeta 1.

Training

During training, a node (or region) receives a temporal sequence of spatial patterns as its input. The learning process consists of two stages:

  1. The spatial pooling identifies (in the input) frequently observed patterns and memorise them as "coincidences". Patterns that are significantly similar to each other are treated as the same coincidence. A large number of possible input patterns are reduced to a manageable number of known coincidences.
  2. The temporal pooling partitions coincidences that are likely to follow each other in the training sequence into temporal groups. Each group of patterns represents a "cause" of the input pattern (or "name" in On Intelligence).

The concepts of spatial pooling and temporal pooling are still quite important in the current HTM algorithms. Temporal pooling is not yet well understood, and its meaning has changed over time (as the HTM algorithms evolved).

Inference

During inference, the node calculates the set of probabilities that a pattern belongs to each known coincidence. Then it calculates the probabilities that the input represents each temporal group. The set of probabilities assigned to the groups is called a node's "belief" about the input pattern. (In a simplified implementation, node's belief consists of only one winning group). This belief is the result of the inference that is passed to one or more "parent" nodes in the next higher level of the hierarchy.

"Unexpected" patterns to the node do not have a dominant probability of belonging to any one temporal group but have nearly equal probabilities of belonging to several of the groups. If sequences of patterns are similar to the training sequences, then the assigned probabilities to the groups will not change as often as patterns are received. The output of the node will not change as much, and a resolution in time[ clarification needed ] is lost.

In a more general scheme, the node's belief can be sent to the input of any node(s) at any level(s), but the connections between the nodes are still fixed. The higher-level node combines this output with the output from other child nodes thus forming its own input pattern.

Since resolution in space and time is lost in each node as described above, beliefs formed by higher-level nodes represent an even larger range of space and time. This is meant to reflect the organisation of the physical world as it is perceived by the human brain. Larger concepts (e.g. causes, actions, and objects) are perceived to change more slowly and consist of smaller concepts that change more quickly. Jeff Hawkins postulates that brains evolved this type of hierarchy to match, predict, and affect the organisation of the external world.

More details about the functioning of Zeta 1 HTM can be found in Numenta's old documentation. [5]

Second generation: cortical learning algorithms

The second generation of HTM learning algorithms, often referred to as cortical learning algorithms (CLA), was drastically different from zeta 1. It relies on a data structure called sparse distributed representations (that is, a data structure whose elements are binary, 1 or 0, and whose number of 1 bits is small compared to the number of 0 bits) to represent the brain activity and a more biologically-realistic neuron model (often also referred to as cell, in the context of HTM). [6] There are two core components in this HTM generation: a spatial pooling algorithm, [7] which outputs sparse distributed representations (SDR), and a sequence memory algorithm, [8] which learns to represent and predict complex sequences.

In this new generation, the layers and minicolumns of the cerebral cortex are addressed and partially modeled. Each HTM layer (not to be confused with an HTM level of an HTM hierarchy, as described above) consists of a number of highly interconnected minicolumns. An HTM layer creates a sparse distributed representation from its input, so that a fixed percentage of minicolumns are active at any one time[ clarification needed ]. A minicolumn is understood as a group of cells that have the same receptive field. Each minicolumn has a number of cells that are able to remember several previous states. A cell can be in one of three states: active, inactive and predictive state.

Spatial pooling

The receptive field of each minicolumn is a fixed number of inputs that are randomly selected from a much larger number of node inputs. Based on the (specific) input pattern, some minicolumns will be more or less associated with the active input values. Spatial pooling selects a relatively constant number of the most active minicolumns and inactivates (inhibits) other minicolumns in the vicinity of the active ones. Similar input patterns tend to activate a stable set of minicolumns. The amount of memory used by each layer can be increased to learn more complex spatial patterns or decreased to learn simpler patterns.

Active, inactive and predictive cells

As mentioned above, a cell (or a neuron) of a minicolumn, at any point in time, can be in an active, inactive or predictive state. Initially, cells are inactive.

How do cells become active?

If one or more cells in the active minicolumn are in the predictive state (see below), they will be the only cells to become active in the current time step. If none of the cells in the active minicolumn are in the predictive state (which happens during the initial time step or when the activation of this minicolumn was not expected), all cells are made active.

How do cells become predictive?

When a cell becomes active, it gradually forms connections to nearby cells that tend to be active during several previous time steps. Thus a cell learns to recognize a known sequence by checking whether the connected cells are active. If a large number of connected cells are active, this cell switches to the predictive state in anticipation of one of the few next inputs of the sequence.

The output of a minicolumn

The output of a layer includes minicolumns in both active and predictive states. Thus minicolumns are active over long periods of time, which leads to greater temporal stability seen by the parent layer.

Inference and online learning

Cortical learning algorithms are able to learn continuously from each new input pattern, therefore no separate inference mode is necessary. During inference, HTM tries to match the stream of inputs to fragments of previously learned sequences. This allows each HTM layer to be constantly predicting the likely continuation of the recognized sequences. The index of the predicted sequence is the output of the layer. Since predictions tend to change less frequently than the input patterns, this leads to increasing temporal stability of the output in higher hierarchy levels. Prediction also helps to fill in missing patterns in the sequence and to interpret ambiguous data by biasing the system to infer what it predicted.

Applications of the CLAs

Cortical learning algorithms are currently being offered as commercial SaaS by Numenta (such as Grok [9] ).

The validity of the CLAs

The following question was posed to Jeff Hawkins in September 2011 with regard to cortical learning algorithms: "How do you know if the changes you are making to the model are good or not?" To which Jeff's response was "There are two categories for the answer: one is to look at neuroscience, and the other is methods for machine intelligence. In the neuroscience realm, there are many predictions that we can make, and those can be tested. If our theories explain a vast array of neuroscience observations then it tells us that we’re on the right track. In the machine learning world, they don’t care about that, only how well it works on practical problems. In our case that remains to be seen. To the extent you can solve a problem that no one was able to solve before, people will take notice." [10]

Third generation: sensorimotor inference

The third generation builds on the second generation and adds in a theory of sensorimotor inference in the neocortex. [11] [12] This theory proposes that cortical columns at every level of the hierarchy can learn complete models of objects over time and that features are learned at specific locations on the objects. The theory was expanded in 2018 and referred to as the Thousand Brains Theory. [13]

Comparison of neuron models

Comparing the artificial neural network (A), the biological neuron (B), and the HTM neuron (C). Neuron comparison.png
Comparing the artificial neural network (A), the biological neuron (B), and the HTM neuron (C).
Comparison of Neuron Models
Artificial Neural Network (ANN) Neocortical Pyramidal Neuron (Biological Neuron) HTM Model Neuron [8]
  • Few synapses
  • No dendrites
  • Sum input × weights
  • Learns by modifying weights of synapses
  • Thousands of synapses on the dendrites
  • Active dendrites: cell recognizes hundreds of unique patterns
  • Co-activation of a set of synapses on a dendritic segment causes an NMDA spike and depolarization at the soma [8]
  • Sources of input to the cell:
    1. Feedforward inputs which form synapses proximal to the soma and directly lead to action potentials
    2. NMDA spikes generated in the more distal basal [ clarification needed ]
    3. Apical dendrites that depolarize the soma (usually not sufficient enough to generate a somatic action potential)
  • Learns by growing new synapses
  • Inspired by the pyramidal cells in neocortex layers 2/3 and 5
  • Thousands of synapses
  • Active dendrites: cell recognizes hundreds of unique patterns
  • Models dendrites and NMDA spikes with each array of coincident detectors having a set of synapses
  • Learns by modeling the growth of new synapses

Comparing HTM and neocortex

HTM attempts to implement the functionality that is characteristic of a hierarchically related group of cortical regions in the neocortex. A region of the neocortex corresponds to one or more levels in the HTM hierarchy, while the hippocampus is remotely similar to the highest HTM level. A single HTM node may represent a group of cortical columns within a certain region.

Although it is primarily a functional model, several attempts have been made to relate the algorithms of the HTM with the structure of neuronal connections in the layers of neocortex. [14] [15] The neocortex is organized in vertical columns of 6 horizontal layers. The 6 layers of cells in the neocortex should not be confused with levels in an HTM hierarchy.

HTM nodes attempt to model a portion of cortical columns (80 to 100 neurons) with approximately 20 HTM "cells" per column. HTMs model only layers 2 and 3 to detect spatial and temporal features of the input with 1 cell per column in layer 2 for spatial "pooling", and 1 to 2 dozen per column in layer 3 for temporal pooling. A key to HTMs and the cortex's is their ability to deal with noise and variation in the input which is a result of using a "sparse distributive representation" where only about 2% of the columns are active at any given time.

An HTM attempts to model a portion of the cortex's learning and plasticity as described above. Differences between HTMs and neurons include: [16]

Sparse distributed representations

Integrating memory component with neural networks has a long history dating back to early research in distributed representations [17] [18] and self-organizing maps. For example, in sparse distributed memory (SDM), the patterns encoded by neural networks are used as memory addresses for content-addressable memory, with "neurons" essentially serving as address encoders and decoders. [19] [20]

Computers store information in dense representations such as a 32-bit word, where all combinations of 1s and 0s are possible. By contrast, brains use sparse distributed representations (SDRs). [21] The human neocortex has roughly 16 billion neurons, but at any given time only a small percent are active. The activities of neurons are like bits in a computer, and so the representation is sparse. Similar to SDM developed by NASA in the 80s [19] and vector space models used in Latent semantic analysis, HTM uses sparse distributed representations. [22]

The SDRs used in HTM are binary representations of data consisting of many bits with a small percentage of the bits active (1s); a typical implementation might have 2048 columns and 64K artificial neurons where as few as 40 might be active at once. Although it may seem less efficient for the majority of bits to go "unused" in any given representation, SDRs have two major advantages over traditional dense representations. First, SDRs are tolerant of corruption and ambiguity due to the meaning of the representation being shared (distributed) across a small percentage (sparse) of active bits. In a dense representation, flipping a single bit completely changes the meaning, while in an SDR a single bit may not affect the overall meaning much. This leads to the second advantage of SDRs: because the meaning of a representation is distributed across all active bits, the similarity between two representations can be used as a measure of semantic similarity in the objects they represent. That is, if two vectors in an SDR have 1s in the same position, then they are semantically similar in that attribute. The bits in SDRs have semantic meaning, and that meaning is distributed across the bits. [22]

The semantic folding theory [23] builds on these SDR properties to propose a new model for language semantics, where words are encoded into word-SDRs and the similarity between terms, sentences, and texts can be calculated with simple distance measures.

Similarity to other models

Bayesian networks

Likened to a Bayesian network, an HTM comprises a collection of nodes that are arranged in a tree-shaped hierarchy. Each node in the hierarchy discovers an array of causes in the input patterns and temporal sequences it receives. A Bayesian belief revision algorithm is used to propagate feed-forward and feedback beliefs from child to parent nodes and vice versa. However, the analogy to Bayesian networks is limited, because HTMs can be self-trained (such that each node has an unambiguous family relationship), cope with time-sensitive data, and grant mechanisms for covert attention.

A theory of hierarchical cortical computation based on Bayesian belief propagation was proposed earlier by Tai Sing Lee and David Mumford. [24] While HTM is mostly consistent with these ideas, it adds details about handling invariant representations in the visual cortex. [25]

Neural networks

Like any system that models details of the neocortex, HTM can be viewed as an artificial neural network. The tree-shaped hierarchy commonly used in HTMs resembles the usual topology of traditional neural networks. HTMs attempt to model cortical columns (80 to 100 neurons) and their interactions with fewer HTM "neurons". The goal of current HTMs is to capture as much of the functions of neurons and the network (as they are currently understood) within the capability of typical computers and in areas that can be made readily useful such as image processing. For example, feedback from higher levels and motor control is not attempted because it is not yet understood how to incorporate them and binary instead of variable synapses are used because they were determined to be sufficient in the current HTM capabilities.

LAMINART and similar neural networks researched by Stephen Grossberg attempt to model both the infrastructure of the cortex and the behavior of neurons in a temporal framework to explain neurophysiological and psychophysical data. However, these networks are, at present, too complex for realistic application. [26]

HTM is also related to work by Tomaso Poggio, including an approach for modeling the ventral stream of the visual cortex known as HMAX. Similarities of HTM to various AI ideas are described in the December 2005 issue of the Artificial Intelligence journal. [27]

Neocognitron

Neocognitron, a hierarchical multilayered neural network proposed by Professor Kunihiko Fukushima in 1987, is one of the first deep learning neural network models. [28]

See also

Related Research Articles

<span class="mw-page-title-main">Cerebral cortex</span> Outer layer of the cerebrum of the mammalian brain

The cerebral cortex, also known as the cerebral mantle, is the outer layer of neural tissue of the cerebrum of the brain in humans and other mammals. It is the largest site of neural integration in the central nervous system. and plays a key role in attention, perception, awareness, thought, memory, language, and consciousness. The cerebral cortex is the part of the brain responsible for cognition.

<span class="mw-page-title-main">Neocortex</span> Mammalian structure involved in higher-order brain functions

The neocortex, also called the neopallium, isocortex, or the six-layered cortex, is a set of layers of the mammalian cerebral cortex involved in higher-order brain functions such as sensory perception, cognition, generation of motor commands, spatial reasoning and language. The neocortex is further subdivided into the true isocortex and the proisocortex.

<span class="mw-page-title-main">Cortical minicolumn</span>

A cortical minicolumn (also called cortical microcolumn) is a vertical column through the cortical layers of the brain. Neurons within the microcolumn "receive common inputs, have common outputs, are interconnected, and may well constitute a fundamental computational unit of the cerebral cortex". Minicolumns comprise perhaps 80–120 neurons, except in the primate primary visual cortex (V1), where there are typically more than twice the number. There are about 2×108 minicolumns in humans. From calculations, the diameter of a minicolumn is about 28–40 μm. Minicolumns grow from progenitor cells within the embryo and contain neurons within multiple layers (2–6) of the cortex.

<span class="mw-page-title-main">Cortical column</span> Group of neurons in the cortex of the brain

A cortical column is a group of neurons forming a cylindrical structure through the cerebral cortex of the brain perpendicular to the cortical surface. The structure was first identified by Mountcastle in 1957. He later identified minicolumns as the basic units of the neocortex which were arranged into columns. Each contains the same types of neurons, connectivity, and firing properties. Columns are also called hypercolumn, macrocolumn, functional column or sometimes cortical module. Neurons within a minicolumn (microcolumn) encode similar features, whereas a hypercolumn "denotes a unit containing a full set of values for any given set of receptive field parameters". A cortical module is defined as either synonymous with a hypercolumn (Mountcastle) or as a tissue block of multiple overlapping hypercolumns.

The receptive field, or sensory space, is a delimited medium where some physiological stimuli can evoke a sensory neuronal response in specific organisms.

<i>On Intelligence</i> Book by Jeff Hawkins

On Intelligence: How a New Understanding of the Brain will Lead to the Creation of Truly Intelligent Machines is a 2004 book by Jeff Hawkins and Sandra Blakeslee. The book explains Hawkins' memory-prediction framework theory of the brain and describes some of its consequences.

The memory-prediction framework is a theory of brain function created by Jeff Hawkins and described in his 2004 book On Intelligence. This theory concerns the role of the mammalian neocortex and its associations with the hippocampi and the thalamus in matching sensory inputs to stored memory patterns and how this process leads to predictions of what will happen in the future.

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to the uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.

<span class="mw-page-title-main">Efficient coding hypothesis</span>

The efficient coding hypothesis was proposed by Horace Barlow in 1961 as a theoretical model of sensory coding in the brain. Within the brain, neurons communicate with one another by sending electrical impulses referred to as action potentials or spikes. One goal of sensory neuroscience is to decipher the meaning of these spikes in order to understand how the brain represents and processes information about the outside world. Barlow hypothesized that the spikes in the sensory system formed a neural code for efficiently representing sensory information. By efficient it is understood that the code minimized the number of spikes needed to transmit a given signal. This is somewhat analogous to transmitting information across the internet, where different file formats can be used to transmit a given image. Different file formats require different number of bits for representing the same image at given distortion level, and some are better suited for representing certain classes of images than others. According to this model, the brain is thought to use a code which is suited for representing visual and audio information representative of an organism's natural environment.

Neural coding is a neuroscience field concerned with characterising the hypothetical relationship between the stimulus and the neuronal responses, and the relationship among the electrical activities of the neurons in the ensemble. Based on the theory that sensory and other information is represented in the brain by networks of neurons, it is believed that neurons can encode both digital and analog information.

Leabra stands for local, error-driven and associative, biologically realistic algorithm. It is a model of learning which is a balance between Hebbian and error-driven learning with other network-derived characteristics. This model is used to mathematically predict outcomes based on inputs and previous learning influences. This model is heavily influenced by and contributes to neural network designs and models. This algorithm is the default algorithm in emergent when making a new project, and is extensively used in various simulations.

Neural cliques are network-level memory coding units in the hippocampus. They are functionally organized in a categorical and hierarchical manner. Researchers investigating the role of neural cliques have gained insight into the process of storing memories in the brain. Research evidence suggests that memory of events is achieved not through memorization of exact event details but through recreation of select images based on cognitive significance. This process enables the brain to exhibit large storage capacity and facilitates the capacity for abstract reasoning and generalization. Although several studies converges in the demonstration that real-time patterns of memory traces and sensory inputs are retained in the form of neural cliques, the topic is currently in active research in order to fully understand this biological code.

There are many types of artificial neural networks (ANN).

A Bayesian Confidence Propagation Neural Network (BCPNN) is an artificial neural network inspired by Bayes' theorem, which regards neural computation and processing as probabilistic inference. Neural unit activations represent probability ("confidence") in the presence of input features or categories, synaptic weights are based on estimated correlations and the spread of activation corresponds to calculating posterior probabilities. It was originally proposed by Anders Lansner and Örjan Ekeberg at KTH Royal Institute of Technology. This probabilistic neural network model can also be run in generative mode to produce spontaneous activations and temporal sequences.

Neural decoding is a neuroscience field concerned with the hypothetical reconstruction of sensory and other stimuli from information that has already been encoded and represented in the brain by networks of neurons. Reconstruction refers to the ability of the researcher to predict what sensory stimuli the subject is receiving based purely on neuron action potentials. Therefore, the main goal of neural decoding is to characterize how the electrical activity of neurons elicit activity and responses in the brain.

Sparse distributed memory (SDM) is a mathematical model of human long-term memory introduced by Pentti Kanerva in 1988 while he was at NASA Ames Research Center.

<span class="mw-page-title-main">Feature learning</span> Set of learning techniques in machine learning

In machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task.

Catastrophic interference, also known as catastrophic forgetting, is the tendency of an artificial neural network to abruptly and drastically forget previously learned information upon learning new information. Neural networks are an important part of the network approach and connectionist approach to cognitive science. With these networks, human capabilities such as memory and learning can be modeled using computer simulations.

Convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns feature engineering by itself via filters optimization. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

Semantic folding theory describes a procedure for encoding the semantics of natural language text in a semantically grounded binary representation. This approach provides a framework for modelling how language data is processed by the neocortex.

References

  1. Cui, Yuwei; Ahmad, Subutai; Hawkins, Jeff (2016). "Continuous Online Sequence Learning with an Unsupervised Neural Network Model". Neural Computation. 28 (11): 2474–2504. arXiv: 1512.05463 . doi:10.1162/NECO_a_00893. PMID   27626963. S2CID   3937908.
  2. Ahmad, Subutai; Lavin, Alexander; Purdy, Scott; Agha, Zuha (2017). "Unsupervised real-time anomaly detection for streaming data". Neurocomputing. 262: 134–147. doi: 10.1016/j.neucom.2017.04.070 .
  3. "Preliminary details about new theory work on sensory-motor inference". HTM Forum. 2016-06-03.
  4. HTM Retrospective on YouTube
  5. "Numenta old documentation". numenta.com. Archived from the original on 2009-05-27.
  6. Jeff Hawkins lecture describing cortical learning algorithms on YouTube
  7. Cui, Yuwei; Ahmad, Subutai; Hawkins, Jeff (2017). "The HTM Spatial Pooler—A Neocortical Algorithm for Online Sparse Distributed Coding". Frontiers in Computational Neuroscience. 11: 111. doi: 10.3389/fncom.2017.00111 . PMC   5712570 . PMID   29238299.
  8. 1 2 3 Hawkins, Jeff; Ahmad, Subutai (30 March 2016). "Why Neurons Have Thousands of Synapses, a Theory of Sequence Memory in Neocortex". Front. Neural Circuits. 10: 23. doi: 10.3389/fncir.2016.00023 . PMC   4811948 . PMID   27065813.
  9. "Grok Product Page". grokstream.com. Archived from the original on 2019-04-26. Retrieved 2017-08-12.
  10. Laserson, Jonathan (September 2011). "From Neural Networks to Deep Learning: Zeroing in on the Human Brain" (PDF). XRDS. 18 (1). doi:10.1145/2000775.2000787. S2CID   21496694.
  11. Hawkins, Jeff; Ahmad, Subutai; Cui, Yuwei (2017). "A Theory of How Columns in the Neocortex Enable Learning the Structure of the World". Frontiers in Neural Circuits. 11: 81. doi: 10.3389/fncir.2017.00081 . PMC   5661005 . PMID   29118696.
  12. Have We Missed Half of What the Neocortex Does? Allocentric Location as the Basis of Perception on YouTube
  13. "Numenta publishes breakthrough theory for intelligence and cortical computation". eurekalert.org. 2019-01-14.
  14. Hawkins, Jeff; Blakeslee, Sandra. On Intelligence .
  15. George, Dileep; Hawkins, Jeff (2009). "Towards a Mathematical Theory of Cortical Micro-circuits". PLOS Computational Biology. 5 (10): e1000532. Bibcode:2009PLSCB...5E0532G. doi: 10.1371/journal.pcbi.1000532 . PMC   2749218 . PMID   19816557.
  16. "HTM Cortical Learning Algorithms" (PDF). numenta.org.
  17. Hinton, Geoffrey E. (1984). Distributed representations (PDF) (Technical report). Computer Science Department, Carnegie-Mellon University. CMU-CS-84-157.
  18. Plate, Tony (1991). "Holographic Reduced Representations: Convolution Algebra for Compositional Distributed Representations" (PDF). IJCAI.
  19. 1 2 Kanerva, Pentti (1988). Sparse distributed memory. MIT press. ISBN   9780262111324.
  20. Snaider, Javier; Franklin, Stan (2012). Integer sparse distributed memory (PDF). Twenty-fifth international flairs conference. S2CID   17547390. Archived from the original (PDF) on 2017-12-29.
  21. Olshausen, Bruno A.; Field, David J. (1997). "Sparse coding with an overcomplete basis set: A strategy employed by V1?". Vision Research. 37 (23): 3311–3325. doi: 10.1016/S0042-6989(97)00169-7 . PMID   9425546. S2CID   14208692.
  22. 1 2 Ahmad, Subutai; Hawkins, Jeff (2016). "Numenta NUPIC – sparse distributed representations". arXiv: 1601.00720 [q-bio.NC].
  23. De Sousa Webber, Francisco (2015). "Semantic Folding Theory And its Application in Semantic Fingerprinting". arXiv: 1511.08855 [cs.AI].
  24. Lee, Tai Sing; Mumford, David (2002). "Hierarchical Bayesian Inference in the Visual Cortex". Journal of the Optical Society of America A. 20 (7): 1434–48. CiteSeerX   10.1.1.12.2565 . doi:10.1364/josaa.20.001434. PMID   12868647.
  25. George, Dileep (2010-07-24). "Hierarchical Bayesian inference in the visual cortex". dileepgeorge.com. Archived from the original on 2019-08-01.
  26. Grossberg, Stephen (2007). Cisek, Paul; Drew, Trevor; Kalaska, John (eds.). Towards a unified theory of neocortex: Laminar cortical circuits for vision and cognition. Technical Report CAS/CNS-TR-2006-008. For Computational Neuroscience: From Neurons to Theory and Back Again (PDF) (Report). Amsterdam: Elsevier. pp. 79–104. Archived from the original (PDF) on 2017-08-29.
  27. "Special Review Issue". Artificial Intelligence. 169 (2): 103–212. December 2005.
  28. Fukushima, Kunihiko (2007). "Neocognitron". Scholarpedia. 2 (1): 1717. Bibcode:2007SchpJ...2.1717F. doi: 10.4249/scholarpedia.1717 .

Further reading