Differentiable neural computer

Last updated
A differentiable neural computer being trained to store and recall dense binary numbers. Performance of a reference task during training shown. Upper left: the input (red) and target (blue), as 5-bit words and a 1 bit interrupt signal. Upper right: the model's output. DNC training recall task.gif
A differentiable neural computer being trained to store and recall dense binary numbers. Performance of a reference task during training shown. Upper left: the input (red) and target (blue), as 5-bit words and a 1 bit interrupt signal. Upper right: the model's output.

In artificial intelligence, a differentiable neural computer (DNC) is a memory augmented neural network architecture (MANN), which is typically (but not by definition) recurrent in its implementation. The model was published in 2016 by Alex Graves et al. of DeepMind. [1]

Contents

Applications

DNC indirectly takes inspiration from Von-Neumann architecture, making it likely to outperform conventional architectures in tasks that are fundamentally algorithmic that cannot be learned by finding a decision boundary.

So far, DNCs have been demonstrated to handle only relatively simple tasks, which can be solved using conventional programming. But DNCs don't need to be programmed for each problem, but can instead be trained. This attention span allows the user to feed complex data structures such as graphs sequentially, and recall them for later use. Furthermore, they can learn aspects of symbolic reasoning and apply it to working memory. The researchers who published the method see promise that DNCs can be trained to perform complex, structured tasks [1] [2] and address big-data applications that require some sort of reasoning, such as generating video commentaries or semantic text analysis. [3] [4]

DNC can be trained to navigate rapid transit systems, and apply that network to a different system. A neural network without memory would typically have to learn about each transit system from scratch. On graph traversal and sequence-processing tasks with supervised learning, DNCs performed better than alternatives such as long short-term memory or a neural turing machine. [5] With a reinforcement learning approach to a block puzzle problem inspired by SHRDLU, DNC was trained via curriculum learning, and learned to make a plan. It performed better than a traditional recurrent neural network. [5]

Architecture

DNC system diagram Differentiable Neural Computer.svg
DNC system diagram

DNC networks were introduced as an extension of the Neural Turing Machine (NTM), with the addition of memory attention mechanisms that control where the memory is stored, and temporal attention that records the order of events. This structure allows DNCs to be more robust and abstract than a NTM, and still perform tasks that have longer-term dependencies than some predecessors such as Long Short Term Memory (LSTM). The memory, which is simply a matrix, can be allocated dynamically and accessed indefinitely. The DNC is differentiable end-to-end (each subcomponent of the model is differentiable, therefore so is the whole model). This makes it possible to optimize them efficiently using gradient descent. [3] [6] [7]

The DNC model is similar to the Von Neumann architecture, and because of the resizability of memory, it is Turing complete. [8]

Traditional DNC

DNC, as originally published [1]

Independent variables
Input vector
Target vector
Controller
Controller input matrix


Deep (layered) LSTM
Input gate vector
Output gate vector
Forget gate vector
State gate vector,
Hidden gate vector,


DNC output vector
Read & Write heads
Interface parameters


Read heads
Read keys
Read strengths
Free gates
Read modes,


Write head
Write key
Write strength
Erase vector
Write vector
Allocation gate
Write gate
Memory
Memory matrix,
Matrix of ones
Usage vector
Precedence weighting,
Temporal link matrix,
Write weighting
Read weighting
Read vectors


Content-based addressing,
Lookup key , key strength
Indices of ,
sorted in ascending order of usage
Allocation weighting
Write content weighting
Read content weighting
Forward weighting
Backward weighting
Memory retention vector
Definitions
Weight matrix, bias vector
Zeros matrix, ones matrix, identity matrix
Element-wise multiplication
Cosine similarity
Sigmoid function
Oneplus function
   for j = 1, …, K. Softmax function

Extensions

Refinements include sparse memory addressing, which reduces time and space complexity by thousands of times. This can be achieved by using an approximate nearest neighbor algorithm, such as Locality-sensitive hashing, or a random k-d tree like Fast Library for Approximate Nearest Neighbors from UBC. [9] Adding Adaptive Computation Time (ACT) separates computation time from data time, which uses the fact that problem length and problem difficulty are not always the same. [10] Training using synthetic gradients performs considerably better than Backpropagation through time (BPTT). [11] Robustness can be improved with use of layer normalization and Bypass Dropout as regularization. [12]

See also

Related Research Articles

<span class="mw-page-title-main">Artificial neural network</span> Computational model used in machine learning, based on connected, hierarchical functions

Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains.

<span class="mw-page-title-main">Jürgen Schmidhuber</span> German computer scientist

Jürgen Schmidhuber is a German computer scientist most noted for his work in the field of artificial intelligence, deep learning and artificial neural networks. He is a co-director of the Dalle Molle Institute for Artificial Intelligence Research in Lugano, in Ticino in southern Switzerland. Following Google Scholar, from 2016 to 2021 he has received more than 100,000 scientific citations. He has been referred to as "father of modern AI," "father of AI," "dad of mature AI," "Papa" of famous AI products, "Godfather," and "father of deep learning."

A cognitive architecture refers to both a theory about the structure of the human mind and to a computational instantiation of such a theory used in the fields of artificial intelligence (AI) and computational cognitive science. The formalized models can be used to further refine a comprehensive theory of cognition and as a useful artificial intelligence program. Successful cognitive architectures include ACT-R and SOAR. The research on cognitive architectures as software instantiation of cognitive theories was initiated by Allen Newell in 1990.

<span class="mw-page-title-main">Recurrent neural network</span> Computational model used in machine learning

A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. Recurrent neural networks are theoretically Turing complete and can run arbitrary programs to process arbitrary sequences of inputs.

Meta learning is a subfield of machine learning where automatic learning algorithms are applied to metadata about machine learning experiments. As of 2017 the term had not found a standard interpretation, however the main goal is to use such metadata to understand how automatic learning can become flexible in solving learning problems, hence to improve the performance of existing learning algorithms or to learn (induce) the learning algorithm itself, hence the alternative term learning to learn.

<span class="mw-page-title-main">Long short-term memory</span> Artificial recurrent neural network architecture used in deep learning

Long short-term memory (LSTM) is an artificial neural network used in the fields of artificial intelligence and deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. Such a recurrent neural network (RNN) can process not only single data points, but also entire sequences of data. For example, LSTM is applicable to tasks such as unsegmented, connected handwriting recognition, speech recognition, machine translation, robot control, video games, and healthcare.

There are many types of artificial neural networks (ANN).

<span class="mw-page-title-main">Sepp Hochreiter</span>

Josef "Sepp" Hochreiter is a German computer scientist. Since 2018 he has led the Institute for Machine Learning at the Johannes Kepler University of Linz after having led the Institute of Bioinformatics from 2006 to 2018. In 2017 he became the head of the Linz Institute of Technology (LIT) AI Lab. Hochreiter is also a founding director of the Institute of Advanced Research in Artificial Intelligence (IARAI). Previously, he was at the Technical University of Berlin, at the University of Colorado at Boulder, and at the Technical University of Munich. He is a chair of the Critical Assessment of Massive Data Analysis (CAMDA) conference.

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.

<span class="mw-page-title-main">DeepMind</span> Artificial intelligence company owned by Google

DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was acquired by Google in 2014 and became a wholly owned subsidiary of Alphabet Inc, after Google's restructuring in 2015. The company is based in London, with research centres in Canada, France, and the United States.

<span class="mw-page-title-main">Vanishing gradient problem</span> Machine learning model training problem

In machine learning, the vanishing gradient problem is encountered when training artificial neural networks with gradient-based learning methods and backpropagation. In such methods, during each iteration of training each of the neural network's weights receives an update proportional to the partial derivative of the error function with respect to the current weight. The problem is that in some cases, the gradient will be vanishingly small, effectively preventing the weight from changing its value. In the worst case, this may completely stop the neural network from further training. As one example of the problem cause, traditional activation functions such as the hyperbolic tangent function have gradients in the range (0,1], and backpropagation computes gradients by the chain rule. This has the effect of multiplying n of these small numbers to compute gradients of the early layers in an n-layer network, meaning that the gradient decreases exponentially with n while the early layers train very slowly.

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence, its sub-disciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

A Neural Turing machine (NTM) is a recurrent neural network model of a Turing machine. The approach was published by Alex Graves et al. in 2014. NTMs combine the fuzzy pattern matching capabilities of neural networks with the algorithmic power of programmable computers.

Alex Graves was a research scientist at DeepMind. He did a BSc in Theoretical Physics at Edinburgh and obtained a PhD in AI under Jürgen Schmidhuber at IDSIA. He was also a postdoc under Jürgen Schmidhuber at TU Munich and under Geoffrey Hinton at the University of Toronto.

<span class="mw-page-title-main">Neural architecture search</span> Machine learning-powered structure design

Neural architecture search (NAS) is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par or outperform hand-designed architectures. Methods for NAS can be categorized according to the search space, search strategy and performance estimation strategy used:

<span class="mw-page-title-main">Deep reinforcement learning</span> Machine learning that combines deep learning and reinforcement learning

Deep reinforcement learning is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs and decide what actions to perform to optimize an objective. Deep reinforcement learning has been used for a diverse set of applications including but not limited to robotics, video games, natural language processing, computer vision, education, transportation, finance and healthcare.

The history of artificial neural networks (ANN) began with Warren McCulloch and Walter Pitts (1943) who created a computational model for neural networks based on algorithms called threshold logic. This model paved the way for research to split into two approaches. One approach focused on biological processes while the other focused on the application of neural networks to artificial intelligence. This work led to work on nerve networks and their link to finite automata.

Timothy P. Lillicrap is a Canadian neuroscientist and AI researcher, adjunct professor at University College London, and staff research scientist at Google DeepMind, where he has been involved in the AlphaGo and AlphaZero projects mastering the games of Go, Chess and Shogi. His research focuses on machine learning and statistics for optimal control and decision making, as well as using these mathematical frameworks to understand how the brain learns. He has developed algorithms and approaches for exploiting deep neural networks in the context of reinforcement learning, and new recurrent memory architectures for one-shot learning.

<span class="mw-page-title-main">Attention (machine learning)</span> Machine learning technique

In artificial neural networks, attention is a technique that is meant to mimic cognitive attention. The effect enhances some parts of the input data while diminishing other parts — the motivation being that the network should devote more focus to the small, but important, parts of the data. Learning which part of the data is more important than another depends on the context, and this is trained by gradient descent.

<span class="mw-page-title-main">GPT-2</span> 2019 text-generating language model

Generative Pre-trained Transformer 2 (GPT-2) is an open-source artificial intelligence created by OpenAI in February 2019. GPT-2 translates text, answers questions, summarizes passages, and generates text output on a level that, while sometimes indistinguishable from that of humans, can become repetitive or nonsensical when generating long passages. It is a general-purpose learner; it was not specifically trained to do any of these tasks, and its ability to perform them is an extension of its general ability to accurately synthesize the next item in an arbitrary sequence. GPT-2 was created as a "direct scale-up" of OpenAI's 2018 GPT model, with a ten-fold increase in both its parameter count and the size of its training dataset.

References

  1. 1 2 3 Graves, Alex; Wayne, Greg; Reynolds, Malcolm; Harley, Tim; Danihelka, Ivo; Grabska-Barwińska, Agnieszka; Colmenarejo, Sergio Gómez; Grefenstette, Edward; Ramalho, Tiago (2016-10-12). "Hybrid computing using a neural network with dynamic external memory". Nature. 538 (7626): 471–476. Bibcode:2016Natur.538..471G. doi:10.1038/nature20101. ISSN   1476-4687. PMID   27732574. S2CID   205251479.
  2. "Differentiable neural computers | DeepMind". DeepMind. Retrieved 2016-10-19.
  3. 1 2 Burgess, Matt. "DeepMind's AI learned to ride the London Underground using human-like reason and memory". WIRED UK. Retrieved 2016-10-19.
  4. Jaeger, Herbert (2016-10-12). "Artificial intelligence: Deep neural reasoning". Nature. 538 (7626): 467–468. Bibcode:2016Natur.538..467J. doi: 10.1038/nature19477 . ISSN   1476-4687. PMID   27732576.
  5. 1 2 James, Mike. "DeepMind's Differentiable Neural Network Thinks Deeply". www.i-programmer.info. Retrieved 2016-10-20.
  6. "DeepMind AI 'Learns' to Navigate London Tube". PCMAG. Retrieved 2016-10-19.
  7. Mannes, John. "DeepMind's differentiable neural computer helps you navigate the subway with its memory". TechCrunch. Retrieved 2016-10-19.
  8. "RNN Symposium 2016: Alex Graves - Differentiable Neural Computer". YouTube .
  9. Jack W Rae; Jonathan J Hunt; Harley, Tim; Danihelka, Ivo; Senior, Andrew; Wayne, Greg; Graves, Alex; Timothy P Lillicrap (2016). "Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes". arXiv: 1610.09027 [cs.LG].
  10. Graves, Alex (2016). "Adaptive Computation Time for Recurrent Neural Networks". arXiv: 1603.08983 [cs.NE].
  11. Jaderberg, Max; Wojciech Marian Czarnecki; Osindero, Simon; Vinyals, Oriol; Graves, Alex; Silver, David; Kavukcuoglu, Koray (2016). "Decoupled Neural Interfaces using Synthetic Gradients". arXiv: 1608.05343 [cs.LG].
  12. Franke, Jörg; Niehues, Jan; Waibel, Alex (2018). "Robust and Scalable Differentiable Neural Computer for Question Answering". arXiv: 1807.02658 [cs.CL].