Neural Turing machine

Last updated

A neural Turing machine (NTM) is a recurrent neural network model of a Turing machine. The approach was published by Alex Graves et al. in 2014. [1] NTMs combine the fuzzy pattern matching capabilities of neural networks with the algorithmic power of programmable computers.

An NTM has a neural network controller coupled to external memory resources, which it interacts with through attentional mechanisms. The memory interactions are differentiable end-to-end, making it possible to optimize them using gradient descent. [2] An NTM with a long short-term memory (LSTM) network controller can infer simple algorithms such as copying, sorting, and associative recall from examples alone. [1]

The authors of the original NTM paper did not publish their source code. [1] The first stable open-source implementation was published in 2018 at the 27th International Conference on Artificial Neural Networks, receiving a best-paper award. [3] [4] [5] Other open source implementations of NTMs exist but as of 2018 they are not sufficiently stable for production use. [6] [7] [8] [9] [10] [11] [12] The developers either report that the gradients of their implementation sometimes become NaN during training for unknown reasons and cause training to fail; [10] [11] [9] report slow convergence; [7] [6] or do not report the speed of learning of their implementation. [12] [8]

Differentiable neural computers are an outgrowth of Neural Turing machines, with attention mechanisms that control where the memory is active, and improve performance. [13]

Related Research Articles

<span class="mw-page-title-main">Artificial neural network</span> Computational model used in machine learning, based on connected, hierarchical functions

Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains.

<span class="mw-page-title-main">Jürgen Schmidhuber</span> German computer scientist

Jürgen Schmidhuber is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland.

<span class="mw-page-title-main">Geoffrey Hinton</span> British-Canadian computer scientist and psychologist (born 1947)

Geoffrey Everest Hinton is a British-Canadian cognitive psychologist and computer scientist, most noted for his work on artificial neural networks. Since 2013, he has divided his time working for Google and the University of Toronto. In 2017, he co-founded and became the Chief Scientific Advisor of the Vector Institute in Toronto.

A cognitive architecture refers to both a theory about the structure of the human mind and to a computational instantiation of such a theory used in the fields of artificial intelligence (AI) and computational cognitive science. The formalized models can be used to further refine a comprehensive theory of cognition and as a useful artificial intelligence program. Successful cognitive architectures include ACT-R and SOAR. The research on cognitive architectures as software instantiation of cognitive theories was initiated by Allen Newell in 1990.

<span class="mw-page-title-main">Recurrent neural network</span> Computational model used in machine learning

A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. Recurrent neural networks are theoretically Turing complete and can run arbitrary programs to process arbitrary sequences of inputs.

<span class="mw-page-title-main">Neural network</span> Structure in biology and artificial intelligence

A neural network can refer to either a neural circuit of biological neurons, or a network of artificial neurons or nodes in the case of an artificial neural network. Artificial neural networks are used for solving artificial intelligence (AI) problems; they model connections of biological neurons as weights between nodes. A positive weight reflects an excitatory connection, while negative values mean inhibitory connections. All inputs are modified by a weight and summed. This activity is referred to as a linear combination. Finally, an activation function controls the amplitude of the output. For example, an acceptable range of output is usually between 0 and 1, or it could be −1 and 1.

<span class="mw-page-title-main">Long short-term memory</span> Artificial recurrent neural network architecture used in deep learning

Long short-term memory (LSTM) is an artificial neural network used in the fields of artificial intelligence and deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. Such a recurrent neural network (RNN) can process not only single data points, but also entire sequences of data. This characteristic makes LSTM networks ideal for processing and predicting data. For example, LSTM is applicable to tasks such as unsegmented, connected handwriting recognition, speech recognition, machine translation, speech activity detection, robot control, video games, and healthcare.

There are many types of artificial neural networks (ANN).

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.

<span class="mw-page-title-main">MNIST database</span> Database of handwritten digits

The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.

<span class="mw-page-title-main">Stan (software)</span> Probabilistic programming language for Bayesian inference

Stan is a probabilistic programming language for statistical inference written in C++. The Stan language is used to specify a (Bayesian) statistical model with an imperative program calculating the log probability density function.

<span class="mw-page-title-main">DeepDream</span> Software program

DeepDream is a computer vision program created by Google engineer Alexander Mordvintsev that uses a convolutional neural network to find and enhance patterns in images via algorithmic pareidolia, thus creating a dream-like appearance reminiscent of a psychedelic experience in the deliberately overprocessed images.

<span class="mw-page-title-main">TensorFlow</span> Machine learning software library

TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks.

Alex Graves is a computer scientist. Before working as a research scientist at DeepMind, he earned a BSc in Theoretical Physics from the University of Edinburgh and a PhD in artificial intelligence under Jürgen Schmidhuber at IDSIA. He was also a postdoc under Schmidhuber at the Technical University of Munich and under Geoffrey Hinton at the University of Toronto.

<span class="mw-page-title-main">Wojciech Zaremba</span> Polish-American computer scientist (born 1988)

Wojciech Zaremba is a Polish-American computer scientist, a founding team member of OpenAI (2016–present), where he leads both the Codex research and language teams. The teams actively work on AI that writes computer code and creating successors to GPT-3 respectively. The mission of OpenAI is to build safe artificial intelligence (AI), and ensure that its benefits are as evenly distributed as possible.

<span class="mw-page-title-main">Differentiable neural computer</span> Artificial neural network architecture

In artificial intelligence, a differentiable neural computer (DNC) is a memory augmented neural network architecture (MANN), which is typically recurrent in its implementation. The model was published in 2016 by Alex Graves et al. of DeepMind.

<span class="mw-page-title-main">Neural architecture search</span> Machine learning-powered structure design

Neural architecture search (NAS) is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par or outperform hand-designed architectures. Methods for NAS can be categorized according to the search space, search strategy and performance estimation strategy used:

SqueezeNet is the name of a deep neural network for computer vision that was released in 2016. SqueezeNet was developed by researchers at DeepScale, University of California, Berkeley, and Stanford University. In designing SqueezeNet, the authors' goal was to create a smaller neural network with fewer parameters that can more easily fit into computer memory and can more easily be transmitted over a computer network.

<span class="mw-page-title-main">Owl Scientific Computing</span> Numerical programming library for the OCaml programming language

Owl Scientific Computing is a software system for scientific and engineering computing developed in the Department of Computer Science and Technology, University of Cambridge. The System Research Group (SRG) in the department recognises Owl as one of the representative systems developed in SRG in the 2010s. The source code is licensed under the MIT License and can be accessed from the GitHub repository.

<span class="mw-page-title-main">Attention (machine learning)</span> Machine learning technique

In artificial neural networks, attention is a technique that is meant to mimic cognitive attention. The effect enhances some parts of the input data while diminishing other parts — the motivation being that the network should devote more focus to the important parts of the data, even though they may be small. Learning which part of the data is more important than another depends on the context, and this is trained by gradient descent.

References

  1. 1 2 3 Graves, Alex; Wayne, Greg; Danihelka, Ivo (2014). "Neural Turing Machines". arXiv: 1410.5401 [cs.NE].
  2. "Deep Minds: An Interview with Google's Alex Graves & Koray Kavukcuoglu" . Retrieved May 17, 2016.
  3. Collier, Mark; Beel, Joeran (2018), "Implementing Neural Turing Machines", Artificial Neural Networks and Machine Learning – ICANN 2018, Springer International Publishing, pp. 94–104, arXiv: 1807.08518 , Bibcode:2018arXiv180708518C, doi:10.1007/978-3-030-01424-7_10, ISBN   9783030014230, S2CID   49908746
  4. "MarkPKCollier/NeuralTuringMachine". GitHub. Retrieved 2018-10-20.
  5. Beel, Joeran (2018-10-20). "Best-Paper Award for our Publication "Implementing Neural Turing Machines" at the 27th International Conference on Artificial Neural Networks | Prof. Joeran Beel (TCD Dublin)". Trinity College Dublin, School of Computer Science and Statistics Blog. Retrieved 2018-10-20.
  6. 1 2 "snowkylin/ntm". GitHub. Retrieved 2018-10-20.
  7. 1 2 "chiggum/Neural-Turing-Machines". GitHub. Retrieved 2018-10-20.
  8. 1 2 "yeoedward/Neural-Turing-Machine". GitHub. 2017-09-13. Retrieved 2018-10-20.
  9. 1 2 "camigord/Neural-Turing-Machine". GitHub. Retrieved 2018-10-20.
  10. 1 2 "carpedm20/NTM-tensorflow". GitHub. Retrieved 2018-10-20.
  11. 1 2 "snipsco/ntm-lasagne". GitHub. Retrieved 2018-10-20.
  12. 1 2 "loudinthecloud/pytorch-ntm". GitHub. Retrieved 2018-10-20.
  13. Administrator. "DeepMind's Differentiable Neural Network Thinks Deeply". www.i-programmer.info. Retrieved 2016-10-20.