Neural Turing machine

Last updated

A neural Turing machine (NTM) is a recurrent neural network model of a Turing machine. The approach was published by Alex Graves et al. in 2014. [1] NTMs combine the fuzzy pattern matching capabilities of neural networks with the algorithmic power of programmable computers.

An NTM has a neural network controller coupled to external memory resources, which it interacts with through attentional mechanisms. The memory interactions are differentiable end-to-end, making it possible to optimize them using gradient descent. [2] An NTM with a long short-term memory (LSTM) network controller can infer simple algorithms such as copying, sorting, and associative recall from examples alone. [1]

The authors of the original NTM paper did not publish their source code. [1] The first stable open-source implementation was published in 2018 at the 27th International Conference on Artificial Neural Networks, receiving a best-paper award. [3] [4] [5] Other open source implementations of NTMs exist but as of 2018 they are not sufficiently stable for production use. [6] [7] [8] [9] [10] [11] [12] The developers either report that the gradients of their implementation sometimes become NaN during training for unknown reasons and cause training to fail; [10] [11] [9] report slow convergence; [7] [6] or do not report the speed of learning of their implementation. [12] [8]

Differentiable neural computers are an outgrowth of Neural Turing machines, with attention mechanisms that control where the memory is active, and improve performance. [13]

Related Research Articles

<span class="mw-page-title-main">Neural network (machine learning)</span> Computational model used in machine learning, based on connected, hierarchical functions

In machine learning, a neural network is a model inspired by the structure and function of biological neural networks in animal brains.

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech-to-text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

<span class="mw-page-title-main">Jürgen Schmidhuber</span> German computer scientist

Jürgen Schmidhuber is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland. He is also director of the Artificial Intelligence Initiative and professor of the Computer Science program in the Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) division at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.

Neuroevolution, or neuro-evolution, is a form of artificial intelligence that uses evolutionary algorithms to generate artificial neural networks (ANN), parameters, and rules. It is most commonly applied in artificial life, general game playing and evolutionary robotics. The main benefit is that neuroevolution can be applied more widely than supervised learning algorithms, which require a syllabus of correct input-output pairs. In contrast, neuroevolution requires only a measure of a network's performance at a task. For example, the outcome of a game can be easily measured without providing labeled examples of desired strategies. Neuroevolution is commonly used as part of the reinforcement learning paradigm, and it can be contrasted with conventional deep learning techniques that use backpropagation with a fixed topology.

<span class="mw-page-title-main">Geoffrey Hinton</span> British computer scientist (born 1947)

Geoffrey Everest Hinton is a British-Canadian computer scientist, cognitive scientist, cognitive psychologist, known for his work on artificial neural networks which earned him the title as the "Godfather of AI".

A cognitive architecture refers to both a theory about the structure of the human mind and to a computational instantiation of such a theory used in the fields of artificial intelligence (AI) and computational cognitive science. These formalized models can be used to further refine comprehensive theories of cognition and serve as the frameworks for useful artificial intelligence programs. Successful cognitive architectures include ACT-R and SOAR. The research on cognitive architectures as software instantiation of cognitive theories was initiated by Allen Newell in 1990.

Recurrent neural networks (RNNs) are a class of artificial neural network commonly used for sequential data processing. Unlike feedforward neural networks, which process data in a single pass, RNNs process data across multiple time steps, making them well-adapted for modelling and processing text, speech, and time series.

<span class="mw-page-title-main">Quantum neural network</span> Quantum Mechanics in Neural Networks

Quantum neural networks are computational neural network models which are based on the principles of quantum mechanics. The first ideas on quantum neural computation were published independently in 1995 by Subhash Kak and Ron Chrisley, engaging with the theory of quantum mind, which posits that quantum effects play a role in cognitive function. However, typical research in quantum neural networks involves combining classical artificial neural network models with the advantages of quantum information in order to develop more efficient algorithms. One important motivation for these investigations is the difficulty to train classical neural networks, especially in big data applications. The hope is that features of quantum computing such as quantum parallelism or the effects of interference and entanglement can be used as resources. Since the technological implementation of a quantum computer is still in a premature stage, such quantum neural network models are mostly theoretical proposals that await their full implementation in physical experiments.

<span class="mw-page-title-main">Long short-term memory</span> Type of recurrent neural network architecture

Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at mitigating the vanishing gradient problem commonly encountered by traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models, and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps. The name is made in analogy with long-term memory and short-term memory and their relationship, studied by cognitive psychologists since the early 20th century.

There are many types of artificial neural networks (ANN).

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is a subset of machine learning that focuses on utilizing neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience and is centered around stacking artificial neurons into layers and "training" them to process data. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

<span class="mw-page-title-main">TensorFlow</span> Machine learning software library

TensorFlow is a software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for training and inference of neural networks. It is one of the most popular deep learning frameworks, alongside others such as PyTorch and PaddlePaddle. It is free and open-source software released under the Apache License 2.0.

The following tables compare notable software frameworks, libraries, and computer programs for deep learning applications.

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence (AI), its subdisciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

Alex Graves is a computer scientist.

Wojciech Zaremba is a Polish computer scientist, a founding team member of OpenAI (2016–present), where he leads both the Codex research and language teams. The teams actively work on AI that writes computer code and creating successors to GPT-3 respectively.

<span class="mw-page-title-main">Differentiable neural computer</span> Artificial neural network architecture

In artificial intelligence, a differentiable neural computer (DNC) is a memory augmented neural network architecture (MANN), which is typically recurrent in its implementation. The model was published in 2016 by Alex Graves et al. of DeepMind.

Artificial neural networks (ANNs) are models created using machine learning to perform a number of tasks. Their creation was inspired by biological neural circuitry. While some of the computational implementations ANNs relate to earlier discoveries in mathematics, the first implementation of ANNs was by psychologist Frank Rosenblatt, who developed the perceptron. Little research was conducted on ANNs in the 1970s and 1980s, with the AAAI calling this period an "AI winter".

LightGBM, short for Light Gradient-Boosting Machine, is a free and open-source distributed gradient-boosting framework for machine learning, originally developed by Microsoft. It is based on decision tree algorithms and used for ranking, classification and other machine learning tasks. The development focus is on performance and scalability.

<span class="mw-page-title-main">Owl Scientific Computing</span> Numerical programming library for the OCaml programming language

Owl Scientific Computing is a software system for scientific and engineering computing developed in the Department of Computer Science and Technology, University of Cambridge. The System Research Group (SRG) in the department recognises Owl as one of the representative systems developed in SRG in the 2010s. The source code is licensed under the MIT License and can be accessed from the GitHub repository.

References

  1. 1 2 3 Graves, Alex; Wayne, Greg; Danihelka, Ivo (2014). "Neural Turing Machines". arXiv: 1410.5401 [cs.NE].
  2. "Deep Minds: An Interview with Google's Alex Graves & Koray Kavukcuoglu" . Retrieved May 17, 2016.
  3. Collier, Mark; Beel, Joeran (2018), "Implementing Neural Turing Machines", Artificial Neural Networks and Machine Learning – ICANN 2018, Springer International Publishing, pp. 94–104, arXiv: 1807.08518 , Bibcode:2018arXiv180708518C, doi:10.1007/978-3-030-01424-7_10, ISBN   9783030014230, S2CID   49908746
  4. "MarkPKCollier/NeuralTuringMachine". GitHub. Retrieved 2018-10-20.
  5. Beel, Joeran (2018-10-20). "Best-Paper Award for our Publication "Implementing Neural Turing Machines" at the 27th International Conference on Artificial Neural Networks | Prof. Joeran Beel (TCD Dublin)". Trinity College Dublin, School of Computer Science and Statistics Blog. Retrieved 2018-10-20.
  6. 1 2 "snowkylin/ntm". GitHub. Retrieved 2018-10-20.
  7. 1 2 "chiggum/Neural-Turing-Machines". GitHub. Retrieved 2018-10-20.
  8. 1 2 "yeoedward/Neural-Turing-Machine". GitHub. 2017-09-13. Retrieved 2018-10-20.
  9. 1 2 "camigord/Neural-Turing-Machine". GitHub. Retrieved 2018-10-20.
  10. 1 2 "carpedm20/NTM-tensorflow". GitHub. Retrieved 2018-10-20.
  11. 1 2 "snipsco/ntm-lasagne". GitHub. Retrieved 2018-10-20.
  12. 1 2 "loudinthecloud/pytorch-ntm". GitHub. Retrieved 2018-10-20.
  13. Administrator. "DeepMind's Differentiable Neural Network Thinks Deeply". www.i-programmer.info. Retrieved 2016-10-20.