Alex Graves (computer scientist)

Last updated
Alex Graves
Alma mater
Known for
Scientific career
Fields
Institutions DeepMind
University of Toronto
Dalle Molle Institute for Artificial Intelligence Research
Thesis Supervised sequence labelling with recurrent neural networks  (2008)
Doctoral advisor Jürgen Schmidhuber
Website www.cs.toronto.edu/~graves OOjs UI icon edit-ltr-progressive.svg

Alex Graves is a computer scientist and research scientist at DeepMind. [1]

Contents

Education

Graves earned his Bachelor of Science degree in Theoretical Physics from the University of Edinburgh [ when? ] and a PhD in artificial intelligence from the Technical University of Munich supervised by Jürgen Schmidhuber at the Dalle Molle Institute for Artificial Intelligence Research. [2] [3]

Career and research

After his PhD, Graves was postdoc working with Schmidhuber at the Technical University of Munich and Geoffrey Hinton [4] at the University of Toronto.

At the Dalle Molle Institute for Artificial Intelligence Research, Graves trained long short-term memory (LSTM) neural networks by a novel method called connectionist temporal classification (CTC). [5] This method outperformed traditional speech recognition models in certain applications. [6] In 2009, his CTC-trained LSTM was the first recurrent neural network (RNN) to win pattern recognition contests, winning several competitions in connected handwriting recognition. [7] [8] Google uses CTC-trained LSTM for speech recognition on the smartphone. [9] [10]

Graves is also the creator of neural Turing machines [11] and the closely related differentiable neural computer. [12] [13] In 2023, he published the paper Bayesian Flow Networks. [14]

Related Research Articles

<span class="mw-page-title-main">Neural network (machine learning)</span> Computational model used in machine learning, based on connected, hierarchical functions

In machine learning, a neural network is a model inspired by the structure and function of biological neural networks in animal brains.

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech-to-text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

<span class="mw-page-title-main">Handwriting recognition</span> Ability of a computer to receive and interpret intelligible handwritten input

Handwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface, a generally easier task as there are more clues available. A handwriting recognition system handles formatting, performs correct segmentation into characters, and finds the most possible words.

<span class="mw-page-title-main">Jürgen Schmidhuber</span> German computer scientist

Jürgen Schmidhuber is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland. He is also director of the Artificial Intelligence Initiative and professor of the Computer Science program in the Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) division at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.

Recurrent neural networks (RNNs) are a class of artificial neural networks for sequential data processing. Unlike feedforward neural networks, which process data in a single pass, RNNs process data across multiple time steps, making them well-adapted for modelling and processing text, speech, and time series.

Meta-learning is a subfield of machine learning where automatic learning algorithms are applied to metadata about machine learning experiments. As of 2017, the term had not found a standard interpretation, however the main goal is to use such metadata to understand how automatic learning can become flexible in solving learning problems, hence to improve the performance of existing learning algorithms or to learn (induce) the learning algorithm itself, hence the alternative term learning to learn.

<span class="mw-page-title-main">Long short-term memory</span> Artificial recurrent neural network architecture used in deep learning

Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at dealing with the vanishing gradient problem present in traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps, thus "long short-term memory". The name is made in analogy with long-term memory and short-term memory and their relationship, studied by cognitive psychologists since early 20th century.

This is a timeline of artificial intelligence, sometimes alternatively called synthetic intelligence.

There are many types of artificial neural networks (ANN).

<span class="mw-page-title-main">Sepp Hochreiter</span> German computer scientist

Josef "Sepp" Hochreiter is a German computer scientist. Since 2018 he has led the Institute for Machine Learning at the Johannes Kepler University of Linz after having led the Institute of Bioinformatics from 2006 to 2018. In 2017 he became the head of the Linz Institute of Technology (LIT) AI Lab. Hochreiter is also a founding director of the Institute of Advanced Research in Artificial Intelligence (IARAI). Previously, he was at Technische Universität Berlin, at University of Colorado Boulder, and at the Technical University of Munich. He is a chair of the Critical Assessment of Massive Data Analysis (CAMDA) conference.

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is a subset of machine learning methods based on neural networks with representation learning. The field takes inspiration from biological neuroscience and is centered around stacking artificial neurons into layers and "training" them to process data. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

In machine learning, the vanishing gradient problem is encountered when training neural networks with gradient-based learning methods and backpropagation. In such methods, during each iteration of training each of the neural networks weights receives an update proportional to the partial derivative of the error function with respect to the current weight. The problem is that as the network depth or sequence length increases, the gradient magnitude typically is expected to decrease, slowing the training process. In the worst case, this may completely stop the neural network from further training. As one example of the problem cause, traditional activation functions such as the hyperbolic tangent function have gradients in the range [-1,1], and backpropagation computes gradients by the chain rule. This has the effect of multiplying n of these small numbers to compute gradients of the early layers in an n-layer network, meaning that the gradient decreases exponentially with n while the early layers train very slowly.

Bidirectional recurrent neural networks (BRNN) connect two hidden layers of opposite directions to the same output. With this form of generative deep learning, the output layer can get information from past (backwards) and future (forward) states simultaneously. Invented in 1997 by Schuster and Paliwal, BRNNs were introduced to increase the amount of input information available to the network. For example, multilayer perceptron (MLPs) and time delay neural network (TDNNs) have limitations on the input data flexibility, as they require their input data to be fixed. Standard recurrent neural network (RNNs) also have restrictions as the future input information cannot be reached from the current state. On the contrary, BRNNs do not require their input data to be fixed. Moreover, their future input information is reachable from the current state.

Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. The GRU is like a long short-term memory (LSTM) with a gating mechanism to input or forget certain features, but lacks a context vector or output gate, resulting in fewer parameters than LSTM. GRU's performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM. GRUs showed that gating is indeed helpful in general, and Bengio's team came to no concrete conclusion on which of the two gating units was better.

<span class="mw-page-title-main">Differentiable neural computer</span> Artificial neural network architecture

In artificial intelligence, a differentiable neural computer (DNC) is a memory augmented neural network architecture (MANN), which is typically recurrent in its implementation. The model was published in 2016 by Alex Graves et al. of DeepMind.

Felix Gers is a professor of computer science at Berlin University of Applied Sciences Berlin. With Jürgen Schmidhuber and Fred Cummins, he introduced the forget gate to the long short-term memory recurrent neural network architecture. This modification of the original architecture has been shown to be crucial to the success of the LSTM at such tasks as speech and handwriting recognition.

Connectionist temporal classification (CTC) is a type of neural network output and associated scoring function, for training recurrent neural networks (RNNs) such as LSTM networks to tackle sequence problems where the timing is variable. It can be used for tasks like on-line handwriting recognition or recognizing phonemes in speech audio. CTC refers to the outputs and scoring, and is independent of the underlying neural network structure. It was introduced in 2006.

In machine learning, the Highway Network was the first working very deep feedforward neural network with hundreds of layers, much deeper than previous artificial neural networks. It uses skip connections modulated by learned gating mechanisms to regulate information flow, inspired by Long Short-Term Memory (LSTM) recurrent neural networks. The advantage of a Highway Network over the common deep neural networks is that it solves or partially prevents the vanishing gradient problem, thus leading to easier to optimize neural networks. The gating mechanisms facilitate information flow across many layers.

<span class="mw-page-title-main">Residual neural network</span> Deep learning method

A residual neural network is a deep learning architecture in which the weight layers learn residual functions with reference to the layer inputs. It was developed in 2015 for image recognition and won that year's ImageNet Large Scale Visual Recognition Challenge.

Artificial neural networks (ANNs) are models created using machine learning to perform a number of tasks. Their creation was inspired by neural circuitry. While some of the computational implementations ANNs relate to earlier discoveries in mathematics, the first implementation of ANNs was by psychologist Frank Rosenblatt, who developed the perceptron. Little research was conducted on ANNs in the 1970s and 1980s, with the AAAI calling that period an "AI winter".

References

  1. 1 2 Alex Graves publications indexed by Google Scholar OOjs UI icon edit-ltr-progressive.svg
  2. Graves, Alex (2008). Supervised sequence labelling with recurrent neural networks (PDF) (PhD thesis). Technischen Universitat Munchen. OCLC   1184353689.
  3. "Alex Graves". Canadian Institute for Advanced Research. Archived from the original on 1 May 2015.
  4. "Marginally Interesting: What is going on with DeepMind and Google?". Blog.mikiobraun.de. 28 January 2014. Retrieved May 17, 2016.
  5. Alex Graves, Santiago Fernandez, Faustino Gomez, and Jürgen Schmidhuber (2006). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural nets. Proceedings of ICML’06, pp. 369–376.
  6. Santiago Fernandez, Alex Graves, and Jürgen Schmidhuber (2007). An application of recurrent neural networks to discriminative keyword spotting. Proceedings of ICANN (2), pp. 220–229. doi : 10.1007/978-3-540-74695-9_23
  7. Graves, Alex; and Schmidhuber, Jürgen; Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks, in Bengio, Yoshua; Schuurmans, Dale; Lafferty, John; Williams, Chris K. I.; and Culotta, Aron (eds.), Advances in Neural Information Processing Systems 22 (NIPS'22), December 7th–10th, 2009, Vancouver, BC, Neural Information Processing Systems (NIPS) Foundation, 2009, pp. 545–552 doi : 10.5555/2981780.2981848
  8. A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, J. Schmidhuber. A Novel Connectionist System for Improved Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, 2009. doi : 10.1109/TPAMI.2008.137
  9. Google Research Blog. The neural networks behind Google Voice transcription. August 11, 2015. By Françoise Beaufays http://googleresearch.blogspot.co.at/2015/08/the-neural-networks-behind-google-voice.html
  10. Google Research Blog. Google voice search: faster and more accurate. September 24, 2015. By Haşim Sak, Andrew Senior, Kanishka Rao, Françoise Beaufays and Johan Schalkwyk – Google Speech Team http://googleresearch.blogspot.co.uk/2015/09/google-voice-search-faster-and-more.html
  11. "Google's Secretive DeepMind Startup Unveils a "Neural Turing Machine"" . Retrieved May 17, 2016.
  12. Graves, Alex; Wayne, Greg; Reynolds, Malcolm; Harley, Tim; Danihelka, Ivo; Grabska-Barwińska, Agnieszka; Colmenarejo, Sergio Gómez; Grefenstette, Edward; Ramalho, Tiago (2016-10-12). "Hybrid computing using a neural network with dynamic external memory". Nature. 538 (7626): 471–476. Bibcode:2016Natur.538..471G. doi:10.1038/nature20101. ISSN   1476-4687. PMID   27732574. S2CID   205251479.
  13. "Differentiable neural computers | DeepMind". DeepMind. Retrieved 2016-10-19.
  14. Alex Graves; Rupesh Kumar Srivastava; Timothy Atkinson; Faustino Gomez (14 August 2023), Bayesian Flow Networks (PDF), arXiv: 2308.07037 , doi:10.48550/ARXIV.2308.07037, Wikidata   Q121625910