Sepp Hochreiter | |
---|---|
Born | |
Nationality | German |
Alma mater | Technische Universität München |
Scientific career | |
Fields | Machine learning, bioinformatics |
Institutions | Johannes Kepler University Linz |
Thesis | Generalisierung bei neuronalen Netzen geringer Komplexität (1999) |
Doctoral advisor | Wilfried Brauer |
Josef "Sepp" Hochreiter (born 14 February 1967) is a German computer scientist. Since 2018 he has led the Institute for Machine Learning at the Johannes Kepler University of Linz after having led the Institute of Bioinformatics from 2006 to 2018. In 2017 he became the head of the Linz Institute of Technology (LIT) AI Lab. Hochreiter is also a founding director of the Institute of Advanced Research in Artificial Intelligence (IARAI). [1] Previously, he was at Technische Universität Berlin, at University of Colorado Boulder, and at the Technical University of Munich. He is a chair of the Critical Assessment of Massive Data Analysis (CAMDA) conference. [2]
Hochreiter has made contributions in the fields of machine learning, deep learning and bioinformatics, most notably the development of the long short-term memory (LSTM) neural network architecture, [3] [4] but also in meta-learning, [5] reinforcement learning [6] [7] and biclustering with application to bioinformatics data.
Hochreiter developed the long short-term memory (LSTM) neural network architecture in his diploma thesis in 1991 leading to the main publication in 1997. [3] [4] LSTM overcomes the problem of numerical instability in training recurrent neural networks (RNNs) that prevents them from learning from long sequences (vanishing or exploding gradient). [3] [8] [9] In 2007, Hochreiter and others successfully applied LSTM with an optimized architecture to very fast protein homology detection without requiring a sequence alignment. [10] LSTM networks have also been used in Google Voice for transcription [11] and search, [12] and in the Google Allo chat app for generating response suggestion with low latency. [13]
Beyond LSTM, Hochreiter has developed "Flat Minimum Search" to increase the generalization of neural networks [14] and introduced rectified factor networks (RFNs) for sparse coding [15] [16] which have been applied in bioinformatics and genetics. [17] Hochreiter introduced modern Hopfield networks with continuous states [18] and applied them to the task of immune repertoire classification. [19]
Hochreiter worked with Jürgen Schmidhuber in the field of reinforcement learning on actor-critic systems that learn by "backpropagation through a model". [6] [20]
Hochreiter has been involved in the development of factor analysis methods with application to bioinformatics, including FABIA for biclustering, [21] HapFABIA for detecting short segments of identity by descent [22] and FARMS for preprocessing and summarizing high-density oligonucleotide DNA microarrays to analyze RNA gene expression. [23]
In 2006, Hochreiter and others proposed an extension of the support vector machine (SVM), the "Potential Support Vector Machine" (PSVM), [24] which can be applied to non-square kernel matrices and can be used with kernels that are not positive definite. Hochreiter and his collaborators have applied PSVM to feature selection, including gene selection for microarray data. [25] [26] [27]
Hochreiter was awarded the IEEE CIS Neural Networks Pioneer Prize in 2021 for his work on LSTM. [28]
In machine learning, a neural network is a model inspired by the structure and function of biological neural networks in animal brains.
Jürgen Schmidhuber is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland. He is also director of the Artificial Intelligence Initiative and professor of the Computer Science program in the Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) division at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.
Recurrent neural networks (RNNs) are a class of artificial neural networks for sequential data processing. Unlike feedforward neural networks, which process data in a single pass, RNNs processes data across multiple time steps, making them well-adapted for modelling and processing text, speech, and time series.
Meta-learning is a subfield of machine learning where automatic learning algorithms are applied to metadata about machine learning experiments. As of 2017, the term had not found a standard interpretation, however the main goal is to use such metadata to understand how automatic learning can become flexible in solving learning problems, hence to improve the performance of existing learning algorithms or to learn (induce) the learning algorithm itself, hence the alternative term learning to learn.
An echo state network (ESN) is a type of reservoir computer that uses a recurrent neural network with a sparsely connected hidden layer. The connectivity and weights of hidden neurons are fixed and randomly assigned. The weights of output neurons can be learned so that the network can produce or reproduce specific temporal patterns. The main interest of this network is that although its behavior is non-linear, the only weights that are modified during training are for the synapses that connect the hidden neurons to output neurons. Thus, the error function is quadratic with respect to the parameter vector and can be differentiated easily to a linear system.
Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at dealing with the vanishing gradient problem present in traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps, thus "long short-term memory". It is applicable to classification, processing and predicting data based on time series, such as in handwriting, speech recognition, machine translation, speech activity detection, robot control, video games, and healthcare.
The activation function of a node in an artificial neural network is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation function is nonlinear. Modern activation functions include the smooth version of the ReLU, the GELU, which was used in the 2018 BERT model, the logistic (sigmoid) function used in the 2012 speech recognition model developed by Hinton et al, the ReLU used in the 2012 AlexNet computer vision model and in the 2015 ResNet model.
Encog is a machine learning framework available for Java and .Net. Encog supports different learning algorithms such as Bayesian Networks, Hidden Markov Models and Support Vector Machines. However, its main strength lies in its neural network algorithms. Encog contains classes to create a wide variety of networks, as well as support classes to normalize and process data for these neural networks. Encog trains using many different techniques. Multithreading is used to allow optimal training performance on multicore machines.
There are many types of artificial neural networks (ANN).
Deep learning is the subset of machine learning methods based on neural networks with representation learning. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.
In the context of artificial neural networks, the rectifier or ReLU activation function is an activation function defined as the positive part of its argument:
In machine learning, the vanishing gradient problem is encountered when training neural networks with gradient-based learning methods and backpropagation. In such methods, during each iteration of training each of the neural networks weights receives an update proportional to the partial derivative of the error function with respect to the current weight. The problem is that as the sequence length increases, the gradient magnitude typically is expected to decrease, slowing the training process. In the worst case, this may completely stop the neural network from further training. As one example of the problem cause, traditional activation functions such as the hyperbolic tangent function have gradients in the range [-1,1], and backpropagation computes gradients by the chain rule. This has the effect of multiplying n of these small numbers to compute gradients of the early layers in an n-layer network, meaning that the gradient decreases exponentially with n while the early layers train very slowly.
Bidirectional recurrent neural networks (BRNN) connect two hidden layers of opposite directions to the same output. With this form of generative deep learning, the output layer can get information from past (backwards) and future (forward) states simultaneously. Invented in 1997 by Schuster and Paliwal, BRNNs were introduced to increase the amount of input information available to the network. For example, multilayer perceptron (MLPs) and time delay neural network (TDNNs) have limitations on the input data flexibility, as they require their input data to be fixed. Standard recurrent neural network (RNNs) also have restrictions as the future input information cannot be reached from the current state. On the contrary, BRNNs do not require their input data to be fixed. Moreover, their future input information is reachable from the current state.
Alex Graves is a computer scientist and research scientist at DeepMind.
Felix Gers is a professor of computer science at Berlin University of Applied Sciences Berlin. With Jürgen Schmidhuber and Fred Cummins, he introduced the forget gate to the long short-term memory recurrent neural network architecture. This modification of the original architecture has been shown to be crucial to the success of the LSTM at such tasks as speech and handwriting recognition.
In machine learning, the Highway Network was the first working very deep feedforward neural network with hundreds of layers, much deeper than previous artificial neural networks. It uses skip connections modulated by learned gating mechanisms to regulate information flow, inspired by Long Short-Term Memory (LSTM) recurrent neural networks. The advantage of a Highway Network over the common deep neural networks is that it solves or partially prevents the vanishing gradient problem, thus leading to easier to optimize neural networks. The gating mechanisms facilitate information flow across many layers.
A residual neural network is a seminal deep learning model in which the weight layers learn residual functions with reference to the layer inputs. It was developed in 2015 for image recognition and won that year's ImageNet Large Scale Visual Recognition Challenge.
EMRBots are experimental artificially generated electronic medical records (EMRs). The aim of EMRBots is to allow non-commercial entities to use the artificial patient repositories to practice statistical and machine-learning algorithms. Commercial entities can also use the repositories for any purpose, as long as they do not create software products using the repositories.
Artificial neural networks (ANNs) are models created using machine learning to perform a number of tasks. Their creation was inspired by neural circuitry. While some of the computational implementations ANNs relate to earlier discoveries in mathematics, the first implementation of ANNs was by psychologist Frank Rosenblatt, who developed the perceptron. Little research was conducted on ANNs in the 1970s and 1980s, with the AAAI calling that period an "AI winter".
A transformer is a deep learning architecture developed by Google and based on the multi-head attention mechanism, proposed in a 2017 paper "Attention Is All You Need". Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism allowing the signal for key tokens to be amplified and less important tokens to be diminished. The transformer paper, published in 2017, is based on the softmax-based attention mechanism proposed by Bahdanau et al. in 2014 for machine translation, and the Fast Weight Controller, similar to a transformer, proposed in 1992.