Original author(s) | Bastiaan Quast |
---|---|
Initial release | 30 November 2015 |
Stable release | 1.9.0 / 22 April 2023 |
Preview release | 1.9.0.9000 / 22 April 2023 |
Repository | github |
Written in | R |
Operating system | macOS, Linux, Windows |
Size | 564.2 kB (v. 1.9.0) |
License | GPL v3 |
Website | cran |
rnn is an open-source machine learning framework that implements recurrent neural network architectures, such as LSTM and GRU, natively in the R programming language, that has been downloaded over 100,000 times (from the RStudio servers alone). [1]
The rnn package is distributed through the Comprehensive R Archive Network [2] under the open-source GPL v3 license.
The below example from the rnn documentation show how to train a recurrent neural network to solve the problem of bit-by-bit binary addition.
> # install the rnn package, including the dependency sigmoid> install.packages('rnn')> # load the rnn package> library(rnn)> # create input data > X1=sample(0:127,10000,replace=TRUE)> X2=sample(0:127,10000,replace=TRUE)> # create output data> Y<-X1+X2> # convert from decimal to binary notation > X1<-int2bin(X1,length=8)> X2<-int2bin(X2,length=8)> Y<-int2bin(Y,length=8)> # move input data into single tensor> X<-array(c(X1,X2),dim=c(dim(X1),2))> # train the model> model<-trainr(Y=Y,+ X=X,+ learningrate=1,+ hidden_dim=16)Trained epoch: 1 - Learning rate: 1Epoch error: 0.839787019539748
The sigmoid functions and derivatives used in the package were originally included in the package, from version 0.8.0 onwards, these were released in a separate R package sigmoid, with the intention to enable more general use. The sigmoid package is a dependency of the rnn package and therefore automatically installed with it. [3]
With the release of version 0.3.0 in April 2016 [4] the use in production and research environments became more widespread. The package was reviewed several months later on the R blog The Beginner Programmer as "R provides a simple and very user friendly package named rnn for working with recurrent neural networks.", [5] which further increased usage. [6]
The book Neural Networks in R by Balaji Venkateswaran and Giuseppe Ciaburro uses rnn to demonstrate recurrent neural networks to R users. [7] [8] It is also used in the r-exercises.com course "Neural network exercises". [9] [10]
The RStudio CRAN mirror download logs [11] show that the package is downloaded on average about 2,000 per month from those servers , [12] with a total of over 100,000 downloads since the first release, [13] according to RDocumentation.org, this puts the package in the 15th percentile of most popular R packages . [14]
Jürgen Schmidhuber is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland. He is also director of the Artificial Intelligence Initiative and professor of the Computer Science program in the Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) division at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.
A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to the uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.
An echo state network (ESN) is a type of reservoir computer that uses a recurrent neural network with a sparsely connected hidden layer. The connectivity and weights of hidden neurons are fixed and randomly assigned. The weights of output neurons can be learned so that the network can produce or reproduce specific temporal patterns. The main interest of this network is that although its behavior is non-linear, the only weights that are modified during training are for the synapses that connect the hidden neurons to output neurons. Thus, the error function is quadratic with respect to the parameter vector and can be differentiated easily to a linear system.
Long short-term memory (LSTM) network is a recurrent neural network (RNN), aimed to deal with the vanishing gradient problem present in traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps, thus "long short-term memory". It is applicable to classification, processing and predicting data based on time series, such as in handwriting, speech recognition, machine translation, speech activity detection, robot control, video games, and healthcare.
There are many types of artificial neural networks (ANN).
Josef "Sepp" Hochreiter is a German computer scientist. Since 2018 he has led the Institute for Machine Learning at the Johannes Kepler University of Linz after having led the Institute of Bioinformatics from 2006 to 2018. In 2017 he became the head of the Linz Institute of Technology (LIT) AI Lab. Hochreiter is also a founding director of the Institute of Advanced Research in Artificial Intelligence (IARAI). Previously, he was at the Technical University of Berlin, at the University of Colorado at Boulder, and at the Technical University of Munich. He is a chair of the Critical Assessment of Massive Data Analysis (CAMDA) conference.
Deep learning is the subset of machine learning methods based on artificial neural networks (ANNs) with representation learning. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.
In the context of artificial neural networks, the rectifier or ReLU activation function is an activation function defined as the positive part of its argument:
In machine learning, the vanishing gradient problem is encountered when training recurrent neural networks with gradient-based learning methods and backpropagation. In such methods, during each iteration of training each of the neural networks weights receives an update proportional to the partial derivative of the error function with respect to the current weight. The problem is that as the sequence length increases, the gradient magnitude typically is expected to decrease, slowing the training process. In the worst case, this may completely stop the neural network from further training. As one example of the problem cause, traditional activation functions such as the hyperbolic tangent function have gradients in the range [-1,1], and backpropagation computes gradients by the chain rule. This has the effect of multiplying n of these small numbers to compute gradients of the early layers in an n-layer network, meaning that the gradient decreases exponentially with n while the early layers train very slowly.
A recursive neural network is a kind of deep neural network created by applying the same set of weights recursively over a structured input, to produce a structured prediction over variable-size input structures, or a scalar prediction on it, by traversing a given structure in topological order. Recursive neural networks, sometimes abbreviated as RvNNs, have been successful, for instance, in learning sequence and tree structures in natural language processing, mainly phrase and sentence continuous representations based on word embedding. RvNNs have first been introduced to learn distributed representations of structure, such as logical terms. Models and general frameworks have been developed in further works since the 1990s.
Bidirectional recurrent neural networks (BRNN) connect two hidden layers of opposite directions to the same output. With this form of generative deep learning, the output layer can get information from past (backwards) and future (forward) states simultaneously. Invented in 1997 by Schuster and Paliwal, BRNNs were introduced to increase the amount of input information available to the network. For example, multilayer perceptron (MLPs) and time delay neural network (TDNNs) have limitations on the input data flexibility, as they require their input data to be fixed. Standard recurrent neural network (RNNs) also have restrictions as the future input information cannot be reached from the current state. On the contrary, BRNNs do not require their input data to be fixed. Moreover, their future input information is reachable from the current state.
Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. The GRU is like a long short-term memory (LSTM) with a gating mechanism to input or forget certain features, but lacks a context vector or output gate, resulting in fewer parameters than LSTM. GRU's performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM. GRUs showed that gating is indeed helpful in general, and Bengio's team came to no concrete conclusion on which of the two gating units was better.
In artificial intelligence, a differentiable neural computer (DNC) is a memory augmented neural network architecture (MANN), which is typically recurrent in its implementation. The model was published in 2016 by Alex Graves et al. of DeepMind.
Caffe is a deep learning framework, originally developed at University of California, Berkeley. It is open source, under a BSD license. It is written in C++, with a Python interface.
The GEKKO Python package solves large-scale mixed-integer and differential algebraic equations with nonlinear programming solvers. Modes of operation include machine learning, data reconciliation, real-time optimization, dynamic simulation, and nonlinear model predictive control. In addition, the package solves Linear programming (LP), Quadratic programming (QP), Quadratically constrained quadratic program (QCQP), Nonlinear programming (NLP), Mixed integer programming (MIP), and Mixed integer linear programming (MILP). GEKKO is available in Python and installed with pip from PyPI of the Python Software Foundation.
Bastiaan Quast is a Dutch Machine learning researcher. He is the author and lead maintainer of the open-source rnn and transformer deep-learning frameworks in the R programming language, and the datasets.load GUI package, as well as R packages on Global Value Chain decomposition & WIOD and on Regression Discontinuity Design. Quast is a great-great-grandson of the Nobel Peace Prize laureate Tobias Asser.
Artificial intelligence and machine learning techniques are used in video games for a wide variety of applications such as non-player character (NPC) control and procedural content generation (PCG). Machine learning is a subset of artificial intelligence that uses historical data to build predictive and analytical models. This is in sharp contrast to traditional methods of artificial intelligence such as search trees and expert systems.
A transformer is a deep learning architecture based on the multi-head attention mechanism, proposed in a 2017 paper "Attention Is All You Need". It has no recurrent units, and thus requires less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models (LLM) on large (language) datasets, such as the Wikipedia corpus and Common Crawl. Input text is split into n-grams encoded as tokens and each token is converted into a vector via looking up from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism allowing the signal for key tokens to be amplified and less important tokens to be diminished. The transformer paper, published in 2017, is based on the softmax-based attention mechanism proposed by Bahdanau et. al. in 2014 for machine translation, and the Fast Weight Controller, similar to a transformer, proposed in 1992.
R packages are extensions to the R statistical programming language. R packages contain code, data, and documentation in a standardised collection format that can be installed by users of R, typically via a centralised software repository such as CRAN. The large number of packages available for R, and the ease of installing and using them, has been cited as a major factor driving the widespread adoption of the language in data science.
Brain.js is a JavaScript library used for neural networking, which is released as free and open-source software under the MIT License. It can be used in both the browser and Node.js backends.