Recursive neural network

Last updated

A recursive neural network is a kind of deep neural network created by applying the same set of weights recursively over a structured input, to produce a structured prediction over variable-size input structures, or a scalar prediction on it, by traversing a given structure in topological order. Recursive neural networks, sometimes abbreviated as RvNNs, have been successful, for instance, in learning sequence and tree structures in natural language processing, mainly phrase and sentence continuous representations based on word embedding. RvNNs have first been introduced to learn distributed representations of structure, such as logical terms. [1] Models and general frameworks have been developed in further works since the 1990s. [2] [3]

Contents

Architectures

Basic

A simple recursive neural network architecture Simple recursive neural network.svg
A simple recursive neural network architecture

In the most simple architecture, nodes are combined into parents using a weight matrix that is shared across the whole network, and a non-linearity such as tanh . If c1 and c2 are n-dimensional vector representation of nodes, their parent will also be an n-dimensional vector, calculated as

Where W is a learned weight matrix.

This architecture, with a few improvements, has been used for successfully parsing natural scenes, syntactic parsing of natural language sentences, [4] and recursive autoencoding and generative modeling of 3D shape structures in the form of cuboid abstractions. [5]

Recursive cascade correlation (RecCC)

RecCC is a constructive neural network approach to deal with tree domains [2] with pioneering applications to chemistry [6] and extension to directed acyclic graphs. [7]

Unsupervised RNN

A framework for unsupervised RNN has been introduced in 2004. [8] [9]

Tensor

Recursive neural tensor networks use one, tensor-based composition function for all nodes in the tree. [10]

Training

Stochastic gradient descent

Typically, stochastic gradient descent (SGD) is used to train the network. The gradient is computed using backpropagation through structure (BPTS), a variant of backpropagation through time used for recurrent neural networks.

Properties

Universal approximation capability of RNN over trees has been proved in literature. [11] [12]

Recurrent neural networks

Recurrent neural networks are recursive artificial neural networks with a certain structure: that of a linear chain. Whereas recursive neural networks operate on any hierarchical structure, combining child representations into parent representations, recurrent neural networks operate on the linear progression of time, combining the previous time step and a hidden representation into the representation for the current time step.

Tree Echo State Networks

An efficient approach to implement recursive neural networks is given by the Tree Echo State Network [13] within the reservoir computing paradigm.

Extension to graphs

Extensions to graphs include graph neural network (GNN), [14] Neural Network for Graphs (NN4G), [15] and more recently convolutional neural networks for graphs.

Related Research Articles

<span class="mw-page-title-main">Artificial neural network</span> Computational model used in machine learning, based on connected, hierarchical functions

Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains.

<span class="mw-page-title-main">Jürgen Schmidhuber</span> German computer scientist

Jürgen Schmidhuber is a German computer scientist most noted for his work in the field of artificial intelligence, deep learning and artificial neural networks. He is a co-director of the Dalle Molle Institute for Artificial Intelligence Research in Lugano, in Ticino in southern Switzerland. Following Google Scholar, from 2016 to 2021 he has received more than 100,000 scientific citations. He has been referred to as "father of modern AI," "father of AI," "dad of mature AI," "Papa" of famous AI products, "Godfather," and "father of deep learning."

Neuroevolution, or neuro-evolution, is a form of artificial intelligence that uses evolutionary algorithms to generate artificial neural networks (ANN), parameters, and rules. It is most commonly applied in artificial life, general game playing and evolutionary robotics. The main benefit is that neuroevolution can be applied more widely than supervised learning algorithms, which require a syllabus of correct input-output pairs. In contrast, neuroevolution requires only a measure of a network's performance at a task. For example, the outcome of a game can be easily measured without providing labeled examples of desired strategies. Neuroevolution is commonly used as part of the reinforcement learning paradigm, and it can be contrasted with conventional deep learning techniques that use gradient descent on a neural network with a fixed topology.

<span class="mw-page-title-main">Recurrent neural network</span> Computational model used in machine learning

A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. Recurrent neural networks are theoretically Turing complete and can run arbitrary programs to process arbitrary sequences of inputs.

<span class="mw-page-title-main">Neural network</span> Structure in biology and artificial intelligence

A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological neurons, or an artificial neural network, used for solving artificial intelligence (AI) problems. The connections of the biological neuron are modeled in artificial neural networks as weights between nodes. A positive weight reflects an excitatory connection, while negative values mean inhibitory connections. All inputs are modified by a weight and summed. This activity is referred to as a linear combination. Finally, an activation function controls the amplitude of the output. For example, an acceptable range of output is usually between 0 and 1, or it could be −1 and 1.

This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.

Reservoir computing is a framework for computation derived from recurrent neural network theory that maps input signals into higher dimensional computational spaces through the dynamics of a fixed, non-linear system called a reservoir. After the input signal is fed into the reservoir, which is treated as a "black box," a simple readout mechanism is trained to read the state of the reservoir and map it to the desired output. The first key benefit of this framework is that training is performed only at the readout stage, as the reservoir dynamics are fixed. The second is that the computational power of naturally available systems, both classical and quantum mechanical, can be used to reduce the effective computational cost.

<span class="mw-page-title-main">Long short-term memory</span> Artificial recurrent neural network architecture used in deep learning

Long short-term memory (LSTM) is an artificial neural network used in the fields of artificial intelligence and deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. Such a recurrent neural network (RNN) can process not only single data points, but also entire sequences of data. For example, LSTM is applicable to tasks such as unsegmented, connected handwriting recognition, speech recognition, machine translation, robot control, video games, and healthcare.

There are many types of artificial neural networks (ANN).

An adaptive neuro-fuzzy inference system or adaptive network-based fuzzy inference system (ANFIS) is a kind of artificial neural network that is based on Takagi–Sugeno fuzzy inference system. The technique was developed in the early 1990s. Since it integrates both neural networks and fuzzy logic principles, it has potential to capture the benefits of both in a single framework. Its inference system corresponds to a set of fuzzy IF–THEN rules that have learning capability to approximate nonlinear functions. Hence, ANFIS is considered to be a universal estimator. For using the ANFIS in a more efficient and optimal way, one can use the best parameters obtained by genetic algorithm. It has uses in intelligent situational aware energy management system.

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.

<span class="mw-page-title-main">Yasuo Matsuyama</span>

Yasuo Matsuyama is a Japanese researcher in machine learning and human-aware information processing.

<span class="mw-page-title-main">Convolutional neural network</span> Artificial neural network

In deep learning, a convolutional neural network is a class of artificial neural network (ANN), most commonly applied to analyze visual imagery. CNNs are also known as Shift Invariant or Space Invariant Artificial Neural Networks (SIANN), based on the shared-weight architecture of the convolution kernels or filters that slide along input features and provide translation-equivariant responses known as feature maps. Counter-intuitively, most convolutional neural networks are not invariant to translation, due to the downsampling operation they apply to the input. They have applications in image and video recognition, recommender systems, image classification, image segmentation, medical image analysis, natural language processing, brain–computer interfaces, and financial time series.

<span class="mw-page-title-main">Vanishing gradient problem</span> Machine learning model training problem

In machine learning, the vanishing gradient problem is encountered when training artificial neural networks with gradient-based learning methods and backpropagation. In such methods, during each iteration of training each of the neural network's weights receives an update proportional to the partial derivative of the error function with respect to the current weight. The problem is that in some cases, the gradient will be vanishingly small, effectively preventing the weight from changing its value. In the worst case, this may completely stop the neural network from further training. As one example of the problem cause, traditional activation functions such as the hyperbolic tangent function have gradients in the range (0,1], and backpropagation computes gradients by the chain rule. This has the effect of multiplying n of these small numbers to compute gradients of the early layers in an n-layer network, meaning that the gradient decreases exponentially with n while the early layers train very slowly.

Backpropagation through structure (BPTS) is a gradient-based technique for training recursive neural nets and is extensively described in a 1996 paper written by Christoph Goller and Andreas Küchler.

<span class="mw-page-title-main">Extreme learning machine</span> Type of artificial neural network

Extreme learning machines are feedforward neural networks for classification, regression, clustering, sparse approximation, compression and feature learning with a single layer or multiple layers of hidden nodes, where the parameters of hidden nodes need to be tuned. These hidden nodes can be randomly assigned and never updated, or can be inherited from their ancestors without being changed. In most cases, the output weights of hidden nodes are usually learned in a single step, which essentially amounts to learning a linear model. The name "extreme learning machine" (ELM) was given to such models by its main inventor Guang-Bin Huang.

<span class="mw-page-title-main">AlexNet</span> Convolutional neural network

AlexNet is the name of a convolutional neural network (CNN) architecture, designed by Alex Krizhevsky in collaboration with Ilya Sutskever and Geoffrey Hinton, who was Krizhevsky's Ph.D. advisor.

The history of artificial neural networks (ANN) began with Warren McCulloch and Walter Pitts (1943) who created a computational model for neural networks based on algorithms called threshold logic. This model paved the way for research to split into two approaches. One approach focused on biological processes while the other focused on the application of neural networks to artificial intelligence. This work led to work on nerve networks and their link to finite automata.

<span class="mw-page-title-main">Silvia Ferrari</span> American aerospace engineer

Silvia Ferrari is an American aerospace engineer. She is John Brancaccio Professor at the Sibley School of Mechanical and Aerospace Engineering at Cornell University and also the director of the Laboratory for Intelligent Systems and Control (LISC) at the same university.

<span class="mw-page-title-main">Graph neural network</span>

A Graph neural network (GNN) is a class of artificial neural networks for processing data that can be represented as graphs.

References

  1. Goller, C.; Küchler, A. (1996). "Learning task-dependent distributed representations by backpropagation through structure". Proceedings of International Conference on Neural Networks (ICNN'96). Vol. 1. pp. 347–352. CiteSeerX   10.1.1.52.4759 . doi:10.1109/ICNN.1996.548916. ISBN   978-0-7803-3210-2. S2CID   6536466.
  2. 1 2 Sperduti, A.; Starita, A. (1997-05-01). "Supervised neural networks for the classification of structures". IEEE Transactions on Neural Networks. 8 (3): 714–735. doi:10.1109/72.572108. ISSN   1045-9227. PMID   18255672.
  3. Frasconi, P.; Gori, M.; Sperduti, A. (1998-09-01). "A general framework for adaptive processing of data structures". IEEE Transactions on Neural Networks. 9 (5): 768–786. CiteSeerX   10.1.1.64.2580 . doi:10.1109/72.712151. ISSN   1045-9227. PMID   18255765.
  4. Socher, Richard; Lin, Cliff; Ng, Andrew Y.; Manning, Christopher D. "Parsing Natural Scenes and Natural Language with Recursive Neural Networks" (PDF). The 28th International Conference on Machine Learning (ICML 2011).
  5. Li, Jun; Xu, Kai; Chaudhuri, Siddhartha; Yumer, Ersin; Zhang, Hao; Guibas, Leonadis (2017). "GRASS: Generative Recursive Autoencoders for Shape Structures" (PDF). ACM Transactions on Graphics. 36 (4): 52. arXiv: 1705.02090 . doi:10.1145/3072959.3073613. S2CID   20432407.
  6. Bianucci, Anna Maria; Micheli, Alessio; Sperduti, Alessandro; Starita, Antonina (2000). "Application of Cascade Correlation Networks for Structures to Chemistry". Applied Intelligence. 12 (1–2): 117–147. doi:10.1023/A:1008368105614. ISSN   0924-669X. S2CID   10031212.
  7. Micheli, A.; Sona, D.; Sperduti, A. (2004-11-01). "Contextual processing of structured data by recursive cascade correlation". IEEE Transactions on Neural Networks. 15 (6): 1396–1410. CiteSeerX   10.1.1.135.8772 . doi:10.1109/TNN.2004.837783. ISSN   1045-9227. PMID   15565768. S2CID   12370239.
  8. Hammer, Barbara; Micheli, Alessio; Sperduti, Alessandro; Strickert, Marc (2004). "Recursive self-organizing network models". Neural Networks. 17 (8–9): 1061–1085. CiteSeerX   10.1.1.129.6155 . doi:10.1016/j.neunet.2004.06.009. PMID   15555852.
  9. Hammer, Barbara; Micheli, Alessio; Sperduti, Alessandro; Strickert, Marc (2004-03-01). "A general framework for unsupervised processing of structured data". Neurocomputing. 57: 3–35. CiteSeerX   10.1.1.3.984 . doi:10.1016/j.neucom.2004.01.008.
  10. Socher, Richard; Perelygin, Alex; Y. Wu, Jean; Chuang, Jason; D. Manning, Christopher; Y. Ng, Andrew; Potts, Christopher. "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank" (PDF). EMNLP 2013.
  11. Hammer, Barbara (2007-10-03). Learning with Recurrent Neural Networks. Springer. ISBN   9781846285677.
  12. Hammer, Barbara; Micheli, Alessio; Sperduti, Alessandro (2005-05-01). "Universal Approximation Capability of Cascade Correlation for Structures". Neural Computation. 17 (5): 1109–1159. CiteSeerX   10.1.1.138.2224 . doi:10.1162/0899766053491878. S2CID   10845957.
  13. Gallicchio, Claudio; Micheli, Alessio (2013-02-04). "Tree Echo State Networks". Neurocomputing. 101: 319–337. doi:10.1016/j.neucom.2012.08.017. hdl: 11568/158480 .
  14. Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; Monfardini, G. (2009-01-01). "The Graph Neural Network Model". IEEE Transactions on Neural Networks. 20 (1): 61–80. doi:10.1109/TNN.2008.2005605. ISSN   1045-9227. PMID   19068426. S2CID   206756462.
  15. Micheli, A. (2009-03-01). "Neural Network for Graphs: A Contextual Constructive Approach". IEEE Transactions on Neural Networks. 20 (3): 498–511. doi:10.1109/TNN.2008.2010350. ISSN   1045-9227. PMID   19193509. S2CID   17486263.