Recursive neural network

Last updated

A recursive neural network is a kind of deep neural network created by applying the same set of weights recursively over a structured input, to produce a structured prediction over variable-size input structures, or a scalar prediction on it, by traversing a given structure in topological order. These networks were first introduced to learn distributed representations of structure (such as logical terms), [1] but have been successful in multiple applications, for instance in learning sequence and tree structures in natural language processing (mainly continuous representations of phrases and sentences based on word embeddings).

Contents

Architectures

Basic

A simple recursive neural network architecture Simple recursive neural network.svg
A simple recursive neural network architecture

In the simplest architecture, nodes are combined into parents using a weight matrix (which is shared across the whole network) and a non-linearity such as the hyperbolic function. If and are -dimensional vector representations of nodes, their parent will also be an -dimensional vector, defined as:

where is a learned weight matrix.

This architecture, with a few improvements, has been used for successfully parsing natural scenes, syntactic parsing of natural language sentences, [2] and recursive autoencoding and generative modeling of 3D shape structures in the form of cuboid abstractions. [3]

Recursive cascade correlation (RecCC)

RecCC is a constructive neural network approach to deal with tree domains [4] with pioneering applications to chemistry [5] and extension to directed acyclic graphs. [6]

Unsupervised RNN

A framework for unsupervised RNN has been introduced in 2004. [7] [8]

Tensor

Recursive neural tensor networks use a single tensor-based composition function for all nodes in the tree. [9]

Training

Stochastic gradient descent

Typically, stochastic gradient descent (SGD) is used to train the network. The gradient is computed using backpropagation through structure (BPTS), a variant of backpropagation through time used for recurrent neural networks.

Properties

The universal approximation capability of RNNs over trees has been proved in literature. [10] [11]

Recurrent neural networks

Recurrent neural networks are recursive artificial neural networks with a certain structure: that of a linear chain. Whereas recursive neural networks operate on any hierarchical structure, combining child representations into parent representations, recurrent neural networks operate on the linear progression of time, combining the previous time step and a hidden representation into the representation for the current time step.

Tree Echo State Networks

An efficient approach to implement recursive neural networks is given by the Tree Echo State Network [12] within the reservoir computing paradigm.

Extension to graphs

Extensions to graphs include graph neural network (GNN), [13] Neural Network for Graphs (NN4G), [14] and more recently convolutional neural networks for graphs.

Related Research Articles

<span class="mw-page-title-main">Sigmoid function</span> Mathematical function having a characteristic S-shaped curve or sigmoid curve

A sigmoid function is any mathematical function whose graph has a characteristic S-shaped or sigmoid curve.

Neuroevolution, or neuro-evolution, is a form of artificial intelligence that uses evolutionary algorithms to generate artificial neural networks (ANN), parameters, and rules. It is most commonly applied in artificial life, general game playing and evolutionary robotics. The main benefit is that neuroevolution can be applied more widely than supervised learning algorithms, which require a syllabus of correct input-output pairs. In contrast, neuroevolution requires only a measure of a network's performance at a task. For example, the outcome of a game can be easily measured without providing labeled examples of desired strategies. Neuroevolution is commonly used as part of the reinforcement learning paradigm, and it can be contrasted with conventional deep learning techniques that use backpropagation with a fixed topology.

In theoretical computer science, the subgraph isomorphism problem is a computational task in which two graphs G and H are given as input, and one must determine whether G contains a subgraph that is isomorphic to H. Subgraph isomorphism is a generalization of both the maximum clique problem and the problem of testing whether a graph contains a Hamiltonian cycle, and is therefore NP-complete. However certain other cases of subgraph isomorphism may be solved in polynomial time.

Belief propagation, also known as sum–product message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node, conditional on any observed nodes. Belief propagation is commonly used in artificial intelligence and information theory, and has demonstrated empirical success in numerous applications, including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability.

In machine learning, backpropagation is a gradient estimation method commonly used for training a neural network to compute its parameter updates.

Recurrent neural networks (RNNs) are a class of artificial neural network commonly used for sequential data processing. Unlike feedforward neural networks, which process data in a single pass, RNNs process data across multiple time steps, making them well-adapted for modelling and processing text, speech, and time series.

<span class="mw-page-title-main">Feedforward neural network</span> Type of artificial neural network

Feedforward refers to recognition-inference architecture of neuronal networks. Artificial neural network architectures are based on inputs multiplied by weights to obtain outputs (inputs-to-output): feedforward. Recurrent neural networks, or neural networks with loops allow information from later processing stages to earlier stages for sequence processing. However, at every stage of inference a feedforward multiplication remains the core, essential for backpropagation or backpropagation through time. Thus neural networks cannot contain feedback like negative feedback or positive feedback where the outputs feed back to the very same inputs and modify them, because this forms an infinite loop which is not possible to rewind in time to generate an error signal through backpropagation. This issue and nomenclature appear to be a point of confusion between some computer scientists and scientists in other fields studying brain networks.

Molecule mining is the process of data mining, or extracting and discovering patterns, as applied to molecules. Since molecules may be represented by molecular graphs, this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.

Reservoir computing is a framework for computation derived from recurrent neural network theory that maps input signals into higher dimensional computational spaces through the dynamics of a fixed, non-linear system called a reservoir. After the input signal is fed into the reservoir, which is treated as a "black box," a simple readout mechanism is trained to read the state of the reservoir and map it to the desired output. The first key benefit of this framework is that training is performed only at the readout stage, as the reservoir dynamics are fixed. The second is that the computational power of naturally available systems, both classical and quantum mechanical, can be used to reduce the effective computational cost.

<span class="mw-page-title-main">Long short-term memory</span> Type of recurrent neural network architecture

Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at mitigating the vanishing gradient problem commonly encountered by traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models, and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps. The name is made in analogy with long-term memory and short-term memory and their relationship, studied by cognitive psychologists since the early 20th century.

In the mathematical theory of artificial neural networks, universal approximation theorems are theorems of the following form: Given a family of neural networks, for each function from a certain function space, there exists a sequence of neural networks from the family, such that according to some criterion. That is, the family of neural networks is dense in the function space.

There are many types of artificial neural networks (ANN).

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is a subset of machine learning that focuses on utilizing neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience and is centered around stacking artificial neurons into layers and "training" them to process data. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

Heterogeneous earliest finish time (HEFT) is a heuristic algorithm to schedule a set of dependent tasks onto a network of heterogenous workers taking communication time into account. For inputs HEFT takes a set of tasks, represented as a directed acyclic graph, a set of workers, the times to execute each task on each worker, and the times to communicate the results from each job to each of its children between each pair of workers. It descends from list scheduling algorithms.

A convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns features by itself via filter optimization. This type of deep learning network has been applied to process and make predictions from many different types of data including text, images and audio. Convolution-based networks are the de-facto standard in deep learning-based approaches to computer vision and image processing, and have only recently been replaced -- in some cases -- by newer deep learning architectures such as the transformer. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

In machine learning, the vanishing gradient problem is encountered when training neural networks with gradient-based learning methods and backpropagation. In such methods, during each training iteration, each neural network weight receives an update proportional to the partial derivative of the loss function with respect to the current weight. The problem is that as the network depth or sequence length increases, the gradient magnitude typically is expected to decrease, slowing the training process. In the worst case, this may completely stop the neural network from further learning. As one example of this problem, traditional activation functions such as the hyperbolic tangent function have gradients in the range [-1,1], and backpropagation computes gradients using the chain rule. This has the effect of multiplying n of these small numbers to compute gradients of the early layers in an n-layer network, meaning that the gradient decreases exponentially with n while the early layers train very slowly.

Backpropagation through structure (BPTS) is a gradient-based technique for training recursive neural networks, proposed in a 1996 paper written by Christoph Goller and Andreas Küchler.

Extreme learning machines are feedforward neural networks for classification, regression, clustering, sparse approximation, compression and feature learning with a single layer or multiple layers of hidden nodes, where the parameters of hidden nodes need to be tuned. These hidden nodes can be randomly assigned and never updated, or can be inherited from their ancestors without being changed. In most cases, the output weights of hidden nodes are usually learned in a single step, which essentially amounts to learning a linear model.

Artificial neural networks (ANNs) are models created using machine learning to perform a number of tasks. Their creation was inspired by biological neural circuitry. While some of the computational implementations ANNs relate to earlier discoveries in mathematics, the first implementation of ANNs was by psychologist Frank Rosenblatt, who developed the perceptron. Little research was conducted on ANNs in the 1970s and 1980s, with the AAAI calling this period an "AI winter".

Graph neural networks (GNN) are specialized artificial neural networks that are designed for tasks whose inputs are graphs.

References

  1. Goller, C.; Küchler, A. (1996). "Learning task-dependent distributed representations by backpropagation through structure". Proceedings of International Conference on Neural Networks (ICNN'96). Vol. 1. pp. 347–352. CiteSeerX   10.1.1.52.4759 . doi:10.1109/ICNN.1996.548916. ISBN   978-0-7803-3210-2. S2CID   6536466.
  2. Socher, Richard; Lin, Cliff; Ng, Andrew Y.; Manning, Christopher D. "Parsing Natural Scenes and Natural Language with Recursive Neural Networks" (PDF). The 28th International Conference on Machine Learning (ICML 2011).
  3. Li, Jun; Xu, Kai; Chaudhuri, Siddhartha; Yumer, Ersin; Zhang, Hao; Guibas, Leonadis (2017). "GRASS: Generative Recursive Autoencoders for Shape Structures" (PDF). ACM Transactions on Graphics. 36 (4): 52. arXiv: 1705.02090 . doi:10.1145/3072959.3073613. S2CID   20432407.
  4. Sperduti, A.; Starita, A. (1997-05-01). "Supervised neural networks for the classification of structures". IEEE Transactions on Neural Networks. 8 (3): 714–735. doi:10.1109/72.572108. ISSN   1045-9227. PMID   18255672.
  5. Bianucci, Anna Maria; Micheli, Alessio; Sperduti, Alessandro; Starita, Antonina (2000). "Application of Cascade Correlation Networks for Structures to Chemistry". Applied Intelligence. 12 (1–2): 117–147. doi:10.1023/A:1008368105614. ISSN   0924-669X. S2CID   10031212.
  6. Micheli, A.; Sona, D.; Sperduti, A. (2004-11-01). "Contextual processing of structured data by recursive cascade correlation". IEEE Transactions on Neural Networks. 15 (6): 1396–1410. CiteSeerX   10.1.1.135.8772 . doi:10.1109/TNN.2004.837783. ISSN   1045-9227. PMID   15565768. S2CID   12370239.
  7. Hammer, Barbara; Micheli, Alessio; Sperduti, Alessandro; Strickert, Marc (2004). "Recursive self-organizing network models". Neural Networks. 17 (8–9): 1061–1085. CiteSeerX   10.1.1.129.6155 . doi:10.1016/j.neunet.2004.06.009. PMID   15555852.
  8. Hammer, Barbara; Micheli, Alessio; Sperduti, Alessandro; Strickert, Marc (2004-03-01). "A general framework for unsupervised processing of structured data". Neurocomputing. 57: 3–35. CiteSeerX   10.1.1.3.984 . doi:10.1016/j.neucom.2004.01.008.
  9. Socher, Richard; Perelygin, Alex; Y. Wu, Jean; Chuang, Jason; D. Manning, Christopher; Y. Ng, Andrew; Potts, Christopher. "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank" (PDF). EMNLP 2013.
  10. Hammer, Barbara (2007-10-03). Learning with Recurrent Neural Networks. Springer. ISBN   9781846285677.
  11. Hammer, Barbara; Micheli, Alessio; Sperduti, Alessandro (2005-05-01). "Universal Approximation Capability of Cascade Correlation for Structures". Neural Computation. 17 (5): 1109–1159. CiteSeerX   10.1.1.138.2224 . doi:10.1162/0899766053491878. S2CID   10845957.
  12. Gallicchio, Claudio; Micheli, Alessio (2013-02-04). "Tree Echo State Networks". Neurocomputing. 101: 319–337. doi:10.1016/j.neucom.2012.08.017. hdl: 11568/158480 .
  13. Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; Monfardini, G. (2009-01-01). "The Graph Neural Network Model". IEEE Transactions on Neural Networks. 20 (1): 61–80. doi:10.1109/TNN.2008.2005605. ISSN   1045-9227. PMID   19068426. S2CID   206756462.
  14. Micheli, A. (2009-03-01). "Neural Network for Graphs: A Contextual Constructive Approach". IEEE Transactions on Neural Networks. 20 (3): 498–511. doi:10.1109/TNN.2008.2010350. ISSN   1045-9227. PMID   19193509. S2CID   17486263.