Echo state network

Last updated February 16, 2024

An echo state network (ESN)^[1]^[2] is a type of reservoir computer that uses a recurrent neural network with a sparsely connected hidden layer (with typically 1% connectivity). The connectivity and weights of hidden neurons are fixed and randomly assigned. The weights of output neurons can be learned so that the network can produce or reproduce specific temporal patterns. The main interest of this network is that although its behavior is non-linear, the only weights that are modified during training are for the synapses that connect the hidden neurons to output neurons. Thus, the error function is quadratic with respect to the parameter vector and can be differentiated easily to a linear system.

Alternatively, one may consider a nonparametric Bayesian formulation of the output layer, under which: (i) a prior distribution is imposed over the output weights; and (ii) the output weights are marginalized out in the context of prediction generation, given the training data. This idea has been demonstrated in^[3] by using Gaussian priors, whereby a Gaussian process model with ESN-driven kernel function is obtained. Such a solution was shown to outperform ESNs with trainable (finite) sets of weights in several benchmarks.

Some publicly available implementations of ESNs are: (i) aureservoir: an efficient C++ library for various kinds of echo state networks with python/numpy bindings; (ii) Matlab code: an efficient matlab for an echo state network; (iii) ReservoirComputing.jl: an efficient Julia-based implementation of various types of echo state networks; and (iv) pyESN: simple echo state networks in Python.

Background

The Echo State Network (ESN)^[4] belongs to the Recurrent Neural Network (RNN) family and provide their architecture and supervised learning principle. Unlike Feedforward Neural Networks, Recurrent Neural Networks are dynamic systems and not functions. Recurrent Neural Networks are typically used for:

Learning dynamical processes: signal treatment in engineering and telecommunications, vibration analysis, seismology, and control of engines and generators.
Signal forecasting and generation: text, music, electric signals, chaotic signals.^[5]
Modeling of biological systems, neurosciences (cognitive neurodynamics), memory modeling, brain-computer interfaces (BCIs), filtering and Kalman processes, military applications, volatility modeling etc.

For the training of RNNs a number of learning algorithms are available: backpropagation through time, real-time recurrent learning. Convergence is not guaranteed due to instability and bifurcation phenomena.^[4]

The main approach of the ESN is firstly to operate a random, large, fixed, recurring neural network with the input signal, which induces a nonlinear response signal in each neuron within this "reservoir" network, and secondly connect a desired output signal by a trainable linear combination of all these response signals.^[2]

Another feature of the ESN is the autonomous operation in prediction: if the Echo State Network is trained with an input that is a backshifted version of the output, then it can be used for signal generation/prediction by using the previous output as input.^[4]^[5]

The main idea of ESNs is tied to Liquid State Machines (LSM), which were independently and simultaneously developed with ESNs by Wolfgang Maass.^[6] LSMs, ESNs and the newly researched Backpropagation Decorrelation learning rule for RNNs^[7] are more and more summarized under the name Reservoir Computing.

Schiller and Steil^[7] also demonstrated that in conventional training approaches for RNNs, in which all weights (not only output weights) are adapted, the dominant changes are in output weights. In cognitive neuroscience, Peter F. Dominey analysed a related process related to the modelling of sequence processing in the mammalian brain, in particular speech recognition in the human brain.^[8] The basic idea also included a model of temporal input discrimination in biological neuronal networks.^[9] An early clear formulation of the reservoir computing idea is due to K. Kirby, who disclosed this concept in a largely forgotten conference contribution.^[10] The first formulation of the reservoir computing idea known today stems from L. Schomaker,^[11] who described how a desired target output could be obtained from an RNN by learning to combine signals from a randomly configured ensemble of spiking neural oscillators.^[2]

Variants

Echo state networks can be built in different ways. They can be set up with or without directly trainable input-to-output connections, with or without output reservation feedback, with different neurotypes, different reservoir internal connectivity patterns etc. The output weight can be calculated for linear regression with all algorithms whether they are online or offline. In addition to the solutions for errors with smallest squares, margin maximization criteria, so-called training support vector machines, are used to determine the output values.^[12] Other variants of echo state networks seek to change the formulation to better match common models of physical systems, such as those typically those defined by differential equations. Work in this direction includes echo state networks which partially include physical models,^[13] hybrid echo state networks,^[14] and continuous-time echo state networks.^[15]

The fixed RNN acts as a random, nonlinear medium whose dynamic response, the "echo", is used as a signal base. The linear combination of this base can be trained to reconstruct the desired output by minimizing some error criteria.^[2]

Significance

RNNs were rarely used in practice before the introduction of the ESN, because of the complexity involved in adjusting their connections (e.g., lack of autodifferentiation, susceptibility to vanishing/exploding gradients, etc.). RNN training algorithms were slow and often vulnerable to issues, such as branching errors.^[16] Convergence could therefore not be guaranteed. On the other hand, ESN training does not have a problem with branching and is easy to implement. In early studies, ESNs were shown to perform well on time series prediction tasks from synthetic datasets.^[1]^[17]

However, today, many of the problems that made RNNs slow and error-prone have been addressed with the advent of autodifferentiation (deep learning) libraries, as well as more stable architectures such as LSTM and GRU—thus, the unique selling point of ESNs has been lost. In addition, RNNs have proven themselves in several practical areas, such as language processing. To cope with tasks of similar complexity using reservoir calculation methods requires memory of excessive size.

However, ESNs are used in some areas, such as many signal processing applications. In particular, they have been widely used as a computing principle that mixes well with non-digital computer substrates. Since ESNs do not need to modify the parameters of the RNN, they make it possible to use many different objects as their nonlinear "reservoir″. For example, optical microchips, mechanical nanooscillators, polymer mixtures, or even artificial soft limbs.^[2]

Related Research Articles

Artificial neural networks are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains.

An artificial neuron is a mathematical function conceived as a model of biological neurons in a neural network. Artificial neurons are the elementary units of artificial neural networks. The artificial neuron is a function that receives one or more inputs, applies weights to these inputs, and sums them to produce an output.

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to the uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.

A feedforward neural network (FNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. Its flow is uni-directional, meaning that the information in the model flows in only one direction—forward—from the input nodes, through the hidden nodes and to the output nodes, without any cycles or loops, in contrast to recurrent neural networks, which have a bi-directional flow. Modern feedforward networks are trained using the backpropagation method and are colloquially referred to as the "vanilla" neural networks.

Quantum neural networks are computational neural network models which are based on the principles of quantum mechanics. The first ideas on quantum neural computation were published independently in 1995 by Subhash Kak and Ron Chrisley, engaging with the theory of quantum mind, which posits that quantum effects play a role in cognitive function. However, typical research in quantum neural networks involves combining classical artificial neural network models with the advantages of quantum information in order to develop more efficient algorithms. One important motivation for these investigations is the difficulty to train classical neural networks, especially in big data applications. The hope is that features of quantum computing such as quantum parallelism or the effects of interference and entanglement can be used as resources. Since the technological implementation of a quantum computer is still in a premature stage, such quantum neural network models are mostly theoretical proposals that await their full implementation in physical experiments.

A liquid state machine (LSM) is a type of reservoir computer that uses a spiking neural network. An LSM consists of a large collection of units. Each node receives time varying input from external sources as well as from other nodes. Nodes are randomly connected to each other. The recurrent nature of the connections turns the time varying input into a spatio-temporal pattern of activations in the network nodes. The spatio-temporal patterns of activation are read out by linear discriminant units.

<span class="mw-page-title-main">Spiking neural network</span> Artificial neural network that mimics neurons

Spiking neural networks (SNNs) are artificial neural networks (ANN) that more closely mimic natural neural networks. In addition to neuronal and synaptic state, SNNs incorporate the concept of time into their operating model. The idea is that neurons in the SNN do not transmit information at each propagation cycle, but rather transmit information only when a membrane potential—an intrinsic quality of the neuron related to its membrane electrical charge—reaches a specific value, called the threshold. When the membrane potential reaches the threshold, the neuron fires, and generates a signal that travels to other neurons which, in turn, increase or decrease their potentials in response to this signal. A neuron model that fires at the moment of threshold crossing is also called a spiking neuron model.

Reservoir computing is a framework for computation derived from recurrent neural network theory that maps input signals into higher dimensional computational spaces through the dynamics of a fixed, non-linear system called a reservoir. After the input signal is fed into the reservoir, which is treated as a "black box," a simple readout mechanism is trained to read the state of the reservoir and map it to the desired output. The first key benefit of this framework is that training is performed only at the readout stage, as the reservoir dynamics are fixed. The second is that the computational power of naturally available systems, both classical and quantum mechanical, can be used to reduce the effective computational cost.

Long short-term memory (LSTM) network is a recurrent neural network (RNN), aimed to deal with the vanishing gradient problem present in traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps, thus "long short-term memory". It is applicable to classification, processing and predicting data based on time series, such as in handwriting, speech recognition, machine translation, speech activity detection, robot control, video games, and healthcare.

<span class="mw-page-title-main">Activation function</span> Artificial neural network node function

The activation function of a node in an artificial neural network is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation function is nonlinear. Modern activation functions include the smooth version of the ReLU, the GELU, which was used in the 2018 BERT model, the logistic (sigmoid) function used in the 2012 speech recognition model developed by Hinton et al, the ReLU used in the 2012 AlexNet computer vision model and in the 2015 ResNet model.

Time delay neural network (TDNN) is a multilayer artificial neural network architecture whose purpose is to 1) classify patterns with shift-invariance, and 2) model context at each layer of the network.

Models of neural computation are attempts to elucidate, in an abstract and mathematical fashion, the core principles that underlie information processing in biological nervous systems, or functional components thereof. This article aims to provide an overview of the most definitive models of neuro-biological computation as well as the tools commonly used to construct and analyze them.

There are many types of artificial neural networks (ANN).

Deep learning is the subset of machine learning methods based on artificial neural networks (ANNs) with representation learning. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

An artificial neural network's learning rule or learning process is a method, mathematical logic or algorithm which improves the network's performance and/or training time. Usually, this rule is applied repeatedly over the network. It is done by updating the weights and bias levels of a network when a network is simulated in a specific data environment. A learning rule may accept existing conditions of the network and will compare the expected result and actual result of the network to give new and improved values for weights and bias. Depending on the complexity of actual model being simulated, the learning rule of the network can be as simple as an XOR gate or mean squared error, or as complex as the result of a system of differential equations.

Convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns feature engineering by itself via filters optimization. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

A recursive neural network is a kind of deep neural network created by applying the same set of weights recursively over a structured input, to produce a structured prediction over variable-size input structures, or a scalar prediction on it, by traversing a given structure in topological order. Recursive neural networks, sometimes abbreviated as RvNNs, have been successful, for instance, in learning sequence and tree structures in natural language processing, mainly phrase and sentence continuous representations based on word embedding. RvNNs have first been introduced to learn distributed representations of structure, such as logical terms. Models and general frameworks have been developed in further works since the 1990s.

Bidirectional recurrent neural networks (BRNN) connect two hidden layers of opposite directions to the same output. With this form of generative deep learning, the output layer can get information from past (backwards) and future (forward) states simultaneously. Invented in 1997 by Schuster and Paliwal, BRNNs were introduced to increase the amount of input information available to the network. For example, multilayer perceptron (MLPs) and time delay neural network (TDNNs) have limitations on the input data flexibility, as they require their input data to be fixed. Standard recurrent neural network (RNNs) also have restrictions as the future input information cannot be reached from the current state. On the contrary, BRNNs do not require their input data to be fixed. Moreover, their future input information is reachable from the current state.

Artificial neural networks (ANNs) are models created using machine learning to perform a number of tasks. Their creation was inspired by neural circuitry. While some of the computational implementations ANNs relate to earlier discoveries in mathematics, the first implementation of ANNs was by psychologist Frank Rosenblatt, who developed the perceptron. Little research was conducted on ANNs in the 1970s and 1980s, with the AAAI calling that period an "AI winter".

A layer in a deep learning model is a structure or network topology in the model's architecture, which takes information from the previous layers and then passes it to the next layer.

References

1 2 Jaeger, H.; Haas, H. (2004). "Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication" (PDF). Science. 304 (5667): 78–80. Bibcode:2004Sci...304...78J. doi:10.1126/science.1091277. PMID 15064413. S2CID 2184251.
1 2 3 4 5 Jaeger, Herbert (2007). "Echo state network". Scholarpedia. 2 (9): 2330. Bibcode:2007SchpJ...2.2330J. doi: 10.4249/scholarpedia.2330 .
↑ Chatzis, S. P.; Demiris, Y. (2011). "Echo State Gaussian Process". IEEE Transactions on Neural Networks. 22 (9): 1435–1445. doi:10.1109/TNN.2011.2162109. PMID 21803684. S2CID 8553623.
1 2 3 Jaeger, Herbert (2002). A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the "echo state network" approach. Germany: German National Research Center for Information Technology. pp. 1–45.
1 2 Antonik, Piotr; Gulina, Marvyn; Pauwels, Jaël; Massar, Serge (2018). "Using a reservoir computer to learn chaotic attractors, with applications to chaos synchronization and cryptography". Phys. Rev. E. 98 (1): 012215. arXiv: 1802.02844 . Bibcode:2018PhRvE..98a2215A. doi:10.1103/PhysRevE.98.012215. PMID 30110744. S2CID 3616565.
↑ Maass W., Natschlaeger T., and Markram H. (2002). "Real-time computing without stable states: A new framework for neural computation based on perturbations". Neural Computation. 14 (11): 2531–2560. doi:10.1162/089976602760407955. PMID 12433288. S2CID 1045112.{{cite journal}}: CS1 maint: multiple names: authors list (link)
1 2 Schiller U.D. and Steil J. J. (2005). "Analyzing the weight dynamics of recurrent learning algorithms". Neurocomputing. 63: 5–23. doi:10.1016/j.neucom.2004.04.006.
↑ Dominey P.F. (1995). "Complex sensory-motor sequence learning based on recurrent state representation and reinforcement learning". Biol. Cybernetics. 73 (3): 265–274. doi:10.1007/BF00201428. PMID 7548314. S2CID 1603500.
↑ Buonomano, D.V. and Merzenich, M.M. (1995). "Temporal Information Transformed into a Spatial Code by a Neural Network with Realistic Properties". Science. 267 (5200): 1028–1030. Bibcode:1995Sci...267.1028B. doi:10.1126/science.7863330. PMID 7863330. S2CID 12880807.{{cite journal}}: CS1 maint: multiple names: authors list (link)
↑ Kirby, K. (1991). "Context dynamics in neural sequential learning. Proc". Florida AI Research Symposium: 66–70.
↑ Schomaker, L. (1992). "A neural oscillator-network model of temporal pattern generation". Human Movement Science. 11 (1–2): 181–192. doi:10.1016/0167-9457(92)90059-K.
↑ Schmidhuber J., Gomez F., Wierstra D., and Gagliolo M. (2007). "Training recurrent networks by evolino". Neural Computation. 19 (3): 757–779. CiteSeerX 10.1.1.218.3086 . doi:10.1162/neco.2007.19.3.757. PMID 17298232. S2CID 11745761.{{cite journal}}: CS1 maint: multiple names: authors list (link)
↑ Doan N, Polifke W, Magri L (2020). "Physics-Informed Echo State Networks". Journal of Computational Science. 47: 101237. arXiv: 2011.02280 . doi:10.1016/j.jocs.2020.101237. S2CID 226246385.{{cite journal}}: CS1 maint: multiple names: authors list (link)
↑ Pathak J, Wikner A, Russel R, Chandra S, Hunt B, Girvan M, Ott E (2018). "Hybrid Forecasting of Chaotic Processes: Using Machine Learning in Conjunction with a Knowledge-Based Model". Chaos. 28 (4): 041101. arXiv: 1803.04779 . Bibcode:2018Chaos..28d1101P. doi:10.1063/1.5028373. PMID 31906641. S2CID 3883587.{{cite journal}}: CS1 maint: multiple names: authors list (link)
↑ Anantharaman, Ranjan; Ma, Yingbo; Gowda, Shashi; Laughman, Chris; Shah, Viral; Edelman, Alan; Rackauckas, Chris (2020). "Accelerating Simulation of Stiff Nonlinear Systems using Continuous-Time Echo State Networks". arXiv: 2010.04004 [cs.LG].
↑ Doya K. (1992). "Bifurcations in the learning of recurrent neural networks". [Proceedings] 1992 IEEE International Symposium on Circuits and Systems. Vol. 6. pp. 2777–2780. doi:10.1109/ISCAS.1992.230622. ISBN 0-7803-0593-0. S2CID 15069221.
↑ Jaeger H. (2007). "Discovering multiscale dynamical features with hierarchical echo state networks". Technical Report 10, School of Engineering and Science, Jacobs University.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:0-1] 1 2 Jaeger, H.; Haas, H. (2004). "Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication" (PDF). Science. 304 (5667): 78–80. Bibcode:2004Sci...304...78J. doi:10.1126/science.1091277. PMID 15064413. S2CID 2184251.

[:1-2] 1 2 3 4 5 Jaeger, Herbert (2007). "Echo state network". Scholarpedia. 2 (9): 2330. Bibcode:2007SchpJ...2.2330J. doi: 10.4249/scholarpedia.2330 .

[3] Chatzis, S. P.; Demiris, Y. (2011). "Echo State Gaussian Process". IEEE Transactions on Neural Networks. 22 (9): 1435–1445. doi:10.1109/TNN.2011.2162109. PMID 21803684. S2CID 8553623.

[:2-4] 1 2 3 Jaeger, Herbert (2002). A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the "echo state network" approach. Germany: German National Research Center for Information Technology. pp. 1–45.

[:4-5] 1 2 Antonik, Piotr; Gulina, Marvyn; Pauwels, Jaël; Massar, Serge (2018). "Using a reservoir computer to learn chaotic attractors, with applications to chaos synchronization and cryptography". Phys. Rev. E. 98 (1): 012215. arXiv: 1802.02844 . Bibcode:2018PhRvE..98a2215A. doi:10.1103/PhysRevE.98.012215. PMID 30110744. S2CID 3616565.

[6] Maass W., Natschlaeger T., and Markram H. (2002). "Real-time computing without stable states: A new framework for neural computation based on perturbations". Neural Computation. 14 (11): 2531–2560. doi:10.1162/089976602760407955. PMID 12433288. S2CID 1045112.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[:3-7] 1 2 Schiller U.D. and Steil J. J. (2005). "Analyzing the weight dynamics of recurrent learning algorithms". Neurocomputing. 63: 5–23. doi:10.1016/j.neucom.2004.04.006.

[8] Dominey P.F. (1995). "Complex sensory-motor sequence learning based on recurrent state representation and reinforcement learning". Biol. Cybernetics. 73 (3): 265–274. doi:10.1007/BF00201428. PMID 7548314. S2CID 1603500.

[9] Buonomano, D.V. and Merzenich, M.M. (1995). "Temporal Information Transformed into a Spatial Code by a Neural Network with Realistic Properties". Science. 267 (5200): 1028–1030. Bibcode:1995Sci...267.1028B. doi:10.1126/science.7863330. PMID 7863330. S2CID 12880807.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[10] Kirby, K. (1991). "Context dynamics in neural sequential learning. Proc". Florida AI Research Symposium: 66–70.

[11] Schomaker, L. (1992). "A neural oscillator-network model of temporal pattern generation". Human Movement Science. 11 (1–2): 181–192. doi:10.1016/0167-9457(92)90059-K.

[12] Schmidhuber J., Gomez F., Wierstra D., and Gagliolo M. (2007). "Training recurrent networks by evolino". Neural Computation. 19 (3): 757–779. CiteSeerX 10.1.1.218.3086 . doi:10.1162/neco.2007.19.3.757. PMID 17298232. S2CID 11745761.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[13] Doan N, Polifke W, Magri L (2020). "Physics-Informed Echo State Networks". Journal of Computational Science. 47: 101237. arXiv: 2011.02280 . doi:10.1016/j.jocs.2020.101237. S2CID 226246385.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[14] Pathak J, Wikner A, Russel R, Chandra S, Hunt B, Girvan M, Ott E (2018). "Hybrid Forecasting of Chaotic Processes: Using Machine Learning in Conjunction with a Knowledge-Based Model". Chaos. 28 (4): 041101. arXiv: 1803.04779 . Bibcode:2018Chaos..28d1101P. doi:10.1063/1.5028373. PMID 31906641. S2CID 3883587.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[15] Anantharaman, Ranjan; Ma, Yingbo; Gowda, Shashi; Laughman, Chris; Shah, Viral; Edelman, Alan; Rackauckas, Chris (2020). "Accelerating Simulation of Stiff Nonlinear Systems using Continuous-Time Echo State Networks". arXiv: 2010.04004 [cs.LG].

[16] Doya K. (1992). "Bifurcations in the learning of recurrent neural networks". [Proceedings] 1992 IEEE International Symposium on Circuits and Systems. Vol. 6. pp. 2777–2780. doi:10.1109/ISCAS.1992.230622. ISBN 0-7803-0593-0. S2CID 15069221.

[17] Jaeger H. (2007). "Discovering multiscale dynamical features with hierarchical echo state networks". Technical Report 10, School of Engineering and Science, Jacobs University.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]