Echo state network

Last updated January 03, 2025

An echo state network (ESN)^[1]^[2] is a type of reservoir computer that uses a recurrent neural network with a sparsely connected hidden layer (with typically 1% connectivity). The connectivity and weights of hidden neurons are fixed and randomly assigned. The weights of output neurons can be learned so that the network can produce or reproduce specific temporal patterns. The main interest of this network is that although its behavior is non-linear, the only weights that are modified during training are for the synapses that connect the hidden neurons to output neurons. Thus, the error function is quadratic with respect to the parameter vector and can be differentiated easily to a linear system.

Alternatively, one may consider a nonparametric Bayesian formulation of the output layer, under which: (i) a prior distribution is imposed over the output weights; and (ii) the output weights are marginalized out in the context of prediction generation, given the training data. This idea has been demonstrated in^[3] by using Gaussian priors, whereby a Gaussian process model with ESN-driven kernel function is obtained. Such a solution was shown to outperform ESNs with trainable (finite) sets of weights in several benchmarks.

Some publicly available efficient implementations of ESNs are aureservoir (a C++ library for various kinds with python/numpy bindings), MATLAB, ReservoirComputing.jl (a Julia-based implementation of various types) and pyESN (for simple ESNs in Python).

Background

The Echo State Network (ESN)^[4] belongs to the Recurrent Neural Network (RNN) family and provide their architecture and supervised learning principle. Unlike Feedforward Neural Networks, Recurrent Neural Networks are dynamic systems and not functions. Recurrent Neural Networks are typically used for:

Learning dynamical processes: signal treatment in engineering and telecommunications, vibration analysis, seismology, and control of engines and generators.
Signal forecasting and generation: text, music, electric signals, chaotic signals.^[5]
Modeling of biological systems, neurosciences (cognitive neurodynamics), memory modeling, brain-computer interfaces (BCIs), filtering and Kalman processes, military applications, volatility modeling etc.

For the training of RNNs a number of learning algorithms are available: backpropagation through time, real-time recurrent learning. Convergence is not guaranteed due to instability and bifurcation phenomena.^[4]

The main approach of the ESN is firstly to operate a random, large, fixed, recurring neural network with the input signal, which induces a nonlinear response signal in each neuron within this "reservoir" network, and secondly connect a desired output signal by a trainable linear combination of all these response signals.^[2]

Another feature of the ESN is the autonomous operation in prediction: if it is trained with an input that is a backshifted version of the output, then it can be used for signal generation/prediction by using the previous output as input.^[4]^[5]

The main idea of ESNs is tied to liquid state machines, which were independently and simultaneously developed with ESNs by Wolfgang Maass.^[6] They, ESNs and the newly researched backpropagation decorrelation learning rule for RNNs^[7] are more and more summarized under the name Reservoir Computing.

Schiller and Steil^[7] also demonstrated that in conventional training approaches for RNNs, in which all weights (not only output weights) are adapted, the dominant changes are in output weights. In cognitive neuroscience, Peter F. Dominey analysed a related process related to the modelling of sequence processing in the mammalian brain, in particular speech recognition in the human brain.^[8] The basic idea also included a model of temporal input discrimination in biological neuronal networks.^[9] An early clear formulation of the reservoir computing idea is due to K. Kirby, who disclosed this concept in a largely forgotten conference contribution.^[10] The first formulation of the reservoir computing idea known today stems from L. Schomaker,^[11] who described how a desired target output could be obtained from an RNN by learning to combine signals from a randomly configured ensemble of spiking neural oscillators.^[2]

Variants

Echo state networks can be built in different ways. They can be set up with or without directly trainable input-to-output connections, with or without output reservation feedback, with different neurotypes, different reservoir internal connectivity patterns etc. The output weight can be calculated for linear regression with all algorithms whether they are online or offline. In addition to the solutions for errors with smallest squares, margin maximization criteria, so-called training support vector machines, are used to determine the output values.^[12] Other variants of echo state networks seek to change the formulation to better match common models of physical systems, such as those typically those defined by differential equations. Work in this direction includes echo state networks which partially include physical models,^[13] hybrid echo state networks,^[14] and continuous-time echo state networks.^[15]

The fixed RNN acts as a random, nonlinear medium whose dynamic response, the "echo", is used as a signal base. The linear combination of this base can be trained to reconstruct the desired output by minimizing some error criteria.^[2]

Significance

RNNs were rarely used in practice before the introduction of the ESN, because of the complexity involved in adjusting their connections (e.g., lack of autodifferentiation, susceptibility to vanishing/exploding gradients, etc.). RNN training algorithms were slow and often vulnerable to issues, such as branching errors.^[16] Convergence could therefore not be guaranteed. On the other hand, ESN training does not have a problem with branching and is easy to implement. In early studies, ESNs were shown to perform well on time series prediction tasks from synthetic datasets.^[1]^[17]

Today, many of the problems that made RNNs slow and error-prone have been addressed with the advent of autodifferentiation (deep learning) libraries, as well as more stable architectures such as long short-term memory and Gated recurrent unit; thus, the unique selling point of ESNs has been lost. RNNs have also proven themselves in several practical areas, such as language processing. To cope with tasks of similar complexity using reservoir calculation methods requires memory of excessive size.

ESNs are used in some areas, such as signal processing applications. In particular, they have been widely used as a computing principle that mixes well with non-digital computer substrates. Since ESNs do not need to modify the parameters of the RNN, they make it possible to use many different objects as their nonlinear "reservoir″. For example, optical microchips, mechanical nanooscillators, polymer mixtures, or even artificial soft limbs.^[2]

Related Research Articles

In machine learning, a neural network is a model inspired by the structure and function of biological neural networks in animal brains.

An artificial neuron is a mathematical function conceived as a model of a biological neuron in a neural network. The artificial neuron is the elementary unit of an artificial neural network.

Recurrent neural networks (RNNs) are a class of artificial neural network commonly used for sequential data processing. Unlike feedforward neural networks, which process data in a single pass, RNNs process data across multiple time steps, making them well-adapted for modelling and processing text, speech, and time series.

Feedforward refers to recognition-inference architecture of neuronal networks. Artificial neural network architectures are based on inputs multiplied by weights to obtain outputs (inputs-to-output): feedforward. Recurrent neural networks, or neural networks with loops allow information from later processing stages to earlier stages for sequence processing. However, at every stage of inference a feedforward multiplication remains the core, essential for backpropagation or backpropagation through time. Thus neural networks cannot contain feedback like negative feedback or positive feedback where the outputs feed back to the very same inputs and modify them, because this forms an infinite loop which is not possible to rewind in time to generate an error signal through backpropagation. This issue and nomenclature appear to be a point of confusion between some computer scientists and scientists in other fields studying brain networks.

Winner-take-all is a computational principle applied in computational models of neural networks by which neurons compete with each other for activation. In the classical form, only the neuron with the highest activation stays active while all other neurons shut down; however, other variations allow more than one neuron to be active, for example the soft winner take-all, by which a power function is applied to the neurons.

Quantum neural networks are computational neural network models which are based on the principles of quantum mechanics. The first ideas on quantum neural computation were published independently in 1995 by Subhash Kak and Ron Chrisley, engaging with the theory of quantum mind, which posits that quantum effects play a role in cognitive function. However, typical research in quantum neural networks involves combining classical artificial neural network models with the advantages of quantum information in order to develop more efficient algorithms. One important motivation for these investigations is the difficulty to train classical neural networks, especially in big data applications. The hope is that features of quantum computing such as quantum parallelism or the effects of interference and entanglement can be used as resources. Since the technological implementation of a quantum computer is still in a premature stage, such quantum neural network models are mostly theoretical proposals that await their full implementation in physical experiments.

A liquid state machine (LSM) is a type of reservoir computer that uses a spiking neural network. An LSM consists of a large collection of units. Each node receives time varying input from external sources as well as from other nodes. Nodes are randomly connected to each other. The recurrent nature of the connections turns the time varying input into a spatio-temporal pattern of activations in the network nodes. The spatio-temporal patterns of activation are read out by linear discriminant units.

<span class="mw-page-title-main">Spiking neural network</span> Artificial neural network that mimics neurons

Spiking neural networks (SNNs) are artificial neural networks (ANN) that more closely mimic natural neural networks. These models leverage timing of discrete spikes as the main information carrier.

Reservoir computing is a framework for computation derived from recurrent neural network theory that maps input signals into higher dimensional computational spaces through the dynamics of a fixed, non-linear system called a reservoir. After the input signal is fed into the reservoir, which is treated as a "black box," a simple readout mechanism is trained to read the state of the reservoir and map it to the desired output. The first key benefit of this framework is that training is performed only at the readout stage, as the reservoir dynamics are fixed. The second is that the computational power of naturally available systems, both classical and quantum mechanical, can be used to reduce the effective computational cost.

Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at mitigating the vanishing gradient problem commonly encountered by traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models, and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps. The name is made in analogy with long-term memory and short-term memory and their relationship, studied by cognitive psychologists since the early 20th century.

Time delay neural network (TDNN) is a multilayer artificial neural network architecture whose purpose is to 1) classify patterns with shift-invariance, and 2) model context at each layer of the network.

Models of neural computation are attempts to elucidate, in an abstract and mathematical fashion, the core principles that underlie information processing in biological nervous systems, or functional components thereof. This article aims to provide an overview of the most definitive models of neuro-biological computation as well as the tools commonly used to construct and analyze them.

There are many types of artificial neural networks (ANN).

Deep learning is a subset of machine learning that focuses on utilizing neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience and is centered around stacking artificial neurons into layers and "training" them to process data. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

A convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns features by itself via filter optimization. This type of deep learning network has been applied to process and make predictions from many different types of data including text, images and audio. Convolution-based networks are the de-facto standard in deep learning-based approaches to computer vision and image processing, and have only recently been replaced -- in some cases -- by newer deep learning architectures such as the transformer. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

A recursive neural network is a kind of deep neural network created by applying the same set of weights recursively over a structured input, to produce a structured prediction over variable-size input structures, or a scalar prediction on it, by traversing a given structure in topological order. These networks were first introduced to learn distributed representations of structure, but have been successful in multiple applications, for instance in learning sequence and tree structures in natural language processing.

Bidirectional recurrent neural networks (BRNN) connect two hidden layers of opposite directions to the same output. With this form of generative deep learning, the output layer can get information from past (backwards) and future (forward) states simultaneously. Invented in 1997 by Schuster and Paliwal, BRNNs were introduced to increase the amount of input information available to the network. For example, multilayer perceptron (MLPs) and time delay neural network (TDNNs) have limitations on the input data flexibility, as they require their input data to be fixed. Standard recurrent neural network (RNNs) also have restrictions as the future input information cannot be reached from the current state. On the contrary, BRNNs do not require their input data to be fixed. Moreover, their future input information is reachable from the current state.

Artificial neural networks (ANNs) are models created using machine learning to perform a number of tasks. Their creation was inspired by biological neural circuitry. While some of the computational implementations ANNs relate to earlier discoveries in mathematics, the first implementation of ANNs was by psychologist Frank Rosenblatt, who developed the perceptron. Little research was conducted on ANNs in the 1970s and 1980s, with the AAAI calling this period an "AI winter".

A layer in a deep learning model is a structure or network topology in the model's architecture, which takes information from the previous layers and then passes it to the next layer.

<span class="mw-page-title-main">Attention Is All You Need</span> 2017 research paper by Google

"Attention Is All You Need" is a 2017 landmark research paper in machine learning authored by eight scientists working at Google. The paper introduced a new deep learning architecture known as the transformer, based on the attention mechanism proposed in 2014 by Bahdanau et al. It is considered a foundational paper in modern artificial intelligence, as the transformer approach has become the main architecture of large language models like those based on GPT. At the time, the focus of the research was on improving Seq2seq techniques for machine translation, but the authors go further in the paper, foreseeing the technique's potential for other tasks like question answering and what is now known as multimodal Generative AI.

References

1 2 Jaeger, H.; Haas, H. (2004). "Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication" (PDF). Science. 304 (5667): 78–80. Bibcode:2004Sci...304...78J. doi:10.1126/science.1091277. PMID 15064413. S2CID 2184251.
1 2 3 4 5 Jaeger, Herbert (2007). "Echo state network". Scholarpedia. 2 (9): 2330. Bibcode:2007SchpJ...2.2330J. doi: 10.4249/scholarpedia.2330 .
↑ Chatzis, S. P.; Demiris, Y. (2011). "Echo State Gaussian Process". IEEE Transactions on Neural Networks. 22 (9): 1435–1445. doi:10.1109/TNN.2011.2162109. PMID 21803684. S2CID 8553623.
1 2 3 Jaeger, Herbert (2002). A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the "echo state network" approach. Germany: German National Research Center for Information Technology. pp. 1–45.
1 2 Antonik, Piotr; Gulina, Marvyn; Pauwels, Jaël; Massar, Serge (2018). "Using a reservoir computer to learn chaotic attractors, with applications to chaos synchronization and cryptography". Phys. Rev. E. 98 (1): 012215. arXiv: 1802.02844 . Bibcode:2018PhRvE..98a2215A. doi:10.1103/PhysRevE.98.012215. PMID 30110744. S2CID 3616565.
↑ Maass W., Natschlaeger T., and Markram H. (2002). "Real-time computing without stable states: A new framework for neural computation based on perturbations". Neural Computation. 14 (11): 2531–2560. doi:10.1162/089976602760407955. PMID 12433288. S2CID 1045112.{{cite journal}}: CS1 maint: multiple names: authors list (link)
1 2 Schiller U.D. and Steil J. J. (2005). "Analyzing the weight dynamics of recurrent learning algorithms". Neurocomputing. 63: 5–23. doi:10.1016/j.neucom.2004.04.006.
↑ Dominey P.F. (1995). "Complex sensory-motor sequence learning based on recurrent state representation and reinforcement learning". Biol. Cybernetics. 73 (3): 265–274. doi:10.1007/BF00201428. PMID 7548314. S2CID 1603500.
↑ Buonomano, D.V. and Merzenich, M.M. (1995). "Temporal Information Transformed into a Spatial Code by a Neural Network with Realistic Properties". Science. 267 (5200): 1028–1030. Bibcode:1995Sci...267.1028B. doi:10.1126/science.7863330. PMID 7863330. S2CID 12880807.{{cite journal}}: CS1 maint: multiple names: authors list (link)
↑ Kirby, K. (1991). "Context dynamics in neural sequential learning. Proc". Florida AI Research Symposium: 66–70.
↑ Schomaker, L. (1992). "A neural oscillator-network model of temporal pattern generation". Human Movement Science. 11 (1–2): 181–192. doi:10.1016/0167-9457(92)90059-K.
↑ Schmidhuber J., Gomez F., Wierstra D., and Gagliolo M. (2007). "Training recurrent networks by evolino". Neural Computation. 19 (3): 757–779. CiteSeerX 10.1.1.218.3086 . doi:10.1162/neco.2007.19.3.757. PMID 17298232. S2CID 11745761.{{cite journal}}: CS1 maint: multiple names: authors list (link)
↑ Doan N, Polifke W, Magri L (2020). "Physics-Informed Echo State Networks". Journal of Computational Science. 47: 101237. arXiv: 2011.02280 . doi:10.1016/j.jocs.2020.101237. S2CID 226246385.{{cite journal}}: CS1 maint: multiple names: authors list (link)
↑ Pathak J, Wikner A, Russel R, Chandra S, Hunt B, Girvan M, Ott E (2018). "Hybrid Forecasting of Chaotic Processes: Using Machine Learning in Conjunction with a Knowledge-Based Model". Chaos. 28 (4): 041101. arXiv: 1803.04779 . Bibcode:2018Chaos..28d1101P. doi:10.1063/1.5028373. PMID 31906641. S2CID 3883587.{{cite journal}}: CS1 maint: multiple names: authors list (link)
↑ Anantharaman, Ranjan; Ma, Yingbo; Gowda, Shashi; Laughman, Chris; Shah, Viral; Edelman, Alan; Rackauckas, Chris (2020). "Accelerating Simulation of Stiff Nonlinear Systems using Continuous-Time Echo State Networks". arXiv: 2010.04004 [cs.LG].
↑ Doya K. (1992). "Bifurcations in the learning of recurrent neural networks". [Proceedings] 1992 IEEE International Symposium on Circuits and Systems. Vol. 6. pp. 2777–2780. doi:10.1109/ISCAS.1992.230622. ISBN 0-7803-0593-0. S2CID 15069221.
↑ Jaeger H. (2007). "Discovering multiscale dynamical features with hierarchical echo state networks". Technical Report 10, School of Engineering and Science, Jacobs University.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:0-1] 1 2 Jaeger, H.; Haas, H. (2004). "Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication" (PDF). Science. 304 (5667): 78–80. Bibcode:2004Sci...304...78J. doi:10.1126/science.1091277. PMID 15064413. S2CID 2184251.

[:1-2] 1 2 3 4 5 Jaeger, Herbert (2007). "Echo state network". Scholarpedia. 2 (9): 2330. Bibcode:2007SchpJ...2.2330J. doi: 10.4249/scholarpedia.2330 .

[3] Chatzis, S. P.; Demiris, Y. (2011). "Echo State Gaussian Process". IEEE Transactions on Neural Networks. 22 (9): 1435–1445. doi:10.1109/TNN.2011.2162109. PMID 21803684. S2CID 8553623.

[:2-4] 1 2 3 Jaeger, Herbert (2002). A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the "echo state network" approach. Germany: German National Research Center for Information Technology. pp. 1–45.

[:4-5] 1 2 Antonik, Piotr; Gulina, Marvyn; Pauwels, Jaël; Massar, Serge (2018). "Using a reservoir computer to learn chaotic attractors, with applications to chaos synchronization and cryptography". Phys. Rev. E. 98 (1): 012215. arXiv: 1802.02844 . Bibcode:2018PhRvE..98a2215A. doi:10.1103/PhysRevE.98.012215. PMID 30110744. S2CID 3616565.

[6] Maass W., Natschlaeger T., and Markram H. (2002). "Real-time computing without stable states: A new framework for neural computation based on perturbations". Neural Computation. 14 (11): 2531–2560. doi:10.1162/089976602760407955. PMID 12433288. S2CID 1045112.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[:3-7] 1 2 Schiller U.D. and Steil J. J. (2005). "Analyzing the weight dynamics of recurrent learning algorithms". Neurocomputing. 63: 5–23. doi:10.1016/j.neucom.2004.04.006.

[8] Dominey P.F. (1995). "Complex sensory-motor sequence learning based on recurrent state representation and reinforcement learning". Biol. Cybernetics. 73 (3): 265–274. doi:10.1007/BF00201428. PMID 7548314. S2CID 1603500.

[9] Buonomano, D.V. and Merzenich, M.M. (1995). "Temporal Information Transformed into a Spatial Code by a Neural Network with Realistic Properties". Science. 267 (5200): 1028–1030. Bibcode:1995Sci...267.1028B. doi:10.1126/science.7863330. PMID 7863330. S2CID 12880807.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[10] Kirby, K. (1991). "Context dynamics in neural sequential learning. Proc". Florida AI Research Symposium: 66–70.

[11] Schomaker, L. (1992). "A neural oscillator-network model of temporal pattern generation". Human Movement Science. 11 (1–2): 181–192. doi:10.1016/0167-9457(92)90059-K.

[12] Schmidhuber J., Gomez F., Wierstra D., and Gagliolo M. (2007). "Training recurrent networks by evolino". Neural Computation. 19 (3): 757–779. CiteSeerX 10.1.1.218.3086 . doi:10.1162/neco.2007.19.3.757. PMID 17298232. S2CID 11745761.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[13] Doan N, Polifke W, Magri L (2020). "Physics-Informed Echo State Networks". Journal of Computational Science. 47: 101237. arXiv: 2011.02280 . doi:10.1016/j.jocs.2020.101237. S2CID 226246385.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[14] Pathak J, Wikner A, Russel R, Chandra S, Hunt B, Girvan M, Ott E (2018). "Hybrid Forecasting of Chaotic Processes: Using Machine Learning in Conjunction with a Knowledge-Based Model". Chaos. 28 (4): 041101. arXiv: 1803.04779 . Bibcode:2018Chaos..28d1101P. doi:10.1063/1.5028373. PMID 31906641. S2CID 3883587.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[15] Anantharaman, Ranjan; Ma, Yingbo; Gowda, Shashi; Laughman, Chris; Shah, Viral; Edelman, Alan; Rackauckas, Chris (2020). "Accelerating Simulation of Stiff Nonlinear Systems using Continuous-Time Echo State Networks". arXiv: 2010.04004 [cs.LG].

[16] Doya K. (1992). "Bifurcations in the learning of recurrent neural networks". [Proceedings] 1992 IEEE International Symposium on Circuits and Systems. Vol. 6. pp. 2777–2780. doi:10.1109/ISCAS.1992.230622. ISBN 0-7803-0593-0. S2CID 15069221.

[17] Jaeger H. (2007). "Discovering multiscale dynamical features with hierarchical echo state networks". Technical Report 10, School of Engineering and Science, Jacobs University.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]