Wake-sleep algorithm

Last updated December 04, 2023

The wake-sleep algorithm^[1] is an unsupervised learning algorithm for deep generative models, especially Helmholtz Machines.^[2] The algorithm is similar to the expectation-maximization algorithm,^[3] and optimizes the model likelihood for observed data.^[4] The name of the algorithm derives from its use of two learning phases, the “wake” phase and the “sleep” phase, which are performed alternately.^[1] It can be conceived as a model for learning in the brain,^[5] but is also being applied for machine learning.^[6]

Description

The goal of the wake-sleep algorithm is to find a hierarchical representation of observed data.^[7] In a graphical representation of the algorithm, data is applied to the algorithm at the bottom, while higher layers form gradually more abstract representations. Between each pair of layers are two sets of weights: Recognition weights, which define how representations are inferred from data, and generative weights, which define how these representations relate to data.^[8]

Training

Training consists of two phases – the “wake” phase and the “sleep” phase. It has been proven that this learning algorithm is convergent.^[3]

The "wake" phase

Neurons are fired by recognition connections (from what would be input to what would be output). Generative connections (leading from outputs to inputs) are then modified to increase probability that they would recreate the correct activity in the layer below – closer to actual data from sensory input.^[1]

The "sleep" phase

The process is reversed in the “sleep” phase – neurons are fired by generative connections while recognition connections are being modified to increase probability that they would recreate the correct activity in the layer above – further to actual data from sensory input.^[1]

Extensions

Since the recognition network is limited in its flexibility, it might not be able to approximate the posterior distribution of latent variables well.^[6] To better approximate the posterior distribution, it is possible to employ importance sampling, with the recognition network as the proposal distribution. This improved approximation of the posterior distribution also improves the overall performance of the model.^[6]

Related Research Articles

Artificial neural networks are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains.

Pattern recognition is the automated recognition of patterns and regularities in data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent pattern. PR has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power.

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can effectively generalize and thus perform tasks without explicit instructions. Recently, generative artificial neural networks have been able to surpass many previous approaches in performance. Machine learning approaches have been applied to large language models, computer vision, speech recognition, email filtering, agriculture and medicine, where it is too costly to develop algorithms to perform the needed tasks.

Unsupervised learning is a paradigm in machine learning where, in contrast to supervised learning and semi-supervised learning, algorithms learn patterns exclusively from unlabeled data.

<span class="mw-page-title-main">Boltzmann machine</span> Type of stochastic recurrent neural network

A Boltzmann machine is a stochastic spin-glass model with an external field, i.e., a Sherrington–Kirkpatrick model, that is a stochastic Ising model. It is a statistical physics technique applied in the context of cognitive science. It is also classified as a Markov random field.

The Helmholtz machine is a type of artificial neural network that can account for the hidden structure of a set of data by being trained to create a generative model of the original set of data. The hope is that by learning economical representations of the data, the underlying structure of the generative model should reasonably approximate the hidden structure of the data set. A Helmholtz machine contains two networks, a bottom-up recognition network that takes the data as input and produces a distribution over hidden variables, and a top-down "generative" network that generates values of the hidden variables and the data itself. At the time, Helmholtz machines were one of a handful of learning architectures that used feedback as well as feedforward to ensure quality of learned models.

A neural network is a neural circuit of biological neurons, sometimes also called a biological neural network, or a network of artificial neurons or nodes in the case of an artificial neural network.

Peter Dayan is a British neuroscientist and computer scientist who is director at the Max Planck Institute for Biological Cybernetics in Tübingen, Germany, along with Ivan De Araujo. He is co-author of Theoretical Neuroscience, an influential textbook on computational neuroscience. He is known for applying Bayesian methods from machine learning and artificial intelligence to understand neural function and is particularly recognized for relating neurotransmitter levels to prediction errors and Bayesian uncertainties. He has pioneered the field of reinforcement learning (RL) where he helped develop the Q-learning algorithm, and made contributions to unsupervised learning, including the wake-sleep algorithm for neural networks and the Helmholtz machine.

An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data. An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for dimensionality reduction.

Hierarchical temporal memory (HTM) is a biologically constrained machine intelligence technology developed by Numenta. Originally described in the 2004 book On Intelligence by Jeff Hawkins with Sandra Blakeslee, HTM is primarily used today for anomaly detection in streaming data. The technology is based on neuroscience and the physiology and interaction of pyramidal neurons in the neocortex of the mammalian brain.

Bayesian approaches to brain function investigate the capacity of the nervous system to operate in situations of uncertainty in a fashion that is close to the optimal prescribed by Bayesian statistics. This term is used in behavioural sciences and neuroscience and studies associated with this term often strive to explain the brain's cognitive abilities based on statistical principles. It is frequently assumed that the nervous system maintains internal probabilistic models that are updated by neural processing of sensory information using methods approximating those of Bayesian probability.

There are many types of artificial neural networks (ANN).

Deep learning is the subset of machine learning methods which are based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs.

In machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task.

Catastrophic interference, also known as catastrophic forgetting, is the tendency of an artificial neural network to abruptly and drastically forget previously learned information upon learning new information. Neural networks are an important part of the network approach and connectionist approach to cognitive science. With these networks, human capabilities such as memory and learning can be modeled using computer simulations.

In machine learning, a deep belief network (DBN) is a generative graphical model, or alternatively a class of deep neural network, composed of multiple layers of latent variables, with connections between the layers but not between units within each layer.

Radford M. Neal is a professor emeritus at the Department of Statistics and Department of Computer Science at the University of Toronto, where he holds a research chair in statistics and machine learning.

An energy-based model (EBM) is a form of generative model (GM) imported directly from statistical physics to learning. GMs learn an underlying data distribution by analyzing a sample dataset. Once trained, a GM can produce other datasets that also match the data distribution. EBMs provide a unified framework for many probabilistic and non-probabilistic approaches to such learning, particularly for training graphical and other structured models.

References

1 2 3 4 Hinton, Geoffrey E.; Dayan, Peter; Frey, Brendan J.; Neal, Radford (1995-05-26). "The wake-sleep algorithm for unsupervised neural networks". Science. 268 (5214): 1158–1161. Bibcode:1995Sci...268.1158H. doi:10.1126/science.7761831. PMID 7761831. S2CID 871473.
↑ Dayan, Peter. "Helmholtz Machines and Wake-Sleep Learning" (PDF). Retrieved 2015-11-01.
1 2 Ikeda, Shiro; Amari, Shun-ichi; Nakahara, Hiroyuki (1998). "Convergence of the Wake-Sleep Algorithm". Advances in Neural Information Processing Systems. MIT Press. 11.
↑ Frey, Brendan J.; Hinton, Geoffrey E.; Dayan, Peter (1996-05-01). "Does the wake-sleep algorithm produce good density estimators?" (PDF). Advances in Neural Information Processing Systems.
↑ Katayama, Katsuki; Ando, Masataka; Horiguchi, Tsuyoshi (2004-04-01). "Models of MT and MST areas using wake–sleep algorithm". Neural Networks. 17 (3): 339–351. doi:10.1016/j.neunet.2003.07.004. PMID 15037352.
1 2 3 Bornschein, Jörg; Bengio, Yoshua (2014-06-10). "Reweighted Wake-Sleep". arXiv: 1406.2751 [cs.LG].
↑ Maei, Hamid Reza (2007-01-25). "Wake-sleep algorithm for representational learning". University of Montreal. Retrieved 2011-11-01.
↑ Neal, Radford M.; Dayan, Peter (1996-11-24). "Factor Analysis Using Delta Rules Wake-Sleep Learning" (PDF). University of Toronto. Retrieved 2015-11-01.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:1-1] 1 2 3 4 Hinton, Geoffrey E.; Dayan, Peter; Frey, Brendan J.; Neal, Radford (1995-05-26). "The wake-sleep algorithm for unsupervised neural networks". Science. 268 (5214): 1158–1161. Bibcode:1995Sci...268.1158H. doi:10.1126/science.7761831. PMID 7761831. S2CID 871473.

[2] Dayan, Peter. "Helmholtz Machines and Wake-Sleep Learning" (PDF). Retrieved 2015-11-01.

[:2-3] 1 2 Ikeda, Shiro; Amari, Shun-ichi; Nakahara, Hiroyuki (1998). "Convergence of the Wake-Sleep Algorithm". Advances in Neural Information Processing Systems. MIT Press. 11.

[4] Frey, Brendan J.; Hinton, Geoffrey E.; Dayan, Peter (1996-05-01). "Does the wake-sleep algorithm produce good density estimators?" (PDF). Advances in Neural Information Processing Systems.

[5] Katayama, Katsuki; Ando, Masataka; Horiguchi, Tsuyoshi (2004-04-01). "Models of MT and MST areas using wake–sleep algorithm". Neural Networks. 17 (3): 339–351. doi:10.1016/j.neunet.2003.07.004. PMID 15037352.

[:0-6] 1 2 3 Bornschein, Jörg; Bengio, Yoshua (2014-06-10). "Reweighted Wake-Sleep". arXiv: 1406.2751 [cs.LG].

[7] Maei, Hamid Reza (2007-01-25). "Wake-sleep algorithm for representational learning". University of Montreal. Retrieved 2011-11-01.

[8] Neal, Radford M.; Dayan, Peter (1996-11-24). "Factor Analysis Using Delta Rules Wake-Sleep Learning" (PDF). University of Toronto. Retrieved 2015-11-01.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]