Manifold hypothesis

Last updated

The manifold hypothesis posits that many high-dimensional data sets that occur in the real world actually lie along low-dimensional latent manifolds inside that high-dimensional space. [1] [2] [3] [4] As a consequence of the manifold hypothesis, many data sets that appear to initially require many variables to describe, can actually be described by a comparatively small number of variables, likened to the local coordinate system of the underlying manifold. It is suggested that this principle underpins the effectiveness of machine learning algorithms in describing high-dimensional data sets by considering a few common features.

Contents

The manifold hypothesis is related to the effectiveness of nonlinear dimensionality reduction techniques in machine learning. Many techniques of dimensional reduction make the assumption that data lies along a low-dimensional submanifold, such as manifold sculpting, manifold alignment, and manifold regularization.

The major implications of this hypothesis is that

The ability to interpolate between samples is the key to generalization in deep learning. [5]

The information geometry of statistical manifolds

An empirically-motivated approach to the manifold hypothesis focuses on its correspondence with an effective theory for manifold learning under the assumption that robust machine learning requires encoding the dataset of interest using methods for data compression. This perspective gradually emerged using the tools of information geometry thanks to the coordinated effort of scientists working on the efficient coding hypothesis, predictive coding and variational Bayesian methods.

The argument for reasoning about the information geometry on the latent space of distributions rests upon the existence and uniqueness of the Fisher information metric. [6] In this general setting, we are trying to find a stochastic embedding of a statistical manifold. From the perspective of dynamical systems, in the big data regime this manifold generally exhibits certain properties such as homeostasis:

  1. We can sample large amounts of data from the underlying generative process.
  2. Machine Learning experiments are reproducible, so the statistics of the generating process exhibit stationarity.

In a sense made precise by theoretical neuroscientists working on the free energy principle, the statistical manifold in question possesses a Markov blanket. [7]

The Tower of Babel Paradox

"Babel", M. C. Escher Babel-escher.jpg
"Babel", M. C. Escher
Unsolved problem in computer science:

How do global codes emerge from local codes, given that a global encoding is necessary for the emergence of synchronization in large neural networks?

A fundamental challenge facing any biologically-plausible Manifold Learning algorithm is Romain Brette's Tower of Babel Paradox [8] for the efficient coding hypothesis:

The efficient coding hypothesis stipulates that neurons encode signals into spike trains in an efficient way, that is, it uses a code such that all redundancy is removed from the original message while preserving information, in the sense that the encoded message can be mapped back to the original message (Barlow, 1961; Simoncelli, 2003). This implies that with a perfectly efficient code, encoded messages are indistinguishable from random. Since the code is determined on the statistics of the inputs and only the encoded messages are communicated, a code is efficient to the extent that it is not understandable by the receiver. This is the paradox of the efficient code.

In the neural coding metaphor, the code is private and specific to each neuron. If we follow this metaphor, this means that all neurons speak a different language, a language that allows expressing concepts very concisely but that no one else can understand. Thus, according to the coding metaphor, the brain is a Tower of Babel.

The predictive and efficient coding theses predict that each neuron has its own efficient code derived via maximum entropy inference. We may think of these local codes as distinct languages. Brette then surmises that since each neuron has a language indistinguishable from random relative to its neighbors, then a global encoding should be impossible.

Related Research Articles

<span class="mw-page-title-main">Neural network (machine learning)</span> Computational model used in machine learning, based on connected, hierarchical functions

In machine learning, a neural network is a model inspired by the structure and function of biological neural networks in animal brains.

A hidden Markov model (HMM) is a Markov model in which the observations are dependent on a latent Markov process. An HMM requires that there be an observable process whose outcomes depend on the outcomes of in a known way. Since cannot be observed directly, the goal is to learn about state of by observing By definition of being a Markov model, an HMM has an additional requirement that the outcome of at time must be "influenced" exclusively by the outcome of at and that the outcomes of and at must be conditionally independent of at given at time Estimation of the parameters in an HMM can be performed using maximum likelihood. For linear chain HMMs, the Baum–Welch algorithm can be used to estimate the parameters.

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit instructions. Recently, artificial neural networks have been able to surpass many previous approaches in performance.

Unsupervised learning is a method in machine learning where, in contrast to supervised learning, algorithms learn patterns exclusively from unlabeled data. Within such an approach, a machine learning model tries to find any similarities, differences, patterns, and structure in data by itself. No prior human intervention is needed.

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Summary of algorithms for nonlinear dimensionality reduction

Nonlinear dimensionality reduction, also known as manifold learning, is any of various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

<span class="mw-page-title-main">Optical neural network</span>

An optical neural network (ONN) is a physical implementation of an artificial neural network with optical components.

Neural coding is a neuroscience field concerned with characterising the hypothetical relationship between the stimulus and the neuronal responses, and the relationship among the electrical activities of the neurons in the ensemble. Based on the theory that sensory and other information is represented in the brain by networks of neurons, it is believed that neurons can encode both digital and analog information.

An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data. An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for dimensionality reduction.

<span class="mw-page-title-main">Spiking neural network</span> Artificial neural network that mimics neurons

Spiking neural networks (SNNs) are artificial neural networks (ANN) that more closely mimic natural neural networks. In addition to neuronal and synaptic state, SNNs incorporate the concept of time into their operating model. The idea is that neurons in the SNN do not transmit information at each propagation cycle, but rather transmit information only when a membrane potential—an intrinsic quality of the neuron related to its membrane electrical charge—reaches a specific value, called the threshold. When the membrane potential reaches the threshold, the neuron fires, and generates a signal that travels to other neurons which, in turn, increase or decrease their potentials in response to this signal. A neuron model that fires at the moment of threshold crossing is also called a spiking neuron model.

Hierarchical temporal memory (HTM) is a biologically constrained machine intelligence technology developed by Numenta. Originally described in the 2004 book On Intelligence by Jeff Hawkins with Sandra Blakeslee, HTM is primarily used today for anomaly detection in streaming data. The technology is based on neuroscience and the physiology and interaction of pyramidal neurons in the neocortex of the mammalian brain.

There are many types of artificial neural networks (ANN).

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is the subset of machine learning methods based on neural networks with representation learning. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

<span class="mw-page-title-main">Wake-sleep algorithm</span> Unsupervised learning algorithm

The wake-sleep algorithm is an unsupervised learning algorithm for deep generative models, especially Helmholtz Machines. The algorithm is similar to the expectation-maximization algorithm, and optimizes the model likelihood for observed data. The name of the algorithm derives from its use of two learning phases, the “wake” phase and the “sleep” phase, which are performed alternately. It can be conceived as a model for learning in the brain, but is also being applied for machine learning.

<span class="mw-page-title-main">Feature learning</span> Set of learning techniques in machine learning

In machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task.

<span class="mw-page-title-main">Quantum machine learning</span> Interdisciplinary research area at the intersection of quantum physics and machine learning

Quantum machine learning is the integration of quantum algorithms within machine learning programs.

Multimodal learning, in the context of machine learning, is a type of deep learning using a combination of various modalities of data, such as text, audio, or images, in order to create a more robust model of the real-world phenomena in question. In contrast, singular modal learning would analyze text or imaging data independently. Multimodal machine learning combines these fundamentally different statistical analyses using specialized modeling strategies and algorithms, resulting in a model that comes closer to representing the real world.

The following outline is provided as an overview of and topical guide to machine learning:

<span class="mw-page-title-main">Variational autoencoder</span> Deep learning generative model to encode data representation

In machine learning, a variational autoencoder (VAE) is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling. It is part of the families of probabilistic graphical models and variational Bayesian methods.

A latent space, also known as a latent feature space or embedding space, is an embedding of a set of items within a manifold in which items resembling each other are positioned closer to one another. Position within the latent space can be viewed as being defined by a set of latent variables that emerge from the resemblances from the objects.

<span class="mw-page-title-main">Ila Fiete</span> American physicist

Ila Fiete is an Indian–American physicist and computational neuroscientist as well as a Professor in the Department of Brain and Cognitive Sciences within the McGovern Institute for Brain Research at the Massachusetts Institute of Technology. Fiete builds theoretical models and analyses neural data and to uncover how neural circuits perform computations and how the brain represents and manipulates information involved in memory and reasoning.

References

  1. Gorban, A. N.; Tyukin, I. Y. (2018). "Blessing of dimensionality: mathematical foundations of the statistical physics of data". Phil. Trans. R. Soc. A. 15 (3): 20170237. Bibcode:2018RSPTA.37670237G. doi:10.1098/rsta.2017.0237. PMC   5869543 . PMID   29555807.
  2. Cayton, L. (2005). Algorithms for manifold learning (PDF) (Technical report). University of California at San Diego. p. 1. 12(1–17).
  3. Fefferman, Charles; Mitter, Sanjoy; Narayanan, Hariharan (2016-02-09). "Testing the manifold hypothesis". Journal of the American Mathematical Society. 29 (4): 983–1049. arXiv: 1310.0425 . doi:10.1090/jams/852. S2CID   50258911.
  4. Olah, Christopher (2014). "Blog: Neural Networks, Manifolds, and Topology".
  5. Chollet, Francois (2021). Deep Learning with Python (2nd ed.). Manning. pp. 128–129. ISBN   9781617296864.
  6. Caticha, Ariel (2015). Geometry from Information Geometry. MaxEnt 2015, the 35th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering. arXiv: 1512.09076 .
  7. Kirchhoff, Michael; Parr, Thomas; Palacios, Ensor; Friston, Karl; Kiverstein, Julian (2018). "The Markov blankets of life: autonomy, active inference and the free energy principle". J. R. Soc. Interface. 15 (138): 20170792. doi:10.1098/rsif.2017.0792. PMC   5805980 . PMID   29343629.
  8. Brette, R. 07 December 2017. The Paradox of the Efficient Code and the Neural Tower of Babel [Blog post]. Retrieved from http://romainbrette.fr/what-is-computational-neuroscience-xxvii-the-paradox-of-the-efficient-code-and-the-neural-tower-of-babel/

Further reading