Bayesian approaches to brain function investigate the capacity of the nervous system to operate in situations of uncertainty in a fashion that is close to the optimal prescribed by Bayesian statistics. [1] [2] This term is used in behavioural sciences and neuroscience and studies associated with this term often strive to explain the brain's cognitive abilities based on statistical principles. It is frequently assumed that the nervous system maintains internal probabilistic models that are updated by neural processing of sensory information using methods approximating those of Bayesian probability. [3] [4]
This field of study has its historical roots in numerous disciplines including machine learning, experimental psychology and Bayesian statistics. As early as the 1860s, with the work of Hermann Helmholtz in experimental psychology, the brain's ability to extract perceptual information from sensory data was modeled in terms of probabilistic estimation. [5] [6] The basic idea is that the nervous system needs to organize sensory data into an accurate internal model of the outside world.
Bayesian probability has been developed by many important contributors. Pierre-Simon Laplace, Thomas Bayes, Harold Jeffreys, Richard Cox and Edwin Jaynes developed mathematical techniques and procedures for treating probability as the degree of plausibility that could be assigned to a given supposition or hypothesis based on the available evidence. [7] In 1988 Edwin Jaynes presented a framework for using Bayesian Probability to model mental processes. [8] It was thus realized early on that the Bayesian statistical framework holds the potential to lead to insights into the function of the nervous system.
This idea was taken up in research on unsupervised learning, in particular the Analysis by Synthesis approach, branches of machine learning. [9] [10] In 1983 Geoffrey Hinton and colleagues proposed the brain could be seen as a machine making decisions based on the uncertainties of the outside world. [11] During the 1990s researchers including Peter Dayan, Geoffrey Hinton and Richard Zemel proposed that the brain represents knowledge of the world in terms of probabilities and made specific proposals for tractable neural processes that could manifest such a Helmholtz Machine. [12] [13] [14]
A wide range of studies interpret the results of psychophysical experiments in light of Bayesian perceptual models. Many aspects of human perceptual and motor behavior can be modeled with Bayesian statistics. This approach, with its emphasis on behavioral outcomes as the ultimate expressions of neural information processing, is also known for modeling sensory and motor decisions using Bayesian decision theory. Examples are the work of Landy, [15] [16] Jacobs, [17] [18] Jordan, Knill, [19] [20] Kording and Wolpert, [21] [22] and Goldreich. [23] [24] [25]
Many theoretical studies ask how the nervous system could implement Bayesian algorithms. Examples are the work of Pouget, Zemel, Deneve, Latham, Hinton and Dayan. George and Hawkins published a paper that establishes a model of cortical information processing called hierarchical temporal memory that is based on Bayesian network of Markov chains. They further map this mathematical model to the existing knowledge about the architecture of cortex and show how neurons could recognize patterns by hierarchical Bayesian inference. [26]
A number of recent electrophysiological studies focus on the representation of probabilities in the nervous system. Examples are the work of Shadlen and Schultz.
Predictive coding is a neurobiologically plausible scheme for inferring the causes of sensory input based on minimizing prediction error. [27] These schemes are related formally to Kalman filtering and other Bayesian update schemes.
During the 1990s some researchers such as Geoffrey Hinton and Karl Friston began examining the concept of free energy as a calculably tractable measure of the discrepancy between actual features of the world and representations of those features captured by neural network models. [28] A synthesis has been attempted recently [29] by Karl Friston, in which the Bayesian brain emerges from a general principle of free energy minimisation. [30] In this framework, both action and perception are seen as a consequence of suppressing free-energy, leading to perceptual [31] and active inference [32] and a more embodied (enactive) view of the Bayesian brain. Using variational Bayesian methods, it can be shown how internal models of the world are updated by sensory information to minimize free energy or the discrepancy between sensory input and predictions of that input. This can be cast (in neurobiologically plausible terms) as predictive coding or, more generally, Bayesian filtering.
According to Friston: [33]
"The free-energy considered here represents a bound on the surprise inherent in any exchange with the environment, under expectations encoded by its state or configuration. A system can minimise free energy by changing its configuration to change the way it samples the environment, or to change its expectations. These changes correspond to action and perception, respectively, and lead to an adaptive exchange with the environment that is characteristic of biological systems. This treatment implies that the system’s state and structure encode an implicit and probabilistic model of the environment." [33]
This area of research was summarized in terms understandable by the layperson in a 2008 article in New Scientist that offered a unifying theory of brain function. [34] Friston makes the following claims about the explanatory power of the theory:
"This model of brain function can explain a wide range of anatomical and physiological aspects of brain systems; for example, the hierarchical deployment of cortical areas, recurrent architectures using forward and backward connections and functional asymmetries in these connections. In terms of synaptic physiology, it predicts associative plasticity and, for dynamic models, spike-timing-dependent plasticity. In terms of electrophysiology it accounts for classical and extra-classical receptive field effects and long-latency or endogenous components of evoked cortical responses. It predicts the attenuation of responses encoding prediction error with perceptual learning and explains many phenomena like repetition suppression, mismatch negativity and the P300 in electroencephalography. In psychophysical terms, it accounts for the behavioural correlates of these physiological phenomena, e.g., priming, and global precedence." [33]
"It is fairly easy to show that both perceptual inference and learning rest on a minimisation of free energy or suppression of prediction error." [33]
Unsupervised learning is a framework in machine learning where, in contrast to supervised learning, algorithms learn patterns exclusively from unlabeled data. Other frameworks in the spectrum of supervisions include weak- or semi-supervision, where a small portion of the data is tagged, and self-supervision. Some researchers consider self-supervised learning a form of unsupervised learning.
Computational neuroscience is a branch of neuroscience which employs mathematics, computer science, theoretical analysis and abstractions of the brain to understand the principles that govern the development, structure, physiology and cognitive abilities of the nervous system.
The memory-prediction framework is a theory of brain function created by Jeff Hawkins and described in his 2004 book On Intelligence. This theory concerns the role of the mammalian neocortex and its associations with the hippocampi and the thalamus in matching sensory inputs to stored memory patterns and how this process leads to predictions of what will happen in the future.
The Helmholtz machine is a type of artificial neural network that can account for the hidden structure of a set of data by being trained to create a generative model of the original set of data. The hope is that by learning economical representations of the data, the underlying structure of the generative model should reasonably approximate the hidden structure of the data set. A Helmholtz machine contains two networks, a bottom-up recognition network that takes the data as input and produces a distribution over hidden variables, and a top-down "generative" network that generates values of the hidden variables and the data itself. At the time, Helmholtz machines were one of a handful of learning architectures that used feedback as well as feedforward to ensure quality of learned models.
Stephen Grossberg is a cognitive scientist, theoretical and computational psychologist, neuroscientist, mathematician, biomedical engineer, and neuromorphic technologist. He is the Wang Professor of Cognitive and Neural Systems and a Professor Emeritus of Mathematics & Statistics, Psychological & Brain Sciences, and Biomedical Engineering at Boston University.
The kappa effect or perceptual time dilation is a temporal perceptual illusion that can arise when observers judge the elapsed time between sensory stimuli applied sequentially at different locations. In perceiving a sequence of consecutive stimuli, subjects tend to overestimate the elapsed time between two successive stimuli when the distance between the stimuli is sufficiently large, and to underestimate the elapsed time when the distance is sufficiently small.
Peter Dayan is a British neuroscientist and computer scientist who is director at the Max Planck Institute for Biological Cybernetics in Tübingen, Germany, along with Ivan De Araujo. He is co-author of Theoretical Neuroscience, an influential textbook on computational neuroscience. He is known for applying Bayesian methods from machine learning and artificial intelligence to understand neural function and is particularly recognized for relating neurotransmitter levels to prediction errors and Bayesian uncertainties. He has pioneered the field of reinforcement learning (RL) where he helped develop the Q-learning algorithm, and made contributions to unsupervised learning, including the wake-sleep algorithm for neural networks and the Helmholtz machine.
The cutaneous rabbit illusion is a tactile illusion evoked by tapping two or more separate regions of the skin in rapid succession. The illusion is most readily evoked on regions of the body surface that have relatively poor spatial acuity, such as the forearm. A rapid sequence of taps delivered first near the wrist and then near the elbow creates the sensation of sequential taps hopping up the arm from the wrist towards the elbow, although no physical stimulus was applied between the two actual stimulus locations. Similarly, stimuli delivered first near the elbow then near the wrist evoke the illusory perception of taps hopping from elbow towards wrist. The illusion was discovered by Frank Geldard and Carl Sherrick of Princeton University, in the early 1970s, and further characterized by Geldard (1982) and in many subsequent studies. Geldard and Sherrick likened the perception to that of a rabbit hopping along the skin, giving the phenomenon its name. While the rabbit illusion has been most extensively studied in the tactile domain, analogous sensory saltation illusions have been observed in audition and vision. The word "saltation" refers to the leaping or jumping nature of the percept.
Hierarchical temporal memory (HTM) is a biologically constrained machine intelligence technology developed by Numenta. Originally described in the 2004 book On Intelligence by Jeff Hawkins with Sandra Blakeslee, HTM is primarily used today for anomaly detection in streaming data. The technology is based on neuroscience and the physiology and interaction of pyramidal neurons in the neocortex of the mammalian brain.
Neurorobotics is the combined study of neuroscience, robotics, and artificial intelligence. It is the science and technology of embodied autonomous neural systems. Neural systems include brain-inspired algorithms, computational models of biological neural networks and actual biological systems. Such neural systems can be embodied in machines with mechanic or any other forms of physical actuation. This includes robots, prosthetic or wearable systems but also, at smaller scale, micro-machines and, at the larger scales, furniture and infrastructures.
In physiology, an efference copy or efferent copy is an internal copy of an outflowing (efferent), movement-producing signal generated by an organism's motor system. It can be collated with the (reafferent) sensory input that results from the agent's movement, enabling a comparison of actual movement with desired movement, and a shielding of perception from particular self-induced effects on the sensory input to achieve perceptual stability. Together with internal models, efference copies can serve to enable the brain to predict the effects of an action.
Common coding theory is a cognitive psychology theory describing how perceptual representations and motor representations are linked. The theory claims that there is a shared representation for both perception and action. More important, seeing an event activates the action associated with that event, and performing an action activates the associated perceptual event.
The Troland Research Awards are an annual prize given by the United States National Academy of Sciences to two researchers in recognition of psychological research on the relationship between consciousness and the physical world. The areas where these award funds are to be spent include but are not limited to areas of experimental psychology, the topics of sensation, perception, motivation, emotion, learning, memory, cognition, language, and action. The award preference is given to experimental work with a quantitative approach or experimental research seeking physiological explanations.
A Bayesian Confidence Propagation Neural Network (BCPNN) is an artificial neural network inspired by Bayes' theorem, which regards neural computation and processing as probabilistic inference. Neural unit activations represent probability ("confidence") in the presence of input features or categories, synaptic weights are based on estimated correlations and the spread of activation corresponds to calculating posterior probabilities. It was originally proposed by Anders Lansner and Örjan Ekeberg at KTH Royal Institute of Technology. This probabilistic neural network model can also be run in generative mode to produce spontaneous activations and temporal sequences.
The wake-sleep algorithm is an unsupervised learning algorithm for deep generative models, especially Helmholtz Machines. The algorithm is similar to the expectation-maximization algorithm, and optimizes the model likelihood for observed data. The name of the algorithm derives from its use of two learning phases, the “wake” phase and the “sleep” phase, which are performed alternately. It can be conceived as a model for learning in the brain, but is also being applied for machine learning.
The free energy principle is a theoretical framework suggesting that the brain reduces surprise or uncertainty by making predictions based on internal models and updating them using sensory input. It highlights the brain's objective of aligning its internal model with the external world to enhance prediction accuracy. This principle integrates Bayesian inference with active inference, where actions are guided by predictions and sensory feedback refines them. It has wide-ranging implications for comprehending brain function, perception, and action.
Radford M. Neal is a professor emeritus at the Department of Statistics and Department of Computer Science at the University of Toronto, where he holds a research chair in statistics and machine learning.
In neuroscience, predictive coding is a theory of brain function which postulates that the brain is constantly generating and updating a "mental model" of the environment. According to the theory, such a mental model is used to predict input signals from the senses that are then compared with the actual input signals from those senses. Predictive coding is member of a wider set of theories that follow the Bayesian brain hypothesis.
Dynamic causal modeling (DCM) is a framework for specifying models, fitting them to data and comparing their evidence using Bayesian model comparison. It uses nonlinear state-space models in continuous time, specified using stochastic or ordinary differential equations. DCM was initially developed for testing hypotheses about neural dynamics. In this setting, differential equations describe the interaction of neural populations, which directly or indirectly give rise to functional neuroimaging data e.g., functional magnetic resonance imaging (fMRI), magnetoencephalography (MEG) or electroencephalography (EEG). Parameters in these models quantify the directed influences or effective connectivity among neuronal populations, which are estimated from the data using Bayesian statistical methods.