The efficient coding hypothesis was proposed by Horace Barlow in 1961 as a theoretical model of sensory coding in the brain. [1] Within the brain, neurons communicate with one another by sending electrical impulses referred to as action potentials or spikes. One goal of sensory neuroscience is to decipher the meaning of these spikes in order to understand how the brain represents and processes information about the outside world.
Barlow hypothesized that the spikes in the sensory system formed a neural code for efficiently representing sensory information. By efficient it is understood that the code minimized the number of spikes needed to transmit a given signal. This is somewhat analogous to transmitting information across the internet, where different file formats can be used to transmit a given image. Different file formats require different numbers of bits for representing the same image at a given distortion level, and some are better suited for representing certain classes of images than others. According to this model, the brain is thought to use a code which is suited for representing visual and audio information which is representative of an organism's natural environment .
The development of Barlow's hypothesis was influenced by information theory introduced by Claude Shannon only a decade before. Information theory provides a mathematical framework for analyzing communication systems. It formally defines concepts such as information, channel capacity, and redundancy. Barlow's model treats the sensory pathway as a communication channel where neuronal spiking is an efficient code for representing sensory signals. The spiking code aims to maximize available channel capacity by minimizing the redundancy between representational units. H. Barlow was not the first to introduce the idea. It already appears in a 1954 article written by F. Attneave. [2]
A key prediction of the efficient coding hypothesis is that sensory processing in the brain should be adapted to natural stimuli. Neurons in the visual (or auditory) system should be optimized for coding images (or sounds) representative of those found in nature. Researchers have shown that filters optimized for coding natural images lead to filters which resemble the receptive fields of simple-cells in V1. [3] In the auditory domain, optimizing a network for coding natural sounds leads to filters which resemble the impulse response of cochlear filters found in the inner ear. [4]
Due to constraints on the visual system such as the number of neurons and the metabolic energy required for "neural activities", the visual processing system must have an efficient strategy for transmitting as much information as possible. [5] Information must be compressed as it travels from the retina back to the visual cortex. While the retinal receptors can receive information at 10^9 bit/s, the optic nerve, which is composed of 1 million ganglion cells transmitting at 1 bit/sec, only has a transmission capacity of 10^6 bit/s. [5] Further reduction occurs that limits the overall transmission to 40 bit/s which results in inattentional blindness. [5] Thus, the hypothesis states that neurons should encode information as efficiently as possible in order to maximize neural resources. [6] For example, it has been shown that visual data can be compressed up to 20 fold without noticeable information loss. [5]
Evidence suggests that our visual processing system engages in bottom-up selection. For example, inattentional blindness suggests that there must be data deletion early on in the visual pathway. [5] This bottom-up approach allows us to respond to unexpected and salient events more quickly and is often directed by attentional selection. This also gives our visual system the property of being goal-directed. [5] Many have suggested that the visual system is able to work efficiently by breaking images down into distinct components. [6] Additionally, it has been argued that the visual system takes advantage of redundancies in inputs in order to transmit as much information as possible while using the fewest resources. [5]
Simoncelli and Olshausen outline the three major concepts that are assumed to be involved in the development of systems neuroscience:
One assumption used in testing the Efficient Coding Hypothesis is that neurons must be evolutionarily and developmentally adapted to the natural signals in their environment. [7] The idea is that perceptual systems will be the quickest when responding to "environmental stimuli". The visual system should cut out any redundancies in the sensory input. [8]
Central to Barlow's hypothesis is information theory, which when applied to neuroscience, argues that an efficiently coding neural system "should match the statistics of the signals they represent". [9] Therefore, it is important to be able to determine the statistics of the natural images that are producing these signals. Researchers have looked at various components of natural images including luminance contrast, color, and how images are registered over time. [8] They can analyze the properties of natural scenes via digital cameras, spectrophotometers, and range finders. [10]
Researchers look at how luminance contrasts are spatially distributed in an image: the luminance contrasts are highly correlated the closer they are in measurable distance and less correlated the farther apart the pixels are. [8] Independent component analysis (ICA) is an algorithm system that attempts to "linearly transform given (sensory) inputs into independent outputs (synaptic currents) ". [11] ICA eliminates the redundancy by decorrelating the pixels in a natural image. [8] Thus the individual components that make up the natural image are rendered statistically independent. [8] However, researchers have thought that ICA is limited because it assumes that the neural response is linear, and therefore insufficiently describes the complexity of natural images. They argue that, despite what is assumed under ICA, the components of the natural image have a "higher-order structure" that involves correlations among components. [8] Instead, researchers have now developed temporal independent component analysis (TICA), which better represents the complex correlations that occur between components in a natural image. [8] Additionally, a "hierarchical covariance model" developed by Karklin and Lewicki expands on sparse coding methods and can represent additional components of natural images such as "object location, scale, and texture". [8]
The chromatic spectrum as it comes from natural light, but also as it is reflected off of "natural materials", can be easily characterized with principal components analysis (PCA). [10] Because the cones are absorbing a specific amount of photons from the natural image, researchers can use cone responses as a way of describing the natural image. Researchers have found that the three classes of cone receptors in the retina can accurately code natural images and that color is decorrelated already in the LGN. [8] [10] Time has also been modeled. Natural images transform over time, and we can use these transformations to see how the visual input changes over time. [8]
A padegogical review of efficient coding in visual processing --- efficient spatial coding, color coding, temporal/motion coding, stereo coding, and the combination of them --- is in chapter 3 of the book "Understanding vision: theory, models, and data". [12] It explains how efficient coding is realized when input noise makes redundancy reduction no longer adequate, and how efficient coding methods in different situations are related to each other or different from each other.
If neurons are encoding according to the efficient coding hypothesis then individual neurons must be expressing their full output capacity. [6] Before testing this hypothesis it is necessary to define what is considered to be a neural response. [6] Simoncelli and Olshausen suggest that an efficient neuron needs to be given a maximal response value so that we can measure if a neuron is efficiently meeting the maximum level. [7] Secondly, a population of neurons must not be redundant in transmitting signals and must be statistically independent. [6] If the efficient coding hypothesis is accurate, researchers should observe is that there is sparsity in the neuron responses: that is, only a few neurons at a time should fire for an input. [8]
One approach is to design a model for early sensory processing based on the statistics of a natural image and then compare this predicted model to how real neurons actually respond to the natural image. [6] The second approach is to measure a neural system responding to a natural environment, and analyze the results to see if there are any statistical properties to this response. [6] A third approach is to derive the necessary and sufficient conditions under which an observed neural computation is efficient, and test whether empirical stimulus statistics satisfy them. [13]
1. Predicted model approach
In one study by Doi et al. in 2012, the researchers created a predicted response model of the retinal ganglion cells that would be based on the statistics of the natural images used, while considering noise and biological constraints. [14] They then compared the actual information transmission as observed in real retinal ganglion cells to this optimal model to determine the efficiency. They found that the information transmission in the retinal ganglion cells had an overall efficiency of about 80% and concluded that "the functional connectivity between cones and retinal ganglion cells exhibits unique spatial structure...consistent with coding efficiency. [14]
A study by van Hateren and Ruderman in 1998 used ICA to analyze video-sequences and compared how a computer analyzed the independent components of the image to data for visual processing obtained from a cat in DeAngelis et al. 1993. The researchers described the independent components obtained from a video sequence as the "basic building blocks of a signal", with the independent component filter (ICF) measuring "how strongly each building block is present". [15] They hypothesized that if simple cells are organized to pick out the "underlying structure" of images over time then cells should act like the independent component filters. [15] They found that the ICFs determined by the computer were similar to the "receptive fields" that were observed in actual neurons. [15]
2. Analyzing actual neural system in response to natural images
In a report in Science from 2000, William E. Vinje and Jack Gallant outlined a series of experiments used to test elements of the efficient coding hypothesis, including a theory that the non-classical receptive field (nCRF) decorrelates projections from the primary visual cortex. To test this, they took recordings from the V1 neurons in awake macaques during "free viewing of natural images and conditions" that simulated natural vision conditions. [16] The researchers hypothesized that the V1 uses sparse code, which is minimally redundant and "metabolically more efficient". [16]
They also hypothesized that interactions between the classical receptive field (CRF) and the nCRF produced this pattern of sparse coding during the viewing of these natural scenes. In order to test this, they created eye-scan paths and also extracted patches that ranged in size from 1-4 times the diameter of the CRF. They found that the sparseness of the coding increased with the size of the patch. Larger patches encompassed more of the nCRF—indicating that the interactions between these two regions created sparse code. Additionally as stimulus size increased, so did the sparseness. This suggests that the V1 uses sparse code when natural images span the entire visual field. The CRF was defined as the circular area surrounding the locations where stimuli evoked action potentials. They also tested to see if stimulation of the nCRF increased the independence of the responses from the V1 neurons by randomly selecting pairs of neurons. They found that indeed, the neurons were more greatly decoupled upon stimulation of the nCRF.
In conclusion, the experiments of Vinje and Gallant showed that the V1 uses sparse code by employing both the CRF and nCRF when viewing natural images, with the nCRF showing a definitive decorrelating effect on neurons which may increase their efficiency by increasing the amount of independent information they carry. They propose that the cells may represent the individual components of a given natural scene, which may contribute to pattern recognition [16]
Another study done by Baddeley et al. had shown that firing-rate distributions of cat visual area V1 neurons and monkey inferotemporal (IT) neurons were exponential under naturalistic conditions, which implies optimal information transmission for a fixed average rate of firing. A subsequent study of monkey IT neurons found that only a minority were well described by an exponential firing distribution. De Polavieja later argued that this discrepancy was due to the fact that the exponential solution is correct only for the noise-free case, and showed that by taking noise into consideration, one could account for the observed results. [6]
A study by Dan, Attick, and Reid in 1996 used natural images to test the hypothesis that early on in the visual pathway, incoming visual signals will be decorrelated to optimize efficiency. This decorrelation can be observed as the '"whitening" of the temporal and spatial power spectra of the neuronal signals". [17] The researchers played natural image movies in front of cats and used a multielectrode array to record neural signals. This was achieved by refracting the eyes of the cats and then contact lenses being fitted into them. They found that in the LGN, the natural images were decorrelated and concluded, "the early visual pathway has specifically adapted for efficient coding of natural visual information during evolution and/or development". [17]
One of the implications of the efficient coding hypothesis is that the neural coding depends upon the statistics of the sensory signals. These statistics are a function of not only the environment (e.g., the statistics of the natural environment), but also the organism's behavior (e.g., how it moves within that environment). However, perception and behavior are closely intertwined in the perception-action cycle. For example, the process of vision involves various kinds of eye movements. An extension to the efficient coding hypothesis called active efficient coding (AEC) extends efficient coding to active perception. It hypothesizes that biological agents optimize not only their neural coding, but also their behavior to contribute to an efficient sensory representation of the environment. Along these lines, models for the development of active binocular vision, active visual tracking, and accommodation control have been proposed. [18] [19] [20] [21] [22]
The brain has limited resources to process information, in vision this is manifested as the visual attentional bottleneck. [23] The bottleneck forces the brain to select only a small fraction of visual input information for further processing, as merely coding information efficiently is no longer sufficient. A subsequent theory, V1 Saliency Hypothesis, has been developed on exogenous attentional selection of visual input information for further processing guided by a bottom-up saliency map in the primary visual cortex. [24]
Researchers should consider how the visual information is used: The hypothesis does not explain how the information from a visual scene is used—which is the main purpose of the visual system. It seems necessary to understand why we are processing image statistics from the environment because this may be relevant to how this information is ultimately processed. However, some researchers may see the irrelevance of the purpose of vision in Barlow's theory as an advantage for designing experiments. [6]
Some experiments show correlations between neurons: When considering multiple neurons at a time, recordings "show correlation, synchronization, or other forms of statistical dependency between neurons". [6] However, it is relevant to note that most of these experiments did not use natural stimuli to provoke these responses: this may not fit in directly to the efficient coding hypothesis because this hypothesis is concerned with natural image statistics. [6] In his review article Simoncelli notes that perhaps we can interpret redundancy in the Efficient Coding Hypothesis a bit differently: he argues that statistical dependency could be reduced over "successive stages of processing", and not just in one area of the sensory pathway. [6] Yet, recordings by Hung et al. at the end of the visual pathway also show strong layer-dependent correlations to naturalistic objects and in ongoing activity. [25] They showed that redundancy of neighboring neurons (i.e. a 'manifold' representation) benefits learning of complex shape features and that network anisotropy/inhomogeneity is a stronger predictor than noise redundancy of encoding/decoding efficiency. [26]
Observed redundancy: A comparison of the number of retinal ganglion cells to the number of neurons in the primary visual cortex shows an increase in the number of sensory neurons in the cortex as compared to the retina. Simoncelli notes that one major argument of critics in that higher up in the sensory pathway there are greater numbers of neurons that handle the processing of sensory information so this should seem to produce redundancy. [6] However, this observation may not be fully relevant because neurons have different neural coding. In his review, Simoncelli notes "cortical neurons tend to have lower firing rates and may use a different form of code as compared to retinal neurons". [6] Cortical Neurons may also have the ability to encode information over longer periods of time than their retinal counterparts. Experiments done in the auditory system have confirmed that redundancy is decreased. [6]
Difficult to test: Estimation of information-theoretic quantities requires enormous amounts of data, and is thus impractical for experimental verification. Additionally, informational estimators are known to be biased. However, some experimental success has occurred. [6]
Need well-defined criteria for what to measure: This criticism illustrates one of the most fundamental issues of the hypothesis. Here, assumptions are made about the definitions of both the inputs and the outputs of the system. [6] The inputs into the visual system are not completely defined, but they are assumed to be encompassed in a collection of natural images. The output must be defined to test the hypothesis, but variability can occur here too based on the choice of which type of neurons to measure, where they are located and what type of responses, such as firing rate or spike times are chosen to be measured. [6]
How to take noise into account: Some argue that experiments that ignore noise, or other physical constraints on the system are too simplistic. [6] However, some researchers have been able to incorporate these elements into their analyses, thus creating more sophisticated systems. [6]
However, with appropriate formulations, [27] efficient coding can also address some of these issues raised above. For example, some quantifiable degree of redundancies in neural representations of sensory inputs (manifested as correlations in neural responses) is predicted to occur when efficient coding is applied to noisy sensory inputs. [27] Falsifiable theoretical predictions can also be made, [27] and some of them subsequently tested. [28] [29] [30]
Possible applications of the efficient coding hypothesis include cochlear implant design. These neuroprosthetic devices stimulate the auditory nerve by an electrical impulses which allows some of the hearing to return to people who have hearing impairments or are even deaf. The implants are considered to be successful and efficient and the only ones in use currently. Using frequency-place mappings in the efficient coding algorithm may benefit in the use of cochlear implants in the future. [9] Changes in design based on this hypothesis could increase speech intelligibility in hearing impaired patients. Research using vocoded speech processed by different filters showed that humans had greater accuracy in deciphering the speech when it was processed using an efficient-code filter as opposed to a cochleotropic filter or a linear filter. [9] This shows that efficient coding of noise data offered perceptual benefits and provided the listeners with more information. [9] More research is needed to apply current findings into medically relevant changes to cochlear implant design. [9]
The visual cortex of the brain is the area of the cerebral cortex that processes visual information. It is located in the occipital lobe. Sensory input originating from the eyes travels through the lateral geniculate nucleus in the thalamus and then reaches the visual cortex. The area of the visual cortex that receives the sensory input from the lateral geniculate nucleus is the primary visual cortex, also known as visual area 1 (V1), Brodmann area 17, or the striate cortex. The extrastriate areas consist of visual areas 2, 3, 4, and 5.
Computational neuroscience is a branch of neuroscience which employs mathematics, computer science, theoretical analysis and abstractions of the brain to understand the principles that govern the development, structure, physiology and cognitive abilities of the nervous system.
The visual system is the physiological basis of visual perception. The system detects, transduces and interprets information concerning light within the visible range to construct an image and build a mental model of the surrounding environment. The visual system is associated with the eye and functionally divided into the optical system and the neural system.
Horace Basil Barlow FRS was a British vision scientist.
In neuroanatomy, the superior colliculus is a structure lying on the roof of the mammalian midbrain. In non-mammalian vertebrates, the homologous structure is known as the optic tectum or optic lobe. The adjective form tectal is commonly used for both structures.
Multisensory integration, also known as multimodal integration, is the study of how information from the different sensory modalities may be integrated by the nervous system. A coherent representation of objects combining modalities enables animals to have meaningful perceptual experiences. Indeed, multisensory integration is central to adaptive behavior because it allows animals to perceive a world of coherent perceptual entities. Multisensory integration also deals with how different sensory modalities interact with one another and alter each other's processing.
The two-streams hypothesis is a model of the neural processing of vision as well as hearing. The hypothesis, given its initial characterisation in a paper by David Milner and Melvyn A. Goodale in 1992, argues that humans possess two distinct visual systems. Recently there seems to be evidence of two distinct auditory systems as well. As visual information exits the occipital lobe, and as sound leaves the phonological network, it follows two main pathways, or "streams". The ventral stream leads to the temporal lobe, which is involved with object and visual identification and recognition. The dorsal stream leads to the parietal lobe, which is involved with processing the object's spatial location relative to the viewer and with speech repetition.
The normalization model is an influential model of responses of neurons in primary visual cortex. David Heeger developed the model in the early 1990s, and later refined it together with Matteo Carandini and J. Anthony Movshon. The model involves a divisive stage. In the numerator is the output of the classical receptive field. In the denominator, a constant plus a measure of local stimulus contrast. Although the normalization model was initially developed to explain responses in the primary visual cortex, normalization is now thought to operate throughout the visual system, and in many other sensory modalities and brain regions, including the representation of odors in the olfactory bulb, the modulatory effects of visual attention, the encoding of value, and the integration of multisensory information. It has also been observed at subthreshold potentials in the hippocampus. Its presence in such a diversity of neural systems in multiple species, from invertebrates to mammals, suggests that normalization serves as a canonical neural computation. Divisive normalization reduces the redundancy in natural stimulus statistics and is sometimes viewed as an implementation of the efficient coding principle. Formally, divisive normalization is an information-maximizing code for stimuli following a multivariate Pareto distribution.
In vision, filling-in phenomena are those responsible for the completion of missing information across the physiological blind spot, and across natural and artificial scotomata. There is also evidence for similar mechanisms of completion in normal visual analysis. Classical demonstrations of perceptual filling-in involve filling in at the blind spot in monocular vision, and images stabilized on the retina either by means of special lenses, or under certain conditions of steady fixation. For example, naturally in monocular vision at the physiological blind spot, the percept is not a hole in the visual field, but the content is “filled-in” based on information from the surrounding visual field. When a textured stimulus is presented centered on but extending beyond the region of the blind spot, a continuous texture is perceived. This partially inferred percept is paradoxically considered more reliable than a percept based on external input..
Neural coding is a neuroscience field concerned with characterising the hypothetical relationship between the stimulus and the neuronal responses, and the relationship among the electrical activities of the neurons in the ensemble. Based on the theory that sensory and other information is represented in the brain by networks of neurons, it is believed that neurons can encode both digital and analog information.
In neuroanatomy, topographic map is the ordered projection of a sensory surface or an effector system to one or more structures of the central nervous system. Topographic maps can be found in all sensory systems and in many motor systems.
Infomax is an optimization principle for artificial neural networks and other information processing systems. It prescribes that a function that maps a set of input values I to a set of output values O should be chosen or learned so as to maximize the average Shannon mutual information between I and O, subject to a set of specified constraints and/or noise processes. Infomax algorithms are learning algorithms that perform this optimization process. The principle was described by Linsker in 1988.
Scene statistics is a discipline within the field of perception. It is concerned with the statistical regularities related to scenes. It is based on the premise that a perceptual system is designed to interpret scenes.
Feature detection is a process by which the nervous system sorts or filters complex natural stimuli in order to extract behaviorally relevant cues that have a high probability of being associated with important objects or organisms in their environment, as opposed to irrelevant background or noise.
In the human brain, the nucleus basalis, also known as the nucleus basalis of Meynert or nucleus basalis magnocellularis, is a group of neurons located mainly in the substantia innominata of the basal forebrain. Most neurons of the nucleus basalis are rich in the neurotransmitter acetylcholine, and they have widespread projections to the neocortex and other brain structures.
Neural decoding is a neuroscience field concerned with the hypothetical reconstruction of sensory and other stimuli from information that has already been encoded and represented in the brain by networks of neurons. Reconstruction refers to the ability of the researcher to predict what sensory stimuli the subject is receiving based purely on neuron action potentials. Therefore, the main goal of neural decoding is to characterize how the electrical activity of neurons elicit activity and responses in the brain.
Surround suppression is where the relative firing rate of a neuron may under certain conditions decrease when a particular stimulus is enlarged. It has been observed in electrophysiology studies of the brain and has been noted in many sensory neurons, most notably in the early visual system. Surround suppression is defined as a reduction in the activity of a neuron in response to a stimulus outside its classical receptive field.
In neuroscience, predictive coding is a theory of brain function which postulates that the brain is constantly generating and updating a "mental model" of the environment. According to the theory, such a mental model is used to predict input signals from the senses that are then compared with the actual input signals from those senses. Predictive coding is member of a wider set of theories that follow the Bayesian brain hypothesis.
Laura Busse is a German neuroscientist and professor of Systemic Neuroscience within the Division of Neurobiology at the Ludwig Maximilian University of Munich. Busse's lab studies context-dependent visual processing in mouse models by performing large scale in vivo electrophysiological recordings in the thalamic and cortical circuits of awake and behaving mice.
The V1 Saliency Hypothesis, or V1SH is a theory about V1, the primary visual cortex (V1). It proposes that the V1 in primates creates a saliency map of the visual field to guide visual attention or gaze shifts exogenously.