Metalearning is a neuroscientific term proposed by Kenji Doya, [1] as a theory for how neurotransmitters facilitate distributed learning mechanisms in the Basal Ganglia. The theory primarily involves the role of neurotransmitters in dynamically adjusting the way computational learning algorithms [2] interact to produce the kinds of robust learning behaviour currently unique to biological life forms. [3] 'Metalearning' has previously been applied to the fields of Social Psychology and Computer Science but in this context exists an entirely new concept.
The theory of Metalearning builds off earlier work by Doya into the learning algorithms of Supervised learning, Reinforcement learning and Unsupervised learning in the Cerebellum, Basal Ganglia and Cerebral Cortex respectively. [2] The theory emerged from efforts to unify the dynamic selection process for these three learning algorithms to a regulatory mechanism reducible to individual neurotransmitters.
Dopamine is proposed to act as a "global learning" signal, critical to prediction of rewards and action reinforcement. In this way, dopamine is involved in a learning algorithm in which Actor, Environment and Critic are bound in a dynamic interplay that ultimately seeks to maximise the sum of future rewards by producing an optimal action selection policy. In this context, Critic and Actor are characterised as independent network edges that also form a single Complex Agent. This Agent collectively influences the information state of the Environment, which is fed back to the Agent for future computations. Through a separate pathway, Environment is also fed back to Critic in the form of the reward gained through the given action, meaning an equilibrium can be reached between the predicted reward of given policy for a given state, and the evolving prospect of future rewards.
Serotonin is proposed to control the balance between short and long term reward prediction, essentially by variably "discounting" expected future reward sums that may require too much expenditure to achieve. In this way, serotonin may facilitate the expectation of reward at a quasi-emotional level, and thus either encourage or discourage persistence in reward-seeking behaviour depending on the demand of the task, and the duration of persistence required. As global reward prediction would theoretically result from Serotonin modulated computations reaching a steady state with the computations similarly modulated by Dopamine; high serotonergic signalling may override the computations of Dopamine and produce a divergent paradigm of reward not mathematically viable through the dopamine modulated computations alone.
Norepinephrine is proposed to facilitate "wide exploration" by stochastic action selection. The choice between focusing on known, effective strategies or selecting new, experimental ones is known in probability theory as the Exploration-Exploitation Problem. [4] An interplay between situational urgency, and the effectiveness of known strategies thus influences the dilemma between reliable selection for the largest predicted reward, and exploratory selection outside known parameters. Since neuronal firing cascades (such as those required to perfectly swing a golf club) are by definition unstable and prone to variation; Norepinephrine thus selects for the most reliable known execution pattern at higher levels, and allows for more random and unreliable selection at low levels with the purpose of potentially discovering more efficient strategies in the process.
Acetylcholine is proposed to facilitate the balance between memory storage and memory renewal, [5] finding an optimal balance between stability and effectiveness of learning algorithms for the specific environmental task. Acetylcholine thus modulates plasticity in the Hippocampus, Cerebral Cortex and Striatum to facilitate ideal learning conditions in the brain. High levels of Acetylcholine would thus allow for very rapid learning and remodelling of synaptic connections, with the consequence that existing learning may become undone. Likewise, the learning of states that takes place over an extended temporal resolution may be overridden before it reaches a functional level, and thus learning may occur too quickly to actually be performed efficiently. At lower levels of Norepinephrine, plastic changes are proposed to occur much more slowly, potentially being protective against unhelpful learning conditions or allowing for information changes to embody a much broader temporal resolution.
Central to the idea of Metalearning is that global learning can be modelled as function of efficient selection of these four neuromodulators. While no mechanistic model is put forward for where Metalearning ultimately exists in the hierarchy of agency, the model has thus far demonstrated the dynamics necessary to infer the existence of such an agent in biological learning as a whole. While computational models and information systems are still far away from approaching the complexity of human learning; Metalearning provides a promising path forwards for the future evolution of such systems as they increasingly approach the complexity of the biological world.
The investigation of Metalearning as a neuroscientific concept has potential benefits to both the understanding and treatment of Psychiatric Disease, as well as bridging the gaps between Neural Networks, Computer Science and Machine Learning. [1]
A brain is an organ that serves as the center of the nervous system in all vertebrate and most invertebrate animals. It is located in the head, usually close to the sensory organs for senses such as vision. It is the most complex organ in a vertebrate's body. In a human, the cerebral cortex contains approximately 14–16 billion neurons, and the estimated number of neurons in the cerebellum is 55–70 billion. Each neuron is connected by synapses to several thousand other neurons. These neurons typically communicate with one another by means of long fibers called axons, which carry trains of signal pulses called action potentials to distant parts of the brain or body targeting specific recipient cells.
A neurotransmitter is a signaling molecule secreted by a neuron to affect another cell across a synapse. The cell receiving the signal, any main body part or target cell, may be another neuron, but could also be a gland or muscle cell.
The putamen is a round structure located at the base of the forebrain (telencephalon). The putamen and caudate nucleus together form the dorsal striatum. It is also one of the structures that compose the basal nuclei. Through various pathways, the putamen is connected to the substantia nigra, the globus pallidus, the claustrum, and the thalamus, in addition to many regions of the cerebral cortex. A primary function of the putamen is to regulate movements at various stages and influence various types of learning. It employs GABA, acetylcholine, and enkephalin to perform its functions. The putamen also plays a role in degenerative neurological disorders, such as Parkinson's disease.
Dopamine is a neuromodulatory molecule that plays several important roles in cells. It is an organic chemical of the catecholamine and phenethylamine families. Dopamine constitutes about 80% of the catecholamine content in the brain. It is an amine synthesized by removing a carboxyl group from a molecule of its precursor chemical, L-DOPA, which is synthesized in the brain and kidneys. Dopamine is also synthesized in plants and most animals. In the brain, dopamine functions as a neurotransmitter—a chemical released by neurons to send signals to other nerve cells. Neurotransmitters are synthesized in specific regions of the brain, but affect many regions systemically. The brain includes several distinct dopamine pathways, one of which plays a major role in the motivational component of reward-motivated behavior. The anticipation of most types of rewards increases the level of dopamine in the brain, and many addictive drugs increase dopamine release or block its reuptake into neurons following release. Other brain dopamine pathways are involved in motor control and in controlling the release of various hormones. These pathways and cell groups form a dopamine system which is neuromodulatory.
The basal ganglia (BG), or basal nuclei, are a group of subcortical nuclei, of varied origin, in the brains of vertebrates. In humans, and some primates, there are some differences, mainly in the division of the globus pallidus into an external and internal region, and in the division of the striatum. The basal ganglia are situated at the base of the forebrain and top of the midbrain. Basal ganglia are strongly interconnected with the cerebral cortex, thalamus, and brainstem, as well as several other brain areas. The basal ganglia are associated with a variety of functions, including control of voluntary motor movements, procedural learning, habit learning, conditional learning, eye movements, cognition, and emotion.
The mesolimbic pathway, sometimes referred to as the reward pathway, is a dopaminergic pathway in the brain. The pathway connects the ventral tegmental area in the midbrain to the ventral striatum of the basal ganglia in the forebrain. The ventral striatum includes the nucleus accumbens and the olfactory tubercle.
Neurochemistry is the study of chemicals, including neurotransmitters and other molecules such as psychopharmaceuticals and neuropeptides, that control and influence the physiology of the nervous system. This particular field within neuroscience examines how neurochemicals influence the operation of neurons, synapses, and neural networks. Neurochemists analyze the biochemistry and molecular biology of organic compounds in the nervous system, and their roles in such neural processes including cortical plasticity, neurogenesis, and neural differentiation.
A neurochemical is a small organic molecule or peptide that participates in neural activity. The science of neurochemistry studies the functions of neurochemicals.
The nucleus accumbens is a region in the basal forebrain rostral to the preoptic area of the hypothalamus. The nucleus accumbens and the olfactory tubercle collectively form the ventral striatum. The ventral striatum and dorsal striatum collectively form the striatum, which is the main component of the basal ganglia. The dopaminergic neurons of the mesolimbic pathway project onto the GABAergic medium spiny neurons of the nucleus accumbens and olfactory tubercle. Each cerebral hemisphere has its own nucleus accumbens, which can be divided into two structures: the nucleus accumbens core and the nucleus accumbens shell. These substructures have different morphology and functions.
Dopaminergic pathways in the human brain are involved in both physiological and behavioral processes including movement, cognition, executive functions, reward, motivation, and neuroendocrine control. Each pathway is a set of projection neurons, consisting of individual dopaminergic neurons.
The mesocortical pathway is a dopaminergic pathway that connects the ventral tegmentum to the prefrontal cortex. It is one of the four major dopamine pathways in the brain. It is essential to the normal cognitive function of the dorsolateral prefrontal cortex, and is thought to be involved in cognitive control, motivation, and emotional response.
Neuropharmacology is the study of how drugs affect cellular function in the nervous system, and the neural mechanisms through which they influence behavior. There are two main branches of neuropharmacology: behavioral and molecular. Behavioral neuropharmacology focuses on the study of how drugs affect human behavior (neuropsychopharmacology), including the study of how drug dependence and addiction affect the human brain. Molecular neuropharmacology involves the study of neurons and their neurochemical interactions, with the overall goal of developing drugs that have beneficial effects on neurological function. Both of these fields are closely connected, since both are concerned with the interactions of neurotransmitters, neuropeptides, neurohormones, neuromodulators, enzymes, second messengers, co-transporters, ion channels, and receptor proteins in the central and peripheral nervous systems. Studying these interactions, researchers are developing drugs to treat many different neurological disorders, including pain, neurodegenerative diseases such as Parkinson's disease and Alzheimer's disease, psychological disorders, addiction, and many others.
Neuromodulation is the physiological process by which a given neuron uses one or more chemicals to regulate diverse populations of neurons. Neuromodulators typically bind to metabotropic, G-protein coupled receptors (GPCRs) to initiate a second messenger signaling cascade that induces a broad, long-lasting signal. This modulation can last for hundreds of milliseconds to several minutes. Some of the effects of neuromodulators include: alter intrinsic firing activity, increase or decrease voltage-dependent currents, alter synaptic efficacy, increase bursting activity and reconfiguration of synaptic connectivity.
The reward system is a group of neural structures responsible for incentive salience, associative learning, and positively-valenced emotions, particularly ones involving pleasure as a core component. Reward is the attractive and motivational property of a stimulus that induces appetitive behavior, also known as approach behavior, and consummatory behavior. A rewarding stimulus has been described as "any stimulus, object, event, activity, or situation that has the potential to make us approach and consume it is by definition a reward". In operant conditioning, rewarding stimuli function as positive reinforcers; however, the converse statement also holds true: positive reinforcers are rewarding.
Neurotransmitter transporters are a class of membrane transport proteins that span the cellular membranes of neurons. Their primary function is to carry neurotransmitters across these membranes and to direct their further transport to specific intracellular locations. There are more than twenty types of neurotransmitter transporters.
Synaptic gating is the ability of neural circuits to gate inputs by either suppressing or facilitating specific synaptic activity. Selective inhibition of certain synapses has been studied thoroughly, and recent studies have supported the existence of permissively gated synaptic transmission. In general, synaptic gating involves a mechanism of central control over neuronal output. It includes a sort of gatekeeper neuron, which has the ability to influence transmission of information to selected targets independently of the parts of the synapse upon which it exerts its action.
Neurorobotics is the combined study of neuroscience, robotics, and artificial intelligence. It is the science and technology of embodied autonomous neural systems. Neural systems include brain-inspired algorithms, computational models of biological neural networks and actual biological systems. Such neural systems can be embodied in machines with mechanic or any other forms of physical actuation. This includes robots, prosthetic or wearable systems but also, at smaller scale, micro-machines and, at the larger scales, furniture and infrastructures.
A reuptake inhibitor (RI) is a type of drug known as a reuptake modulator that inhibits the plasmalemmal transporter-mediated reuptake of a neurotransmitter from the synapse into the pre-synaptic neuron. This leads to an increase in extracellular concentrations of the neurotransmitter and an increase in neurotransmission. Various drugs exert their psychological and physiological effects through reuptake inhibition, including many antidepressants and psychostimulants.
Christiane Linster is a Luxembourg-born behavioral neuroscientist and a professor in the Department of Neurobiology and Behavior at Cornell University. Her work focuses on neuromodulation along with learning and memory, using the olfactory system of rodents as a model. Her lab integrates behavioral, electrophysiological, and computational work. Linster was the founding President of the Organization for Computational Neurosciences (OCNS), which was created to coordinate and lead the annual meeting of aspiring and senior computational neuroscientists. Linster served as president of the OCNS from 2003 until 2005 when she was replaced by her successor Ranu Jung.
The cortico-basal ganglia-thalamo-cortical loop is a system of neural circuits in the brain. The loop involves connections between the cortex, the basal ganglia, the thalamus, and back to the cortex. It is of particular relevance to hyperkinetic and hypokinetic movement disorders, such as Parkinson's disease and Huntington's disease, as well as to mental disorders of control, such as attention deficit hyperactivity disorder (ADHD), obsessive–compulsive disorder (OCD), and Tourette syndrome.