PVLV

Last updated October 21, 2020

The primary value learned value (PVLV) model is a possible explanation for the reward-predictive firing properties of dopamine (DA) neurons.^[1] It simulates behavioral and neural data on Pavlovian conditioning and the midbrain dopaminergic neurons that fire in proportion to unexpected rewards. It is an alternative to the temporal-differences (TD) algorithm.^[2]

It is used as part of Leabra.

Related Research Articles

Artificial neural networks (ANNs), usually simply called neural networks (NNs), are computing systems vaguely inspired by the biological neural networks that constitute animal brains.

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

Classical conditioning refers to a learning procedure in which a biologically potent stimulus is paired with a previously neutral stimulus. It also refers to the learning process that results from this pairing, through which the neutral stimulus comes to elicit a response that is usually similar to the one elicited by the potent stimulus. It was first studied by Ivan Pavlov in 1897.

Social learning theory is a theory of learning process and social behavior which proposes that new behaviors can be acquired by observing and imitating others. It states that learning is a cognitive process that takes place in a social context and can occur purely through observation or direct instruction, even in the absence of motor reproduction or direct reinforcement. In addition to the observation of behavior, learning also occurs through the observation of rewards and punishments, a process known as vicarious reinforcement. When a particular behavior is rewarded regularly, it will most likely persist; conversely, if a particular behavior is constantly punished, it will most likely desist. The theory expands on traditional behavioral theories, in which behavior is governed solely by reinforcements, by placing emphasis on the important roles of various internal processes in the learning individual.

An artificial neuron is a mathematical function conceived as a model of biological neurons, a neural network. Artificial neurons are elementary units in an artificial neural network. The artificial neuron receives one or more inputs and sums them to produce an output. Usually each input is separately weighted, and the sum is passed through a non-linear function known as an activation function or transfer function. The transfer functions usually have a sigmoid shape, but they may also take the form of other non-linear functions, piecewise linear functions, or step functions. They are also often monotonically increasing, continuous, differentiable and bounded. The thresholding function has inspired building logic gates referred to as threshold logic; applicable to building logic circuits resembling brain processing. For example, new devices such as memristors have been extensively used to develop such logic in recent times.

Pavlovian fear conditioning is a behavioral paradigm in which organisms learn to predict aversive events. It is a form of learning in which an aversive stimulus is associated with a particular neutral context or neutral stimulus, resulting in the expression of fear responses to the originally neutral stimulus or context. This can be done by pairing the neutral stimulus with an aversive stimulus. Eventually, the neutral stimulus alone can elicit the state of fear. In the vocabulary of classical conditioning, the neutral stimulus or context is the "conditional stimulus" (CS), the aversive stimulus is the "unconditional stimulus" (US), and the fear is the "conditional response" (CR).

The nucleus accumbens is a region in the basal forebrain rostral to the preoptic area of the hypothalamus. The nucleus accumbens and the olfactory tubercle collectively form the ventral striatum. The ventral striatum and dorsal striatum collectively form the striatum, which is the main component of the basal ganglia. The dopaminergic neurons of the mesolimbic pathway project onto the GABAergic medium spiny neurons of the nucleus accumbens and olfactory tubercle. Each cerebral hemisphere has its own nucleus accumbens, which can be divided into two structures: the nucleus accumbens core and the nucleus accumbens shell. These substructures have different morphology and functions.

Motivational salience is a cognitive process and a form of attention that motivates or propels an individual's behavior towards or away from a particular object, perceived event or outcome. Motivational salience regulates the intensity of behaviors that facilitate the attainment of a particular goal, the amount of time and energy that an individual is willing to expend to attain a particular goal, and the amount of risk that an individual is willing to accept while working to attain a particular goal.

Temporal difference (TD) learning refers to a class of model-free reinforcement learning methods which learn by bootstrapping from the current estimate of the value function. These methods sample from the environment, like Monte Carlo methods, and perform updates based on current estimates, like dynamic programming methods.

A feedforward neural network is an artificial neural network wherein connections between the nodes do not form a cycle. As such, it is different from its descendant: recurrent neural networks.

A neural network is a network or circuit of neurons, or in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus a neural network is either a biological neural network, made up of real biological neurons, or an artificial neural network, for solving artificial intelligence (AI) problems. The connections of the biological neuron are modeled as weights. A positive weight reflects an excitatory connection, while negative values mean inhibitory connections. All inputs are modified by a weight and summed. This activity is referred to as a linear combination. Finally, an activation function controls the amplitude of the output. For example, an acceptable range of output is usually between 0 and 1, or it could be −1 and 1.

The habenular nuclei acts as a regulator of key central nervous system neurotransmitters, connecting the forebrain and midbrain within the epithalamus. Although it was predominantly studied for its demonstration of asymmetrical brain development and function, in recent years many scientists have begun to examine the habenular nuclei's role in motivation and behavior as it relates to an understanding of the physiology of addiction.

An avoidance response is a natural adaptive behavior performed in response to danger. Excessive avoidance has been suggested to contribute to anxiety disorders, leading psychologists and neuroscientists study how avoidance behaviors are learned using rat or mouse models. Avoidance learning is a type of operant conditioning.

Neural coding is a neuroscience field concerned with characterising the hypothetical relationship between the stimulus and the individual or ensemble neuronal responses and the relationship among the electrical activity of the neurons in the ensemble. Based on the theory that sensory and other information is represented in the brain by networks of neurons, it is thought that neurons can encode both digital and analog information.

Leabra stands for local, error-driven and associative, biologically realistic algorithm. It is a model of learning which is a balance between Hebbian and error-driven learning with other network-derived characteristics. This model is used to mathematically predict outcomes based on inputs and previous learning influences. This model is heavily influenced by and contributes to neural network designs and models. This algorithm is the default algorithm in emergent when making a new project, and is extensively used in various simulations.

The reward system is a group of neural structures responsible for incentive salience, associative learning, and positively-valenced emotions, particularly ones which involve pleasure as a core component. Reward is the attractive and motivational property of a stimulus that induces appetitive behavior, also known as approach behavior, and consummatory behavior. In its description of a rewarding stimulus, a review on reward neuroscience noted, "any stimulus, object, event, activity, or situation that has the potential to make us approach and consume it is by definition a reward". In operant conditioning, rewarding stimuli function as positive reinforcers; however, the converse statement also holds true: positive reinforcers are rewarding.

Neurorobotics, a combined study of neuroscience, robotics, and artificial intelligence, is the science and technology of embodied autonomous neural systems. Neural systems include brain-inspired algorithms, computational models of biological neural networks and actual biological systems. Such neural systems can be embodied in machines with mechanic or any other forms of physical actuation. This includes robots, prosthetic or wearable systems but also, at smaller scale, micro-machines and, at the larger scales, furniture and infrastructures.

The basolateral amygdala (BLA), or basolateral complex, consists of the lateral, basal and accessory-basal nuclei of the amygdala. The lateral nuclei receives the majority of sensory information, which arrives directly from the temporal lobe structures, including the hippocampus and primary auditory cortex. The information is then processed by the basolateral complex and is sent as output to the central nucleus of the amygdala. This is how most emotional arousal is formed in mammals.

Prefrontal cortex basal ganglia working memory (PBWM) is an algorithm that models working memory in the prefrontal cortex and the basal ganglia. It can be compared to long short-term memory (LSTM) in functionality, but is more biologically explainable.

An artificial neural network's learning rule or learning process is a method, mathematical logic or algorithm which improves the network's performance and/or training time. Usually, this rule is applied repeatedly over the network. It is done by updating the weights and bias levels of a network when a network is simulated in a specific data environment. A learning rule may accept existing conditions of the network and will compare the expected result and actual result of the network to give new and improved values for weights and bias. Depending on the complexity of actual model being simulated, the learning rule of the network can be as simple as an XOR gate or mean squared error, or as complex as the result of a system of differential equations.

References

↑ O'Reilly, R.C.; Frank, M.J.; Hazy, T.E. & Watz, B. (2007). "PVLV: The Primary Value and Learned Value Pavlovian Learning Algorithm". Behavioral Neuroscience. 121 (1): 31–4. CiteSeerX 10.1.1.67.6739 . doi:10.1037/0735-7044.121.1.31. PMID 17324049.
↑ "Leabra PBWM". CCNLab.

This neuroscience article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] O'Reilly, R.C.; Frank, M.J.; Hazy, T.E. & Watz, B. (2007). "PVLV: The Primary Value and Learned Value Pavlovian Learning Algorithm". Behavioral Neuroscience. 121 (1): 31–4. CiteSeerX 10.1.1.67.6739 . doi:10.1037/0735-7044.121.1.31. PMID 17324049.

[2] "Leabra PBWM". CCNLab.