Dopamine reward prediction error coding.
Schultz, Wolfram
2016-03-01
Reward prediction errors consist of the differences between received and predicted rewards. They are crucial for basic forms of learning about rewards and make us strive for more rewards-an evolutionary beneficial trait. Most dopamine neurons in the midbrain of humans, monkeys, and rodents signal a reward prediction error; they are activated by more reward than predicted (positive prediction error), remain at baseline activity for fully predicted rewards, and show depressed activity with less reward than predicted (negative prediction error). The dopamine signal increases nonlinearly with reward value and codes formal economic utility. Drugs of addiction generate, hijack, and amplify the dopamine reward signal and induce exaggerated, uncontrolled dopamine effects on neuronal plasticity. The striatum, amygdala, and frontal cortex also show reward prediction error coding, but only in subpopulations of neurons. Thus, the important concept of reward prediction errors is implemented in neuronal hardware.
Dopamine reward prediction error coding
Schultz, Wolfram
2016-01-01
Reward prediction errors consist of the differences between received and predicted rewards. They are crucial for basic forms of learning about rewards and make us strive for more rewards—an evolutionary beneficial trait. Most dopamine neurons in the midbrain of humans, monkeys, and rodents signal a reward prediction error; they are activated by more reward than predicted (positive prediction error), remain at baseline activity for fully predicted rewards, and show depressed activity with less reward than predicted (negative prediction error). The dopamine signal increases nonlinearly with reward value and codes formal economic utility. Drugs of addiction generate, hijack, and amplify the dopamine reward signal and induce exaggerated, uncontrolled dopamine effects on neuronal plasticity. The striatum, amygdala, and frontal cortex also show reward prediction error coding, but only in subpopulations of neurons. Thus, the important concept of reward prediction errors is implemented in neuronal hardware. PMID:27069377
Dopamine prediction error responses integrate subjective value from different reward dimensions
Lak, Armin; Stauffer, William R.; Schultz, Wolfram
2014-01-01
Prediction error signals enable us to learn through experience. These experiences include economic choices between different rewards that vary along multiple dimensions. Therefore, an ideal way to reinforce economic choice is to encode a prediction error that reflects the subjective value integrated across these reward dimensions. Previous studies demonstrated that dopamine prediction error responses reflect the value of singular reward attributes that include magnitude, probability, and delay. Obviously, preferences between rewards that vary along one dimension are completely determined by the manipulated variable. However, it is unknown whether dopamine prediction error responses reflect the subjective value integrated from different reward dimensions. Here, we measured the preferences between rewards that varied along multiple dimensions, and as such could not be ranked according to objective metrics. Monkeys chose between rewards that differed in amount, risk, and type. Because their choices were complete and transitive, the monkeys chose “as if” they integrated different rewards and attributes into a common scale of value. The prediction error responses of single dopamine neurons reflected the integrated subjective value inferred from the choices, rather than the singular reward attributes. Specifically, amount, risk, and reward type modulated dopamine responses exactly to the extent that they influenced economic choices, even when rewards were vastly different, such as liquid and food. This prediction error response could provide a direct updating signal for economic values. PMID:24453218
Dissociable effects of surprising rewards on learning and memory.
Rouhani, Nina; Norman, Kenneth A; Niv, Yael
2018-03-19
Reward-prediction errors track the extent to which rewards deviate from expectations, and aid in learning. How do such errors in prediction interact with memory for the rewarding episode? Existing findings point to both cooperative and competitive interactions between learning and memory mechanisms. Here, we investigated whether learning about rewards in a high-risk context, with frequent, large prediction errors, would give rise to higher fidelity memory traces for rewarding events than learning in a low-risk context. Experiment 1 showed that recognition was better for items associated with larger absolute prediction errors during reward learning. Larger prediction errors also led to higher rates of learning about rewards. Interestingly we did not find a relationship between learning rate for reward and recognition-memory accuracy for items, suggesting that these two effects of prediction errors were caused by separate underlying mechanisms. In Experiment 2, we replicated these results with a longer task that posed stronger memory demands and allowed for more learning. We also showed improved source and sequence memory for items within the high-risk context. In Experiment 3, we controlled for the difficulty of reward learning in the risk environments, again replicating the previous results. Moreover, this control revealed that the high-risk context enhanced item-recognition memory beyond the effect of prediction errors. In summary, our results show that prediction errors boost both episodic item memory and incremental reward learning, but the two effects are likely mediated by distinct underlying systems. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Takahashi, Yuji K.; Langdon, Angela J.; Niv, Yael; Schoenbaum, Geoffrey
2016-01-01
Summary Dopamine neurons signal reward prediction errors. This requires accurate reward predictions. It has been suggested that the ventral striatum provides these predictions. Here we tested this hypothesis by recording from putative dopamine neurons in the VTA of rats performing a task in which prediction errors were induced by shifting reward timing or number. In controls, the neurons exhibited error signals in response to both manipulations. However, dopamine neurons in rats with ipsilateral ventral striatal lesions exhibited errors only to changes in number and failed to respond to changes in timing of reward. These results, supported by computational modeling, indicate that predictions about the temporal specificity and the number of expected rewards are dissociable, and that dopaminergic prediction-error signals rely on the ventral striatum for the former but not the latter. PMID:27292535
Reward positivity: Reward prediction error or salience prediction error?
Heydari, Sepideh; Holroyd, Clay B
2016-08-01
The reward positivity is a component of the human ERP elicited by feedback stimuli in trial-and-error learning and guessing tasks. A prominent theory holds that the reward positivity reflects a reward prediction error signal that is sensitive to outcome valence, being larger for unexpected positive events relative to unexpected negative events (Holroyd & Coles, 2002). Although the theory has found substantial empirical support, most of these studies have utilized either monetary or performance feedback to test the hypothesis. However, in apparent contradiction to the theory, a recent study found that unexpected physical punishments also elicit the reward positivity (Talmi, Atkinson, & El-Deredy, 2013). The authors of this report argued that the reward positivity reflects a salience prediction error rather than a reward prediction error. To investigate this finding further, in the present study participants navigated a virtual T maze and received feedback on each trial under two conditions. In a reward condition, the feedback indicated that they would either receive a monetary reward or not and in a punishment condition the feedback indicated that they would receive a small shock or not. We found that the feedback stimuli elicited a typical reward positivity in the reward condition and an apparently delayed reward positivity in the punishment condition. Importantly, this signal was more positive to the stimuli that predicted the omission of a possible punishment relative to stimuli that predicted a forthcoming punishment, which is inconsistent with the salience hypothesis. © 2016 Society for Psychophysiological Research.
Neural dynamics of reward probability coding: a Magnetoencephalographic study in humans
Thomas, Julie; Vanni-Mercier, Giovanna; Dreher, Jean-Claude
2013-01-01
Prediction of future rewards and discrepancy between actual and expected outcomes (prediction error) are crucial signals for adaptive behavior. In humans, a number of fMRI studies demonstrated that reward probability modulates these two signals in a large brain network. Yet, the spatio-temporal dynamics underlying the neural coding of reward probability remains unknown. Here, using magnetoencephalography, we investigated the neural dynamics of prediction and reward prediction error computations while subjects learned to associate cues of slot machines with monetary rewards with different probabilities. We showed that event-related magnetic fields (ERFs) arising from the visual cortex coded the expected reward value 155 ms after the cue, demonstrating that reward value signals emerge early in the visual stream. Moreover, a prediction error was reflected in ERF peaking 300 ms after the rewarded outcome and showing decreasing amplitude with higher reward probability. This prediction error signal was generated in a network including the anterior and posterior cingulate cortex. These findings pinpoint the spatio-temporal characteristics underlying reward probability coding. Together, our results provide insights into the neural dynamics underlying the ability to learn probabilistic stimuli-reward contingencies. PMID:24302894
DeGuzman, Marisa; Shott, Megan E; Yang, Tony T; Riederer, Justin; Frank, Guido K W
2017-06-01
Anorexia nervosa is a psychiatric disorder of unknown etiology. Understanding associations between behavior and neurobiology is important in treatment development. Using a novel monetary reward task during functional magnetic resonance brain imaging, the authors tested how brain reward learning in adolescent anorexia nervosa changes with weight restoration. Female adolescents with anorexia nervosa (N=21; mean age, 16.4 years [SD=1.9]) underwent functional MRI (fMRI) before and after treatment; similarly, healthy female control adolescents (N=21; mean age, 15.2 years [SD=2.4]) underwent fMRI on two occasions. Brain function was tested using the reward prediction error construct, a computational model for reward receipt and omission related to motivation and neural dopamine responsiveness. Compared with the control group, the anorexia nervosa group exhibited greater brain response 1) for prediction error regression within the caudate, ventral caudate/nucleus accumbens, and anterior and posterior insula, 2) to unexpected reward receipt in the anterior and posterior insula, and 3) to unexpected reward omission in the caudate body. Prediction error and unexpected reward omission response tended to normalize with treatment, while unexpected reward receipt response remained significantly elevated. Greater caudate prediction error response when underweight was associated with lower weight gain during treatment. Punishment sensitivity correlated positively with ventral caudate prediction error response. Reward system responsiveness is elevated in adolescent anorexia nervosa when underweight and after weight restoration. Heightened prediction error activity in brain reward regions may represent a phenotype of adolescent anorexia nervosa that does not respond well to treatment. Prediction error response could be a neurobiological marker of illness severity that can indicate individual treatment needs.
DeGuzman, Marisa; Shott, Megan E.; Yang, Tony T.; Riederer, Justin; Frank, Guido K.W.
2017-01-01
Objective Anorexia nervosa is a psychiatric disorder of unknown etiology. Understanding associations between behavior and neurobiology is important in treatment development. Using a novel monetary reward task during functional magnetic resonance brain imaging, the authors tested how brain reward learning in adolescent anorexia nervosa changes with weight restoration. Method Female adolescents with anorexia nervosa (N=21; mean age, 15.2 years [SD=2.4]) underwent functional MRI (fMRI) before and after treatment; similarly, healthy female control adolescents (N=21; mean age, 16.4 years [SD=1.9]) underwent fMRI on two occasions. Brain function was tested using the reward prediction error construct, a computational model for reward receipt and omission related to motivation and neural dopamine responsiveness. Results Compared with the control group, the anorexia nervosa group exhibited greater brain response 1) for prediction error regression within the caudate, ventral caudate/nucleus accumbens, and anterior and posterior insula, 2) to unexpected reward receipt in the anterior and posterior insula, and 3) to unexpected reward omission in the caudate body. Prediction error and unexpected reward omission response tended to normalize with treatment, while unexpected reward receipt response remained significantly elevated. Greater caudate prediction error response when underweight was associated with lower weight gain during treatment. Punishment sensitivity correlated positively with ventral caudate prediction error response. Conclusions Reward system responsiveness is elevated in adolescent anorexia nervosa when underweight and after weight restoration. Heightened prediction error activity in brain reward regions may represent a phenotype of adolescent anorexia nervosa that does not respond well to treatment. Prediction error response could be a neurobiological marker of illness severity that can indicate individual treatment needs. PMID:28231717
The Dopamine Prediction Error: Contributions to Associative Models of Reward Learning
Nasser, Helen M.; Calu, Donna J.; Schoenbaum, Geoffrey; Sharpe, Melissa J.
2017-01-01
Phasic activity of midbrain dopamine neurons is currently thought to encapsulate the prediction-error signal described in Sutton and Barto’s (1981) model-free reinforcement learning algorithm. This phasic signal is thought to contain information about the quantitative value of reward, which transfers to the reward-predictive cue after learning. This is argued to endow the reward-predictive cue with the value inherent in the reward, motivating behavior toward cues signaling the presence of reward. Yet theoretical and empirical research has implicated prediction-error signaling in learning that extends far beyond a transfer of quantitative value to a reward-predictive cue. Here, we review the research which demonstrates the complexity of how dopaminergic prediction errors facilitate learning. After briefly discussing the literature demonstrating that phasic dopaminergic signals can act in the manner described by Sutton and Barto (1981), we consider how these signals may also influence attentional processing across multiple attentional systems in distinct brain circuits. Then, we discuss how prediction errors encode and promote the development of context-specific associations between cues and rewards. Finally, we consider recent evidence that shows dopaminergic activity contains information about causal relationships between cues and rewards that reflect information garnered from rich associative models of the world that can be adapted in the absence of direct experience. In discussing this research we hope to support the expansion of how dopaminergic prediction errors are thought to contribute to the learning process beyond the traditional concept of transferring quantitative value. PMID:28275359
Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C
2014-03-01
Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.
Dopamine reward prediction-error signalling: a two-component response
Schultz, Wolfram
2017-01-01
Environmental stimuli and objects, including rewards, are often processed sequentially in the brain. Recent work suggests that the phasic dopamine reward prediction-error response follows a similar sequential pattern. An initial brief, unselective and highly sensitive increase in activity unspecifically detects a wide range of environmental stimuli, then quickly evolves into the main response component, which reflects subjective reward value and utility. This temporal evolution allows the dopamine reward prediction-error signal to optimally combine speed and accuracy. PMID:26865020
Competition between learned reward and error outcome predictions in anterior cingulate cortex.
Alexander, William H; Brown, Joshua W
2010-02-15
The anterior cingulate cortex (ACC) is implicated in performance monitoring and cognitive control. Non-human primate studies of ACC show prominent reward signals, but these are elusive in human studies, which instead show mainly conflict and error effects. Here we demonstrate distinct appetitive and aversive activity in human ACC. The error likelihood hypothesis suggests that ACC activity increases in proportion to the likelihood of an error, and ACC is also sensitive to the consequence magnitude of the predicted error. Previous work further showed that error likelihood effects reach a ceiling as the potential consequences of an error increase, possibly due to reductions in the average reward. We explored this issue by independently manipulating reward magnitude of task responses and error likelihood while controlling for potential error consequences in an Incentive Change Signal Task. The fMRI results ruled out a modulatory effect of expected reward on error likelihood effects in favor of a competition effect between expected reward and error likelihood. Dynamic causal modeling showed that error likelihood and expected reward signals are intrinsic to the ACC rather than received from elsewhere. These findings agree with interpretations of ACC activity as signaling both perceptions of risk and predicted reward. Copyright 2009 Elsevier Inc. All rights reserved.
Lateral habenula neurons signal errors in the prediction of reward information
Bromberg-Martin, Ethan S.; Hikosaka, Okihide
2011-01-01
Humans and animals have a remarkable ability to predict future events, which they achieve by persistently searching their environment for sources of predictive information. Yet little is known about the neural systems that motivate this behavior. We hypothesized that information-seeking is assigned value by the same circuits that support reward-seeking, so that neural signals encoding conventional “reward prediction errors” include analogous “information prediction errors”. To test this we recorded from neurons in the lateral habenula, a nucleus which encodes reward prediction errors, while monkeys chose between cues that provided different amounts of information about upcoming rewards. We found that a subpopulation of lateral habenula neurons transmitted signals resembling information prediction errors, responding when reward information was unexpectedly cued, delivered, or denied. Their signals evaluated information sources reliably even when the animal’s decisions did not. These neurons could provide a common instructive signal for reward-seeking and information-seeking behavior. PMID:21857659
Dopamine neurons share common response function for reward prediction error
Eshel, Neir; Tian, Ju; Bukwich, Michael; Uchida, Naoshige
2016-01-01
Dopamine neurons are thought to signal reward prediction error, or the difference between actual and predicted reward. How dopamine neurons jointly encode this information, however, remains unclear. One possibility is that different neurons specialize in different aspects of prediction error; another is that each neuron calculates prediction error in the same way. We recorded from optogenetically-identified dopamine neurons in the lateral ventral tegmental area (VTA) while mice performed classical conditioning tasks. Our tasks allowed us to determine the full prediction error functions of dopamine neurons and compare them to each other. We found striking homogeneity among individual dopamine neurons: their responses to both unexpected and expected rewards followed the same function, just scaled up or down. As a result, we could describe both individual and population responses using just two parameters. Such uniformity ensures robust information coding, allowing each dopamine neuron to contribute fully to the prediction error signal. PMID:26854803
Disambiguating ventral striatum fMRI-related bold signal during reward prediction in schizophrenia
Morris, R W; Vercammen, A; Lenroot, R; Moore, L; Langton, J M; Short, B; Kulkarni, J; Curtis, J; O'Donnell, M; Weickert, C S; Weickert, T W
2012-01-01
Reward detection, surprise detection and prediction-error signaling have all been proposed as roles for the ventral striatum (vStr). Previous neuroimaging studies of striatal function in schizophrenia have found attenuated neural responses to reward-related prediction errors; however, as prediction errors represent a discrepancy in mesolimbic neural activity between expected and actual events, it is critical to examine responses to both expected and unexpected rewards (URs) in conjunction with expected and UR omissions in order to clarify the nature of ventral striatal dysfunction in schizophrenia. In the present study, healthy adults and people with schizophrenia were tested with a reward-related prediction-error task during functional magnetic resonance imaging to determine whether schizophrenia is associated with altered neural responses in the vStr to rewards, surprise prediction errors or all three factors. In healthy adults, we found neural responses in the vStr were correlated more specifically with prediction errors than to surprising events or reward stimuli alone. People with schizophrenia did not display the normal differential activation between expected and URs, which was partially due to exaggerated ventral striatal responses to expected rewards (right vStr) but also included blunted responses to unexpected outcomes (left vStr). This finding shows that neural responses, which typically are elicited by surprise, can also occur to well-predicted events in schizophrenia and identifies aberrant activity in the vStr as a key node of dysfunction in the neural circuitry used to differentiate expected and unexpected feedback in schizophrenia. PMID:21709684
Aberg, Kristoffer C; Müller, Julia; Schwartz, Sophie
2017-01-01
Anticipation and delivery of rewards improves memory formation, but little effort has been made to disentangle their respective contributions to memory enhancement. Moreover, it has been suggested that the effects of reward on memory are mediated by dopaminergic influences on hippocampal plasticity. Yet, evidence linking memory improvements to actual reward computations reflected in the activity of the dopaminergic system, i.e., prediction errors and expected values, is scarce and inconclusive. For example, different previous studies reported that the magnitude of prediction errors during a reinforcement learning task was a positive, negative, or non-significant predictor of successfully encoding simultaneously presented images. Individual sensitivities to reward and punishment have been found to influence the activation of the dopaminergic reward system and could therefore help explain these seemingly discrepant results. Here, we used a novel associative memory task combined with computational modeling and showed independent effects of reward-delivery and reward-anticipation on memory. Strikingly, the computational approach revealed positive influences from both reward delivery, as mediated by prediction error magnitude, and reward anticipation, as mediated by magnitude of expected value, even in the absence of behavioral effects when analyzed using standard methods, i.e., by collapsing memory performance across trials within conditions. We additionally measured trait estimates of reward and punishment sensitivity and found that individuals with increased reward (vs. punishment) sensitivity had better memory for associations encoded during positive (vs. negative) prediction errors when tested after 20 min, but a negative trend when tested after 24 h. In conclusion, modeling trial-by-trial fluctuations in the magnitude of reward, as we did here for prediction errors and expected value computations, provides a comprehensive and biologically plausible description of the dynamic interplay between reward, dopamine, and associative memory formation. Our results also underline the importance of considering individual traits when assessing reward-related influences on memory.
Gu, Xiaosi; Kirk, Ulrich; Lohrenz, Terry M; Montague, P Read
2014-08-01
Computational models of reward processing suggest that foregone or fictive outcomes serve as important information sources for learning and augment those generated by experienced rewards (e.g. reward prediction errors). An outstanding question is how these learning signals interact with top-down cognitive influences, such as cognitive reappraisal strategies. Using a sequential investment task and functional magnetic resonance imaging, we show that the reappraisal strategy selectively attenuates the influence of fictive, but not reward prediction error signals on investment behavior; such behavioral effect is accompanied by changes in neural activity and connectivity in the anterior insular cortex, a brain region thought to integrate subjective feelings with high-order cognition. Furthermore, individuals differ in the extent to which their behaviors are driven by fictive errors versus reward prediction errors, and the reappraisal strategy interacts with such individual differences; a finding also accompanied by distinct underlying neural mechanisms. These findings suggest that the variable interaction of cognitive strategies with two important classes of computational learning signals (fictive, reward prediction error) represent one contributing substrate for the variable capacity of individuals to control their behavior based on foregone rewards. These findings also expose important possibilities for understanding the lack of control in addiction based on possibly foregone rewarding outcomes. Copyright © 2013 The Authors. Human Brain Mapping Published by Wiley Periodicals, Inc.
Watanabe, Noriya; Sakagami, Masamichi; Haruno, Masahiko
2013-03-06
Learning does not only depend on rationality, because real-life learning cannot be isolated from emotion or social factors. Therefore, it is intriguing to determine how emotion changes learning, and to identify which neural substrates underlie this interaction. Here, we show that the task-independent presentation of an emotional face before a reward-predicting cue increases the speed of cue-reward association learning in human subjects compared with trials in which a neutral face is presented. This phenomenon was attributable to an increase in the learning rate, which regulates reward prediction errors. Parallel to these behavioral findings, functional magnetic resonance imaging demonstrated that presentation of an emotional face enhanced reward prediction error (RPE) signal in the ventral striatum. In addition, we also found a functional link between this enhanced RPE signal and increased activity in the amygdala following presentation of an emotional face. Thus, this study revealed an acceleration of cue-reward association learning by emotion, and underscored a role of striatum-amygdala interactions in the modulation of the reward prediction errors by emotion.
Bissonette, Gregory B; Roesch, Matthew R
2016-01-01
Many brain areas are activated by the possibility and receipt of reward. Are all of these brain areas reporting the same information about reward? Or are these signals related to other functions that accompany reward-guided learning and decision-making? Through carefully controlled behavioral studies, it has been shown that reward-related activity can represent reward expectations related to future outcomes, errors in those expectations, motivation, and signals related to goal- and habit-driven behaviors. These dissociations have been accomplished by manipulating the predictability of positively and negatively valued events. Here, we review single neuron recordings in behaving animals that have addressed this issue. We describe data showing that several brain areas, including orbitofrontal cortex, anterior cingulate, and basolateral amygdala signal reward prediction. In addition, anterior cingulate, basolateral amygdala, and dopamine neurons also signal errors in reward prediction, but in different ways. For these areas, we will describe how unexpected manipulations of positive and negative value can dissociate signed from unsigned reward prediction errors. All of these signals feed into striatum to modify signals that motivate behavior in ventral striatum and guide responding via associative encoding in dorsolateral striatum.
Roesch, Matthew R.
2017-01-01
Many brain areas are activated by the possibility and receipt of reward. Are all of these brain areas reporting the same information about reward? Or are these signals related to other functions that accompany reward-guided learning and decision-making? Through carefully controlled behavioral studies, it has been shown that reward-related activity can represent reward expectations related to future outcomes, errors in those expectations, motivation, and signals related to goal- and habit-driven behaviors. These dissociations have been accomplished by manipulating the predictability of positively and negatively valued events. Here, we review single neuron recordings in behaving animals that have addressed this issue. We describe data showing that several brain areas, including orbitofrontal cortex, anterior cingulate, and basolateral amygdala signal reward prediction. In addition, anterior cingulate, basolateral amygdala, and dopamine neurons also signal errors in reward prediction, but in different ways. For these areas, we will describe how unexpected manipulations of positive and negative value can dissociate signed from unsigned reward prediction errors. All of these signals feed into striatum to modify signals that motivate behavior in ventral striatum and guide responding via associative encoding in dorsolateral striatum. PMID:26276036
Model-free and model-based reward prediction errors in EEG.
Sambrook, Thomas D; Hardwick, Ben; Wills, Andy J; Goslin, Jeremy
2018-05-24
Learning theorists posit two reinforcement learning systems: model-free and model-based. Model-based learning incorporates knowledge about structure and contingencies in the world to assign candidate actions with an expected value. Model-free learning is ignorant of the world's structure; instead, actions hold a value based on prior reinforcement, with this value updated by expectancy violation in the form of a reward prediction error. Because they use such different learning mechanisms, it has been previously assumed that model-based and model-free learning are computationally dissociated in the brain. However, recent fMRI evidence suggests that the brain may compute reward prediction errors to both model-free and model-based estimates of value, signalling the possibility that these systems interact. Because of its poor temporal resolution, fMRI risks confounding reward prediction errors with other feedback-related neural activity. In the present study, EEG was used to show the presence of both model-based and model-free reward prediction errors and their place in a temporal sequence of events including state prediction errors and action value updates. This demonstration of model-based prediction errors questions a long-held assumption that model-free and model-based learning are dissociated in the brain. Copyright © 2018 Elsevier Inc. All rights reserved.
Greenberg, Tsafrir; Chase, Henry W.; Almeida, Jorge R.; Stiffler, Richelle; Zevallos, Carlos R.; Aslam, Haris A.; Deckersbach, Thilo; Weyandt, Sarah; Cooper, Crystal; Toups, Marisa; Carmody, Thomas; Kurian, Benji; Peltier, Scott; Adams, Phillip; McInnis, Melvin G.; Oquendo, Maria A.; McGrath, Patrick J.; Fava, Maurizio; Weissman, Myrna; Parsey, Ramin; Trivedi, Madhukar H.; Phillips, Mary L.
2016-01-01
Objective Anhedonia, disrupted reward processing, is a core symptom of major depressive disorder. Recent findings demonstrate altered reward-related ventral striatal reactivity in depressed individuals, but the extent to which this is specific to anhedonia remains poorly understood. The authors examined the effect of anhedonia on reward expectancy (expected outcome value) and prediction error-(discrepancy between expected and actual outcome) related ventral striatal reactivity, as well as the relationship between these measures. Method A total of 148 unmedicated individuals with major depressive disorder and 31 healthy comparison individuals recruited for the multisite EMBARC (Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care) study underwent functional MRI during a well-validated reward task. Region of interest and whole-brain data were examined in the first- (N=78) and second- (N=70) recruited cohorts, as well as the total sample, of depressed individuals, and in healthy individuals. Results Healthy, but not depressed, individuals showed a significant inverse relationship between reward expectancy and prediction error-related right ventral striatal reactivity. Across all participants, and in depressed individuals only, greater anhedonia severity was associated with a reduced reward expectancy-prediction error inverse relationship, even after controlling for other symptoms. Conclusions The normal reward expectancy and prediction error-related ventral striatal reactivity inverse relationship concords with conditioning models, predicting a shift in ventral striatal responding from reward outcomes to reward cues. This study shows, for the first time, an absence of this relationship in two cohorts of unmedicated depressed individuals and a moderation of this relationship by anhedonia, suggesting reduced reward-contingency learning with greater anhedonia. These findings help elucidate neural mechanisms of anhedonia, as a step toward identifying potential biosignatures of treatment response. PMID:26183698
Greenberg, Tsafrir; Chase, Henry W; Almeida, Jorge R; Stiffler, Richelle; Zevallos, Carlos R; Aslam, Haris A; Deckersbach, Thilo; Weyandt, Sarah; Cooper, Crystal; Toups, Marisa; Carmody, Thomas; Kurian, Benji; Peltier, Scott; Adams, Phillip; McInnis, Melvin G; Oquendo, Maria A; McGrath, Patrick J; Fava, Maurizio; Weissman, Myrna; Parsey, Ramin; Trivedi, Madhukar H; Phillips, Mary L
2015-09-01
Anhedonia, disrupted reward processing, is a core symptom of major depressive disorder. Recent findings demonstrate altered reward-related ventral striatal reactivity in depressed individuals, but the extent to which this is specific to anhedonia remains poorly understood. The authors examined the effect of anhedonia on reward expectancy (expected outcome value) and prediction error- (discrepancy between expected and actual outcome) related ventral striatal reactivity, as well as the relationship between these measures. A total of 148 unmedicated individuals with major depressive disorder and 31 healthy comparison individuals recruited for the multisite EMBARC (Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care) study underwent functional MRI during a well-validated reward task. Region of interest and whole-brain data were examined in the first- (N=78) and second- (N=70) recruited cohorts, as well as the total sample, of depressed individuals, and in healthy individuals. Healthy, but not depressed, individuals showed a significant inverse relationship between reward expectancy and prediction error-related right ventral striatal reactivity. Across all participants, and in depressed individuals only, greater anhedonia severity was associated with a reduced reward expectancy-prediction error inverse relationship, even after controlling for other symptoms. The normal reward expectancy and prediction error-related ventral striatal reactivity inverse relationship concords with conditioning models, predicting a shift in ventral striatal responding from reward outcomes to reward cues. This study shows, for the first time, an absence of this relationship in two cohorts of unmedicated depressed individuals and a moderation of this relationship by anhedonia, suggesting reduced reward-contingency learning with greater anhedonia. These findings help elucidate neural mechanisms of anhedonia, as a step toward identifying potential biosignatures of treatment response.
García-García, Isabel; Zeighami, Yashar; Dagher, Alain
2017-06-01
Surprises are important sources of learning. Cognitive scientists often refer to surprises as "reward prediction errors," a parameter that captures discrepancies between expectations and actual outcomes. Here, we integrate neurophysiological and functional magnetic resonance imaging (fMRI) results addressing the processing of reward prediction errors and how they might be altered in drug addiction and Parkinson's disease. By increasing phasic dopamine responses, drugs might accentuate prediction error signals, causing increases in fMRI activity in mesolimbic areas in response to drugs. Chronic substance dependence, by contrast, has been linked with compromised dopaminergic function, which might be associated with blunted fMRI responses to pleasant non-drug stimuli in mesocorticolimbic areas. In Parkinson's disease, dopamine replacement therapies seem to induce impairments in learning from negative outcomes. The present review provides a holistic overview of reward prediction errors across different pathologies and might inform future clinical strategies targeting impulsive/compulsive disorders.
Toward isolating the role of dopamine in the acquisition of incentive salience attribution.
Chow, Jonathan J; Nickell, Justin R; Darna, Mahesh; Beckmann, Joshua S
2016-10-01
Stimulus-reward learning has been heavily linked to the reward-prediction error learning hypothesis and dopaminergic function. However, some evidence suggests dopaminergic function may not strictly underlie reward-prediction error learning, but may be specific to incentive salience attribution. Utilizing a Pavlovian conditioned approach procedure consisting of two stimuli that were equally reward-predictive (both undergoing reward-prediction error learning) but functionally distinct in regard to incentive salience (levers that elicited sign-tracking and tones that elicited goal-tracking), we tested the differential role of D1 and D2 dopamine receptors and nucleus accumbens dopamine in the acquisition of sign- and goal-tracking behavior and their associated conditioned reinforcing value within individuals. Overall, the results revealed that both D1 and D2 inhibition disrupted performance of sign- and goal-tracking. However, D1 inhibition specifically prevented the acquisition of sign-tracking to a lever, instead promoting goal-tracking and decreasing its conditioned reinforcing value, while neither D1 nor D2 signaling was required for goal-tracking in response to a tone. Likewise, nucleus accumbens dopaminergic lesions disrupted acquisition of sign-tracking to a lever, while leaving goal-tracking in response to a tone unaffected. Collectively, these results are the first evidence of an intraindividual dissociation of dopaminergic function in incentive salience attribution from reward-prediction error learning, indicating that incentive salience, reward-prediction error, and their associated dopaminergic signaling exist within individuals and are stimulus-specific. Thus, individual differences in incentive salience attribution may be reflective of a differential balance in dopaminergic function that may bias toward the attribution of incentive salience, relative to reward-prediction error learning only. Copyright © 2016 Elsevier Ltd. All rights reserved.
Dopamine Modulates Adaptive Prediction Error Coding in the Human Midbrain and Striatum.
Diederen, Kelly M J; Ziauddeen, Hisham; Vestergaard, Martin D; Spencer, Tom; Schultz, Wolfram; Fletcher, Paul C
2017-02-15
Learning to optimally predict rewards requires agents to account for fluctuations in reward value. Recent work suggests that individuals can efficiently learn about variable rewards through adaptation of the learning rate, and coding of prediction errors relative to reward variability. Such adaptive coding has been linked to midbrain dopamine neurons in nonhuman primates, and evidence in support for a similar role of the dopaminergic system in humans is emerging from fMRI data. Here, we sought to investigate the effect of dopaminergic perturbations on adaptive prediction error coding in humans, using a between-subject, placebo-controlled pharmacological fMRI study with a dopaminergic agonist (bromocriptine) and antagonist (sulpiride). Participants performed a previously validated task in which they predicted the magnitude of upcoming rewards drawn from distributions with varying SDs. After each prediction, participants received a reward, yielding trial-by-trial prediction errors. Under placebo, we replicated previous observations of adaptive coding in the midbrain and ventral striatum. Treatment with sulpiride attenuated adaptive coding in both midbrain and ventral striatum, and was associated with a decrease in performance, whereas bromocriptine did not have a significant impact. Although we observed no differential effect of SD on performance between the groups, computational modeling suggested decreased behavioral adaptation in the sulpiride group. These results suggest that normal dopaminergic function is critical for adaptive prediction error coding, a key property of the brain thought to facilitate efficient learning in variable environments. Crucially, these results also offer potential insights for understanding the impact of disrupted dopamine function in mental illness. SIGNIFICANCE STATEMENT To choose optimally, we have to learn what to expect. Humans dampen learning when there is a great deal of variability in reward outcome, and two brain regions that are modulated by the brain chemical dopamine are sensitive to reward variability. Here, we aimed to directly relate dopamine to learning about variable rewards, and the neural encoding of associated teaching signals. We perturbed dopamine in healthy individuals using dopaminergic medication and asked them to predict variable rewards while we made brain scans. Dopamine perturbations impaired learning and the neural encoding of reward variability, thus establishing a direct link between dopamine and adaptation to reward variability. These results aid our understanding of clinical conditions associated with dopaminergic dysfunction, such as psychosis. Copyright © 2017 Diederen et al.
Dopamine Reward Prediction Error Responses Reflect Marginal Utility
Stauffer, William R.; Lak, Armin; Schultz, Wolfram
2014-01-01
Summary Background Optimal choices require an accurate neuronal representation of economic value. In economics, utility functions are mathematical representations of subjective value that can be constructed from choices under risk. Utility usually exhibits a nonlinear relationship to physical reward value that corresponds to risk attitudes and reflects the increasing or decreasing marginal utility obtained with each additional unit of reward. Accordingly, neuronal reward responses coding utility should robustly reflect this nonlinearity. Results In two monkeys, we measured utility as a function of physical reward value from meaningful choices under risk (that adhered to first- and second-order stochastic dominance). The resulting nonlinear utility functions predicted the certainty equivalents for new gambles, indicating that the functions’ shapes were meaningful. The monkeys were risk seeking (convex utility function) for low reward and risk avoiding (concave utility function) with higher amounts. Critically, the dopamine prediction error responses at the time of reward itself reflected the nonlinear utility functions measured at the time of choices. In particular, the reward response magnitude depended on the first derivative of the utility function and thus reflected the marginal utility. Furthermore, dopamine responses recorded outside of the task reflected the marginal utility of unpredicted reward. Accordingly, these responses were sufficient to train reinforcement learning models to predict the behaviorally defined expected utility of gambles. Conclusions These data suggest a neuronal manifestation of marginal utility in dopamine neurons and indicate a common neuronal basis for fundamental explanatory constructs in animal learning theory (prediction error) and economic decision theory (marginal utility). PMID:25283778
Decodability of Reward Learning Signals Predicts Mood Fluctuations.
Eldar, Eran; Roth, Charlotte; Dayan, Peter; Dolan, Raymond J
2018-05-07
Our mood often fluctuates without warning. Recent accounts propose that these fluctuations might be preceded by changes in how we process reward. According to this view, the degree to which reward improves our mood reflects not only characteristics of the reward itself (e.g., its magnitude) but also how receptive to reward we happen to be. Differences in receptivity to reward have been suggested to play an important role in the emergence of mood episodes in psychiatric disorders [1-16]. However, despite substantial theory, the relationship between reward processing and daily fluctuations of mood has yet to be tested directly. In particular, it is unclear whether the extent to which people respond to reward changes from day to day and whether such changes are followed by corresponding shifts in mood. Here, we use a novel mobile-phone platform with dense data sampling and wearable heart-rate and electroencephalographic sensors to examine mood and reward processing over an extended period of one week. Subjects regularly performed a trial-and-error choice task in which different choices were probabilistically rewarded. Subjects' choices revealed two complementary learning processes, one fast and one slow. Reward prediction errors [17, 18] indicative of these two processes were decodable from subjects' physiological responses. Strikingly, more accurate decodability of prediction-error signals reflective of the fast process predicted improvement in subjects' mood several hours later, whereas more accurate decodability of the slow process' signals predicted better mood a whole day later. We conclude that real-life mood fluctuations follow changes in responsivity to reward at multiple timescales. Copyright © 2018 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Curiosity and reward: Valence predicts choice and information prediction errors enhance learning.
Marvin, Caroline B; Shohamy, Daphna
2016-03-01
Curiosity drives many of our daily pursuits and interactions; yet, we know surprisingly little about how it works. Here, we harness an idea implied in many conceptualizations of curiosity: that information has value in and of itself. Reframing curiosity as the motivation to obtain reward-where the reward is information-allows one to leverage major advances in theoretical and computational mechanisms of reward-motivated learning. We provide new evidence supporting 2 predictions that emerge from this framework. First, we find an asymmetric effect of positive versus negative information, with positive information enhancing both curiosity and long-term memory for information. Second, we find that it is not the absolute value of information that drives learning but, rather, the gap between the reward expected and reward received, an "information prediction error." These results support the idea that information functions as a reward, much like money or food, guiding choices and driving learning in systematic ways. (c) 2016 APA, all rights reserved).
Mirolli, Marco; Santucci, Vieri G; Baldassarre, Gianluca
2013-03-01
An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions. Copyright © 2013 Elsevier Ltd. All rights reserved.
Oyama, Kei; Tateyama, Yukina; Hernádi, István; Tobler, Philippe N; Iijima, Toshio; Tsutsui, Ken-Ichiro
2015-11-01
To investigate how the striatum integrates sensory information with reward information for behavioral guidance, we recorded single-unit activity in the dorsal striatum of head-fixed rats participating in a probabilistic Pavlovian conditioning task with auditory conditioned stimuli (CSs) in which reward probability was fixed for each CS but parametrically varied across CSs. We found that the activity of many neurons was linearly correlated with the reward probability indicated by the CSs. The recorded neurons could be classified according to their firing patterns into functional subtypes coding reward probability in different forms such as stimulus value, reward expectation, and reward prediction error. These results suggest that several functional subgroups of dorsal striatal neurons represent different kinds of information formed through extensive prior exposure to CS-reward contingencies. Copyright © 2015 the American Physiological Society.
Oyama, Kei; Tateyama, Yukina; Hernádi, István; Tobler, Philippe N.; Iijima, Toshio
2015-01-01
To investigate how the striatum integrates sensory information with reward information for behavioral guidance, we recorded single-unit activity in the dorsal striatum of head-fixed rats participating in a probabilistic Pavlovian conditioning task with auditory conditioned stimuli (CSs) in which reward probability was fixed for each CS but parametrically varied across CSs. We found that the activity of many neurons was linearly correlated with the reward probability indicated by the CSs. The recorded neurons could be classified according to their firing patterns into functional subtypes coding reward probability in different forms such as stimulus value, reward expectation, and reward prediction error. These results suggest that several functional subgroups of dorsal striatal neurons represent different kinds of information formed through extensive prior exposure to CS-reward contingencies. PMID:26378201
Menegas, William; Babayan, Benedicte M; Uchida, Naoshige; Watabe-Uchida, Mitsuko
2017-01-01
Dopamine neurons are thought to encode novelty in addition to reward prediction error (the discrepancy between actual and predicted values). In this study, we compared dopamine activity across the striatum using fiber fluorometry in mice. During classical conditioning, we observed opposite dynamics in dopamine axon signals in the ventral striatum (‘VS dopamine’) and the posterior tail of the striatum (‘TS dopamine’). TS dopamine showed strong excitation to novel cues, whereas VS dopamine showed no responses to novel cues until they had been paired with a reward. TS dopamine cue responses decreased over time, depending on what the cue predicted. Additionally, TS dopamine showed excitation to several types of stimuli including rewarding, aversive, and neutral stimuli whereas VS dopamine showed excitation only to reward or reward-predicting cues. Together, these results demonstrate that dopamine novelty signals are localized in TS along with general salience signals, while VS dopamine reliably encodes reward prediction error. DOI: http://dx.doi.org/10.7554/eLife.21886.001 PMID:28054919
Phasic dopamine signals: from subjective reward value to formal economic utility
Schultz, Wolfram; Carelli, Regina M; Wightman, R Mark
2015-01-01
Although rewards are physical stimuli and objects, their value for survival and reproduction is subjective. The phasic, neurophysiological and voltammetric dopamine reward prediction error response signals subjective reward value. The signal incorporates crucial reward aspects such as amount, probability, type, risk, delay and effort. Differences of dopamine release dynamics with temporal delay and effort in rodents may derive from methodological issues and require further study. Recent designs using concepts and behavioral tools from experimental economics allow to formally characterize the subjective value signal as economic utility and thus to establish a neuronal value function. With these properties, the dopamine response constitutes a utility prediction error signal. PMID:26719853
Prediction-error in the context of real social relationships modulates reward system activity.
Poore, Joshua C; Pfeifer, Jennifer H; Berkman, Elliot T; Inagaki, Tristen K; Welborn, Benjamin L; Lieberman, Matthew D
2012-01-01
The human reward system is sensitive to both social (e.g., validation) and non-social rewards (e.g., money) and is likely integral for relationship development and reputation building. However, data is sparse on the question of whether implicit social reward processing meaningfully contributes to explicit social representations such as trust and attachment security in pre-existing relationships. This event-related fMRI experiment examined reward system prediction-error activity in response to a potent social reward-social validation-and this activity's relation to both attachment security and trust in the context of real romantic relationships. During the experiment, participants' expectations for their romantic partners' positive regard of them were confirmed (validated) or violated, in either positive or negative directions. Primary analyses were conducted using predefined regions of interest, the locations of which were taken from previously published research. Results indicate that activity for mid-brain and striatal reward system regions of interest was modulated by social reward expectation violation in ways consistent with prior research on reward prediction-error. Additionally, activity in the striatum during viewing of disconfirmatory information was associated with both increases in post-scan reports of attachment anxiety and decreases in post-scan trust, a finding that follows directly from representational models of attachment and trust.
Erdeniz, Burak; Rohe, Tim; Done, John; Seidler, Rachael D
2013-01-01
Conventional neuroimaging techniques provide information about condition-related changes of the BOLD (blood-oxygen-level dependent) signal, indicating only where and when the underlying cognitive processes occur. Recently, with the help of a new approach called "model-based" functional neuroimaging (fMRI), researchers are able to visualize changes in the internal variables of a time varying learning process, such as the reward prediction error or the predicted reward value of a conditional stimulus. However, despite being extremely beneficial to the imaging community in understanding the neural correlates of decision variables, a model-based approach to brain imaging data is also methodologically challenging due to the multicollinearity problem in statistical analysis. There are multiple sources of multicollinearity in functional neuroimaging including investigations of closely related variables and/or experimental designs that do not account for this. The source of multicollinearity discussed in this paper occurs due to correlation between different subjective variables that are calculated very close in time. Here, we review methodological approaches to analyzing such data by discussing the special case of separating the reward prediction error signal from reward outcomes.
Dopamine reward prediction error responses reflect marginal utility.
Stauffer, William R; Lak, Armin; Schultz, Wolfram
2014-11-03
Optimal choices require an accurate neuronal representation of economic value. In economics, utility functions are mathematical representations of subjective value that can be constructed from choices under risk. Utility usually exhibits a nonlinear relationship to physical reward value that corresponds to risk attitudes and reflects the increasing or decreasing marginal utility obtained with each additional unit of reward. Accordingly, neuronal reward responses coding utility should robustly reflect this nonlinearity. In two monkeys, we measured utility as a function of physical reward value from meaningful choices under risk (that adhered to first- and second-order stochastic dominance). The resulting nonlinear utility functions predicted the certainty equivalents for new gambles, indicating that the functions' shapes were meaningful. The monkeys were risk seeking (convex utility function) for low reward and risk avoiding (concave utility function) with higher amounts. Critically, the dopamine prediction error responses at the time of reward itself reflected the nonlinear utility functions measured at the time of choices. In particular, the reward response magnitude depended on the first derivative of the utility function and thus reflected the marginal utility. Furthermore, dopamine responses recorded outside of the task reflected the marginal utility of unpredicted reward. Accordingly, these responses were sufficient to train reinforcement learning models to predict the behaviorally defined expected utility of gambles. These data suggest a neuronal manifestation of marginal utility in dopamine neurons and indicate a common neuronal basis for fundamental explanatory constructs in animal learning theory (prediction error) and economic decision theory (marginal utility). Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Neural mechanisms of reinforcement learning in unmedicated patients with major depressive disorder.
Rothkirch, Marcus; Tonn, Jonas; Köhler, Stephan; Sterzer, Philipp
2017-04-01
According to current concepts, major depressive disorder is strongly related to dysfunctional neural processing of motivational information, entailing impairments in reinforcement learning. While computational modelling can reveal the precise nature of neural learning signals, it has not been used to study learning-related neural dysfunctions in unmedicated patients with major depressive disorder so far. We thus aimed at comparing the neural coding of reward and punishment prediction errors, representing indicators of neural learning-related processes, between unmedicated patients with major depressive disorder and healthy participants. To this end, a group of unmedicated patients with major depressive disorder (n = 28) and a group of age- and sex-matched healthy control participants (n = 30) completed an instrumental learning task involving monetary gains and losses during functional magnetic resonance imaging. The two groups did not differ in their learning performance. Patients and control participants showed the same level of prediction error-related activity in the ventral striatum and the anterior insula. In contrast, neural coding of reward prediction errors in the medial orbitofrontal cortex was reduced in patients. Moreover, neural reward prediction error signals in the medial orbitofrontal cortex and ventral striatum showed negative correlations with anhedonia severity. Using a standard instrumental learning paradigm we found no evidence for an overall impairment of reinforcement learning in medication-free patients with major depressive disorder. Importantly, however, the attenuated neural coding of reward in the medial orbitofrontal cortex and the relation between anhedonia and reduced reward prediction error-signalling in the medial orbitofrontal cortex and ventral striatum likely reflect an impairment in experiencing pleasure from rewarding events as a key mechanism of anhedonia in major depressive disorder. © The Author (2017). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Dopamine prediction errors in reward learning and addiction: from theory to neural circuitry
Keiflin, Ronald; Janak, Patricia H.
2015-01-01
Summary Midbrain dopamine (DA) neurons are proposed to signal reward prediction error (RPE), a fundamental parameter in associative learning models. This RPE hypothesis provides a compelling theoretical framework for understanding DA function in reward learning and addiction. New studies support a causal role for DA-mediated RPE activity in promoting learning about natural reward; however, this question has not been explicitly tested in the context of drug addiction. In this review, we integrate theoretical models with experimental findings on the activity of DA systems, and on the causal role of specific neuronal projections and cell types, to provide a circuit-based framework for probing DA-RPE function in addiction. By examining error-encoding DA neurons in the neural network in which they are embedded, hypotheses regarding circuit-level adaptations that possibly contribute to pathological error-signaling and addiction can be formulated and tested. PMID:26494275
Schlagenhauf, Florian; Rapp, Michael A.; Huys, Quentin J. M.; Beck, Anne; Wüstenberg, Torsten; Deserno, Lorenz; Buchholz, Hans-Georg; Kalbitzer, Jan; Buchert, Ralph; Kienast, Thorsten; Cumming, Paul; Plotkin, Michail; Kumakura, Yoshitaka; Grace, Anthony A.; Dolan, Raymond J.; Heinz, Andreas
2013-01-01
Fluid intelligence represents the capacity for flexible problem solving and rapid behavioral adaptation. Rewards drive flexible behavioral adaptation, in part via a teaching signal expressed as reward prediction errors in the ventral striatum, which has been associated with phasic dopamine release in animal studies. We examined a sample of 28 healthy male adults using multimodal imaging and biological parametric mapping with 1) functional magnetic resonance imaging during a reversal learning task and 2) in a subsample of 17 subjects also with positron emission tomography using 6-[18F]fluoro-L-DOPA to assess dopamine synthesis capacity. Fluid intelligence was measured using a battery of nine standard neuropsychological tests. Ventral striatal BOLD correlates of reward prediction errors were positively correlated with fluid intelligence and, in the right ventral striatum, also inversely correlated with dopamine synthesis capacity (FDOPA Kinapp). When exploring aspects of fluid intelligence, we observed that prediction error signaling correlates with complex attention and reasoning. These findings indicate that individual differences in the capacity for flexible problem solving may be driven by ventral striatal activation during reward-related learning, which in turn proved to be inversely associated with ventral striatal dopamine synthesis capacity. PMID:22344813
When is an error not a prediction error? An electrophysiological investigation.
Holroyd, Clay B; Krigolson, Olave E; Baker, Robert; Lee, Seung; Gibson, Jessica
2009-03-01
A recent theory holds that the anterior cingulate cortex (ACC) uses reinforcement learning signals conveyed by the midbrain dopamine system to facilitate flexible action selection. According to this position, the impact of reward prediction error signals on ACC modulates the amplitude of a component of the event-related brain potential called the error-related negativity (ERN). The theory predicts that ERN amplitude is monotonically related to the expectedness of the event: It is larger for unexpected outcomes than for expected outcomes. However, a recent failure to confirm this prediction has called the theory into question. In the present article, we investigated this discrepancy in three trial-and-error learning experiments. All three experiments provided support for the theory, but the effect sizes were largest when an optimal response strategy could actually be learned. This observation suggests that ACC utilizes dopamine reward prediction error signals for adaptive decision making when the optimal behavior is, in fact, learnable.
A Role for the Lateral Dorsal Tegmentum in Memory and Decision Neural Circuitry
Redila, Van; Kinzel, Chantelle; Jo, Yong Sang; Puryear, Corey B.; Mizumori, Sheri J.Y.
2017-01-01
A role for the hippocampus in memory is clear, although the mechanism for its contribution remains a matter of debate. Converging evidence suggests that hippocampus evaluates the extent to which context-defining features of events occur as expected. The consequence of mismatches, or prediction error, signals from hippocampus is discussed in terms of its impact on neural circuitry that evaluates the significance of prediction errors: Ventral tegmental area (VTA) dopamine cells burst fire to rewards or cues that predict rewards (Schultz et al., 1997). Although the lateral dorsal tegmentum (LDTg) importantly controls dopamine cell burst firing (Lodge & Grace, 2006) the behavioral significance of the LDTg control is not known. Therefore, we evaluated LDTg functional activity as rats performed a spatial memory task that generates task-dependent reward codes in VTA (Jo et al., 2013; Puryear et al., 2010) and another VTA afferent, the pedunculopontine nucleus (PPTg, Norton et al., 2011). Reversible inactivation of the LDTg significantly impaired choice accuracy. LDTg neurons coded primarily egocentric information in the form of movement velocity, turning behaviors, and behaviors leading up to expected reward locations. A subset of the velocity-tuned LDTg cells also showed high frequency bursts shortly before or after reward encounters, after which they showed tonic elevated firing during consumption of small, but not large, rewards. Cells that fired before reward encounters showed stronger correlations with velocity as rats moved toward, rather than away from, rewarded sites. LDTg neural activity was more strongly regulated by egocentric behaviors than that observed for PPTg or VTA cells that were recorded by Puryear et al. and Norton et al. While PPTg activity was uniquely sensitive to ongoing sensory input, all three regions encoded reward magnitude (although in different ways), reward expectation, and reward encounters. Only VTA encoded reward prediction errors. LDTg may inform VTA about learned goal-directed movement that reflects the current motivational state, and this in turn may guide VTA determination of expected subjective goal values. When combined it is clear the LDTg and PPTg provide only a portion of the information that dopamine cells need to assess the value of prediction errors, a process that is essential to future adaptive decisions and switches of cognitive (i.e. memorial) strategies and behavioral responses. PMID:24910282
Parvin, Darius E; McDougle, Samuel D; Taylor, Jordan A; Ivry, Richard B
2018-05-09
Failures to obtain reward can occur from errors in action selection or action execution. Recently, we observed marked differences in choice behavior when the failure to obtain a reward was attributed to errors in action execution compared with errors in action selection (McDougle et al., 2016). Specifically, participants appeared to solve this credit assignment problem by discounting outcomes in which the absence of reward was attributed to errors in action execution. Building on recent evidence indicating relatively direct communication between the cerebellum and basal ganglia, we hypothesized that cerebellar-dependent sensory prediction errors (SPEs), a signal indicating execution failure, could attenuate value updating within a basal ganglia-dependent reinforcement learning system. Here we compared the SPE hypothesis to an alternative, "top-down" hypothesis in which changes in choice behavior reflect participants' sense of agency. In two experiments with male and female human participants, we manipulated the strength of SPEs, along with the participants' sense of agency in the second experiment. The results showed that, whereas the strength of SPE had no effect on choice behavior, participants were much more likely to discount the absence of rewards under conditions in which they believed the reward outcome depended on their ability to produce accurate movements. These results provide strong evidence that SPEs do not directly influence reinforcement learning. Instead, a participant's sense of agency appears to play a significant role in modulating choice behavior when unexpected outcomes can arise from errors in action execution. SIGNIFICANCE STATEMENT When learning from the outcome of actions, the brain faces a credit assignment problem: Failures of reward can be attributed to poor choice selection or poor action execution. Here, we test a specific hypothesis that execution errors are implicitly signaled by cerebellar-based sensory prediction errors. We evaluate this hypothesis and compare it with a more "top-down" hypothesis in which the modulation of choice behavior from execution errors reflects participants' sense of agency. We find that sensory prediction errors have no significant effect on reinforcement learning. Instead, instructions influencing participants' belief of causal outcomes appear to be the main factor influencing their choice behavior. Copyright © 2018 the authors 0270-6474/18/384521-10$15.00/0.
A computational substrate for incentive salience.
McClure, Samuel M; Daw, Nathaniel D; Montague, P Read
2003-08-01
Theories of dopamine function are at a crossroads. Computational models derived from single-unit recordings capture changes in dopaminergic neuron firing rate as a prediction error signal. These models employ the prediction error signal in two roles: learning to predict future rewarding events and biasing action choice. Conversely, pharmacological inhibition or lesion of dopaminergic neuron function diminishes the ability of an animal to motivate behaviors directed at acquiring rewards. These lesion experiments have raised the possibility that dopamine release encodes a measure of the incentive value of a contemplated behavioral act. The most complete psychological idea that captures this notion frames the dopamine signal as carrying 'incentive salience'. On the surface, these two competing accounts of dopamine function seem incommensurate. To the contrary, we demonstrate that both of these functions can be captured in a single computational model of the involvement of dopamine in reward prediction for the purpose of reward seeking.
Pourtois, Gilles
2017-01-01
Abstract Positive mood broadens attention and builds additional mental resources. However, its effect on performance monitoring and reward prediction errors remain unclear. To examine this issue, we used a standard mood induction procedure (based on guided imagery) and asked 45 participants to complete a gambling task suited to study reward prediction errors by means of the feedback-related negativity (FRN) and mid-frontal theta band power. Results showed a larger FRN for negative feedback as well as a lack of reward expectation modulation for positive feedback at the theta level with positive mood, relative to a neutral mood condition. A control analysis showed that this latter result could not be explained by the mere superposition of the event-related brain potential component on the theta oscillations. Moreover, these neurophysiological effects were evidenced in the absence of impairments at the behavioral level or increase in autonomic arousal with positive mood, suggesting that this mood state reliably altered brain mechanisms of reward prediction errors during performance monitoring. We interpret these new results as reflecting a genuine mood congruency effect, whereby reward is anticipated as the default outcome with positive mood and therefore processed as unsurprising (even when it is unlikely), while negative feedback is perceived as unexpected. PMID:28199707
Episodic Memory Encoding Interferes with Reward Learning and Decreases Striatal Prediction Errors
Braun, Erin Kendall; Daw, Nathaniel D.
2014-01-01
Learning is essential for adaptive decision making. The striatum and its dopaminergic inputs are known to support incremental reward-based learning, while the hippocampus is known to support encoding of single events (episodic memory). Although traditionally studied separately, in even simple experiences, these two types of learning are likely to co-occur and may interact. Here we sought to understand the nature of this interaction by examining how incremental reward learning is related to concurrent episodic memory encoding. During the experiment, human participants made choices between two options (colored squares), each associated with a drifting probability of reward, with the goal of earning as much money as possible. Incidental, trial-unique object pictures, unrelated to the choice, were overlaid on each option. The next day, participants were given a surprise memory test for these pictures. We found that better episodic memory was related to a decreased influence of recent reward experience on choice, both within and across participants. fMRI analyses further revealed that during learning the canonical striatal reward prediction error signal was significantly weaker when episodic memory was stronger. This decrease in reward prediction error signals in the striatum was associated with enhanced functional connectivity between the hippocampus and striatum at the time of choice. Our results suggest a mechanism by which memory encoding may compete for striatal processing and provide insight into how interactions between different forms of learning guide reward-based decision making. PMID:25378157
Social and monetary reward learning engage overlapping neural substrates.
Lin, Alice; Adolphs, Ralph; Rangel, Antonio
2012-03-01
Learning to make choices that yield rewarding outcomes requires the computation of three distinct signals: stimulus values that are used to guide choices at the time of decision making, experienced utility signals that are used to evaluate the outcomes of those decisions and prediction errors that are used to update the values assigned to stimuli during reward learning. Here we investigated whether monetary and social rewards involve overlapping neural substrates during these computations. Subjects engaged in two probabilistic reward learning tasks that were identical except that rewards were either social (pictures of smiling or angry people) or monetary (gaining or losing money). We found substantial overlap between the two types of rewards for all components of the learning process: a common area of ventromedial prefrontal cortex (vmPFC) correlated with stimulus value at the time of choice and another common area of vmPFC correlated with reward magnitude and common areas in the striatum correlated with prediction errors. Taken together, the findings support the hypothesis that shared anatomical substrates are involved in the computation of both monetary and social rewards. © The Author (2011). Published by Oxford University Press.
Kumar, Poornima; Eickhoff, Simon B.; Dombrovski, Alexandre Y.
2015-01-01
Reinforcement learning describes motivated behavior in terms of two abstract signals. The representation of discrepancies between expected and actual rewards/punishments – prediction error – is thought to update the expected value of actions and predictive stimuli. Electrophysiological and lesion studies suggest that mesostriatal prediction error signals control behavior through synaptic modification of cortico-striato-thalamic networks. Signals in the ventromedial prefrontal and orbitofrontal cortex are implicated in representing expected value. To obtain unbiased maps of these representations in the human brain, we performed a meta-analysis of functional magnetic resonance imaging studies that employed algorithmic reinforcement learning models, across a variety of experimental paradigms. We found that the ventral striatum (medial and lateral) and midbrain/thalamus represented reward prediction errors, consistent with animal studies. Prediction error signals were also seen in the frontal operculum/insula, particularly for social rewards. In Pavlovian studies, striatal prediction error signals extended into the amygdala, while instrumental tasks engaged the caudate. Prediction error maps were sensitive to the model-fitting procedure (fixed or individually-estimated) and to the extent of spatial smoothing. A correlate of expected value was found in a posterior region of the ventromedial prefrontal cortex, caudal and medial to the orbitofrontal regions identified in animal studies. These findings highlight a reproducible motif of reinforcement learning in the cortico-striatal loops and identify methodological dimensions that may influence the reproducibility of activation patterns across studies. PMID:25665667
Graf, Heiko; Metzger, Coraline D; Walter, Martin; Abler, Birgit
2016-01-06
Investigating the effects of serotonergic antidepressants on neural correlates of visual erotic stimulation revealed decreased reactivity within the dopaminergic reward network along with decreased subjective sexual functioning compared with placebo. However, a global dampening of the reward system under serotonergic drugs is not intuitive considering clinical observations of their beneficial effects in the treatment of depression. Particularly, learning signals as coded in prediction error processing within the dopaminergic reward system can be assumed to be rather enhanced as antidepressant drugs have been demonstrated to facilitate the efficacy of psychotherapeutic interventions relying on learning processes. Within the same study sample, we now explored the effects of serotonergic and dopaminergic/noradrenergic antidepressants on prediction error signals compared with placebo by functional MRI. A total of 17 healthy male participants (mean age: 25.4 years) were investigated under the administration of paroxetine, bupropion and placebo for 7 days each within a randomized, double-blind, within-subject cross-over design. During functional MRI, we used an established monetary incentive task to explore neural prediction error signals within the bilateral nucleus accumbens as region of interest within the dopaminergic reward system. In contrast to diminished neural activations and subjective sexual functioning under the serotonergic agent paroxetine under visual erotic stimulation, we revealed unaffected or even enhanced neural prediction error processing within the nucleus accumbens under this antidepressant along with unaffected behavioural processing. Our study provides evidence that serotonergic antidepressants facilitate prediction error signalling and may support suggestions of beneficial effects of these agents on reinforced learning as an essential element in behavioural psychotherapy.
When theory and biology differ: The relationship between reward prediction errors and expectancy.
Williams, Chad C; Hassall, Cameron D; Trska, Robert; Holroyd, Clay B; Krigolson, Olave E
2017-10-01
Comparisons between expectations and outcomes are critical for learning. Termed prediction errors, the violations of expectancy that occur when outcomes differ from expectations are used to modify value and shape behaviour. In the present study, we examined how a wide range of expectancy violations impacted neural signals associated with feedback processing. Participants performed a time estimation task in which they had to guess the duration of one second while their electroencephalogram was recorded. In a key manipulation, we varied task difficulty across the experiment to create a range of different feedback expectancies - reward feedback was either very expected, expected, 50/50, unexpected, or very unexpected. As predicted, the amplitude of the reward positivity, a component of the human event-related brain potential associated with feedback processing, scaled inversely with expectancy (e.g., unexpected feedback yielded a larger reward positivity than expected feedback). Interestingly, the scaling of the reward positivity to outcome expectancy was not linear as would be predicted by some theoretical models. Specifically, we found that the amplitude of the reward positivity was about equivalent for very expected and expected feedback, and for very unexpected and unexpected feedback. As such, our results demonstrate a sigmoidal relationship between reward expectancy and the amplitude of the reward positivity, with interesting implications for theories of reinforcement learning. Copyright © 2017 Elsevier B.V. All rights reserved.
Balasubramani, Pragathi P.; Chakravarthy, V. Srinivasa; Ravindran, Balaraman; Moustafa, Ahmed A.
2014-01-01
Although empirical and neural studies show that serotonin (5HT) plays many functional roles in the brain, prior computational models mostly focus on its role in behavioral inhibition. In this study, we present a model of risk based decision making in a modified Reinforcement Learning (RL)-framework. The model depicts the roles of dopamine (DA) and serotonin (5HT) in Basal Ganglia (BG). In this model, the DA signal is represented by the temporal difference error (δ), while the 5HT signal is represented by a parameter (α) that controls risk prediction error. This formulation that accommodates both 5HT and DA reconciles some of the diverse roles of 5HT particularly in connection with the BG system. We apply the model to different experimental paradigms used to study the role of 5HT: (1) Risk-sensitive decision making, where 5HT controls risk assessment, (2) Temporal reward prediction, where 5HT controls time-scale of reward prediction, and (3) Reward/Punishment sensitivity, in which the punishment prediction error depends on 5HT levels. Thus the proposed integrated RL model reconciles several existing theories of 5HT and DA in the BG. PMID:24795614
A universal role of the ventral striatum in reward-based learning: Evidence from human studies
Daniel, Reka; Pollmann, Stefan
2014-01-01
Reinforcement learning enables organisms to adjust their behavior in order to maximize rewards. Electrophysiological recordings of dopaminergic midbrain neurons have shown that they code the difference between actual and predicted rewards, i.e., the reward prediction error, in many species. This error signal is conveyed to both the striatum and cortical areas and is thought to play a central role in learning to optimize behavior. However, in human daily life rewards are diverse and often only indirect feedback is available. Here we explore the range of rewards that are processed by the dopaminergic system in human participants, and examine whether it is also involved in learning in the absence of explicit rewards. While results from electrophysiological recordings in humans are sparse, evidence linking dopaminergic activity to the metabolic signal recorded from the midbrain and striatum with functional magnetic resonance imaging (fMRI) is available. Results from fMRI studies suggest that the human ventral striatum (VS) receives valuation information for a diverse set of rewarding stimuli. These range from simple primary reinforcers such as juice rewards over abstract social rewards to internally generated signals on perceived correctness, suggesting that the VS is involved in learning from trial-and-error irrespective of the specific nature of provided rewards. In addition, we summarize evidence that the VS can also be implicated when learning from observing others, and in tasks that go beyond simple stimulus-action-outcome learning, indicating that the reward system is also recruited in more complex learning tasks. PMID:24825620
Nicotine Withdrawal Induces Neural Deficits in Reward Processing.
Oliver, Jason A; Evans, David E; Addicott, Merideth A; Potts, Geoffrey F; Brandon, Thomas H; Drobes, David J
2017-06-01
Nicotine withdrawal reduces neurobiological responses to nonsmoking rewards. Insight into these reward deficits could inform the development of targeted interventions. This study examined the effect of withdrawal on neural and behavioral responses during a reward prediction task. Smokers (N = 48) attended two laboratory sessions following overnight abstinence. Withdrawal was manipulated by having participants smoke three regular nicotine (0.6 mg yield; satiation) or very low nicotine (0.05 mg yield; withdrawal) cigarettes. Electrophysiological recordings of neural activity were obtained while participants completed a reward prediction task that involved viewing four combinations of predictive and reward-determining stimuli: (1) Unexpected Reward; (2) Predicted Reward; (3) Predicted Punishment; (4) Unexpected Punishment. The task evokes a medial frontal negativity that mimics the phasic pattern of dopaminergic firing in ventral tegmental regions associated with reward prediction errors. Nicotine withdrawal decreased the amplitude of the medial frontal negativity equally across all trial types (p < .001). Exploratory analyses indicated withdrawal increased time to initiate the next trial following unexpected punishment trials (p < .001) and response time on reward trials during withdrawal was positively related to nicotine dependence (p < .001). Nicotine withdrawal had equivocal impact across trial types, suggesting reward processing deficits are unlikely to stem from changes in phasic dopaminergic activity during prediction errors. Effects on tonic activity may be more pronounced. Pharmacological interventions directly targeting the dopamine system and behavioral interventions designed to increase reward motivation and responsiveness (eg, behavioral activation) may aid in mitigating withdrawal symptoms and potentially improving smoking cessation outcomes. Findings from this study indicate nicotine withdrawal impacts reward processing signals that are observable in smokers' neural activity. This may play a role in the subjective aversive experience of nicotine withdrawal and potentially contribute to smoking relapse. Interventions that address abnormal responding to both pleasant and unpleasant stimuli may be particularly effective for alleviating nicotine withdrawal. © The Author 2017. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Time of Day Differences in Neural Reward Functioning in Healthy Young Men.
Byrne, Jamie E M; Hughes, Matthew E; Rossell, Susan L; Johnson, Sheri L; Murray, Greg
2017-09-13
Reward function appears to be modulated by the circadian system, but little is known about the neural basis of this interaction. Previous research suggests that the neural reward response may be different in the afternoon; however, the direction of this effect is contentious. Reward response may follow the diurnal rhythm in self-reported positive affect, peaking in the early afternoon. An alternative is that daily reward response represents a type of prediction error, with neural reward activation relatively high at times of day when rewards are unexpected (i.e., early and late in the day). The present study measured neural reward activation in the context of a validated reward task at 10.00 h, 14.00 h, and 19.00 h in healthy human males. A region of interest BOLD fMRI protocol was used to investigate the diurnal waveform of activation in reward-related brain regions. Multilevel modeling found, as expected, a highly significant quadratic time-of-day effect focusing on the left putamen ( p < 0.001). Consistent with the "prediction error" hypothesis, activation was significantly higher at 10.00 h and 19.00 h compared with 14.00 h. It is provisionally concluded that the putamen may be particularly important in endogenous priming of reward motivation at different times of day, with the pattern of activation consistent with circadian-modulated reward expectancies in neural pathways (i.e., greater activation to reward stimuli at unexpected times of day). This study encourages further research into circadian modulation of reward and underscores the methodological importance of accounting for time of day in fMRI protocols. SIGNIFICANCE STATEMENT This is one of the first studies to use a repeated-measures imaging procedure to explore the diurnal rhythm of reward activation. Although self-reported reward (most often operationalized as positive affect) peaks in the afternoon, the present findings indicate that neural activation is lowest at this time. We conclude that the diurnal neural activation pattern may reflect a prediction error of the brain, where rewards at unexpected times (10.00 h and 19.00 h) elicit higher activation in reward brain regions than at expected (14.00 h) times. These data also have methodological significance, suggesting that there may be a time of day influence, which should be accounted for in neural reward studies. Copyright © 2017 the authors 0270-6474/17/378895-06$15.00/0.
Altered neural reward and loss processing and prediction error signalling in depression
Ubl, Bettina; Kuehner, Christine; Kirsch, Peter; Ruttorf, Michaela
2015-01-01
Dysfunctional processing of reward and punishment may play an important role in depression. However, functional magnetic resonance imaging (fMRI) studies have shown heterogeneous results for reward processing in fronto-striatal regions. We examined neural responsivity associated with the processing of reward and loss during anticipation and receipt of incentives and related prediction error (PE) signalling in depressed individuals. Thirty medication-free depressed persons and 28 healthy controls performed an fMRI reward paradigm. Regions of interest analyses focused on neural responses during anticipation and receipt of gains and losses and related PE-signals. Additionally, we assessed the relationship between neural responsivity during gain/loss processing and hedonic capacity. When compared with healthy controls, depressed individuals showed reduced fronto-striatal activity during anticipation of gains and losses. The groups did not significantly differ in response to reward and loss outcomes. In depressed individuals, activity increases in the orbitofrontal cortex and nucleus accumbens during reward anticipation were associated with hedonic capacity. Depressed individuals showed an absence of reward-related PEs but encoded loss-related PEs in the ventral striatum. Depression seems to be linked to blunted responsivity in fronto-striatal regions associated with limited motivational responses for rewards and losses. Alterations in PE encoding might mirror blunted reward- and enhanced loss-related associative learning in depression. PMID:25567763
Bakic, Jasmina; Pourtois, Gilles; Jepma, Marieke; Duprat, Romain; De Raedt, Rudi; Baeken, Chris
2017-01-01
Major depressive disorder (MDD) creates debilitating effects on a wide range of cognitive functions, including reinforcement learning (RL). In this study, we sought to assess whether reward processing as such, or alternatively the complex interplay between motivation and reward might potentially account for the abnormal reward-based learning in MDD. A total of 35 treatment resistant MDD patients and 44 age matched healthy controls (HCs) performed a standard probabilistic learning task. RL was titrated using behavioral, computational modeling and event-related brain potentials (ERPs) data. MDD patients showed comparable learning rate compared to HCs. However, they showed decreased lose-shift responses as well as blunted subjective evaluations of the reinforcers used during the task, relative to HCs. Moreover, MDD patients showed normal internal (at the level of error-related negativity, ERN) but abnormal external (at the level of feedback-related negativity, FRN) reward prediction error (RPE) signals during RL, selectively when additional efforts had to be made to establish learning. Collectively, these results lend support to the assumption that MDD does not impair reward processing per se during RL. Instead, it seems to alter the processing of the emotional value of (external) reinforcers during RL, when additional intrinsic motivational processes have to be engaged. © 2016 Wiley Periodicals, Inc.
Neuronal Reward and Decision Signals: From Theories to Data
Schultz, Wolfram
2015-01-01
Rewards are crucial objects that induce learning, approach behavior, choices, and emotions. Whereas emotions are difficult to investigate in animals, the learning function is mediated by neuronal reward prediction error signals which implement basic constructs of reinforcement learning theory. These signals are found in dopamine neurons, which emit a global reward signal to striatum and frontal cortex, and in specific neurons in striatum, amygdala, and frontal cortex projecting to select neuronal populations. The approach and choice functions involve subjective value, which is objectively assessed by behavioral choices eliciting internal, subjective reward preferences. Utility is the formal mathematical characterization of subjective value and a prime decision variable in economic choice theory. It is coded as utility prediction error by phasic dopamine responses. Utility can incorporate various influences, including risk, delay, effort, and social interaction. Appropriate for formal decision mechanisms, rewards are coded as object value, action value, difference value, and chosen value by specific neurons. Although all reward, reinforcement, and decision variables are theoretical constructs, their neuronal signals constitute measurable physical implementations and as such confirm the validity of these concepts. The neuronal reward signals provide guidance for behavior while constraining the free will to act. PMID:26109341
Vassena, Eliana; Deraeve, James; Alexander, William H
2017-10-01
Human behavior is strongly driven by the pursuit of rewards. In daily life, however, benefits mostly come at a cost, often requiring that effort be exerted to obtain potential benefits. Medial PFC (MPFC) and dorsolateral PFC (DLPFC) are frequently implicated in the expectation of effortful control, showing increased activity as a function of predicted task difficulty. Such activity partially overlaps with expectation of reward and has been observed both during decision-making and during task preparation. Recently, novel computational frameworks have been developed to explain activity in these regions during cognitive control, based on the principle of prediction and prediction error (predicted response-outcome [PRO] model [Alexander, W. H., & Brown, J. W. Medial prefrontal cortex as an action-outcome predictor. Nature Neuroscience, 14, 1338-1344, 2011], hierarchical error representation [HER] model [Alexander, W. H., & Brown, J. W. Hierarchical error representation: A computational model of anterior cingulate and dorsolateral prefrontal cortex. Neural Computation, 27, 2354-2410, 2015]). Despite the broad explanatory power of these models, it is not clear whether they can also accommodate effects related to the expectation of effort observed in MPFC and DLPFC. Here, we propose a translation of these computational frameworks to the domain of effort-based behavior. First, we discuss how the PRO model, based on prediction error, can explain effort-related activity in MPFC, by reframing effort-based behavior in a predictive context. We propose that MPFC activity reflects monitoring of motivationally relevant variables (such as effort and reward), by coding expectations and discrepancies from such expectations. Moreover, we derive behavioral and neural model-based predictions for healthy controls and clinical populations with impairments of motivation. Second, we illustrate the possible translation to effort-based behavior of the HER model, an extended version of PRO model based on hierarchical error prediction, developed to explain MPFC-DLPFC interactions. We derive behavioral predictions that describe how effort and reward information is coded in PFC and how changing the configuration of such environmental information might affect decision-making and task performance involving motivation.
Mesolimbic Dopamine Signals the Value of Work
Hamid, Arif A.; Pettibone, Jeffrey R.; Mabrouk, Omar S.; Hetrick, Vaughn L.; Schmidt, Robert; Vander Weele, Caitlin M.; Kennedy, Robert T.; Aragona, Brandon J.; Berke, Joshua D.
2015-01-01
Dopamine cell firing can encode errors in reward prediction, providing a learning signal to guide future behavior. Yet dopamine is also a key modulator of motivation, invigorating current behavior. Existing theories propose that fast (“phasic”) dopamine fluctuations support learning, while much slower (“tonic”) dopamine changes are involved in motivation. We examined dopamine release in the nucleus accumbens across multiple time scales, using complementary microdialysis and voltammetric methods during adaptive decision-making. We first show that minute-by-minute dopamine levels covary with reward rate and motivational vigor. We then show that second-by-second dopamine release encodes an estimate of temporally-discounted future reward (a value function). We demonstrate that changing dopamine immediately alters willingness to work, and reinforces preceding action choices by encoding temporal-difference reward prediction errors. Our results indicate that dopamine conveys a single, rapidly-evolving decision variable, the available reward for investment of effort, that is employed for both learning and motivational functions. PMID:26595651
Frontal Theta Links Prediction Errors to Behavioral Adaptation in Reinforcement Learning
Cavanagh, James F.; Frank, Michael J.; Klein, Theresa J.; Allen, John J.B.
2009-01-01
Investigations into action monitoring have consistently detailed a fronto-central voltage deflection in the Event-Related Potential (ERP) following the presentation of negatively valenced feedback, sometimes termed the Feedback Related Negativity (FRN). The FRN has been proposed to reflect a neural response to prediction errors during reinforcement learning, yet the single trial relationship between neural activity and the quanta of expectation violation remains untested. Although ERP methods are not well suited to single trial analyses, the FRN has been associated with theta band oscillatory perturbations in the medial prefrontal cortex. Medio-frontal theta oscillations have been previously associated with expectation violation and behavioral adaptation and are well suited to single trial analysis. Here, we recorded EEG activity during a probabilistic reinforcement learning task and fit the performance data to an abstract computational model (Q-learning) for calculation of single-trial reward prediction errors. Single-trial theta oscillatory activities following feedback were investigated within the context of expectation (prediction error) and adaptation (subsequent reaction time change). Results indicate that interactive medial and lateral frontal theta activities reflect the degree of negative and positive reward prediction error in the service of behavioral adaptation. These different brain areas use prediction error calculations for different behavioral adaptations: with medial frontal theta reflecting the utilization of prediction errors for reaction time slowing (specifically following errors), but lateral frontal theta reflecting prediction errors leading to working memory-related reaction time speeding for the correct choice. PMID:19969093
Prediction error and somatosensory insula activation in women recovered from anorexia nervosa.
Frank, Guido K W; Collier, Shaleise; Shott, Megan E; O'Reilly, Randall C
2016-08-01
Previous research in patients with anorexia nervosa showed heightened brain response during a taste reward conditioning task and heightened sensitivity to rewarding and punishing stimuli. Here we tested the hypothesis that individuals recovered from anorexia nervosa would also experience greater brain activation during this task as well as higher sensitivity to salient stimuli than controls. Women recovered from restricting-type anorexia nervosa and healthy control women underwent fMRI during application of a prediction error taste reward learning paradigm. Twenty-four women recovered from anorexia nervosa (mean age 30.3 ± 8.1 yr) and 24 control women (mean age 27.4 ± 6.3 yr) took part in this study. The recovered anorexia nervosa group showed greater left posterior insula activation for the prediction error model analysis than the control group (family-wise error- and small volume-corrected p < 0.05). A group × condition analysis found greater posterior insula response in women recovered from anorexia nervosa than controls for unexpected stimulus omission, but not for unexpected receipt. Sensitivity to punishment was elevated in women recovered from anorexia nervosa. This was a cross-sectional study, and the sample size was modest. Anorexia nervosa after recovery is associated with heightened prediction error-related brain response in the posterior insula as well as greater response to unexpected reward stimulus omission. This finding, together with behaviourally increased sensitivity to punishment, could indicate that individuals recovered from anorexia nervosa are particularly responsive to punishment. The posterior insula processes somatosensory stimuli, including unexpected bodily states, and greater response could indicate altered perception or integration of unexpected or maybe unwanted bodily feelings. Whether those findings develop during the ill state or whether they are biological traits requires further study.
Aberg, Kristoffer Carl; Doell, Kimberly C; Schwartz, Sophie
2015-10-28
Some individuals are better at learning about rewarding situations, whereas others are inclined to avoid punishments (i.e., enhanced approach or avoidance learning, respectively). In reinforcement learning, action values are increased when outcomes are better than predicted (positive prediction errors [PEs]) and decreased for worse than predicted outcomes (negative PEs). Because actions with high and low values are approached and avoided, respectively, individual differences in the neural encoding of PEs may influence the balance between approach-avoidance learning. Recent correlational approaches also indicate that biases in approach-avoidance learning involve hemispheric asymmetries in dopamine function. However, the computational and neural mechanisms underpinning such learning biases remain unknown. Here we assessed hemispheric reward asymmetry in striatal activity in 34 human participants who performed a task involving rewards and punishments. We show that the relative difference in reward response between hemispheres relates to individual biases in approach-avoidance learning. Moreover, using a computational modeling approach, we demonstrate that better encoding of positive (vs negative) PEs in dopaminergic midbrain regions is associated with better approach (vs avoidance) learning, specifically in participants with larger reward responses in the left (vs right) ventral striatum. Thus, individual dispositions or traits may be determined by neural processes acting to constrain learning about specific aspects of the world. Copyright © 2015 the authors 0270-6474/15/3514491-10$15.00/0.
Dopaminergic dysfunction in schizophrenia: salience attribution revisited.
Heinz, Andreas; Schlagenhauf, Florian
2010-05-01
A dysregulation of the mesolimbic dopamine system in schizophrenia patients may lead to aberrant attribution of incentive salience and contribute to the emergence of psychopathological symptoms like delusions. The dopaminergic signal has been conceptualized to represent a prediction error that indicates the difference between received and predicted reward. The incentive salience hypothesis states that dopamine mediates the attribution of "incentive salience" to conditioned cues that predict reward. This hypothesis was initially applied in the context of drug addiction and then transferred to schizophrenic psychosis. It was hypothesized that increased firing (chaotic or stress associated) of dopaminergic neurons in the striatum of schizophrenia patients attributes incentive salience to otherwise irrelevant stimuli. Here, we review recent neuroimaging studies directly addressing this hypothesis. They suggest that neuronal functions associated with dopaminergic signaling, such as the attribution of salience to reward-predicting stimuli and the computation of prediction errors, are indeed altered in schizophrenia patients and that this impairment appears to contribute to delusion formation.
Freeman, Scott M; Aron, Adam R
2016-02-01
Controlling an inappropriate response tendency in the face of a reward-predicting stimulus likely depends on the strength of the reward-driven activation, the strength of a putative top-down control process, and their relative timing. We developed a rewarded go/no-go paradigm to investigate such dynamics. Participants made rapid responses (on go trials) to high versus low reward-predicting stimuli and sometimes had to withhold responding (on no-go trials) in the face of the same stimuli. Behaviorally, for high versus low reward stimuli, responses were faster on go trials, and there were more errors of commission on no-go trials. We used single-pulse TMS to map out the corticospinal excitability dynamics, especially on no-go trials where control is needed. For successful no-go trials, there was an early rise in motor activation that was then sharply reduced beneath baseline. This activation-reduction pattern was more pronounced for high- versus low-reward trials and in individuals with greater motivational drive for reward. A follow-on experiment showed that, when participants were fatigued by an effortful task, they made more errors on no-go trials for high versus low reward stimuli. Together, these studies show that, when a response is inappropriate, reward-predicting stimuli induce early motor activation, followed by a top-down effortful control process (which we interpret as response suppression) that depends on the strength of the preceding activation. Our findings provide novel information about the activation-suppression dynamics during control over reward-driven actions, and they illustrate how fatigue or depletion leads to control failures in the face of reward.
Dopamine reward prediction errors reflect hidden state inference across time
Starkweather, Clara Kwon; Babayan, Benedicte M.; Uchida, Naoshige; Gershman, Samuel J.
2017-01-01
Midbrain dopamine neurons signal reward prediction error (RPE), or actual minus expected reward. The temporal difference (TD) learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically, TD learning imparts value to features that serially track elapsed time relative to observable stimuli. In the real world, however, sensory stimuli provide ambiguous information about the hidden state of the environment, leading to the proposal that TD learning might instead compute a value signal based on an inferred distribution of hidden states (a ‘belief state’). In this work, we asked whether dopaminergic signaling supports a TD learning framework that operates over hidden states. We found that dopamine signaling exhibited a striking difference between two tasks that differed only with respect to whether reward was delivered deterministically. Our results favor an associative learning rule that combines cached values with hidden state inference. PMID:28263301
The modulation of savouring by prediction error and its effects on choice
Iigaya, Kiyohito; Story, Giles W; Kurth-Nelson, Zeb; Dolan, Raymond J; Dayan, Peter
2016-01-01
When people anticipate uncertain future outcomes, they often prefer to know their fate in advance. Inspired by an idea in behavioral economics that the anticipation of rewards is itself attractive, we hypothesized that this preference of advance information arises because reward prediction errors carried by such information can boost the level of anticipation. We designed new empirical behavioral studies to test this proposal, and confirmed that subjects preferred advance reward information more strongly when they had to wait for rewards for a longer time. We formulated our proposal in a reinforcement-learning model, and we showed that our model could account for a wide range of existing neuronal and behavioral data, without appealing to ambiguous notions such as an explicit value for information. We suggest that such boosted anticipation significantly drives risk-seeking behaviors, most pertinently in gambling. DOI: http://dx.doi.org/10.7554/eLife.13747.001 PMID:27101365
Hedging Your Bets by Learning Reward Correlations in the Human Brain
Wunderlich, Klaus; Symmonds, Mkael; Bossaerts, Peter; Dolan, Raymond J.
2011-01-01
Summary Human subjects are proficient at tracking the mean and variance of rewards and updating these via prediction errors. Here, we addressed whether humans can also learn about higher-order relationships between distinct environmental outcomes, a defining ecological feature of contexts where multiple sources of rewards are available. By manipulating the degree to which distinct outcomes are correlated, we show that subjects implemented an explicit model-based strategy to learn the associated outcome correlations and were adept in using that information to dynamically adjust their choices in a task that required a minimization of outcome variance. Importantly, the experimentally generated outcome correlations were explicitly represented neuronally in right midinsula with a learning prediction error signal expressed in rostral anterior cingulate cortex. Thus, our data show that the human brain represents higher-order correlation structures between rewards, a core adaptive ability whose immediate benefit is optimized sampling. PMID:21943609
Dopamine reward prediction errors reflect hidden-state inference across time.
Starkweather, Clara Kwon; Babayan, Benedicte M; Uchida, Naoshige; Gershman, Samuel J
2017-04-01
Midbrain dopamine neurons signal reward prediction error (RPE), or actual minus expected reward. The temporal difference (TD) learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically, TD learning imparts value to features that serially track elapsed time relative to observable stimuli. In the real world, however, sensory stimuli provide ambiguous information about the hidden state of the environment, leading to the proposal that TD learning might instead compute a value signal based on an inferred distribution of hidden states (a 'belief state'). Here we asked whether dopaminergic signaling supports a TD learning framework that operates over hidden states. We found that dopamine signaling showed a notable difference between two tasks that differed only with respect to whether reward was delivered in a deterministic manner. Our results favor an associative learning rule that combines cached values with hidden-state inference.
Schelp, Scott A.; Pultorak, Katherine J.; Rakowski, Dylan R.; Gomez, Devan M.; Krzystyniak, Gregory; Das, Raibatak; Oleson, Erik B.
2017-01-01
The mesolimbic dopamine system is strongly implicated in motivational processes. Currently accepted theories suggest that transient mesolimbic dopamine release events energize reward seeking and encode reward value. During the pursuit of reward, critical associations are formed between the reward and cues that predict its availability. Conditioned by these experiences, dopamine neurons begin to fire upon the earliest presentation of a cue, and again at the receipt of reward. The resulting dopamine concentration scales proportionally to the value of the reward. In this study, we used a behavioral economics approach to quantify how transient dopamine release events scale with price and causally alter price sensitivity. We presented sucrose to rats across a range of prices and modeled the resulting demand curves to estimate price sensitivity. Using fast-scan cyclic voltammetry, we determined that the concentration of accumbal dopamine time-locked to cue presentation decreased with price. These data confirm and extend the notion that dopamine release events originating in the ventral tegmental area encode subjective value. Using optogenetics to augment dopamine concentration, we found that enhancing dopamine release at cue made demand more sensitive to price and decreased dopamine concentration at reward delivery. From these observations, we infer that value is decreased because of a negative reward prediction error (i.e., the animal receives less than expected). Conversely, enhancing dopamine at reward made demand less sensitive to price. We attribute this finding to a positive reward prediction error, whereby the animal perceives they received a better value than anticipated. PMID:29109253
Morphological elucidation of basal ganglia circuits contributing reward prediction
Fujiyama, Fumino; Takahashi, Susumu; Karube, Fuyuki
2015-01-01
Electrophysiological studies in monkeys have shown that dopaminergic neurons respond to the reward prediction error. In addition, striatal neurons alter their responsiveness to cortical or thalamic inputs in response to the dopamine signal, via the mechanism of dopamine-regulated synaptic plasticity. These findings have led to the hypothesis that the striatum exhibits synaptic plasticity under the influence of the reward prediction error and conduct reinforcement learning throughout the basal ganglia circuits. The reinforcement learning model is useful; however, the mechanism by which such a process emerges in the basal ganglia needs to be anatomically explained. The actor–critic model has been previously proposed and extended by the existence of role sharing within the striatum, focusing on the striosome/matrix compartments. However, this hypothesis has been difficult to confirm morphologically, partly because of the complex structure of the striosome/matrix compartments. Here, we review recent morphological studies that elucidate the input/output organization of the striatal compartments. PMID:25698913
Punishment sensitivity modulates the processing of negative feedback but not error-induced learning.
Unger, Kerstin; Heintz, Sonja; Kray, Jutta
2012-01-01
Accumulating evidence suggests that individual differences in punishment and reward sensitivity are associated with functional alterations in neural systems underlying error and feedback processing. In particular, individuals highly sensitive to punishment have been found to be characterized by larger mediofrontal error signals as reflected in the error negativity/error-related negativity (Ne/ERN) and the feedback-related negativity (FRN). By contrast, reward sensitivity has been shown to relate to the error positivity (Pe). Given that Ne/ERN, FRN, and Pe have been functionally linked to flexible behavioral adaptation, the aim of the present research was to examine how these electrophysiological reflections of error and feedback processing vary as a function of punishment and reward sensitivity during reinforcement learning. We applied a probabilistic learning task that involved three different conditions of feedback validity (100%, 80%, and 50%). In contrast to prior studies using response competition tasks, we did not find reliable correlations between punishment sensitivity and the Ne/ERN. Instead, higher punishment sensitivity predicted larger FRN amplitudes, irrespective of feedback validity. Moreover, higher reward sensitivity was associated with a larger Pe. However, only reward sensitivity was related to better overall learning performance and higher post-error accuracy, whereas highly punishment sensitive participants showed impaired learning performance, suggesting that larger negative feedback-related error signals were not beneficial for learning or even reflected maladaptive information processing in these individuals. Thus, although our findings indicate that individual differences in reward and punishment sensitivity are related to electrophysiological correlates of error and feedback processing, we found less evidence for influences of these personality characteristics on the relation between performance monitoring and feedback-based learning.
Neural evidence for description dependent reward processing in the framing effect.
Yu, Rongjun; Zhang, Ping
2014-01-01
Human decision making can be influenced by emotionally valenced contexts, known as the framing effect. We used event-related brain potentials to investigate how framing influences the encoding of reward. We found that the feedback related negativity (FRN), which indexes the "worse than expected" negative prediction error in the anterior cingulate cortex (ACC), was more negative for the negative frame than for the positive frame in the win domain. Consistent with previous findings that the FRN is not sensitive to "better than expected" positive prediction error, the FRN did not differentiate the positive and negative frame in the loss domain. Our results provide neural evidence that the description invariance principle which states that reward representation and decision making are not influenced by how options are presented is violated in the framing effect.
Belief state representation in the dopamine system.
Babayan, Benedicte M; Uchida, Naoshige; Gershman, Samuel J
2018-05-14
Learning to predict future outcomes is critical for driving appropriate behaviors. Reinforcement learning (RL) models have successfully accounted for such learning, relying on reward prediction errors (RPEs) signaled by midbrain dopamine neurons. It has been proposed that when sensory data provide only ambiguous information about which state an animal is in, it can predict reward based on a set of probabilities assigned to hypothetical states (called the belief state). Here we examine how dopamine RPEs and subsequent learning are regulated under state uncertainty. Mice are first trained in a task with two potential states defined by different reward amounts. During testing, intermediate-sized rewards are given in rare trials. Dopamine activity is a non-monotonic function of reward size, consistent with RL models operating on belief states. Furthermore, the magnitude of dopamine responses quantitatively predicts changes in behavior. These results establish the critical role of state inference in RL.
Silvetti, Massimo; Alexander, William; Verguts, Tom; Brown, Joshua W
2014-10-01
The role of the medial prefrontal cortex (mPFC) and especially the anterior cingulate cortex has been the subject of intense debate for the last decade. A number of theories have been proposed to account for its function. Broadly speaking, some emphasize cognitive control, whereas others emphasize value processing; specific theories concern reward processing, conflict detection, error monitoring, and volatility detection, among others. Here we survey and evaluate them relative to experimental results from neurophysiological, anatomical, and cognitive studies. We argue for a new conceptualization of mPFC, arising from recent computational modeling work. Based on reinforcement learning theory, these new models propose that mPFC is an Actor-Critic system. This system is aimed to predict future events including rewards, to evaluate errors in those predictions, and finally, to implement optimal skeletal-motor and visceromotor commands to obtain reward. This framework provides a comprehensive account of mPFC function, accounting for and predicting empirical results across different levels of analysis, including monkey neurophysiology, human ERP, human neuroimaging, and human behavior. Copyright © 2013 Elsevier Ltd. All rights reserved.
Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons
Frémaux, Nicolas; Sprekeler, Henning; Gerstner, Wulfram
2013-01-01
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity. PMID:23592970
Reinforcement learning using a continuous time actor-critic framework with spiking neurons.
Frémaux, Nicolas; Sprekeler, Henning; Gerstner, Wulfram
2013-04-01
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.
Neural evidence for description dependent reward processing in the framing effect
Yu, Rongjun; Zhang, Ping
2014-01-01
Human decision making can be influenced by emotionally valenced contexts, known as the framing effect. We used event-related brain potentials to investigate how framing influences the encoding of reward. We found that the feedback related negativity (FRN), which indexes the “worse than expected” negative prediction error in the anterior cingulate cortex (ACC), was more negative for the negative frame than for the positive frame in the win domain. Consistent with previous findings that the FRN is not sensitive to “better than expected” positive prediction error, the FRN did not differentiate the positive and negative frame in the loss domain. Our results provide neural evidence that the description invariance principle which states that reward representation and decision making are not influenced by how options are presented is violated in the framing effect. PMID:24733998
Reward Pays the Cost of Noise Reduction in Motor and Cognitive Control.
Manohar, Sanjay G; Chong, Trevor T-J; Apps, Matthew A J; Batla, Amit; Stamelou, Maria; Jarman, Paul R; Bhatia, Kailash P; Husain, Masud
2015-06-29
Speed-accuracy trade-off is an intensively studied law governing almost all behavioral tasks across species. Here we show that motivation by reward breaks this law, by simultaneously invigorating movement and improving response precision. We devised a model to explain this paradoxical effect of reward by considering a new factor: the cost of control. Exerting control to improve response precision might itself come at a cost--a cost to attenuate a proportion of intrinsic neural noise. Applying a noise-reduction cost to optimal motor control predicted that reward can increase both velocity and accuracy. Similarly, application to decision-making predicted that reward reduces reaction times and errors in cognitive control. We used a novel saccadic distraction task to quantify the speed and accuracy of both movements and decisions under varying reward. Both faster speeds and smaller errors were observed with higher incentives, with the results best fitted by a model including a precision cost. Recent theories consider dopamine to be a key neuromodulator in mediating motivational effects of reward. We therefore examined how Parkinson's disease (PD), a condition associated with dopamine depletion, alters the effects of reward. Individuals with PD showed reduced reward sensitivity in their speed and accuracy, consistent in our model with higher noise-control costs. Including a cost of control over noise explains how reward may allow apparent performance limits to be surpassed. On this view, the pattern of reduced reward sensitivity in PD patients can specifically be accounted for by a higher cost for controlling noise. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Surprise beyond prediction error
Chumbley, Justin R; Burke, Christopher J; Stephan, Klaas E; Friston, Karl J; Tobler, Philippe N; Fehr, Ernst
2014-01-01
Surprise drives learning. Various neural “prediction error” signals are believed to underpin surprise-based reinforcement learning. Here, we report a surprise signal that reflects reinforcement learning but is neither un/signed reward prediction error (RPE) nor un/signed state prediction error (SPE). To exclude these alternatives, we measured surprise responses in the absence of RPE and accounted for a host of potential SPE confounds. This new surprise signal was evident in ventral striatum, primary sensory cortex, frontal poles, and amygdala. We interpret these findings via a normative model of surprise. PMID:24700400
Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis
2013-01-01
Background Depression is characterised partly by blunted reactions to reward. However, tasks probing this deficiency have not distinguished insensitivity to reward from insensitivity to the prediction errors for reward that determine learning and are putatively reported by the phasic activity of dopamine neurons. We attempted to disentangle these factors with respect to anhedonia in the context of stress, Major Depressive Disorder (MDD), Bipolar Disorder (BPD) and a dopaminergic challenge. Methods Six behavioural datasets involving 392 experimental sessions were subjected to a model-based, Bayesian meta-analysis. Participants across all six studies performed a probabilistic reward task that used an asymmetric reinforcement schedule to assess reward learning. Healthy controls were tested under baseline conditions, stress or after receiving the dopamine D2 agonist pramipexole. In addition, participants with current or past MDD or BPD were evaluated. Reinforcement learning models isolated the contributions of variation in reward sensitivity and learning rate. Results MDD and anhedonia reduced reward sensitivity more than they affected the learning rate, while a low dose of the dopamine D2 agonist pramipexole showed the opposite pattern. Stress led to a pattern consistent with a mixed effect on reward sensitivity and learning rate. Conclusion Reward-related learning reflected at least two partially separable contributions. The first related to phasic prediction error signalling, and was preferentially modulated by a low dose of the dopamine agonist pramipexole. The second related directly to reward sensitivity, and was preferentially reduced in MDD and anhedonia. Stress altered both components. Collectively, these findings highlight the contribution of model-based reinforcement learning meta-analysis for dissecting anhedonic behavior. PMID:23782813
A reward prediction error for charitable donations reveals outcome orientation of donators
Kuss, Katarina; Falk, Armin; Trautner, Peter; Elger, Christian E.; Weber, Bernd
2013-01-01
The motives underlying prosocial behavior, like charitable donations, can be related either to actions or to outcomes. To address the neural basis of outcome orientation in charitable giving, we asked 33 subjects to make choices affecting their own payoffs and payoffs to a charity organization, while being scanned by functional magnetic resonance imaging (fMRI). We experimentally induced a reward prediction error (RPE) by subsequently discarding some of the chosen outcomes. Co-localized to a nucleus accumbens BOLD signal corresponding to the RPE for the subject's own payoff, we observed an equivalent RPE signal for the charity's payoff in those subjects who were willing to donate. This unique demonstration of a neuronal RPE signal for outcomes exclusively affecting unrelated others indicates common brain processes during outcome evaluation for selfish, individual and nonselfish, social rewards and strongly suggests the effectiveness of outcome-oriented motives in charitable giving. PMID:22198972
Reward abundance interferes with error-based learning in a visuomotor adaptation task
Oostwoud Wijdenes, Leonie; Rigterink, Tessa; Overvliet, Krista E.; Smeets, Joeren B. J.
2018-01-01
The brain rapidly adapts reaching movements to changing circumstances by using visual feedback about errors. Providing reward in addition to error feedback facilitates the adaptation but the underlying mechanism is unknown. Here, we investigate whether the proportion of trials rewarded (the ‘reward abundance’) influences how much participants adapt to their errors. We used a 3D multi-target pointing task in which reward alone is insufficient for motor adaptation. Participants (N = 423) performed the pointing task with feedback based on a shifted hand-position. On a proportion of trials we gave them rewarding feedback that their hand hit the target. Half of the participants only received this reward feedback. The other half also received feedback about endpoint errors. In different groups, we varied the proportion of trials that was rewarded. As expected, participants who received feedback about their errors did adapt, but participants who only received reward-feedback did not. Critically, participants who received abundant rewards adapted less to their errors than participants who received less reward. Thus, reward abundance negatively influences how much participants learn from their errors. Probably participants used a mechanism that relied more on the reward feedback when the reward was abundant. Because participants could not adapt to the reward, this interfered with adaptation to errors. PMID:29513681
An Integrative Perspective on the Role of Dopamine in Schizophrenia
Maia, Tiago V.; Frank, Michael J.
2017-01-01
We propose that schizophrenia involves a combination of decreased phasic dopamine responses for relevant stimuli and increased spontaneous phasic dopamine release. Using insights from computational reinforcement-learning models and basic-science studies of the dopamine system, we show that each of these two disturbances contributes to a specific symptom domain and explains a large set of experimental findings associated with that domain. Reduced phasic responses for relevant stimuli help to explain negative symptoms and provide a unified explanation for the following experimental findings in schizophrenia, most of which have been shown to correlate with negative symptoms: reduced learning from rewards; blunted activation of the ventral striatum, midbrain, and other limbic regions for rewards and positive prediction errors; blunted activation of the ventral striatum during reward anticipation; blunted autonomic responding for relevant stimuli; blunted neural activation for aversive outcomes and aversive prediction errors; reduced willingness to expend effort for rewards; and psychomotor slowing. Increased spontaneous phasic dopamine release helps to explain positive symptoms and provides a unified explanation for the following experimental findings in schizophrenia, most of which have been shown to correlate with positive symptoms: aberrant learning for neutral cues (assessed with behavioral and autonomic responses), and aberrant, increased activation of the ventral striatum, midbrain, and other limbic regions for neutral cues, neutral outcomes, and neutral prediction errors. Taken together, then, these two disturbances explain many findings in schizophrenia. We review evidence supporting their co-occurrence and consider their differential implications for the treatment of positive and negative symptoms. PMID:27452791
Model-based learning and the contribution of the orbitofrontal cortex to the model-free world
McDannald, Michael A.; Takahashi, Yuji K.; Lopatina, Nina; Pietras, Brad W.; Jones, Josh L.; Schoenbaum, Geoffrey
2012-01-01
Learning is proposed to occur when there is a discrepancy between reward prediction and reward receipt. At least two separate systems are thought to exist: one in which predictions are proposed to be based on model-free or cached values; and another in which predictions are model-based. A basic neural circuit for model-free reinforcement learning has already been described. In the model-free circuit the ventral striatum (VS) is thought to supply a common-currency reward prediction to midbrain dopamine neurons that compute prediction errors and drive learning. In a model-based system, predictions can include more information about an expected reward, such as its sensory attributes or current, unique value. This detailed prediction allows for both behavioral flexibility and learning driven by changes in sensory features of rewards alone. Recent evidence from animal learning and human imaging suggests that, in addition to model-free information, the VS also signals model-based information. Further, there is evidence that the orbitofrontal cortex (OFC) signals model-based information. Here we review these data and suggest that the OFC provides model-based information to this traditional model-free circuitry and offer possibilities as to how this interaction might occur. PMID:22487030
Signed reward prediction errors drive declarative learning
Naert, Lien; Janssens, Clio; Talsma, Durk; Van Opstal, Filip; Verguts, Tom
2018-01-01
Reward prediction errors (RPEs) are thought to drive learning. This has been established in procedural learning (e.g., classical and operant conditioning). However, empirical evidence on whether RPEs drive declarative learning–a quintessentially human form of learning–remains surprisingly absent. We therefore coupled RPEs to the acquisition of Dutch-Swahili word pairs in a declarative learning paradigm. Signed RPEs (SRPEs; “better-than-expected” signals) during declarative learning improved recognition in a follow-up test, with increasingly positive RPEs leading to better recognition. In addition, classic declarative memory mechanisms such as time-on-task failed to explain recognition performance. The beneficial effect of SRPEs on recognition was subsequently affirmed in a replication study with visual stimuli. PMID:29293493
Signed reward prediction errors drive declarative learning.
De Loof, Esther; Ergo, Kate; Naert, Lien; Janssens, Clio; Talsma, Durk; Van Opstal, Filip; Verguts, Tom
2018-01-01
Reward prediction errors (RPEs) are thought to drive learning. This has been established in procedural learning (e.g., classical and operant conditioning). However, empirical evidence on whether RPEs drive declarative learning-a quintessentially human form of learning-remains surprisingly absent. We therefore coupled RPEs to the acquisition of Dutch-Swahili word pairs in a declarative learning paradigm. Signed RPEs (SRPEs; "better-than-expected" signals) during declarative learning improved recognition in a follow-up test, with increasingly positive RPEs leading to better recognition. In addition, classic declarative memory mechanisms such as time-on-task failed to explain recognition performance. The beneficial effect of SRPEs on recognition was subsequently affirmed in a replication study with visual stimuli.
ERIC Educational Resources Information Center
Holroyd, Clay B.; Baker, Travis E.; Kerns, Kimberly A.; Muller, Ulrich
2008-01-01
Behavioral and neurophysiological evidence suggest that attention-deficit hyperactivity disorder (ADHD) is characterized by the impact of abnormal reward prediction error signals carried by the midbrain dopamine system on frontal brain areas that implement cognitive control. To investigate this issue, we recorded the event-related brain potential…
Interactions of timing and prediction error learning.
Kirkpatrick, Kimberly
2014-01-01
Timing and prediction error learning have historically been treated as independent processes, but growing evidence has indicated that they are not orthogonal. Timing emerges at the earliest time point when conditioned responses are observed, and temporal variables modulate prediction error learning in both simple conditioning and cue competition paradigms. In addition, prediction errors, through changes in reward magnitude or value alter timing of behavior. Thus, there appears to be a bi-directional interaction between timing and prediction error learning. Modern theories have attempted to integrate the two processes with mixed success. A neurocomputational approach to theory development is espoused, which draws on neurobiological evidence to guide and constrain computational model development. Heuristics for future model development are presented with the goal of sparking new approaches to theory development in the timing and prediction error fields. Copyright © 2013 Elsevier B.V. All rights reserved.
Diuk, Carlos; Tsai, Karin; Wallis, Jonathan; Botvinick, Matthew; Niv, Yael
2013-03-27
Studies suggest that dopaminergic neurons report a unitary, global reward prediction error signal. However, learning in complex real-life tasks, in particular tasks that show hierarchical structure, requires multiple prediction errors that may coincide in time. We used functional neuroimaging to measure prediction error signals in humans performing such a hierarchical task involving simultaneous, uncorrelated prediction errors. Analysis of signals in a priori anatomical regions of interest in the ventral striatum and the ventral tegmental area indeed evidenced two simultaneous, but separable, prediction error signals corresponding to the two levels of hierarchy in the task. This result suggests that suitably designed tasks may reveal a more intricate pattern of firing in dopaminergic neurons. Moreover, the need for downstream separation of these signals implies possible limitations on the number of different task levels that we can learn about simultaneously.
Separate Populations of Neurons in Ventral Striatum Encode Value and Motivation
Gentry, Ronny N.; Goldstein, Brandon L.; Hearn, Taylor N.; Barnett, Brian R.; Kashtelyan, Vadim; Roesch, Matthew R.
2013-01-01
Neurons in the ventral striatum (VS) fire to cues that predict differently valued rewards. It is unclear whether this activity represents the value associated with the expected reward or the level of motivation induced by reward anticipation. To distinguish between the two, we trained rats on a task in which we varied value independently from motivation by manipulating the size of the reward expected on correct trials and the threat of punishment expected upon errors. We found that separate populations of neurons in VS encode expected value and motivation. PMID:23724077
Midbrain dopamine neurons signal aversion in a reward-context-dependent manner
Matsumoto, Hideyuki; Tian, Ju; Uchida, Naoshige; Watabe-Uchida, Mitsuko
2016-01-01
Dopamine is thought to regulate learning from appetitive and aversive events. Here we examined how optogenetically-identified dopamine neurons in the lateral ventral tegmental area of mice respond to aversive events in different conditions. In low reward contexts, most dopamine neurons were exclusively inhibited by aversive events, and expectation reduced dopamine neurons’ responses to reward and punishment. When a single odor predicted both reward and punishment, dopamine neurons’ responses to that odor reflected the integrated value of both outcomes. Thus, in low reward contexts, dopamine neurons signal value prediction errors (VPEs) integrating information about both reward and aversion in a common currency. In contrast, in high reward contexts, dopamine neurons acquired a short-latency excitation to aversive events that masked their VPE signaling. Our results demonstrate the importance of considering the contexts to examine the representation in dopamine neurons and uncover different modes of dopamine signaling, each of which may be adaptive for different environments. DOI: http://dx.doi.org/10.7554/eLife.17328.001 PMID:27760002
Prediction error and somatosensory insula activation in women recovered from anorexia nervosa
Frank, Guido K.W.; Collier, Shaleise; Shott, Megan E.; O’Reilly, Randall C.
2016-01-01
Background Previous research in patients with anorexia nervosa showed heightened brain response during a taste reward conditioning task and heightened sensitivity to rewarding and punishing stimuli. Here we tested the hypothesis that individuals recovered from anorexia nervosa would also experience greater brain activation during this task as well as higher sensitivity to salient stimuli than controls. Methods Women recovered from restricting-type anorexia nervosa and healthy control women underwent fMRI during application of a prediction error taste reward learning paradigm. Results Twenty-four women recovered from anorexia nervosa (mean age 30.3 ± 8.1 yr) and 24 control women (mean age 27.4 ± 6.3 yr) took part in this study. The recovered anorexia nervosa group showed greater left posterior insula activation for the prediction error model analysis than the control group (family-wise error– and small volume–corrected p < 0.05). A group × condition analysis found greater posterior insula response in women recovered from anorexia nervosa than controls for unexpected stimulus omission, but not for unexpected receipt. Sensitivity to punishment was elevated in women recovered from anorexia nervosa. Limitations This was a cross-sectional study, and the sample size was modest. Conclusion Anorexia nervosa after recovery is associated with heightened prediction error–related brain response in the posterior insula as well as greater response to unexpected reward stimulus omission. This finding, together with behaviourally increased sensitivity to punishment, could indicate that individuals recovered from anorexia nervosa are particularly responsive to punishment. The posterior insula processes somatosensory stimuli, including unexpected bodily states, and greater response could indicate altered perception or integration of unexpected or maybe unwanted bodily feelings. Whether those findings develop during the ill state or whether they are biological traits requires further study. PMID:26836623
Model-based predictions for dopamine.
Langdon, Angela J; Sharpe, Melissa J; Schoenbaum, Geoffrey; Niv, Yael
2018-04-01
Phasic dopamine responses are thought to encode a prediction-error signal consistent with model-free reinforcement learning theories. However, a number of recent findings highlight the influence of model-based computations on dopamine responses, and suggest that dopamine prediction errors reflect more dimensions of an expected outcome than scalar reward value. Here, we review a selection of these recent results and discuss the implications and complications of model-based predictions for computational theories of dopamine and learning. Copyright © 2017. Published by Elsevier Ltd.
Tsai, Karin; Wallis, Jonathan; Botvinick, Matthew
2013-01-01
Studies suggest that dopaminergic neurons report a unitary, global reward prediction error signal. However, learning in complex real-life tasks, in particular tasks that show hierarchical structure, requires multiple prediction errors that may coincide in time. We used functional neuroimaging to measure prediction error signals in humans performing such a hierarchical task involving simultaneous, uncorrelated prediction errors. Analysis of signals in a priori anatomical regions of interest in the ventral striatum and the ventral tegmental area indeed evidenced two simultaneous, but separable, prediction error signals corresponding to the two levels of hierarchy in the task. This result suggests that suitably designed tasks may reveal a more intricate pattern of firing in dopaminergic neurons. Moreover, the need for downstream separation of these signals implies possible limitations on the number of different task levels that we can learn about simultaneously. PMID:23536092
Reward Processing, Neuroeconomics, and Psychopathology.
Zald, David H; Treadway, Michael T
2017-05-08
Abnormal reward processing is a prominent transdiagnostic feature of psychopathology. The present review provides a framework for considering the different aspects of reward processing and their assessment, and highlights recent insights from the field of neuroeconomics that may aid in understanding these processes. Although altered reward processing in psychopathology has often been treated as a general hypo- or hyperresponsivity to reward, increasing data indicate that a comprehensive understanding of reward dysfunction requires characterization within more specific reward-processing domains, including subjective valuation, discounting, hedonics, reward anticipation and facilitation, and reinforcement learning. As such, more nuanced models of the nature of these abnormalities are needed. We describe several processing abnormalities capable of producing the types of selective alterations in reward-related behavior observed in different forms of psychopathology, including (mal)adaptive scaling and anchoring, dysfunctional weighting of reward and cost variables, competition between valuation systems, and reward prediction error signaling.
An Integrative Perspective on the Role of Dopamine in Schizophrenia.
Maia, Tiago V; Frank, Michael J
2017-01-01
We propose that schizophrenia involves a combination of decreased phasic dopamine responses for relevant stimuli and increased spontaneous phasic dopamine release. Using insights from computational reinforcement-learning models and basic-science studies of the dopamine system, we show that each of these two disturbances contributes to a specific symptom domain and explains a large set of experimental findings associated with that domain. Reduced phasic responses for relevant stimuli help to explain negative symptoms and provide a unified explanation for the following experimental findings in schizophrenia, most of which have been shown to correlate with negative symptoms: reduced learning from rewards; blunted activation of the ventral striatum, midbrain, and other limbic regions for rewards and positive prediction errors; blunted activation of the ventral striatum during reward anticipation; blunted autonomic responding for relevant stimuli; blunted neural activation for aversive outcomes and aversive prediction errors; reduced willingness to expend effort for rewards; and psychomotor slowing. Increased spontaneous phasic dopamine release helps to explain positive symptoms and provides a unified explanation for the following experimental findings in schizophrenia, most of which have been shown to correlate with positive symptoms: aberrant learning for neutral cues (assessed with behavioral and autonomic responses), and aberrant, increased activation of the ventral striatum, midbrain, and other limbic regions for neutral cues, neutral outcomes, and neutral prediction errors. Taken together, then, these two disturbances explain many findings in schizophrenia. We review evidence supporting their co-occurrence and consider their differential implications for the treatment of positive and negative symptoms. Copyright © 2016 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Schönberg, Tom; Daw, Nathaniel D; Joel, Daphna; O'Doherty, John P
2007-11-21
The computational framework of reinforcement learning has been used to forward our understanding of the neural mechanisms underlying reward learning and decision-making behavior. It is known that humans vary widely in their performance in decision-making tasks. Here, we used a simple four-armed bandit task in which subjects are almost evenly split into two groups on the basis of their performance: those who do learn to favor choice of the optimal action and those who do not. Using models of reinforcement learning we sought to determine the neural basis of these intrinsic differences in performance by scanning both groups with functional magnetic resonance imaging. We scanned 29 subjects while they performed the reward-based decision-making task. Our results suggest that these two groups differ markedly in the degree to which reinforcement learning signals in the striatum are engaged during task performance. While the learners showed robust prediction error signals in both the ventral and dorsal striatum during learning, the nonlearner group showed a marked absence of such signals. Moreover, the magnitude of prediction error signals in a region of dorsal striatum correlated significantly with a measure of behavioral performance across all subjects. These findings support a crucial role of prediction error signals, likely originating from dopaminergic midbrain neurons, in enabling learning of action selection preferences on the basis of obtained rewards. Thus, spontaneously observed individual differences in decision making performance demonstrate the suggested dependence of this type of learning on the functional integrity of the dopaminergic striatal system in humans.
Ellwood, Ian T.; Patel, Tosha; Wadia, Varun; Lee, Anthony T.; Liptak, Alayna T.
2017-01-01
Dopamine neurons in the ventral tegmental area (VTA) encode reward prediction errors and can drive reinforcement learning through their projections to striatum, but much less is known about their projections to prefrontal cortex (PFC). Here, we studied these projections and observed phasic VTA–PFC fiber photometry signals after the delivery of rewards. Next, we studied how optogenetic stimulation of these projections affects behavior using conditioned place preference and a task in which mice learn associations between cues and food rewards and then use those associations to make choices. Neither phasic nor tonic stimulation of dopaminergic VTA–PFC projections elicited place preference. Furthermore, substituting phasic VTA–PFC stimulation for food rewards was not sufficient to reinforce new cue–reward associations nor maintain previously learned ones. However, the same patterns of stimulation that failed to reinforce place preference or cue–reward associations were able to modify behavior in other ways. First, continuous tonic stimulation maintained previously learned cue–reward associations even after they ceased being valid. Second, delivering phasic stimulation either continuously or after choices not previously associated with reward induced mice to make choices that deviated from previously learned associations. In summary, despite the fact that dopaminergic VTA–PFC projections exhibit phasic increases in activity that are time locked to the delivery of rewards, phasic activation of these projections does not necessarily reinforce specific actions. Rather, dopaminergic VTA–PFC activity can control whether mice maintain or deviate from previously learned cue–reward associations. SIGNIFICANCE STATEMENT Dopaminergic inputs from ventral tegmental area (VTA) to striatum encode reward prediction errors and reinforce specific actions; however, it is currently unknown whether dopaminergic inputs to prefrontal cortex (PFC) play similar or distinct roles. Here, we used bulk Ca2+ imaging to show that unexpected rewards or reward-predicting cues elicit phasic increases in the activity of dopaminergic VTA–PFC fibers. However, in multiple behavioral paradigms, we failed to observe reinforcing effects after stimulation of these fibers. In these same experiments, we did find that tonic or phasic patterns of stimulation caused mice to maintain or deviate from previously learned cue–reward associations, respectively. Therefore, although they may exhibit similar patterns of activity, dopaminergic inputs to striatum and PFC can elicit divergent behavioral effects. PMID:28739583
Error-related negativities elicited by monetary loss and cues that predict loss.
Dunning, Jonathan P; Hajcak, Greg
2007-11-19
Event-related potential studies have reported error-related negativity following both error commission and feedback indicating errors or monetary loss. The present study examined whether error-related negativities could be elicited by a predictive cue presented prior to both the decision and subsequent feedback in a gambling task. Participants were presented with a cue that indicated the probability of reward on the upcoming trial (0, 50, and 100%). Results showed a negative deflection in the event-related potential in response to loss cues compared with win cues; this waveform shared a similar latency and morphology with the traditional feedback error-related negativity.
Neural substrates of updating the prediction through prediction error during decision making.
Wang, Ying; Ma, Ning; He, Xiaosong; Li, Nan; Wei, Zhengde; Yang, Lizhuang; Zha, Rujing; Han, Long; Li, Xiaoming; Zhang, Daren; Liu, Ying; Zhang, Xiaochu
2017-08-15
Learning of prediction error (PE), including reward PE and risk PE, is crucial for updating the prediction in reinforcement learning (RL). Neurobiological and computational models of RL have reported extensive brain activations related to PE. However, the occurrence of PE does not necessarily predict updating the prediction, e.g., in a probability-known event. Therefore, the brain regions specifically engaged in updating the prediction remain unknown. Here, we conducted two functional magnetic resonance imaging (fMRI) experiments, the probability-unknown Iowa Gambling Task (IGT) and the probability-known risk decision task (RDT). Behavioral analyses confirmed that PEs occurred in both tasks but were only used for updating the prediction in the IGT. By comparing PE-related brain activations between the two tasks, we found that the rostral anterior cingulate cortex/ventral medial prefrontal cortex (rACC/vmPFC) and the posterior cingulate cortex (PCC) activated only during the IGT and were related to both reward and risk PE. Moreover, the responses in the rACC/vmPFC and the PCC were modulated by uncertainty and were associated with reward prediction-related brain regions. Electric brain stimulation over these regions lowered the performance in the IGT but not in the RDT. Our findings of a distributed neural circuit of PE processing suggest that the rACC/vmPFC and the PCC play a key role in updating the prediction through PE processing during decision making. Copyright © 2017 Elsevier Inc. All rights reserved.
Reward Processing, Neuroeconomics, and Psychopathology
Zald, David H.; Treadway, Michael
2018-01-01
Abnormal reward processing is a prominent transdiagnostic feature of psychopathology. The present review provides a framework for considering the different aspects of reward processing and their assessment and highlight recent insights from the field of neuroeconomics that may aid in understanding these processes. Although altered reward processing in psychopathology has often been treated as a general hypo- or hyper-responsivity to reward, increasing data indicate that a comprehensive understanding of reward dysfunction requires characterization within more specific reward processing domains, including subjective valuation, discounting, hedonics, reward anticipation and facilitation, and reinforcement learning. As such, more nuanced models of the nature of these abnormalities are needed. We describe several processing abnormalities capable of producing the types of selective alterations in reward related behavior observed in different forms of psychopathology, including (mal)adaptive scaling and anchoring, dysfunctional weighting of reward and cost variables, completion between valuation systems, and positive prediction error signaling. PMID:28301764
Neuroscientific Model of Motivational Process
Kim, Sung-il
2013-01-01
Considering the neuroscientific findings on reward, learning, value, decision-making, and cognitive control, motivation can be parsed into three sub processes, a process of generating motivation, a process of maintaining motivation, and a process of regulating motivation. I propose a tentative neuroscientific model of motivational processes which consists of three distinct but continuous sub processes, namely reward-driven approach, value-based decision-making, and goal-directed control. Reward-driven approach is the process in which motivation is generated by reward anticipation and selective approach behaviors toward reward. This process recruits the ventral striatum (reward area) in which basic stimulus-action association is formed, and is classified as an automatic motivation to which relatively less attention is assigned. By contrast, value-based decision-making is the process of evaluating various outcomes of actions, learning through positive prediction error, and calculating the value continuously. The striatum and the orbitofrontal cortex (valuation area) play crucial roles in sustaining motivation. Lastly, the goal-directed control is the process of regulating motivation through cognitive control to achieve goals. This consciously controlled motivation is associated with higher-level cognitive functions such as planning, retaining the goal, monitoring the performance, and regulating action. The anterior cingulate cortex (attention area) and the dorsolateral prefrontal cortex (cognitive control area) are the main neural circuits related to regulation of motivation. These three sub processes interact with each other by sending reward prediction error signals through dopaminergic pathway from the striatum and to the prefrontal cortex. The neuroscientific model of motivational process suggests several educational implications with regard to the generation, maintenance, and regulation of motivation to learn in the learning environment. PMID:23459598
Neuroscientific model of motivational process.
Kim, Sung-Il
2013-01-01
Considering the neuroscientific findings on reward, learning, value, decision-making, and cognitive control, motivation can be parsed into three sub processes, a process of generating motivation, a process of maintaining motivation, and a process of regulating motivation. I propose a tentative neuroscientific model of motivational processes which consists of three distinct but continuous sub processes, namely reward-driven approach, value-based decision-making, and goal-directed control. Reward-driven approach is the process in which motivation is generated by reward anticipation and selective approach behaviors toward reward. This process recruits the ventral striatum (reward area) in which basic stimulus-action association is formed, and is classified as an automatic motivation to which relatively less attention is assigned. By contrast, value-based decision-making is the process of evaluating various outcomes of actions, learning through positive prediction error, and calculating the value continuously. The striatum and the orbitofrontal cortex (valuation area) play crucial roles in sustaining motivation. Lastly, the goal-directed control is the process of regulating motivation through cognitive control to achieve goals. This consciously controlled motivation is associated with higher-level cognitive functions such as planning, retaining the goal, monitoring the performance, and regulating action. The anterior cingulate cortex (attention area) and the dorsolateral prefrontal cortex (cognitive control area) are the main neural circuits related to regulation of motivation. These three sub processes interact with each other by sending reward prediction error signals through dopaminergic pathway from the striatum and to the prefrontal cortex. The neuroscientific model of motivational process suggests several educational implications with regard to the generation, maintenance, and regulation of motivation to learn in the learning environment.
Stimulus-Response-Outcome Coding in the Pigeon Nidopallium Caudolaterale
Starosta, Sarah; Güntürkün, Onur; Stüttgen, Maik C.
2013-01-01
A prerequisite for adaptive goal-directed behavior is that animals constantly evaluate action outcomes and relate them to both their antecedent behavior and to stimuli predictive of reward or non-reward. Here, we investigate whether single neurons in the avian nidopallium caudolaterale (NCL), a multimodal associative forebrain structure and a presumed analogue of mammalian prefrontal cortex, represent information useful for goal-directed behavior. We subjected pigeons to a go-nogo task, in which responding to one visual stimulus (S+) was partially reinforced, responding to another stimulus (S–) was punished, and responding to test stimuli from the same physical dimension (spatial frequency) was inconsequential. The birds responded most intensely to S+, and their response rates decreased monotonically as stimuli became progressively dissimilar to S+; thereby, response rates provided a behavioral index of reward expectancy. We found that many NCL neurons' responses were modulated in the stimulus discrimination phase, the outcome phase, or both. A substantial fraction of neurons increased firing for cues predicting non-reward or decreased firing for cues predicting reward. Interestingly, the same neurons also responded when reward was expected but not delivered, and could thus provide a negative reward prediction error or, alternatively, signal negative value. In addition, many cells showed motor-related response modulation. In summary, NCL neurons represent information about the reward value of specific stimuli, instrumental actions as well as action outcomes, and therefore provide signals useful for adaptive behavior in dynamically changing environments. PMID:23437383
Adaptive scaling of reward in episodic memory: a replication study.
Mason, Alice; Ludwig, Casimir; Farrell, Simon
2017-11-01
Reward is thought to enhance episodic memory formation via dopaminergic consolidation. Bunzeck, Dayan, Dolan, and Duzel [(2010). A common mechanism for adaptive scaling of reward and novelty. Human Brain Mapping, 31, 1380-1394] provided functional magnetic resonance imaging (fMRI) and behavioural evidence that reward and episodic memory systems are sensitive to the contextual value of a reward-whether it is relatively higher or lower-as opposed to absolute value or prediction error. We carried out a direct replication of their behavioural study and did not replicate their finding that memory performance associated with reward follows this pattern of adaptive scaling. An effect of reward outcome was in the opposite direction to that in the original study, with lower reward outcomes leading to better memory than higher outcomes. There was a marginal effect of reward context, suggesting that expected value affected memory performance. We discuss the robustness of the reward memory relationship to variations in reward context, and whether other reward-related factors have a more reliable influence on episodic memory.
Colas, Jaron T; Pauli, Wolfgang M; Larsen, Tobias; Tyszka, J Michael; O'Doherty, John P
2017-10-01
Prediction-error signals consistent with formal models of "reinforcement learning" (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well determined. Here, we created a novel paradigm optimized to dissociate the subtypes of reward-prediction errors that function as the key computational signatures of two distinct classes of RL models-namely, "actor/critic" models and action-value-learning models (e.g., the Q-learning model). The state-value-prediction error (SVPE), which is independent of actions, is a hallmark of the actor/critic architecture, whereas the action-value-prediction error (AVPE) is the distinguishing feature of action-value-learning algorithms. To test for the presence of these prediction-error signals in the brain, we scanned human participants with a high-resolution functional magnetic-resonance imaging (fMRI) protocol optimized to enable measurement of neural activity in the dopaminergic midbrain as well as the striatal areas to which it projects. In keeping with the actor/critic model, the SVPE signal was detected in the substantia nigra. The SVPE was also clearly present in both the ventral striatum and the dorsal striatum. However, alongside these purely state-value-based computations we also found evidence for AVPE signals throughout the striatum. These high-resolution fMRI findings suggest that model-free aspects of reward learning in humans can be explained algorithmically with RL in terms of an actor/critic mechanism operating in parallel with a system for more direct action-value learning.
Pauli, Wolfgang M.; Larsen, Tobias; Tyszka, J. Michael; O’Doherty, John P.
2017-01-01
Prediction-error signals consistent with formal models of “reinforcement learning” (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well determined. Here, we created a novel paradigm optimized to dissociate the subtypes of reward-prediction errors that function as the key computational signatures of two distinct classes of RL models—namely, “actor/critic” models and action-value-learning models (e.g., the Q-learning model). The state-value-prediction error (SVPE), which is independent of actions, is a hallmark of the actor/critic architecture, whereas the action-value-prediction error (AVPE) is the distinguishing feature of action-value-learning algorithms. To test for the presence of these prediction-error signals in the brain, we scanned human participants with a high-resolution functional magnetic-resonance imaging (fMRI) protocol optimized to enable measurement of neural activity in the dopaminergic midbrain as well as the striatal areas to which it projects. In keeping with the actor/critic model, the SVPE signal was detected in the substantia nigra. The SVPE was also clearly present in both the ventral striatum and the dorsal striatum. However, alongside these purely state-value-based computations we also found evidence for AVPE signals throughout the striatum. These high-resolution fMRI findings suggest that model-free aspects of reward learning in humans can be explained algorithmically with RL in terms of an actor/critic mechanism operating in parallel with a system for more direct action-value learning. PMID:29049406
Model-based learning and the contribution of the orbitofrontal cortex to the model-free world.
McDannald, Michael A; Takahashi, Yuji K; Lopatina, Nina; Pietras, Brad W; Jones, Josh L; Schoenbaum, Geoffrey
2012-04-01
Learning is proposed to occur when there is a discrepancy between reward prediction and reward receipt. At least two separate systems are thought to exist: one in which predictions are proposed to be based on model-free or cached values; and another in which predictions are model-based. A basic neural circuit for model-free reinforcement learning has already been described. In the model-free circuit the ventral striatum (VS) is thought to supply a common-currency reward prediction to midbrain dopamine neurons that compute prediction errors and drive learning. In a model-based system, predictions can include more information about an expected reward, such as its sensory attributes or current, unique value. This detailed prediction allows for both behavioral flexibility and learning driven by changes in sensory features of rewards alone. Recent evidence from animal learning and human imaging suggests that, in addition to model-free information, the VS also signals model-based information. Further, there is evidence that the orbitofrontal cortex (OFC) signals model-based information. Here we review these data and suggest that the OFC provides model-based information to this traditional model-free circuitry and offer possibilities as to how this interaction might occur. © 2012 The Authors. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.
Tan, Can Ozan; Bullock, Daniel
2008-10-01
Recently, dopamine (DA) neurons of the substantia nigra pars compacta (SNc) were found to exhibit sustained responses related to reward uncertainty, in addition to the phasic responses related to reward-prediction errors (RPEs). Thus, cue-dependent anticipations of the timing, magnitude, and uncertainty of rewards are learned and reflected in components of DA signals. Here we simulate a local circuit model to show how learned uncertainty responses are generated, along with phasic RPE responses, on single trials. Both types of simulated DA responses exhibit the empirically observed dependencies on conditional probability, expected value of reward, and time since onset of the reward-predicting cue. The model's three major pathways compute expected values of cues, timed predictions of reward magnitudes, and uncertainties associated with these predictions. The first two pathways' computations refine those modeled by Brown et al. (1999). The third, newly modeled, pathway involves medium spiny projection neurons (MSPNs) of the striatal matrix, whose axons corelease GABA and substance P, both at synapses with GABAergic neurons in the substantia nigra pars reticulata (SNr) and with distal dendrites (in SNr) of DA neurons whose somas are located in ventral SNc. Corelease enables efficient computation of uncertainty responses that are a nonmonotonic function of the conditional probability of reward, and variability in striatal cholinergic transmission can explain observed individual differences in the amplitudes of uncertainty responses. The involvement of matricial MSPNs and cholinergic transmission within the striatum implies a relation between uncertainty in cue-reward contingencies and action-selection functions of the basal ganglia.
Larsen, Tobias; Collette, Sven; Tyszka, Julian M.; Seymour, Ben; O'Doherty, John P.
2015-01-01
The role of neurons in the substantia nigra (SN) and ventral tegmental area (VTA) of the midbrain in contributing to the elicitation of reward prediction errors during appetitive learning has been well established. Less is known about the differential contribution of these midbrain regions to appetitive versus aversive learning, especially in humans. Here we scanned human participants with high-resolution fMRI focused on the SN and VTA while they participated in a sequential Pavlovian conditioning paradigm involving an appetitive outcome (a pleasant juice), as well as an aversive outcome (an unpleasant bitter and salty flavor). We found a degree of regional specialization within the SN: Whereas a region of ventromedial SN correlated with a temporal difference reward prediction error during appetitive Pavlovian learning, a dorsolateral area correlated instead with an aversive expected value signal in response to the most distal cue, and to a reward prediction error in response to the most proximal cue to the aversive outcome. Furthermore, participants' affective reactions to both the appetitive and aversive conditioned stimuli more than 1 year after the fMRI experiment was conducted correlated with activation in the ventromedial and dorsolateral SN obtained during the experiment, respectively. These findings suggest that, whereas the human ventromedial SN contributes to long-term learning about rewards, the dorsolateral SN may be particularly important for long-term learning in aversive contexts. SIGNIFICANCE STATEMENT The role of the substantia nigra (SN) and ventral tegmental area (VTA) in appetitive learning is well established, but less is known about their contribution to aversive compared with appetitive learning, especially in humans. We used high-resolution fMRI to measure activity in the SN and VTA while participants underwent higher-order Pavlovian learning. We found a regional specialization within the SN: a ventromedial area was selectively engaged during appetitive learning, and a dorsolateral area during aversive learning. Activity in these areas predicted affective reactions to appetitive and aversive conditioned stimuli over 1 year later. These findings suggest that, whereas the human ventromedial SN contributes to long-term learning about rewards, the dorsolateral SN may be particularly important for long-term learning in aversive contexts. PMID:26490862
Reward-based training of recurrent neural networks for cognitive and value-based tasks
Song, H Francis; Yang, Guangyu R; Wang, Xiao-Jing
2017-01-01
Trained neural network models, which exhibit features of neural activity recorded from behaving animals, may provide insights into the circuit mechanisms of cognitive functions through systematic analysis of network activity and connectivity. However, in contrast to the graded error signals commonly used to train networks through supervised learning, animals learn from reward feedback on definite actions through reinforcement learning. Reward maximization is particularly relevant when optimal behavior depends on an animal’s internal judgment of confidence or subjective preferences. Here, we implement reward-based training of recurrent neural networks in which a value network guides learning by using the activity of the decision network to predict future reward. We show that such models capture behavioral and electrophysiological findings from well-known experimental paradigms. Our work provides a unified framework for investigating diverse cognitive and value-based computations, and predicts a role for value representation that is essential for learning, but not executing, a task. DOI: http://dx.doi.org/10.7554/eLife.21492.001 PMID:28084991
Putting reward in art: A tentative prediction error account of visual art
Van de Cruys, Sander; Wagemans, Johan
2011-01-01
The predictive coding model is increasingly and fruitfully used to explain a wide range of findings in perception. Here we discuss the potential of this model in explaining the mechanisms underlying aesthetic experiences. Traditionally art appreciation has been associated with concepts such as harmony, perceptual fluency, and the so-called good Gestalt. We observe that more often than not great artworks blatantly violate these characteristics. Using the concept of prediction error from the predictive coding approach, we attempt to resolve this contradiction. We argue that artists often destroy predictions that they have first carefully built up in their viewers, and thus highlight the importance of negative affect in aesthetic experience. However, the viewer often succeeds in recovering the predictable pattern, sometimes on a different level. The ensuing rewarding effect is derived from this transition from a state of uncertainty to a state of increased predictability. We illustrate our account with several example paintings and with a discussion of art movements and individual differences in preference. On a more fundamental level, our theorizing leads us to consider the affective implications of prediction confirmation and violation. We compare our proposal to other influential theories on aesthetics and explore its advantages and limitations. PMID:23145260
Marsden, Karen E; Ma, Wei Ji; Deci, Edward L; Ryan, Richard M; Chiu, Pearl H
2015-06-01
The duration and quality of human performance depend on both intrinsic motivation and external incentives. However, little is known about the neuroscientific basis of this interplay between internal and external motivators. Here, we used functional magnetic resonance imaging to examine the neural substrates of intrinsic motivation, operationalized as the free-choice time spent on a task when this was not required, and tested the neural and behavioral effects of external reward on intrinsic motivation. We found that increased duration of free-choice time was predicted by generally diminished neural responses in regions associated with cognitive and affective regulation. By comparison, the possibility of additional reward improved task accuracy, and specifically increased neural and behavioral responses following errors. Those individuals with the smallest neural responses associated with intrinsic motivation exhibited the greatest error-related neural enhancement under the external contingency of possible reward. Together, these data suggest that human performance is guided by a "tonic" and "phasic" relationship between the neural substrates of intrinsic motivation (tonic) and the impact of external incentives (phasic).
Failure analysis and modeling of a multicomputer system. M.S. Thesis
NASA Technical Reports Server (NTRS)
Subramani, Sujatha Srinivasan
1990-01-01
This thesis describes the results of an extensive measurement-based analysis of real error data collected from a 7-machine DEC VaxCluster multicomputer system. In addition to evaluating basic system error and failure characteristics, we develop reward models to analyze the impact of failures and errors on the system. The results show that, although 98 percent of errors in the shared resources recover, they result in 48 percent of all system failures. The analysis of rewards shows that the expected reward rate for the VaxCluster decreases to 0.5 in 100 days for a 3 out of 7 model, which is well over a 100 times that for a 7-out-of-7 model. A comparison of the reward rates for a range of k-out-of-n models indicates that the maximum increase in reward rate (0.25) occurs in going from the 6-out-of-7 model to the 5-out-of-7 model. The analysis also shows that software errors have the lowest reward (0.2 vs. 0.91 for network errors). The large loss in reward rate for software errors is due to the fact that a large proportion (94 percent) of software errors lead to failure. In comparison, the high reward rate for network errors is due to fast recovery from a majority of these errors (median recovery duration is 0 seconds).
Knowledge acquisition is governed by striatal prediction errors.
Pine, Alex; Sadeh, Noa; Ben-Yakov, Aya; Dudai, Yadin; Mendelsohn, Avi
2018-04-26
Discrepancies between expectations and outcomes, or prediction errors, are central to trial-and-error learning based on reward and punishment, and their neurobiological basis is well characterized. It is not known, however, whether the same principles apply to declarative memory systems, such as those supporting semantic learning. Here, we demonstrate with fMRI that the brain parametrically encodes the degree to which new factual information violates expectations based on prior knowledge and beliefs-most prominently in the ventral striatum, and cortical regions supporting declarative memory encoding. These semantic prediction errors determine the extent to which information is incorporated into long-term memory, such that learning is superior when incoming information counters strong incorrect recollections, thereby eliciting large prediction errors. Paradoxically, by the same account, strong accurate recollections are more amenable to being supplanted by misinformation, engendering false memories. These findings highlight a commonality in brain mechanisms and computational rules that govern declarative and nondeclarative learning, traditionally deemed dissociable.
Kishida, Kenneth T.; Saez, Ignacio; Lohrenz, Terry; Witcher, Mark R.; Laxton, Adrian W.; Tatter, Stephen B.; White, Jason P.; Ellis, Thomas L.; Phillips, Paul E. M.; Montague, P. Read
2016-01-01
In the mammalian brain, dopamine is a critical neuromodulator whose actions underlie learning, decision-making, and behavioral control. Degeneration of dopamine neurons causes Parkinson’s disease, whereas dysregulation of dopamine signaling is believed to contribute to psychiatric conditions such as schizophrenia, addiction, and depression. Experiments in animal models suggest the hypothesis that dopamine release in human striatum encodes reward prediction errors (RPEs) (the difference between actual and expected outcomes) during ongoing decision-making. Blood oxygen level-dependent (BOLD) imaging experiments in humans support the idea that RPEs are tracked in the striatum; however, BOLD measurements cannot be used to infer the action of any one specific neurotransmitter. We monitored dopamine levels with subsecond temporal resolution in humans (n = 17) with Parkinson’s disease while they executed a sequential decision-making task. Participants placed bets and experienced monetary gains or losses. Dopamine fluctuations in the striatum fail to encode RPEs, as anticipated by a large body of work in model organisms. Instead, subsecond dopamine fluctuations encode an integration of RPEs with counterfactual prediction errors, the latter defined by how much better or worse the experienced outcome could have been. How dopamine fluctuations combine the actual and counterfactual is unknown. One possibility is that this process is the normal behavior of reward processing dopamine neurons, which previously had not been tested by experiments in animal models. Alternatively, this superposition of error terms may result from an additional yet-to-be-identified subclass of dopamine neurons. PMID:26598677
Kishida, Kenneth T; Saez, Ignacio; Lohrenz, Terry; Witcher, Mark R; Laxton, Adrian W; Tatter, Stephen B; White, Jason P; Ellis, Thomas L; Phillips, Paul E M; Montague, P Read
2016-01-05
In the mammalian brain, dopamine is a critical neuromodulator whose actions underlie learning, decision-making, and behavioral control. Degeneration of dopamine neurons causes Parkinson's disease, whereas dysregulation of dopamine signaling is believed to contribute to psychiatric conditions such as schizophrenia, addiction, and depression. Experiments in animal models suggest the hypothesis that dopamine release in human striatum encodes reward prediction errors (RPEs) (the difference between actual and expected outcomes) during ongoing decision-making. Blood oxygen level-dependent (BOLD) imaging experiments in humans support the idea that RPEs are tracked in the striatum; however, BOLD measurements cannot be used to infer the action of any one specific neurotransmitter. We monitored dopamine levels with subsecond temporal resolution in humans (n = 17) with Parkinson's disease while they executed a sequential decision-making task. Participants placed bets and experienced monetary gains or losses. Dopamine fluctuations in the striatum fail to encode RPEs, as anticipated by a large body of work in model organisms. Instead, subsecond dopamine fluctuations encode an integration of RPEs with counterfactual prediction errors, the latter defined by how much better or worse the experienced outcome could have been. How dopamine fluctuations combine the actual and counterfactual is unknown. One possibility is that this process is the normal behavior of reward processing dopamine neurons, which previously had not been tested by experiments in animal models. Alternatively, this superposition of error terms may result from an additional yet-to-be-identified subclass of dopamine neurons.
Rutledge, Robb B.; Zaehle, Tino; Schmitt, Friedhelm C.; Kopitzki, Klaus; Kowski, Alexander B.; Voges, Jürgen; Heinze, Hans-Jochen; Dolan, Raymond J.
2015-01-01
Functional magnetic resonance imaging (fMRI), cyclic voltammetry, and single-unit electrophysiology studies suggest that signals measured in the nucleus accumbens (Nacc) during value-based decision making represent reward prediction errors (RPEs), the difference between actual and predicted rewards. Here, we studied the precise temporal and spectral pattern of reward-related signals in the human Nacc. We recorded local field potentials (LFPs) from the Nacc of six epilepsy patients during an economic decision-making task. On each trial, patients decided whether to accept or reject a gamble with equal probabilities of a monetary gain or loss. The behavior of four patients was consistent with choices being guided by value expectations. Expected value signals before outcome onset were observed in three of those patients, at varying latencies and with nonoverlapping spectral patterns. Signals after outcome onset were correlated with RPE regressors in all subjects. However, further analysis revealed that these signals were better explained as outcome valence rather than RPE signals, with gamble gains and losses differing in the power of beta oscillations and in evoked response amplitudes. Taken together, our results do not support the idea that postsynaptic potentials in the Nacc represent a RPE that unifies outcome magnitude and prior value expectation. We discuss the generalizability of our findings to healthy individuals and the relation of our results to measurements of RPE signals obtained from the Nacc with other methods. PMID:26019312
Stenner, Max-Philipp; Rutledge, Robb B; Zaehle, Tino; Schmitt, Friedhelm C; Kopitzki, Klaus; Kowski, Alexander B; Voges, Jürgen; Heinze, Hans-Jochen; Dolan, Raymond J
2015-08-01
Functional magnetic resonance imaging (fMRI), cyclic voltammetry, and single-unit electrophysiology studies suggest that signals measured in the nucleus accumbens (Nacc) during value-based decision making represent reward prediction errors (RPEs), the difference between actual and predicted rewards. Here, we studied the precise temporal and spectral pattern of reward-related signals in the human Nacc. We recorded local field potentials (LFPs) from the Nacc of six epilepsy patients during an economic decision-making task. On each trial, patients decided whether to accept or reject a gamble with equal probabilities of a monetary gain or loss. The behavior of four patients was consistent with choices being guided by value expectations. Expected value signals before outcome onset were observed in three of those patients, at varying latencies and with nonoverlapping spectral patterns. Signals after outcome onset were correlated with RPE regressors in all subjects. However, further analysis revealed that these signals were better explained as outcome valence rather than RPE signals, with gamble gains and losses differing in the power of beta oscillations and in evoked response amplitudes. Taken together, our results do not support the idea that postsynaptic potentials in the Nacc represent a RPE that unifies outcome magnitude and prior value expectation. We discuss the generalizability of our findings to healthy individuals and the relation of our results to measurements of RPE signals obtained from the Nacc with other methods. Copyright © 2015 the American Physiological Society.
Dynamic shaping of dopamine signals during probabilistic Pavlovian conditioning.
Hart, Andrew S; Clark, Jeremy J; Phillips, Paul E M
2015-01-01
Cue- and reward-evoked phasic dopamine activity during Pavlovian and operant conditioning paradigms is well correlated with reward-prediction errors from formal reinforcement learning models, which feature teaching signals in the form of discrepancies between actual and expected reward outcomes. Additionally, in learning tasks where conditioned cues probabilistically predict rewards, dopamine neurons show sustained cue-evoked responses that are correlated with the variance of reward and are maximal to cues predicting rewards with a probability of 0.5. Therefore, it has been suggested that sustained dopamine activity after cue presentation encodes the uncertainty of impending reward delivery. In the current study we examined the acquisition and maintenance of these neural correlates using fast-scan cyclic voltammetry in rats implanted with carbon fiber electrodes in the nucleus accumbens core during probabilistic Pavlovian conditioning. The advantage of this technique is that we can sample from the same animal and recording location throughout learning with single trial resolution. We report that dopamine release in the nucleus accumbens core contains correlates of both expected value and variance. A quantitative analysis of these signals throughout learning, and during the ongoing updating process after learning in probabilistic conditions, demonstrates that these correlates are dynamically encoded during these phases. Peak CS-evoked responses are correlated with expected value and predominate during early learning while a variance-correlated sustained CS signal develops during the post-asymptotic updating phase. Copyright © 2014 Elsevier Inc. All rights reserved.
The computational neurobiology of learning and reward.
Daw, Nathaniel D; Doya, Kenji
2006-04-01
Following the suggestion that midbrain dopaminergic neurons encode a signal, known as a 'reward prediction error', used by artificial intelligence algorithms for learning to choose advantageous actions, the study of the neural substrates for reward-based learning has been strongly influenced by computational theories. In recent work, such theories have been increasingly integrated into experimental design and analysis. Such hybrid approaches have offered detailed new insights into the function of a number of brain areas, especially the cortex and basal ganglia. In part this is because these approaches enable the study of neural correlates of subjective factors (such as a participant's beliefs about the reward to be received for performing some action) that the computational theories purport to quantify.
Social learning through prediction error in the brain
NASA Astrophysics Data System (ADS)
Joiner, Jessica; Piva, Matthew; Turrin, Courtney; Chang, Steve W. C.
2017-06-01
Learning about the world is critical to survival and success. In social animals, learning about others is a necessary component of navigating the social world, ultimately contributing to increasing evolutionary fitness. How humans and nonhuman animals represent the internal states and experiences of others has long been a subject of intense interest in the developmental psychology tradition, and, more recently, in studies of learning and decision making involving self and other. In this review, we explore how psychology conceptualizes the process of representing others, and how neuroscience has uncovered correlates of reinforcement learning signals to explore the neural mechanisms underlying social learning from the perspective of representing reward-related information about self and other. In particular, we discuss self-referenced and other-referenced types of reward prediction errors across multiple brain structures that effectively allow reinforcement learning algorithms to mediate social learning. Prediction-based computational principles in the brain may be strikingly conserved between self-referenced and other-referenced information.
Daniel, Reka; Pollmann, Stefan
2010-01-06
The dopaminergic system is known to play a central role in reward-based learning (Schultz, 2006), yet it was also observed to be involved when only cognitive feedback is given (Aron et al., 2004). Within the domain of information-integration category learning, in which information from several stimulus dimensions has to be integrated predecisionally (Ashby and Maddox, 2005), the importance of contingent feedback is well established (Maddox et al., 2003). We examined the common neural correlates of reward anticipation and prediction error in this task. Sixteen subjects performed two parallel information-integration tasks within a single event-related functional magnetic resonance imaging session but received a monetary reward only for one of them. Similar functional areas including basal ganglia structures were activated in both task versions. In contrast, a single structure, the nucleus accumbens, showed higher activation during monetary reward anticipation compared with the anticipation of cognitive feedback in information-integration learning. Additionally, this activation was predicted by measures of intrinsic motivation in the cognitive feedback task and by measures of extrinsic motivation in the rewarded task. Our results indicate that, although all other structures implicated in category learning are not significantly affected by altering the type of reward, the nucleus accumbens responds to the positive incentive properties of an expected reward depending on the specific type of the reward.
Davidow, Juliet Y; Foerde, Karin; Galván, Adriana; Shohamy, Daphna
2016-10-05
Adolescents are notorious for engaging in reward-seeking behaviors, a tendency attributed to heightened activity in the brain's reward systems during adolescence. It has been suggested that reward sensitivity in adolescence might be adaptive, but evidence of an adaptive role has been scarce. Using a probabilistic reinforcement learning task combined with reinforcement learning models and fMRI, we found that adolescents showed better reinforcement learning and a stronger link between reinforcement learning and episodic memory for rewarding outcomes. This behavioral benefit was related to heightened prediction error-related BOLD activity in the hippocampus and to stronger functional connectivity between the hippocampus and the striatum at the time of reinforcement. These findings reveal an important role for the hippocampus in reinforcement learning in adolescence and suggest that reward sensitivity in adolescence is related to adaptive differences in how adolescents learn from experience. Copyright © 2016 Elsevier Inc. All rights reserved.
Goal or gold: overlapping reward processes in soccer players upon scoring and winning money.
Häusler, Alexander Niklas; Becker, Benjamin; Bartling, Marcel; Weber, Bernd
2015-01-01
Social rewards are important incentives for human behavior. This is especially true in team sports such as the most popular one worldwide: soccer. We investigated reward processing upon scoring a soccer goal in a standard two-versus-one situation and in comparison to winning in a monetary incentive task. The results show a strong overlap in brain activity between the two conditions in established reward regions of the mesolimbic dopaminergic system, including the ventral striatum and ventromedial pre-frontal cortex. The three main components of reward-associated learning, i.e., reward probability (RP), reward reception (RR) and reward prediction errors (RPE) showed highly similar activation in both con-texts, with only the RR and RPE components displaying overlapping reward activity. Passing and shooting behavior did not correlate with individual egoism scores, but we observe a positive correlation be-tween egoism and activity in the left middle frontal gyrus upon scoring after a pass versus a direct shot. Our findings suggest that rewards in the context of soccer and monetary incentives are based on similar neural processes.
Goal or Gold: Overlapping Reward Processes in Soccer Players upon Scoring and Winning Money
Häusler, Alexander Niklas; Becker, Benjamin; Bartling, Marcel; Weber, Bernd
2015-01-01
Social rewards are important incentives for human behavior. This is especially true in team sports such as the most popular one worldwide: soccer. We investigated reward processing upon scoring a soccer goal in a standard two-versus-one situation and in comparison to winning in a monetary incentive task. The results show a strong overlap in brain activity between the two conditions in established reward regions of the mesolimbic dopaminergic system, including the ventral striatum and ventromedial pre-frontal cortex. The three main components of reward-associated learning i.e. reward probability (RP), reward reception (RR) and reward prediction errors (RPE) showed highly similar activation in both con-texts, with only the RR and RPE components displaying overlapping reward activity. Passing and shooting behavior did not correlate with individual egoism scores, but we observe a positive correlation be-tween egoism and activity in the left middle frontal gyrus upon scoring after a pass versus a direct shot. Our findings suggest that rewards in the context of soccer and monetary incentives are based on similar neural processes. PMID:25875594
Zhu, Lusha; Mathewson, Kyle E.; Hsu, Ming
2012-01-01
Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents’ beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs. PMID:22307594
Zhu, Lusha; Mathewson, Kyle E; Hsu, Ming
2012-01-31
Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents' beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs.
Effects of monetary reward and punishment on information checking behaviour.
Li, Simon Y W; Cox, Anna L; Or, Calvin; Blandford, Ann
2016-03-01
Two experiments were conducted to examine whether checking one's own work can be motivated by monetary reward and punishment. Participants were randomly assigned to one of three conditions: a flat-rate payment for completing the task (Control); payment increased for error-free performance (Reward); payment decreased for error performance (Punishment). Experiment 1 (N = 90) was conducted with liberal arts students, using a general data-entry task. Experiment 2 (N = 90) replicated Experiment 1 with clinical students and a safety-critical 'cover story' for the task. In both studies, Reward and Punishment resulted in significantly fewer errors, more frequent and longer checking, than Control. No such differences were obtained between the Reward and Punishment conditions. It is concluded that error consequences in terms of monetary reward and punishment can result in more accurate task performance and more rigorous checking behaviour than errors without consequences. However, whether punishment is more effective than reward, or vice versa, remains inconclusive. Copyright © 2015 Elsevier Ltd and The Ergonomics Society. All rights reserved.
From prediction error to incentive salience: mesolimbic computation of reward motivation
Berridge, Kent C.
2011-01-01
Reward contains separable psychological components of learning, incentive motivation and pleasure. Most computational models have focused only on the learning component of reward, but the motivational component is equally important in reward circuitry, and even more directly controls behavior. Modeling the motivational component requires recognition of additional control factors besides learning. Here I will discuss how mesocorticolimbic mechanisms generate the motivation component of incentive salience. Incentive salience takes Pavlovian learning and memory as one input and as an equally important input takes neurobiological state factors (e.g., drug states, appetite states, satiety states) that can vary independently of learning. Neurobiological state changes can produce unlearned fluctuations or even reversals in the ability of a previously-learned reward cue to trigger motivation. Such fluctuations in cue-triggered motivation can dramatically depart from all previously learned values about the associated reward outcome. Thus a consequence of the difference between incentive salience and learning can be to decouple cue-triggered motivation of the moment from previously learned values of how good the associated reward has been in the past. Another consequence can be to produce irrationally strong motivation urges that are not justified by any memories of previous reward values (and without distorting associative predictions of future reward value). Such irrationally strong motivation may be especially problematic in addiction. To comprehend these phenomena, future models of mesocorticolimbic reward function should address the neurobiological state factors that participate to control generation of incentive salience. PMID:22487042
Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules
La Camera, Giancarlo; Richmond, Barry J.
2008-01-01
It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as “schedule length effect”). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: “framing,” wherein equivalent options are treated differently depending on the context in which they are presented, and the “sunk cost” effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys. PMID:18688266
Modeling the violation of reward maximization and invariance in reinforcement schedules.
La Camera, Giancarlo; Richmond, Barry J
2008-08-08
It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as "schedule length effect"). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: "framing," wherein equivalent options are treated differently depending on the context in which they are presented, and the "sunk cost" effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys.
Real and hypothetical monetary rewards modulate risk taking in the brain.
Xu, Sihua; Pan, Yu; Wang, You; Spaeth, Andrea M; Qu, Zhe; Rao, Hengyi
2016-07-07
Both real and hypothetical monetary rewards are widely used as reinforcers in risk taking and decision making studies. However, whether real and hypothetical monetary rewards modulate risk taking and decision making in the same manner remains controversial. In this study, we used event-related potentials (ERP) with a balloon analogue risk task (BART) paradigm to examine the effects of real and hypothetical monetary rewards on risk taking in the brain. Behavioral data showed reduced risk taking after negative feedback (money loss) during the BART with real rewards compared to those with hypothetical rewards, suggesting increased loss aversion with real monetary rewards. The ERP data demonstrated a larger feedback-related negativity (FRN) in response to money loss during risk taking with real rewards compared to those with hypothetical rewards, which may reflect greater prediction error or regret emotion after real monetary losses. These findings demonstrate differential effects of real versus hypothetical monetary rewards on risk taking behavior and brain activity, suggesting a caution when drawing conclusions about real choices from hypothetical studies of intended behavior, especially when large rewards are used. The results have implications for future utility of real and hypothetical monetary rewards in studies of risk taking and decision making.
Morita, Kenji; Morishima, Mieko; Sakai, Katsuyuki; Kawaguchi, Yasuo
2013-05-15
Humans and animals take actions quickly when they expect that the actions lead to reward, reflecting their motivation. Injection of dopamine receptor antagonists into the striatum has been shown to slow such reward-seeking behavior, suggesting that dopamine is involved in the control of motivational processes. Meanwhile, neurophysiological studies have revealed that phasic response of dopamine neurons appears to represent reward prediction error, indicating that dopamine plays central roles in reinforcement learning. However, previous attempts to elucidate the mechanisms of these dopaminergic controls have not fully explained how the motivational and learning aspects are related and whether they can be understood by the way the activity of dopamine neurons itself is controlled by their upstream circuitries. To address this issue, we constructed a closed-circuit model of the corticobasal ganglia system based on recent findings regarding intracortical and corticostriatal circuit architectures. Simulations show that the model could reproduce the observed distinct motivational effects of D1- and D2-type dopamine receptor antagonists. Simultaneously, our model successfully explains the dopaminergic representation of reward prediction error as observed in behaving animals during learning tasks and could also explain distinct choice biases induced by optogenetic stimulation of the D1 and D2 receptor-expressing striatal neurons. These results indicate that the suggested roles of dopamine in motivational control and reinforcement learning can be understood in a unified manner through a notion that the indirect pathway of the basal ganglia represents the value of states/actions at a previous time point, an empirically driven key assumption of our model.
Neural Reward and Punishment Sensitivity in Cigarette Smokers
Potts, Geoffrey F.; Bloom, Erika; Evans, David E.; Drobes, David J.
2014-01-01
Background Nicotine addiction remains a major public health problem but the neural substrates of addictive behavior remain unknown. One characteristic of smoking behavior is impulsive choice, selecting the immediate reward of smoking despite the potential long-term negative consequences. This suggests that drug users, including cigarette smokers, may be more sensitive to rewards and less sensitive to punishment. Methods We used event-related potentials (ERPs) to test the hypothesis that smokers are more responsive to reward signals and less responsive to punishment, potentially predisposing them to risky behavior. We conducted two experiments, one using a reward prediction design to elicit a Medial Frontal Negativity (MFN) and one using a reward- and punishment-motivated flanker task to elicit an Error Related Negativity (ERN), ERP components thought to index activity in the cortical projection of the dopaminergic reward system. Results and Conclusions The smokers had a greater MFN response to unpredicted rewards, and non-smokers, but not smokers, had a larger ERN on punishment motivated trials indicating that smokers are more reward sensitive and less punishment sensitive than nonsmokers, overestimating the appetitive value and underestimating aversive outcomes of stimuli and actions. PMID:25292454
Neurocomputational mechanisms of prosocial learning and links to empathy.
Lockwood, Patricia L; Apps, Matthew A J; Valton, Vincent; Viding, Essi; Roiser, Jonathan P
2016-08-30
Reinforcement learning theory powerfully characterizes how we learn to benefit ourselves. In this theory, prediction errors-the difference between a predicted and actual outcome of a choice-drive learning. However, we do not operate in a social vacuum. To behave prosocially we must learn the consequences of our actions for other people. Empathy, the ability to vicariously experience and understand the affect of others, is hypothesized to be a critical facilitator of prosocial behaviors, but the link between empathy and prosocial behavior is still unclear. During functional magnetic resonance imaging (fMRI) participants chose between different stimuli that were probabilistically associated with rewards for themselves (self), another person (prosocial), or no one (control). Using computational modeling, we show that people can learn to obtain rewards for others but do so more slowly than when learning to obtain rewards for themselves. fMRI revealed that activity in a posterior portion of the subgenual anterior cingulate cortex/basal forebrain (sgACC) drives learning only when we are acting in a prosocial context and signals a prosocial prediction error conforming to classical principles of reinforcement learning theory. However, there is also substantial variability in the neural and behavioral efficiency of prosocial learning, which is predicted by trait empathy. More empathic people learn more quickly when benefitting others, and their sgACC response is the most selective for prosocial learning. We thus reveal a computational mechanism driving prosocial learning in humans. This framework could provide insights into atypical prosocial behavior in those with disorders of social cognition.
Treadway, Michael T; Admon, Roee; Arulpragasam, Amanda R; Mehta, Malavika; Douglas, Samuel; Vitaliano, Gordana; Olson, David P; Cooper, Jessica A; Pizzagalli, Diego A
2017-10-15
Stress is widely known to alter behavioral responses to rewards and punishments. It is believed that stress may precipitate these changes through modulation of corticostriatal circuitry involved in reinforcement learning and motivation, although the intervening mechanisms remain unclear. One candidate is inflammation, which can rapidly increase following stress and can disrupt dopamine-dependent reward pathways. Here, in a sample of 88 healthy female participants, we first assessed the effect of an acute laboratory stress paradigm on levels of plasma interleukin-6 (IL-6), a cytokine known to be both responsive to stress and elevated in depression. In a second laboratory session, we examined the effects of a second laboratory stress paradigm on reward prediction error (RPE) signaling in the ventral striatum. We show that individual differences in stress-induced increases in IL-6 (session 1) were associated with decreased ventral striatal RPE signaling during reinforcement learning (session 2), though there was no main effect of stress on RPE. Furthermore, changes in IL-6 following stress predicted intraindividual variability in perceived stress during a 4-month follow-up period. Taken together, these data identify a novel link between IL-6 and striatal RPEs during reinforcement learning in the context of acute psychological stress, as well as future appraisal of stressful life events. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Working Memory Load Strengthens Reward Prediction Errors.
Collins, Anne G E; Ciullo, Brittany; Frank, Michael J; Badre, David
2017-04-19
Reinforcement learning (RL) in simple instrumental tasks is usually modeled as a monolithic process in which reward prediction errors (RPEs) are used to update expected values of choice options. This modeling ignores the different contributions of different memory and decision-making systems thought to contribute even to simple learning. In an fMRI experiment, we investigated how working memory (WM) and incremental RL processes interact to guide human learning. WM load was manipulated by varying the number of stimuli to be learned across blocks. Behavioral results and computational modeling confirmed that learning was best explained as a mixture of two mechanisms: a fast, capacity-limited, and delay-sensitive WM process together with slower RL. Model-based analysis of fMRI data showed that striatum and lateral prefrontal cortex were sensitive to RPE, as shown previously, but, critically, these signals were reduced when the learning problem was within capacity of WM. The degree of this neural interaction related to individual differences in the use of WM to guide behavioral learning. These results indicate that the two systems do not process information independently, but rather interact during learning. SIGNIFICANCE STATEMENT Reinforcement learning (RL) theory has been remarkably productive at improving our understanding of instrumental learning as well as dopaminergic and striatal network function across many mammalian species. However, this neural network is only one contributor to human learning and other mechanisms such as prefrontal cortex working memory also play a key role. Our results also show that these other players interact with the dopaminergic RL system, interfering with its key computation of reward prediction errors. Copyright © 2017 the authors 0270-6474/17/374332-11$15.00/0.
From prediction error to incentive salience: mesolimbic computation of reward motivation.
Berridge, Kent C
2012-04-01
Reward contains separable psychological components of learning, incentive motivation and pleasure. Most computational models have focused only on the learning component of reward, but the motivational component is equally important in reward circuitry, and even more directly controls behavior. Modeling the motivational component requires recognition of additional control factors besides learning. Here I discuss how mesocorticolimbic mechanisms generate the motivation component of incentive salience. Incentive salience takes Pavlovian learning and memory as one input and as an equally important input takes neurobiological state factors (e.g. drug states, appetite states, satiety states) that can vary independently of learning. Neurobiological state changes can produce unlearned fluctuations or even reversals in the ability of a previously learned reward cue to trigger motivation. Such fluctuations in cue-triggered motivation can dramatically depart from all previously learned values about the associated reward outcome. Thus, one consequence of the difference between incentive salience and learning can be to decouple cue-triggered motivation of the moment from previously learned values of how good the associated reward has been in the past. Another consequence can be to produce irrationally strong motivation urges that are not justified by any memories of previous reward values (and without distorting associative predictions of future reward value). Such irrationally strong motivation may be especially problematic in addiction. To understand these phenomena, future models of mesocorticolimbic reward function should address the neurobiological state factors that participate to control generation of incentive salience. © 2012 The Author. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.
Becker, Michael P I; Nitsch, Alexander M; Hewig, Johannes; Miltner, Wolfgang H R; Straube, Thomas
2016-12-01
Several regions of the frontal cortex interact with striatal and amygdala regions to mediate the evaluation of reward-related information and subsequent adjustment of response choices. Recent theories discuss the particular relevance of dorsal anterior cingulate cortex (dACC) for switching behavior; consecutively, ventromedial prefrontal cortex (VMPFC) is involved in mediating exploitative behaviors by tracking reward values unfolding after the behavioral switch. Amygdala, on the other hand, has been implied in coding the valence of stimulus-outcome associations and the ventral striatum (VS) has consistently been shown to code a reward prediction error (RPE). Here, we used fMRI data acquired in humans during a reversal task to parametrically model different sequences of positive feedback in order to unravel differential contributions of these brain regions to the tracking and exploitation of rewards. Parameters from an Optimal Bayesian Learner accurately predicted the divergent involvement of dACC and VMPFC during feedback processing: dACC signaled the first, but not later, presentations of positive feedback, while VMPFC coded trial-by-trial accumulations in reward value. Our results confirm that dACC carries a prominent confirmatory signal during processing of first positive feedback. Amygdala coded positive feedbacks more uniformly, while striatal regions were associated with RPE. Copyright © 2016 Elsevier Inc. All rights reserved.
Prospect theory does not describe the feedback-related negativity value function.
Sambrook, Thomas D; Roser, Matthew; Goslin, Jeremy
2012-12-01
Humans handle uncertainty poorly. Prospect theory accounts for this with a value function in which possible losses are overweighted compared to possible gains, and the marginal utility of rewards decreases with size. fMRI studies have explored the neural basis of this value function. A separate body of research claims that prediction errors are calculated by midbrain dopamine neurons. We investigated whether the prospect theoretic effects shown in behavioral and fMRI studies were present in midbrain prediction error coding by using the feedback-related negativity, an ERP component believed to reflect midbrain prediction errors. Participants' stated satisfaction with outcomes followed prospect theory but their feedback-related negativity did not, instead showing no effect of marginal utility and greater sensitivity to potential gains than losses. Copyright © 2012 Society for Psychophysiological Research.
Cooper, Jessica A.; Gorlick, Marissa A.; Denny, Taylor; Worthy, Darrell A.; Beevers, Christopher G.; Maddox, W. Todd
2013-01-01
Depression is often characterized by attentional biases toward negative items and away from positive items, which likely affects reward and punishment processing. Recent work reported that training attention away from negative stimuli reduced this bias and reduced depressive symptoms. However, the effect of attention training on subsequent learning has yet to be explored. In the current study, participants were required to learn to maximize reward during decision-making. Undergraduates with elevated self-reported depressive symptoms received attention training toward positive stimuli prior to performing the decision-making task (n=20; active training). The active training group was compared to two groups: undergraduates with elevated self-reported depressive symptoms who received placebo training (n=22; placebo training) and control subjects with low levels of depressive symptoms (n=33; non-depressive control). The placebo-training depressive group performed worse and switched between options more than non-depressive controls on the reward maximization task. However, depressives that received active training performed as well as non-depressive controls. Computational modeling indicated that the placebo-trained group learned more from negative than from positive prediction errors, leading to more frequent switching. The non-depressive control and active training depressive groups showed similar learning from positive and negative prediction errors, leading to less frequent switching and better performance. Our results indicate that individuals with elevated depressive symptoms are impaired at reward maximization, but that the deficit can be improved with attention training toward positive stimuli. PMID:24197612
Cooper, Jessica A; Gorlick, Marissa A; Denny, Taylor; Worthy, Darrell A; Beevers, Christopher G; Maddox, W Todd
2014-06-01
Depression is often characterized by attentional biases toward negative items and away from positive items, which likely affects reward and punishment processing. Recent work has reported that training attention away from negative stimuli reduced this bias and reduced depressive symptoms. However, the effect of attention training on subsequent learning has yet to be explored. In the present study, participants were required to learn to maximize reward during decision making. Undergraduates with elevated self-reported depressive symptoms received attention training toward positive stimuli prior to performing the decision-making task (n = 20; active training). The active-training group was compared to two other groups: undergraduates with elevated self-reported depressive symptoms who received placebo training (n = 22; placebo training) and a control group with low levels of depressive symptoms (n = 33; nondepressive control). The placebo-training depressive group performed worse and switched between options more than did the nondepressive controls on the reward maximization task. However, depressives that received active training performed as well as the nondepressive controls. Computational modeling indicated that the placebo-trained group learned more from negative than from positive prediction errors, leading to more frequent switching. The nondepressive control and active-training depressive groups showed similar learning from positive and negative prediction errors, leading to less-frequent switching and better performance. Our results indicate that individuals with elevated depressive symptoms are impaired at reward maximization, but that the deficit can be improved with attention training toward positive stimuli.
People newly in love are more responsive to positive feedback.
Brown, Cassandra L; Beninger, Richard J
2012-06-01
Passionate love is associated with increased activity in dopamine-rich regions of the brain. Increased dopamine in these regions is associated with a greater tendency to learn from reward in trial-and-error learning tasks. This study examined the prediction that individuals who were newly in love would be better at responding to reward (positive feedback). In test trials, people who were newly in love selected positive outcomes significantly more often than their single (not in love) counterparts but were no better at the task overall. This suggests that people who are newly in love show a bias toward responding to positive feedback, which may reflect a general bias towards reward-seeking.
Neural reward and punishment sensitivity in cigarette smokers.
Potts, Geoffrey F; Bloom, Erika L; Evans, David E; Drobes, David J
2014-11-01
Nicotine addiction remains a major public health problem but the neural substrates of addictive behavior remain unknown. One characteristic of smoking behavior is impulsive choice, selecting the immediate reward of smoking despite the potential long-term negative consequences. This suggests that drug users, including cigarette smokers, may be more sensitive to rewards and less sensitive to punishment. We used event-related potentials (ERPs) to test the hypothesis that smokers are more responsive to reward signals and less responsive to punishment, potentially predisposing them to risky behavior. We conducted two experiments, one using a reward prediction design to elicit a Medial Frontal Negativity (MFN) and one using a reward- and punishment-motivated flanker task to elicit an Error Related Negativity (ERN), ERP components thought to index activity in the cortical projection of the dopaminergic reward system. The smokers had a greater MFN response to unpredicted rewards, and non-smokers, but not smokers, had a larger ERN on punishment motivated trials indicating that smokers are more reward sensitive and less punishment sensitive than nonsmokers, overestimating the appetitive value and underestimating aversive outcomes of stimuli and actions. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Cacciapaglia, Fabio; Wightman, R. Mark; Carelli, Regina M.
2015-01-01
Mesolimbic dopamine (DA) is phasically released during appetitive behaviors, though there is substantive disagreement about the specific purpose of these DA signals. For example, prediction error (PE) models suggest a role of learning, while incentive salience (IS) models argue that the DA signal imbues stimuli with value and thereby stimulates motivated behavior. However, within the nucleus accumbens (NAc) patterns of DA release can strikingly differ between subregions, and as such, it is possible that these patterns differentially contribute to aspects of PE and IS. To assess this, we measured DA release in subregions of the NAc during a behavioral task that spatiotemporally separated sequential goal-directed stimuli. Electrochemical methods were used to measure subsecond NAc dopamine release in the core and shell during a well learned instrumental chain schedule in which rats were trained to press one lever (seeking; SL) to gain access to a second lever (taking; TL) linked with food delivery, and again during extinction. In the core, phasic DA release was greatest following initial SL presentation, but minimal for the subsequent TL and reward events. In contrast, phasic shell DA showed robust release at all task events. Signaling decreased between the beginning and end of sessions in the shell, but not core. During extinction, peak DA release in the core showed a graded decrease for the SL and pauses in release during omitted expected rewards, whereas shell DA release decreased predominantly during the TL. These release dynamics suggest parallel DA signals capable of supporting distinct theories of appetitive behavior. SIGNIFICANCE STATEMENT Dopamine signaling in the brain is important for a variety of cognitive functions, such as learning and motivation. Typically, it is assumed that a single dopamine signal is sufficient to support these cognitive functions, though competing theories disagree on how dopamine contributes to reward-based behaviors. Here, we have found that real-time dopamine release within the nucleus accumbens (a primary target of midbrain dopamine neurons) strikingly varies between core and shell subregions. In the core, dopamine dynamics are consistent with learning-based theories (such as reward prediction error) whereas in the shell, dopamine is consistent with motivation-based theories (e.g., incentive salience). These findings demonstrate that dopamine plays multiple and complementary roles based on discrete circuits that help animals optimize rewarding behaviors. PMID:26290234
Effects of monetary reward and punishment on information checking behaviour: An eye-tracking study.
Li, Simon Y W; Cox, Anna L; Or, Calvin; Blandford, Ann
2018-07-01
The aim of the present study was to investigate the effect of error consequence, as reward or punishment, on individuals' checking behaviour following data entry. This study comprised two eye-tracking experiments that replicate and extend the investigation of Li et al. (2016) into the effect of monetary reward and punishment on data-entry performance. The first experiment adopted the same experimental setup as Li et al. (2016) but additionally used an eye tracker. The experiment validated Li et al. (2016) finding that, when compared to no error consequence, both reward and punishment led to improved data-entry performance in terms of reducing errors, and that no performance difference was found between reward and punishment. The second experiment extended the earlier study by associating error consequence to each individual trial by providing immediate performance feedback to participants. It was found that gradual increment (i.e. reward feedback) also led to significantly more accurate performance than no error consequence. It is unclear whether gradual increment is more effective than gradual decrement because of the small sample size tested. However, this study reasserts the effectiveness of reward on data-entry performance. Copyright © 2018 Elsevier Ltd. All rights reserved.
Credit assignment in movement-dependent reinforcement learning
Boggess, Matthew J.; Crossley, Matthew J.; Parvin, Darius; Ivry, Richard B.; Taylor, Jordan A.
2016-01-01
When a person fails to obtain an expected reward from an object in the environment, they face a credit assignment problem: Did the absence of reward reflect an extrinsic property of the environment or an intrinsic error in motor execution? To explore this problem, we modified a popular decision-making task used in studies of reinforcement learning, the two-armed bandit task. We compared a version in which choices were indicated by key presses, the standard response in such tasks, to a version in which the choices were indicated by reaching movements, which affords execution failures. In the key press condition, participants exhibited a strong risk aversion bias; strikingly, this bias reversed in the reaching condition. This result can be explained by a reinforcement model wherein movement errors influence decision-making, either by gating reward prediction errors or by modifying an implicit representation of motor competence. Two further experiments support the gating hypothesis. First, we used a condition in which we provided visual cues indicative of movement errors but informed the participants that trial outcomes were independent of their actual movements. The main result was replicated, indicating that the gating process is independent of participants’ explicit sense of control. Second, individuals with cerebellar degeneration failed to modulate their behavior between the key press and reach conditions, providing converging evidence of an implicit influence of movement error signals on reinforcement learning. These results provide a mechanistically tractable solution to the credit assignment problem. PMID:27247404
Credit assignment in movement-dependent reinforcement learning.
McDougle, Samuel D; Boggess, Matthew J; Crossley, Matthew J; Parvin, Darius; Ivry, Richard B; Taylor, Jordan A
2016-06-14
When a person fails to obtain an expected reward from an object in the environment, they face a credit assignment problem: Did the absence of reward reflect an extrinsic property of the environment or an intrinsic error in motor execution? To explore this problem, we modified a popular decision-making task used in studies of reinforcement learning, the two-armed bandit task. We compared a version in which choices were indicated by key presses, the standard response in such tasks, to a version in which the choices were indicated by reaching movements, which affords execution failures. In the key press condition, participants exhibited a strong risk aversion bias; strikingly, this bias reversed in the reaching condition. This result can be explained by a reinforcement model wherein movement errors influence decision-making, either by gating reward prediction errors or by modifying an implicit representation of motor competence. Two further experiments support the gating hypothesis. First, we used a condition in which we provided visual cues indicative of movement errors but informed the participants that trial outcomes were independent of their actual movements. The main result was replicated, indicating that the gating process is independent of participants' explicit sense of control. Second, individuals with cerebellar degeneration failed to modulate their behavior between the key press and reach conditions, providing converging evidence of an implicit influence of movement error signals on reinforcement learning. These results provide a mechanistically tractable solution to the credit assignment problem.
Explanatory pluralism: An unrewarding prediction error for free energy theorists.
Colombo, Matteo; Wright, Cory
2017-03-01
Courtesy of its free energy formulation, the hierarchical predictive processing theory of the brain (PTB) is often claimed to be a grand unifying theory. To test this claim, we examine a central case: activity of mesocorticolimbic dopaminergic (DA) systems. After reviewing the three most prominent hypotheses of DA activity-the anhedonia, incentive salience, and reward prediction error hypotheses-we conclude that the evidence currently vindicates explanatory pluralism. This vindication implies that the grand unifying claims of advocates of PTB are unwarranted. More generally, we suggest that the form of scientific progress in the cognitive sciences is unlikely to be a single overarching grand unifying theory. Copyright © 2016 Elsevier Inc. All rights reserved.
The Attraction Effect Modulates Reward Prediction Errors and Intertemporal Choices.
Gluth, Sebastian; Hotaling, Jared M; Rieskamp, Jörg
2017-01-11
Classical economic theory contends that the utility of a choice option should be independent of other options. This view is challenged by the attraction effect, in which the relative preference between two options is altered by the addition of a third, asymmetrically dominated option. Here, we leveraged the attraction effect in the context of intertemporal choices to test whether both decisions and reward prediction errors (RPE) in the absence of choice violate the independence of irrelevant alternatives principle. We first demonstrate that intertemporal decision making is prone to the attraction effect in humans. In an independent group of participants, we then investigated how this affects the neural and behavioral valuation of outcomes using a novel intertemporal lottery task and fMRI. Participants' behavioral responses (i.e., satisfaction ratings) were modulated systematically by the attraction effect and this modulation was correlated across participants with the respective change of the RPE signal in the nucleus accumbens. Furthermore, we show that, because exponential and hyperbolic discounting models are unable to account for the attraction effect, recently proposed sequential sampling models might be more appropriate to describe intertemporal choices. Our findings demonstrate for the first time that the attraction effect modulates subjective valuation even in the absence of choice. The findings also challenge the prospect of using neuroscientific methods to measure utility in a context-free manner and have important implications for theories of reinforcement learning and delay discounting. Many theories of value-based decision making assume that people first assess the attractiveness of each option independently of each other and then pick the option with the highest subjective value. The attraction effect, however, shows that adding a new option to a choice set can change the relative value of the existing options, which is a violation of the independence principle. Using an intertemporal choice framework, we tested whether such violations also occur when the brain encodes the difference between expected and received rewards (i.e., the reward prediction error). Our results suggest that neither intertemporal choice nor valuation without choice adhere to the independence principle. Copyright © 2017 the authors 0270-6474/17/370371-12$15.00/0.
Insel, Catherine; Reinen, Jenna; Weber, Jochen; Wager, Tor D; Jarskog, L Fredrik; Shohamy, Daphna; Smith, Edward E
2014-03-01
Schizophrenia is characterized by an abnormal dopamine system, and dopamine blockade is the primary mechanism of antipsychotic treatment. Consistent with the known role of dopamine in reward processing, prior research has demonstrated that patients with schizophrenia exhibit impairments in reward-based learning. However, it remains unknown how treatment with antipsychotic medication impacts the behavioral and neural signatures of reinforcement learning in schizophrenia. The goal of this study was to examine whether antipsychotic medication modulates behavioral and neural responses to prediction error coding during reinforcement learning. Patients with schizophrenia completed a reinforcement learning task while undergoing functional magnetic resonance imaging. The task consisted of two separate conditions in which participants accumulated monetary gain or avoided monetary loss. Behavioral results indicated that antipsychotic medication dose was associated with altered behavioral approaches to learning, such that patients taking higher doses of medication showed increased sensitivity to negative reinforcement. Higher doses of antipsychotic medication were also associated with higher learning rates (LRs), suggesting that medication enhanced sensitivity to trial-by-trial feedback. Neuroimaging data demonstrated that antipsychotic dose was related to differences in neural signatures of feedback prediction error during the loss condition. Specifically, patients taking higher doses of medication showed attenuated prediction error responses in the striatum and the medial prefrontal cortex. These findings indicate that antipsychotic medication treatment may influence motivational processes in patients with schizophrenia.
Multi-layer network utilizing rewarded spike time dependent plasticity to learn a foraging task
2017-01-01
Neural networks with a single plastic layer employing reward modulated spike time dependent plasticity (STDP) are capable of learning simple foraging tasks. Here we demonstrate advanced pattern discrimination and continuous learning in a network of spiking neurons with multiple plastic layers. The network utilized both reward modulated and non-reward modulated STDP and implemented multiple mechanisms for homeostatic regulation of synaptic efficacy, including heterosynaptic plasticity, gain control, output balancing, activity normalization of rewarded STDP and hard limits on synaptic strength. We found that addition of a hidden layer of neurons employing non-rewarded STDP created neurons that responded to the specific combinations of inputs and thus performed basic classification of the input patterns. When combined with a following layer of neurons implementing rewarded STDP, the network was able to learn, despite the absence of labeled training data, discrimination between rewarding patterns and the patterns designated as punishing. Synaptic noise allowed for trial-and-error learning that helped to identify the goal-oriented strategies which were effective in task solving. The study predicts a critical set of properties of the spiking neuronal network with STDP that was sufficient to solve a complex foraging task involving pattern classification and decision making. PMID:28961245
An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning
Potjans, Wiebke; Diesmann, Markus; Morrison, Abigail
2011-01-01
An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards. PMID:21589888
Chudasama, Y; Robbins, Trevor W
2003-09-24
To examine possible heterogeneity of function within the ventral regions of the rodent frontal cortex, the present study compared the effects of excitotoxic lesions of the orbitofrontal cortex (OFC) and the infralimbic cortex (ILC) on pavlovian autoshaping and discrimination reversal learning. During the pavlovian autoshaping task, in which rats learn to approach a stimulus predictive of reward [conditional stimulus (CS+)], only the OFC group failed to acquire discriminated approach but was unimpaired when preoperatively trained. In the visual discrimination learning and reversal task, rats were initially required to discriminate a stimulus positively associated with reward. There was no effect of either OFC or ILC lesions on discrimination learning. When the stimulus-reward contingencies were reversed, both groups of animals committed more errors, but only the OFC-lesioned animals were unable to suppress the previously rewarded stimulus-reward association, committing more "stimulus perseverative" errors. In contrast, the ILC group showed a pattern of errors that was more attributable to "learning" than perseveration. These findings suggest two types of dissociation between the effects of OFC and ILC lesions: (1) OFC lesions impaired the learning processes implicated in pavlovian autoshaping but not instrumental simultaneous discrimination learning, whereas ILC lesions were unimpaired at autoshaping and their reversal learning deficit did not reflect perseveration, and (2) OFC lesions induced perseverative responding in reversal learning but did not disinhibit responses to pavlovian CS-. In contrast, the ILC lesion had no effect on response inhibitory control in either of these settings. The findings are discussed in the context of dissociable executive functions in ventral sectors of the rat prefrontal cortex.
Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex.
Takahashi, Yuji K; Roesch, Matthew R; Wilson, Robert C; Toreson, Kathy; O'Donnell, Patricio; Niv, Yael; Schoenbaum, Geoffrey
2011-10-30
The orbitofrontal cortex has been hypothesized to carry information regarding the value of expected rewards. Such information is essential for associative learning, which relies on comparisons between expected and obtained reward for generating instructive error signals. These error signals are thought to be conveyed by dopamine neurons. To test whether orbitofrontal cortex contributes to these error signals, we recorded from dopamine neurons in orbitofrontal-lesioned rats performing a reward learning task. Lesions caused marked changes in dopaminergic error signaling. However, the effect of lesions was not consistent with a simple loss of information regarding expected value. Instead, without orbitofrontal input, dopaminergic error signals failed to reflect internal information about the impending response that distinguished externally similar states leading to differently valued future rewards. These results are consistent with current conceptualizations of orbitofrontal cortex as supporting model-based behavior and suggest an unexpected role for this information in dopaminergic error signaling.
Probability differently modulating the effects of reward and punishment on visuomotor adaptation.
Song, Yanlong; Smiley-Oyen, Ann L
2017-12-01
Recent human motor learning studies revealed that punishment seemingly accelerated motor learning but reward enhanced consolidation of motor memory. It is not evident how intrinsic properties of reward and punishment modulate the potentially dissociable effects of reward and punishment on motor learning and motor memory. It is also not clear what causes the dissociation of the effects of reward and punishment. By manipulating probability of distribution, a critical property of reward and punishment, the present study demonstrated that probability had distinct modulation on the effects of reward and punishment in adapting to a sudden visual rotation and consolidation of the adaptation memory. Specifically, two probabilities of monetary reward and punishment distribution, 50 and 100%, were applied during young adult participants adapting to a sudden visual rotation. Punishment and reward showed distinct effects on motor adaptation and motor memory. The group that received punishments in 100% of the adaptation trials adapted significantly faster than the other three groups, but the group that received rewards in 100% of the adaptation trials showed marked savings in re-adapting to the same rotation. In addition, the group that received punishments in 50% of the adaptation trials that were randomly selected also had savings in re-adapting to the same rotation. Sensitivity to sensory prediction error or difference in explicit process induced by reward and punishment may likely contribute to the distinct effects of reward and punishment.
A Cerebellar Framework for Predictive Coding and Homeostatic Regulation in Depressive Disorder.
Schutter, Dennis J L G
2016-02-01
Depressive disorder is associated with abnormalities in the processing of reward and punishment signals and disturbances in homeostatic regulation. These abnormalities are proposed to impair error minimization routines for reducing uncertainty. Several lines of research point towards a role of the cerebellum in reward- and punishment-related predictive coding and homeostatic regulatory function in depressive disorder. Available functional and anatomical evidence suggests that in addition to the cortico-limbic networks, the cerebellum is part of the dysfunctional brain circuit in depressive disorder as well. It is proposed that impaired cerebellar function contributes to abnormalities in predictive coding and homeostatic dysregulation in depressive disorder. Further research on the role of the cerebellum in depressive disorder may further extend our knowledge on the functional and neural mechanisms of depressive disorder and development of novel antidepressant treatments strategies targeting the cerebellum.
Berthet, Pierre; Lansner, Anders
2014-01-01
Optogenetic stimulation of specific types of medium spiny neurons (MSNs) in the striatum has been shown to bias the selection of mice in a two choices task. This shift is dependent on the localisation and on the intensity of the stimulation but also on the recent reward history. We have implemented a way to simulate this increased activity produced by the optical flash in our computational model of the basal ganglia (BG). This abstract model features the direct and indirect pathways commonly described in biology, and a reward prediction pathway (RP). The framework is similar to Actor-Critic methods and to the ventral/dorsal distinction in the striatum. We thus investigated the impact on the selection caused by an added stimulation in each of the three pathways. We were able to reproduce in our model the bias in action selection observed in mice. Our results also showed that biasing the reward prediction is sufficient to create a modification in the action selection. However, we had to increase the percentage of trials with stimulation relative to that in experiments in order to impact the selection. We found that increasing only the reward prediction had a different effect if the stimulation in RP was action dependent (only for a specific action) or not. We further looked at the evolution of the change in the weights depending on the stage of learning within a block. A bias in RP impacts the plasticity differently depending on that stage but also on the outcome. It remains to experimentally test how the dopaminergic neurons are affected by specific stimulations of neurons in the striatum and to relate data to predictions of our model.
Planning activity for internally generated reward goals in monkey amygdala neurons
Schultz, Wolfram
2015-01-01
The best rewards are often distant and can only be achieved by planning and decision-making over several steps. We designed a multi-step choice task in which monkeys followed internal plans to save rewards towards self-defined goals. During this self-controlled behavior, amygdala neurons showed future-oriented activity that reflected the animal’s plan to obtain specific rewards several trials ahead. This prospective activity encoded crucial components of the animal’s plan, including value and length of the planned choice sequence. It began on initial trials when a plan would be formed, reappeared step-by-step until reward receipt, and readily updated with a new sequence. It predicted performance, including errors, and typically disappeared during instructed behavior. Such prospective activity could underlie the formation and pursuit of internal plans characteristic for goal-directed behavior. The existence of neuronal planning activity in the amygdala suggests an important role for this structure in guiding behavior towards internally generated, distant goals. PMID:25622146
Dopamine, reward learning, and active inference
FitzGerald, Thomas H. B.; Dolan, Raymond J.; Friston, Karl
2015-01-01
Temporal difference learning models propose phasic dopamine signaling encodes reward prediction errors that drive learning. This is supported by studies where optogenetic stimulation of dopamine neurons can stand in lieu of actual reward. Nevertheless, a large body of data also shows that dopamine is not necessary for learning, and that dopamine depletion primarily affects task performance. We offer a resolution to this paradox based on an hypothesis that dopamine encodes the precision of beliefs about alternative actions, and thus controls the outcome-sensitivity of behavior. We extend an active inference scheme for solving Markov decision processes to include learning, and show that simulated dopamine dynamics strongly resemble those actually observed during instrumental conditioning. Furthermore, simulated dopamine depletion impairs performance but spares learning, while simulated excitation of dopamine neurons drives reward learning, through aberrant inference about outcome states. Our formal approach provides a novel and parsimonious reconciliation of apparently divergent experimental findings. PMID:26581305
Dopamine, reward learning, and active inference.
FitzGerald, Thomas H B; Dolan, Raymond J; Friston, Karl
2015-01-01
Temporal difference learning models propose phasic dopamine signaling encodes reward prediction errors that drive learning. This is supported by studies where optogenetic stimulation of dopamine neurons can stand in lieu of actual reward. Nevertheless, a large body of data also shows that dopamine is not necessary for learning, and that dopamine depletion primarily affects task performance. We offer a resolution to this paradox based on an hypothesis that dopamine encodes the precision of beliefs about alternative actions, and thus controls the outcome-sensitivity of behavior. We extend an active inference scheme for solving Markov decision processes to include learning, and show that simulated dopamine dynamics strongly resemble those actually observed during instrumental conditioning. Furthermore, simulated dopamine depletion impairs performance but spares learning, while simulated excitation of dopamine neurons drives reward learning, through aberrant inference about outcome states. Our formal approach provides a novel and parsimonious reconciliation of apparently divergent experimental findings.
Aberg, Kristoffer Carl; Doell, Kimberly Crystal; Schwartz, Sophie
2016-08-01
Orienting biases refer to consistent, trait-like direction of attention or locomotion toward one side of space. Recent studies suggest that such hemispatial biases may determine how well people memorize information presented in the left or right hemifield. Moreover, lesion studies indicate that learning rewarded stimuli in one hemispace depends on the integrity of the contralateral striatum. However, the exact neural and computational mechanisms underlying the influence of individual orienting biases on reward learning remain unclear. Because reward-based behavioural adaptation depends on the dopaminergic system and prediction error (PE) encoding in the ventral striatum, we hypothesized that hemispheric asymmetries in dopamine (DA) function may determine individual spatial biases in reward learning. To test this prediction, we acquired fMRI in 33 healthy human participants while they performed a lateralized reward task. Learning differences between hemispaces were assessed by presenting stimuli, assigned to different reward probabilities, to the left or right of central fixation, i.e. presented in the left or right visual hemifield. Hemispheric differences in DA function were estimated through differential fMRI responses to positive vs. negative feedback in the left vs. right ventral striatum, and a computational approach was used to identify the neural correlates of PEs. Our results show that spatial biases favoring reward learning in the right (vs. left) hemifield were associated with increased reward responses in the left hemisphere and relatively better neural encoding of PEs for stimuli presented in the right (vs. left) hemifield. These findings demonstrate that trait-like spatial biases implicate hemisphere-specific learning mechanisms, with individual differences between hemispheres contributing to reinforcing spatial biases. Copyright © 2016 Elsevier Ltd. All rights reserved.
2018-01-01
Abstract Dopamine has been suggested to be crucially involved in effort-related choices. Key findings are that dopamine depletion (i) changed preference for a high-cost, large-reward option to a low-cost, small-reward option, (ii) but not when the large-reward option was also low-cost or the small-reward option gave no reward, (iii) while increasing the latency in all the cases but only transiently, and (iv) that antagonism of either dopamine D1 or D2 receptors also specifically impaired selection of the high-cost, large-reward option. The underlying neural circuit mechanisms remain unclear. Here we show that findings i–iii can be explained by the dopaminergic representation of temporal-difference reward-prediction error (TD-RPE), whose mechanisms have now become clarified, if (1) the synaptic strengths storing the values of actions mildly decay in time and (2) the obtained-reward-representing excitatory input to dopamine neurons increases after dopamine depletion. The former is potentially caused by background neural activity–induced weak synaptic plasticity, and the latter is assumed to occur through post-depletion increase of neural activity in the pedunculopontine nucleus, where neurons representing obtained reward exist and presumably send excitatory projections to dopamine neurons. We further show that finding iv, which is nontrivial given the suggested distinct functions of the D1 and D2 corticostriatal pathways, can also be explained if we additionally assume a proposed mechanism of TD-RPE calculation, in which the D1 and D2 pathways encode the values of actions with a temporal difference. These results suggest a possible circuit mechanism for the involvements of dopamine in effort-related choices and, simultaneously, provide implications for the mechanisms of TD-RPE calculation. PMID:29468191
An MEG signature corresponding to an axiomatic model of reward prediction error.
Talmi, Deborah; Fuentemilla, Lluis; Litvak, Vladimir; Duzel, Emrah; Dolan, Raymond J
2012-01-02
Optimal decision-making is guided by evaluating the outcomes of previous decisions. Prediction errors are theoretical teaching signals which integrate two features of an outcome: its inherent value and prior expectation of its occurrence. To uncover the magnetic signature of prediction errors in the human brain we acquired magnetoencephalographic (MEG) data while participants performed a gambling task. Our primary objective was to use formal criteria, based upon an axiomatic model (Caplin and Dean, 2008a), to determine the presence and timing profile of MEG signals that express prediction errors. We report analyses at the sensor level, implemented in SPM8, time locked to outcome onset. We identified, for the first time, a MEG signature of prediction error, which emerged approximately 320 ms after an outcome and expressed as an interaction between outcome valence and probability. This signal followed earlier, separate signals for outcome valence and probability, which emerged approximately 200 ms after an outcome. Strikingly, the time course of the prediction error signal, as well as the early valence signal, resembled the Feedback-Related Negativity (FRN). In simultaneously acquired EEG data we obtained a robust FRN, but the win and loss signals that comprised this difference wave did not comply with the axiomatic model. Our findings motivate an explicit examination of the critical issue of timing embodied in computational models of prediction errors as seen in human electrophysiological data. Copyright © 2011 Elsevier Inc. All rights reserved.
Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning.
Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane
2017-01-01
Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.
Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning
Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane
2017-01-01
Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning. PMID:29163004
Das, Ravi K.; Gale, Grace; Hennessy, Vanessa; Kamboj, Sunjeev K.
2018-01-01
Maladaptive reward memories (MRMs) can become unstable following retrieval under certain conditions, allowing their modification by subsequent new learning. However, robust (well-rehearsed) and chronologically old MRMs, such as those underlying substance use disorders, do not destabilize easily when retrieved. A key determinate of memory destabilization during retrieval is prediction error (PE). We describe a retrieval procedure for alcohol MRMs in hazardous drinkers that specifically aims to maximize the generation of PE and therefore the likelihood of MRM destabilization. The procedure requires explicitly generating the expectancy of alcohol consumption and then violating this expectancy (withholding alcohol) following the presentation of a brief set of prototypical alcohol cue images (retrieval + PE). Control procedures involve presenting the same cue images, but allow alcohol to be consumed, generating minimal PE (retrieval-no PE) or generate PE without retrieval of alcohol MRMs, by presenting orange juice cues (no retrieval + PE). Subsequently, we describe a multisensory disgust-based counterconditioning procedure to probe MRM destabilization by re-writing alcohol cue-reward associations prior to reconsolidation. This procedure pairs alcohol cues with images invoking pathogen disgust and an extremely bitter-tasting solution (denatonium benzoate), generating gustatory disgust. Following retrieval + PE, but not no retrieval + PE or retrieval-no PE, counterconditioning produces evidence of MRM rewriting as indexed by lasting reductions in alcohol cue valuation, attentional capture, and alcohol craving. PMID:29364255
ERIC Educational Resources Information Center
Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias
2011-01-01
Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…
Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model
Jensen, Greg; Muñoz, Fabian; Alkan, Yelda; Ferrera, Vincent P.; Terrace, Herbert S.
2015-01-01
Transitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort, inspired by cognitive processes, which performs transitive inference at low computational cost. This is accomplished by (1) representing stimulus positions along a unit span using beta distributions, (2) treating positive and negative feedback asymmetrically, and (3) updating the position of every stimulus during every trial, whether that stimulus was visible or not. Performance was compared for rhesus macaques, humans, and the betasort algorithm, as well as Q-learning, an established reward-prediction error (RPE) model. Of these, only Q-learning failed to respond above chance during critical test trials. Betasort’s success (when compared to RPE models) and its computational efficiency (when compared to full Markov decision process implementations) suggests that the study of reinforcement learning in organisms will be best served by a feature-driven approach to comparing formal models. PMID:26407227
Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model.
Jensen, Greg; Muñoz, Fabian; Alkan, Yelda; Ferrera, Vincent P; Terrace, Herbert S
2015-01-01
Transitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort, inspired by cognitive processes, which performs transitive inference at low computational cost. This is accomplished by (1) representing stimulus positions along a unit span using beta distributions, (2) treating positive and negative feedback asymmetrically, and (3) updating the position of every stimulus during every trial, whether that stimulus was visible or not. Performance was compared for rhesus macaques, humans, and the betasort algorithm, as well as Q-learning, an established reward-prediction error (RPE) model. Of these, only Q-learning failed to respond above chance during critical test trials. Betasort's success (when compared to RPE models) and its computational efficiency (when compared to full Markov decision process implementations) suggests that the study of reinforcement learning in organisms will be best served by a feature-driven approach to comparing formal models.
Computational approaches to schizophrenia: A perspective on negative symptoms.
Deserno, Lorenz; Heinz, Andreas; Schlagenhauf, Florian
2017-08-01
Schizophrenia is a heterogeneous spectrum disorder often associated with detrimental negative symptoms. In recent years, computational approaches to psychiatry have attracted growing attention. Negative symptoms have shown some overlap with general cognitive impairments and were also linked to impaired motivational processing in brain circuits implementing reward prediction. In this review, we outline how computational approaches may help to provide a better understanding of negative symptoms in terms of the potentially underlying behavioural and biological mechanisms. First, we describe the idea that negative symptoms could arise from a failure to represent reward expectations to enable flexible behavioural adaptation. It has been proposed that these impairments arise from a failure to use prediction errors to update expectations. Important previous studies focused on processing of so-called model-free prediction errors where learning is determined by past rewards only. However, learning and decision-making arise from multiple cognitive mechanisms functioning simultaneously, and dissecting them via well-designed tasks in conjunction with computational modelling is a promising avenue. Second, we move on to a proof-of-concept example on how generative models of functional imaging data from a cognitive task enable the identification of subgroups of patients mapping on different levels of negative symptoms. Combining the latter approach with behavioural studies regarding learning and decision-making may allow the identification of key behavioural and biological parameters distinctive for different dimensions of negative symptoms versus a general cognitive impairment. We conclude with an outlook on how this computational framework could, at some point, enrich future clinical studies. Copyright © 2016. Published by Elsevier B.V.
Hester, Robert; Murphy, Kevin; Brown, Felicity L; Skilleter, Ashley J
2010-11-17
Punishing an error to shape subsequent performance is a major tenet of individual and societal level behavioral interventions. Recent work examining error-related neural activity has identified that the magnitude of activity in the posterior medial frontal cortex (pMFC) is predictive of learning from an error, whereby greater activity in this region predicts adaptive changes in future cognitive performance. It remains unclear how punishment influences error-related neural mechanisms to effect behavior change, particularly in key regions such as pMFC, which previous work has demonstrated to be insensitive to punishment. Using an associative learning task that provided monetary reward and punishment for recall performance, we observed that when recall errors were categorized by subsequent performance--whether the failure to accurately recall a number-location association was corrected at the next presentation of the same trial--the magnitude of error-related pMFC activity predicted future correction. However, the pMFC region was insensitive to the magnitude of punishment an error received and it was the left insula cortex that predicted learning from the most aversive outcomes. These findings add further evidence to the hypothesis that error-related pMFC activity may reflect more than a prediction error in representing the value of an outcome. The novel role identified here for the insular cortex in learning from punishment appears particularly compelling for our understanding of psychiatric and neurologic conditions that feature both insular cortex dysfunction and a diminished capacity for learning from negative feedback or punishment.
A temporal basis for Weber's law in value perception.
Namboodiri, Vijay Mohan K; Mihalas, Stefan; Hussain Shuler, Marshall G
2014-01-01
Weber's law-the observation that the ability to perceive changes in magnitudes of stimuli is proportional to the magnitude-is a widely observed psychophysical phenomenon. It is also believed to underlie the perception of reward magnitudes and the passage of time. Since many ecological theories state that animals attempt to maximize reward rates, errors in the perception of reward magnitudes and delays must affect decision-making. Using an ecological theory of decision-making (TIMERR), we analyze the effect of multiple sources of noise (sensory noise, time estimation noise, and integration noise) on reward magnitude and subjective value perception. We show that the precision of reward magnitude perception is correlated with the precision of time perception and that Weber's law in time estimation can lead to Weber's law in value perception. The strength of this correlation is predicted to depend on the reward history of the animal. Subsequently, we show that sensory integration noise (either alone or in combination with time estimation noise) also leads to Weber's law in reward magnitude perception in an accumulator model, if it has balanced Poisson feedback. We then demonstrate that the noise in subjective value of a delayed reward, due to the combined effect of noise in both the perception of reward magnitude and delay, also abides by Weber's law. Thus, in our theory we prove analytically that the perception of reward magnitude, time, and subjective value change all approximately obey Weber's law.
Saddoris, Michael P; Cacciapaglia, Fabio; Wightman, R Mark; Carelli, Regina M
2015-08-19
Mesolimbic dopamine (DA) is phasically released during appetitive behaviors, though there is substantive disagreement about the specific purpose of these DA signals. For example, prediction error (PE) models suggest a role of learning, while incentive salience (IS) models argue that the DA signal imbues stimuli with value and thereby stimulates motivated behavior. However, within the nucleus accumbens (NAc) patterns of DA release can strikingly differ between subregions, and as such, it is possible that these patterns differentially contribute to aspects of PE and IS. To assess this, we measured DA release in subregions of the NAc during a behavioral task that spatiotemporally separated sequential goal-directed stimuli. Electrochemical methods were used to measure subsecond NAc dopamine release in the core and shell during a well learned instrumental chain schedule in which rats were trained to press one lever (seeking; SL) to gain access to a second lever (taking; TL) linked with food delivery, and again during extinction. In the core, phasic DA release was greatest following initial SL presentation, but minimal for the subsequent TL and reward events. In contrast, phasic shell DA showed robust release at all task events. Signaling decreased between the beginning and end of sessions in the shell, but not core. During extinction, peak DA release in the core showed a graded decrease for the SL and pauses in release during omitted expected rewards, whereas shell DA release decreased predominantly during the TL. These release dynamics suggest parallel DA signals capable of supporting distinct theories of appetitive behavior. Dopamine signaling in the brain is important for a variety of cognitive functions, such as learning and motivation. Typically, it is assumed that a single dopamine signal is sufficient to support these cognitive functions, though competing theories disagree on how dopamine contributes to reward-based behaviors. Here, we have found that real-time dopamine release within the nucleus accumbens (a primary target of midbrain dopamine neurons) strikingly varies between core and shell subregions. In the core, dopamine dynamics are consistent with learning-based theories (such as reward prediction error) whereas in the shell, dopamine is consistent with motivation-based theories (e.g., incentive salience). These findings demonstrate that dopamine plays multiple and complementary roles based on discrete circuits that help animals optimize rewarding behaviors. Copyright © 2015 the authors 0270-6474/15/3511572-11$15.00/0.
Relief as a Reward: Hedonic and Neural Responses to Safety from Pain
Leknes, Siri; Lee, Michael; Berna, Chantal; Andersson, Jesper; Tracey, Irene
2011-01-01
Relief fits the definition of a reward. Unlike other reward types the pleasantness of relief depends on the violation of a negative expectation, yet this has not been investigated using neuroimaging approaches. We hypothesized that the degree of negative expectation depends on state (dread) and trait (pessimism) sensitivity. Of the brain regions that are involved in mediating pleasure, the nucleus accumbens also signals unexpected reward and positive prediction error. We hypothesized that accumbens activity reflects the level of negative expectation and subsequent pleasant relief. Using fMRI and two purpose-made tasks, we compared hedonic and BOLD responses to relief with responses during an appetitive reward task in 18 healthy volunteers. We expected some similarities in task responses, reflecting common neural substrates implicated across reward types. However, we also hypothesized that relief responses would differ from appetitive rewards in the nucleus accumbens, since only relief pleasantness depends on negative expectations. The results confirmed these hypotheses. Relief and appetitive reward task activity converged in the ventromedial prefrontal cortex, which also correlated with appetitive reward pleasantness ratings. In contrast, dread and pessimism scores correlated with relief but not with appetitive reward hedonics. Moreover, only relief pleasantness covaried with accumbens activation. Importantly, the accumbens signal appeared to specifically reflect individual differences in anticipation of the adverse event (dread, pessimism) but was uncorrelated to appetitive reward hedonics. In conclusion, relief differs from appetitive rewards due to its reliance on negative expectations, the violation of which is reflected in relief-related accumbens activation. PMID:21490964
Failure analysis and modeling of a VAXcluster system
NASA Technical Reports Server (NTRS)
Tang, Dong; Iyer, Ravishankar K.; Subramani, Sujatha S.
1990-01-01
This paper discusses the results of a measurement-based analysis of real error data collected from a DEC VAXcluster multicomputer system. In addition to evaluating basic system dependability characteristics such as error and failure distributions and hazard rates for both individual machines and for the VAXcluster, reward models were developed to analyze the impact of failures on the system as a whole. The results show that more than 46 percent of all failures were due to errors in shared resources. This is despite the fact that these errors have a recovery probability greater than 0.99. The hazard rate calculations show that not only errors, but also failures occur in bursts. Approximately 40 percent of all failures occur in bursts and involved multiple machines. This result indicates that correlated failures are significant. Analysis of rewards shows that software errors have the lowest reward (0.05 vs 0.74 for disk errors). The expected reward rate (reliability measure) of the VAXcluster drops to 0.5 in 18 hours for the 7-out-of-7 model and in 80 days for the 3-out-of-7 model.
Mattfeld, Aaron T.; Gluck, Mark A.; Stark, Craig E.L.
2011-01-01
The goal of the present study was to elucidate the role of the human striatum in learning via reward and punishment during an associative learning task. Previous studies have identified the striatum as a critical component in the neural circuitry of reward-related learning. It remains unclear, however, under what task conditions, and to what extent, the striatum is modulated by punishment during an instrumental learning task. Using high-resolution functional magnetic resonance imaging (fMRI) during a reward- and punishment-based probabilistic associative learning task, we observed activity in the ventral putamen for stimuli learned via reward regardless of whether participants were correct or incorrect (i.e., outcome). In contrast, activity in the dorsal caudate was modulated by trials that received feedback—either correct reward or incorrect punishment trials. We also identified an anterior/posterior dissociation reflecting reward and punishment prediction error estimates. Additionally, differences in patterns of activity that correlated with the amount of training were identified along the anterior/posterior axis of the striatum. We suggest that unique subregions of the striatum—separated along both a dorsal/ventral and anterior/posterior axis— differentially participate in the learning of associations through reward and punishment. PMID:22021252
Unexpected but Incidental Positive Outcomes Predict Real-World Gambling.
Otto, A Ross; Fleming, Stephen M; Glimcher, Paul W
2016-03-01
Positive mood can affect a person's tendency to gamble, possibly because positive mood fosters unrealistic optimism. At the same time, unexpected positive outcomes, often called prediction errors, influence mood. However, a linkage between positive prediction errors-the difference between expected and obtained outcomes-and consequent risk taking has yet to be demonstrated. Using a large data set of New York City lottery gambling and a model inspired by computational accounts of reward learning, we found that people gamble more when incidental outcomes in the environment (e.g., local sporting events and sunshine) are better than expected. When local sports teams performed better than expected, or a sunny day followed a streak of cloudy days, residents gambled more. The observed relationship between prediction errors and gambling was ubiquitous across the city's socioeconomically diverse neighborhoods and was specific to sports and weather events occurring locally in New York City. Our results suggest that unexpected but incidental positive outcomes influence risk taking. © The Author(s) 2016.
Zsuga, Judit; Biro, Klara; Papp, Csaba; Tajti, Gabor; Gesztelyi, Rudolf
2016-02-01
Reinforcement learning (RL) is a powerful concept underlying forms of associative learning governed by the use of a scalar reward signal, with learning taking place if expectations are violated. RL may be assessed using model-based and model-free approaches. Model-based reinforcement learning involves the amygdala, the hippocampus, and the orbitofrontal cortex (OFC). The model-free system involves the pedunculopontine-tegmental nucleus (PPTgN), the ventral tegmental area (VTA) and the ventral striatum (VS). Based on the functional connectivity of VS, model-free and model based RL systems center on the VS that by integrating model-free signals (received as reward prediction error) and model-based reward related input computes value. Using the concept of reinforcement learning agent we propose that the VS serves as the value function component of the RL agent. Regarding the model utilized for model-based computations we turned to the proactive brain concept, which offers an ubiquitous function for the default network based on its great functional overlap with contextual associative areas. Hence, by means of the default network the brain continuously organizes its environment into context frames enabling the formulation of analogy-based association that are turned into predictions of what to expect. The OFC integrates reward-related information into context frames upon computing reward expectation by compiling stimulus-reward and context-reward information offered by the amygdala and hippocampus, respectively. Furthermore we suggest that the integration of model-based expectations regarding reward into the value signal is further supported by the efferent of the OFC that reach structures canonical for model-free learning (e.g., the PPTgN, VTA, and VS). (c) 2016 APA, all rights reserved).
Beyond reward prediction errors: the role of dopamine in movement kinematics
Barter, Joseph W.; Li, Suellen; Lu, Dongye; Bartholomew, Ryan A.; Rossi, Mark A.; Shoemaker, Charles T.; Salas-Meza, Daniel; Gaidis, Erin; Yin, Henry H.
2015-01-01
We recorded activity of dopamine (DA) neurons in the substantia nigra pars compacta in unrestrained mice while monitoring their movements with video tracking. Our approach allows an unbiased examination of the continuous relationship between single unit activity and behavior. Although DA neurons show characteristic burst firing following cue or reward presentation, as previously reported, their activity can be explained by the representation of actual movement kinematics. Unlike neighboring pars reticulata GABAergic output neurons, which can represent vector components of position, DA neurons represent vector components of velocity or acceleration. We found neurons related to movements in four directions—up, down, left, right. For horizontal movements, there is significant lateralization of neurons: the left nigra contains more rightward neurons, whereas the right nigra contains more leftward neurons. The relationship between DA activity and movement kinematics was found on both appetitive trials using sucrose and aversive trials using air puff, showing that these neurons belong to a velocity control circuit that can be used for any number of purposes, whether to seek reward or to avoid harm. In support of this conclusion, mimicry of the phasic activation of DA neurons with selective optogenetic stimulation could also generate movements. Contrary to the popular hypothesis that DA neurons encode reward prediction errors, our results suggest that nigrostriatal DA plays an essential role in controlling the kinematics of voluntary movements. We hypothesize that DA signaling implements gain adjustment for adaptive transition control, and describe a new model of the basal ganglia (BG) in which DA functions to adjust the gain of the transition controller. This model has significant implications for our understanding of movement disorders implicating DA and the BG. PMID:26074791
Is the encoding of Reward Prediction Error reliable during development?
Keren, Hanna; Chen, Gang; Benson, Brenda; Ernst, Monique; Leibenluft, Ellen; Fox, Nathan A; Pine, Daniel S; Stringaris, Argyris
2018-05-16
Reward Prediction Errors (RPEs), defined as the difference between the expected and received outcomes, are integral to reinforcement learning models and play an important role in development and psychopathology. In humans, RPE encoding can be estimated using fMRI recordings, however, a basic measurement property of RPE signals, their test-retest reliability across different time scales, remains an open question. In this paper, we examine the 3-month and 3-year reliability of RPE encoding in youth (mean age at baseline = 10.6 ± 0.3 years), a period of developmental transitions in reward processing. We show that RPE encoding is differentially distributed between the positive values being encoded predominantly in the striatum and negative RPEs primarily encoded in the insula. The encoding of negative RPE values is highly reliable in the right insula, across both the long and the short time intervals. Insula reliability for RPE encoding is the most robust finding, while other regions, such as the striatum, are less consistent. Striatal reliability appeared significant as well once covarying for factors, which were possibly confounding the signal to noise ratio. By contrast, task activation during feedback in the striatum is highly reliable across both time intervals. These results demonstrate the valence-dependent differential encoding of RPE signals between the insula and striatum, and the consistency of RPE signals or lack thereof, during childhood and into adolescence. Characterizing the regions where the RPE signal in BOLD fMRI is a reliable marker is key for estimating reward-processing alterations in longitudinal designs, such as developmental or treatment studies. Copyright © 2018 Elsevier Inc. All rights reserved.
Abnormal Reward System Activation in Mania
Abler, Birgit; Greenhouse, Ian; Ongur, Dost; Walter, Henrik; Heckers, Stephan
2008-01-01
Transmission of reward signals is a function of dopamine, a neurotransmitter known to be involved in the mechanism of psychosis. Using functional magnetic resonance imaging (fMRI), we investigated how expectation and receipt of monetary rewards modulate brain activation in patients with bipolar mania and schizophrenia. We studied 12 acutely manic patients with a history of bipolar disorder, 12 patients with a current episode of schizoaffective disorder or schizophrenia and 12 healthy subjects. All patients were treated with dopamine antagonists at the time of the study. Subjects performed a delayed incentive paradigm with monetary reward in the scanner that allowed for investigating effects of expectation, receipt, and omission of rewards. Patients with schizophrenia and healthy control subjects showed the expected activation of dopaminergic brain areas, that is, ventral tegmentum activation upon expectation of monetary rewards and nucleus accumbens activation during receipt vs omission of rewards. In manic patients, however, we did not find a similar pattern of brain activation and the differential signal in the nucleus accumbens upon receipt vs omission of rewards was significantly lower compared to the healthy control subjects. Our findings provide evidence for abnormal function of the dopamine system during receipt or omission of expected rewards in bipolar disorder. These deficits in prediction error processing in acute mania may help to explain symptoms of disinhibition and abnormal goal pursuit regulation. PMID:17987058
Reward salience and risk aversion underlie differential ACC activity in substance dependence
Alexander, William H.; Fukunaga, Rena; Finn, Peter; Brown, Joshua W.
2015-01-01
The medial prefrontal cortex, especially the dorsal anterior cingulate cortex (ACC), has long been implicated in cognitive control and error processing. Although the association between ACC and behavior has been established, it is less clear how ACC contributes to dysfunctional behavior such as substance dependence. Evidence from neuroimaging studies investigating ACC function in substance users is mixed, with some studies showing disengagement of ACC in substance dependent individuals (SDs), while others show increased ACC activity related to substance use. In this study, we investigate ACC function in SDs and healthy individuals performing a change signal task for monetary rewards. Using a priori predictions derived from a recent computational model of ACC, we find that ACC activity differs between SDs and controls in factors related to reward salience and risk aversion between SDs and healthy individuals. Quantitative fits of a computational model to fMRI data reveal significant differences in best fit parameters for reward salience and risk preferences. Specifically, the ACC in SDs shows greater risk aversion, defined as concavity in the utility function, and greater attention to rewards relative to reward omission. Furthermore, across participants risk aversion and reward salience are positively correlated. The results clarify the role that ACC plays in both the reduced sensitivity to omitted rewards and greater reward valuation in SDs. Clinical implications of applying computational modeling in psychiatry are also discussed. PMID:26106528
Reward salience and risk aversion underlie differential ACC activity in substance dependence.
Alexander, William H; Fukunaga, Rena; Finn, Peter; Brown, Joshua W
2015-01-01
The medial prefrontal cortex, especially the dorsal anterior cingulate cortex (ACC), has long been implicated in cognitive control and error processing. Although the association between ACC and behavior has been established, it is less clear how ACC contributes to dysfunctional behavior such as substance dependence. Evidence from neuroimaging studies investigating ACC function in substance users is mixed, with some studies showing disengagement of ACC in substance dependent individuals (SDs), while others show increased ACC activity related to substance use. In this study, we investigate ACC function in SDs and healthy individuals performing a change signal task for monetary rewards. Using a priori predictions derived from a recent computational model of ACC, we find that ACC activity differs between SDs and controls in factors related to reward salience and risk aversion between SDs and healthy individuals. Quantitative fits of a computational model to fMRI data reveal significant differences in best fit parameters for reward salience and risk preferences. Specifically, the ACC in SDs shows greater risk aversion, defined as concavity in the utility function, and greater attention to rewards relative to reward omission. Furthermore, across participants risk aversion and reward salience are positively correlated. The results clarify the role that ACC plays in both the reduced sensitivity to omitted rewards and greater reward valuation in SDs. Clinical implications of applying computational modeling in psychiatry are also discussed.
How glitter relates to gold: similarity-dependent reward prediction errors in the human striatum.
Kahnt, Thorsten; Park, Soyoung Q; Burke, Christopher J; Tobler, Philippe N
2012-11-14
Optimal choices benefit from previous learning. However, it is not clear how previously learned stimuli influence behavior to novel but similar stimuli. One possibility is to generalize based on the similarity between learned and current stimuli. Here, we use neuroscientific methods and a novel computational model to inform the question of how stimulus generalization is implemented in the human brain. Behavioral responses during an intradimensional discrimination task showed similarity-dependent generalization. Moreover, a peak shift occurred, i.e., the peak of the behavioral generalization gradient was displaced from the rewarded conditioned stimulus in the direction away from the unrewarded conditioned stimulus. To account for the behavioral responses, we designed a similarity-based reinforcement learning model wherein prediction errors generalize across similar stimuli and update their value. We show that this model predicts a similarity-dependent neural generalization gradient in the striatum as well as changes in responding during extinction. Moreover, across subjects, the width of generalization was negatively correlated with functional connectivity between the striatum and the hippocampus. This result suggests that hippocampus-striatal connections contribute to stimulus-specific value updating by controlling the width of generalization. In summary, our results shed light onto the neurobiology of a fundamental, similarity-dependent learning principle that allows learning the value of stimuli that have never been encountered.
Role of dopamine D2 receptors in human reinforcement learning.
Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W
2014-09-01
Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well.
Role of Dopamine D2 Receptors in Human Reinforcement Learning
Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W
2014-01-01
Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well. PMID:24713613
Towards a general theory of neural computation based on prediction by single neurons.
Fiorillo, Christopher D
2008-10-01
Although there has been tremendous progress in understanding the mechanics of the nervous system, there has not been a general theory of its computational function. Here I present a theory that relates the established biophysical properties of single generic neurons to principles of Bayesian probability theory, reinforcement learning and efficient coding. I suggest that this theory addresses the general computational problem facing the nervous system. Each neuron is proposed to mirror the function of the whole system in learning to predict aspects of the world related to future reward. According to the model, a typical neuron receives current information about the state of the world from a subset of its excitatory synaptic inputs, and prior information from its other inputs. Prior information would be contributed by synaptic inputs representing distinct regions of space, and by different types of non-synaptic, voltage-regulated channels representing distinct periods of the past. The neuron's membrane voltage is proposed to signal the difference between current and prior information ("prediction error" or "surprise"). A neuron would apply a Hebbian plasticity rule to select those excitatory inputs that are the most closely correlated with reward but are the least predictable, since unpredictable inputs provide the neuron with the most "new" information about future reward. To minimize the error in its predictions and to respond only when excitation is "new and surprising," the neuron selects amongst its prior information sources through an anti-Hebbian rule. The unique inputs of a mature neuron would therefore result from learning about spatial and temporal patterns in its local environment, and by extension, the external world. Thus the theory describes how the structure of the mature nervous system could reflect the structure of the external world, and how the complexity and intelligence of the system might develop from a population of undifferentiated neurons, each implementing similar learning algorithms.
Baker, Travis E; Holroyd, Clay B
2011-04-01
The reinforcement learning theory of the error-related negativity (ERN) holds that the impact of reward signals carried by the midbrain dopamine system modulates activity of the anterior cingulate cortex (ACC), alternatively disinhibiting and inhibiting the ACC following unpredicted error and reward events, respectively. According to a recent formulation of the theory, activity that is intrinsic to the ACC produces a component of the event-related brain potential (ERP) called the N200, and following unpredicted rewards, the N200 is suppressed by extrinsically applied positive dopamine reward signals, resulting in an ERP component called the feedback-ERN (fERN). Here we demonstrate that, despite extensive spatial and temporal overlap between the two ERP components, the functional processes indexed by the N200 (conflict) and the fERN (reward) are dissociable. These results point toward avenues for future investigation. Copyright © 2011 Elsevier B.V. All rights reserved.
Parietal neurons encode expected gains in instrumental information
Foley, Nicholas C.; Kelly, Simon P.; Mhatre, Himanshu; Gottlieb, Jacqueline
2017-01-01
In natural behavior, animals have access to multiple sources of information, but only a few of these sources are relevant for learning and actions. Beyond choosing an appropriate action, making good decisions entails the ability to choose the relevant information, but fundamental questions remain about the brain’s information sampling policies. Recent studies described the neural correlates of seeking information about a reward, but it remains unknown whether, and how, neurons encode choices of instrumental information, in contexts in which the information guides subsequent actions. Here we show that parietal cortical neurons involved in oculomotor decisions encode, before an information sampling saccade, the reduction in uncertainty that the saccade is expected to bring for a subsequent action. These responses were distinct from the neurons’ visual and saccadic modulations and from signals of expected reward or reward prediction errors. Therefore, even in an instrumental context when information and reward gains are closely correlated, individual cells encode decision variables that are based on informational factors and can guide the active sampling of action-relevant cues. PMID:28373569
The Error in Total Error Reduction
Witnauer, James E.; Urcelay, Gonzalo P.; Miller, Ralph R.
2013-01-01
Most models of human and animal learning assume that learning is proportional to the discrepancy between a delivered outcome and the outcome predicted by all cues present during that trial (i.e., total error across a stimulus compound). This total error reduction (TER) view has been implemented in connectionist and artificial neural network models to describe the conditions under which weights between units change. Electrophysiological work has revealed that the activity of dopamine neurons is correlated with the total error signal in models of reward learning. Similar neural mechanisms presumably support fear conditioning, human contingency learning, and other types of learning. Using a computational modelling approach, we compared several TER models of associative learning to an alternative model that rejects the TER assumption in favor of local error reduction (LER), which assumes that learning about each cue is proportional to the discrepancy between the delivered outcome and the outcome predicted by that specific cue on that trial. The LER model provided a better fit to the reviewed data than the TER models. Given the superiority of the LER model with the present data sets, acceptance of TER should be tempered. PMID:23891930
Neuroeconomics and the study of addiction.
Monterosso, John; Piray, Payam; Luo, Shan
2012-07-15
We review the key findings in the application of neuroeconomics to the study of addiction. Although there are not "bright line" boundaries between neuroeconomics and other areas of behavioral science, neuroeconomics coheres around the topic of the neural representations of "Value" (synonymous with the "decision utility" of behavioral economics). Neuroeconomics parameterizes distinct features of Valuation, going beyond the general construct of "reward sensitivity" widely used in addiction research. We argue that its modeling refinements might facilitate the identification of neural substrates that contribute to addiction. We highlight two areas of neuroeconomics that have been particularly productive. The first is research on neural correlates of delay discounting (reduced Valuation of rewards as a function of their delay). The second is work that models how Value is learned as a function of "prediction-error" signaling. Although both areas are part of the neuroeconomic program, delay discounting research grows directly out of behavioral economics, whereas prediction-error work is grounded in models of learning. We also consider efforts to apply neuroeconomics to the study of self-control and discuss challenges for this area. We argue that neuroeconomic work has the potential to generate breakthrough research in addiction science. Copyright © 2012 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Neural basis of decision making guided by emotional outcomes
Matsuda, Yoshi-Taka; Fujimura, Tomomi; Ueno, Kenichi; Asamizuya, Takeshi; Suzuki, Chisato; Cheng, Kang; Okanoya, Kazuo; Okada, Masato
2015-01-01
Emotional events resulting from a choice influence an individual's subsequent decision making. Although the relationship between emotion and decision making has been widely discussed, previous studies have mainly investigated decision outcomes that can easily be mapped to reward and punishment, including monetary gain/loss, gustatory stimuli, and pain. These studies regard emotion as a modulator of decision making that can be made rationally in the absence of emotions. In our daily lives, however, we often encounter various emotional events that affect decisions by themselves, and mapping the events to a reward or punishment is often not straightforward. In this study, we investigated the neural substrates of how such emotional decision outcomes affect subsequent decision making. By using functional magnetic resonance imaging (fMRI), we measured brain activities of humans during a stochastic decision-making task in which various emotional pictures were presented as decision outcomes. We found that pleasant pictures differentially activated the midbrain, fusiform gyrus, and parahippocampal gyrus, whereas unpleasant pictures differentially activated the ventral striatum, compared with neutral pictures. We assumed that the emotional decision outcomes affect the subsequent decision by updating the value of the options, a process modeled by reinforcement learning models, and that the brain regions representing the prediction error that drives the reinforcement learning are involved in guiding subsequent decisions. We found that some regions of the striatum and the insula were separately correlated with the prediction error for either pleasant pictures or unpleasant pictures, whereas the precuneus was correlated with prediction errors for both pleasant and unpleasant pictures. PMID:25695644
Neural basis of decision making guided by emotional outcomes.
Katahira, Kentaro; Matsuda, Yoshi-Taka; Fujimura, Tomomi; Ueno, Kenichi; Asamizuya, Takeshi; Suzuki, Chisato; Cheng, Kang; Okanoya, Kazuo; Okada, Masato
2015-05-01
Emotional events resulting from a choice influence an individual's subsequent decision making. Although the relationship between emotion and decision making has been widely discussed, previous studies have mainly investigated decision outcomes that can easily be mapped to reward and punishment, including monetary gain/loss, gustatory stimuli, and pain. These studies regard emotion as a modulator of decision making that can be made rationally in the absence of emotions. In our daily lives, however, we often encounter various emotional events that affect decisions by themselves, and mapping the events to a reward or punishment is often not straightforward. In this study, we investigated the neural substrates of how such emotional decision outcomes affect subsequent decision making. By using functional magnetic resonance imaging (fMRI), we measured brain activities of humans during a stochastic decision-making task in which various emotional pictures were presented as decision outcomes. We found that pleasant pictures differentially activated the midbrain, fusiform gyrus, and parahippocampal gyrus, whereas unpleasant pictures differentially activated the ventral striatum, compared with neutral pictures. We assumed that the emotional decision outcomes affect the subsequent decision by updating the value of the options, a process modeled by reinforcement learning models, and that the brain regions representing the prediction error that drives the reinforcement learning are involved in guiding subsequent decisions. We found that some regions of the striatum and the insula were separately correlated with the prediction error for either pleasant pictures or unpleasant pictures, whereas the precuneus was correlated with prediction errors for both pleasant and unpleasant pictures. Copyright © 2015 the American Physiological Society.
Effects of reward and punishment on learning from errors in smokers.
Duehlmeyer, Leonie; Levis, Bianca; Hester, Robert
2018-04-30
Punishing errors facilitates adaptation in healthy individuals, while aberrant reward and punishment sensitivity in drug-dependent individuals may change this impact. Many societies have institutions that use the concept of punishing drug use behavior, making it important to understand how drug dependency mediates the effects of negative feedback for influencing adaptive behavior. Using an associative learning task, we investigated differences in error correction rates of dependent smokers, compared with controls. Two versions of the task were administered to different participant samples: One assessed the effect of varying monetary contingencies to task performance, the other, the presence of reward as compared to avoidance of punishment for correct performance. While smokers recalled associations that were rewarded with a higher value 11% more often than lower rewarded locations, they did not correct higher punished locations more often. Controls exhibited the opposite pattern. The three-way interaction between magnitude, feedback type and group was significant, F(1,48) = 5.288, p =0.026, ɳ 2 p =0.099. Neither participant group corrected locations offering reward more often than those offering avoidances of punishment. The interaction between group and feedback condition was not significant, F(1,58) = 0.0, p =0.99, ɳ 2 p =0.001. The present results suggest that smokers have poorer learning from errors when receiving negative feedback. Moreover, larger rewards reinforce smokers' behavior stronger than smaller rewards, whereas controls made no distinction. These findings support the hypothesis that dependent smokers may respond to positively framed and rewarded anti-smoking programs when compared to those relying on negative feedback or punishment. Copyright © 2018 Elsevier B.V. All rights reserved.
Impacts of motivational valence on the error-related negativity elicited by full and partial errors.
Maruo, Yuya; Schacht, Annekathrin; Sommer, Werner; Masaki, Hiroaki
2016-02-01
Affect and motivation influence the error-related negativity (ERN) elicited by full errors; however, it is unknown whether they also influence ERNs to correct responses accompanied by covert incorrect response activation (partial errors). Here we compared a neutral condition with conditions, where correct responses were rewarded or where incorrect responses were punished with gains and losses of small amounts of money, respectively. Data analysis distinguished ERNs elicited by full and partial errors. In the reward and punishment conditions, ERN amplitudes to both full and partial errors were larger than in the neutral condition, confirming participants' sensitivity to the significance of errors. We also investigated the relationships between ERN amplitudes and the behavioral inhibition and activation systems (BIS/BAS). Regardless of reward/punishment condition, participants scoring higher on BAS showed smaller ERN amplitudes in full error trials. These findings provide further evidence that the ERN is related to motivational valence and that similar relationships hold for both full and partial errors. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Association of Neural and Emotional Impacts of Reward Prediction Errors With Major Depression
Moutoussis, Michael; Smittenaar, Peter; Zeidman, Peter; Taylor, Tanja; Hrynkiewicz, Louise; Lam, Jordan; Skandali, Nikolina; Siegel, Jenifer Z.; Ousdal, Olga T.; Prabhu, Gita; Dayan, Peter; Fonagy, Peter; Dolan, Raymond J.
2017-01-01
Importance Major depressive disorder (MDD) is associated with deficits in representing reward prediction errors (RPEs), which are the difference between experienced and predicted reward. Reward prediction errors underlie learning of values in reinforcement learning models, are represented by phasic dopamine release, and are known to affect momentary mood. Objective To combine functional neuroimaging, computational modeling, and smartphone-based large-scale data collection to test, in the absence of learning-related concerns, the hypothesis that depression attenuates the impact of RPEs. Design, Setting, and Participants Functional magnetic resonance imaging (fMRI) data were collected on 32 individuals with moderate MDD and 20 control participants who performed a probabilistic reward task. A risky decision task with repeated happiness ratings as a measure of momentary mood was also tested in the laboratory in 74 participants and with a smartphone-based platform in 1833 participants. The study was conducted from November 20, 2012, to February 17, 2015. Main Outcomes and Measures Blood oxygen level–dependent activity was measured in ventral striatum, a dopamine target area known to represent RPEs. Momentary mood was measured during risky decision making. Results Of the 52 fMRI participants (mean [SD] age, 34.0 [9.1] years), 30 (58%) were women and 32 had MDD. Of the 74 participants in the laboratory risky decision task (mean age, 34.2 [10.3] years), 44 (59%) were women and 54 had MDD. Of the smartphone group, 543 (30%) had a depression history and 1290 (70%) had no depression history; 918 (50%) were women, and 593 (32%) were younger than 30 years. Contrary to previous results in reinforcement learning tasks, individuals with moderate depression showed intact RPE signals in ventral striatum (z = 3.16; P = .002) that did not differ significantly from controls (z = 0.91; P = .36). Symptom severity correlated with baseline mood parameters in laboratory (ρ = −0.54; P < 1 × 10−6) and smartphone (ρ = −0.30; P < 1 × 10−39) data. However, participants with depression showed an intact association between RPEs and happiness in a computational model of momentary mood dynamics (z = 4.55; P < .001) that was not attenuated compared with controls (z = −0.42; P = .67). Conclusions and Relevance The neural and emotional impact of RPEs is intact in major depression. These results suggest that depression does not affect the expression of dopaminergic RPEs and that attenuated RPEs in previous reports may reflect downstream effects more closely related to aberrant behavior. The correlation between symptom severity and baseline mood parameters supports an association between depression and momentary mood fluctuations during cognitive tasks. These results demonstrate a potential for smartphones in large-scale computational phenotyping, which is a goal for computational psychiatry. PMID:28678984
Hypernatural Monitoring: A Social Rehearsal Account of Smartphone Addiction
Veissière, Samuel P. L.; Stendel, Moriah
2018-01-01
We present a deflationary account of smartphone addiction by situating this purportedly antisocial phenomenon within the fundamentally social dispositions of our species. While we agree with contemporary critics that the hyper-connectedness and unpredictable rewards of mobile technology can modulate negative affect, we propose to place the locus of addiction on an evolutionarily older mechanism: the human need to monitor and be monitored by others. Drawing from key findings in evolutionary anthropology and the cognitive science of religion, we articulate a hypernatural monitoring model of smartphone addiction grounded in a general social rehearsal theory of human cognition. Building on recent predictive-processing views of perception and addiction in cognitive neuroscience, we describe the role of social reward anticipation and prediction errors in mediating dysfunctional smartphone use. We conclude with insights from contemplative philosophies and harm-reduction models on finding the right rituals for honoring social connections and setting intentional protocols for the consumption of social information. PMID:29515480
Hypernatural Monitoring: A Social Rehearsal Account of Smartphone Addiction.
Veissière, Samuel P L; Stendel, Moriah
2018-01-01
We present a deflationary account of smartphone addiction by situating this purportedly antisocial phenomenon within the fundamentally social dispositions of our species. While we agree with contemporary critics that the hyper-connectedness and unpredictable rewards of mobile technology can modulate negative affect, we propose to place the locus of addiction on an evolutionarily older mechanism: the human need to monitor and be monitored by others. Drawing from key findings in evolutionary anthropology and the cognitive science of religion, we articulate a hypernatural monitoring model of smartphone addiction grounded in a general social rehearsal theory of human cognition. Building on recent predictive-processing views of perception and addiction in cognitive neuroscience, we describe the role of social reward anticipation and prediction errors in mediating dysfunctional smartphone use. We conclude with insights from contemplative philosophies and harm-reduction models on finding the right rituals for honoring social connections and setting intentional protocols for the consumption of social information.
Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation
Morita, Kenji
2016-01-01
It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of ‘Go’ or ‘No-Go’ selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of ‘Go’ values towards a goal, and (2) value-contrasts between ‘Go’ and ‘No-Go’ are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning are active even though learning has apparently converged, the systems might be in a state of dynamic equilibrium, where learning and forgetting are balanced. PMID:27736881
Children on the autism spectrum update their behaviour in response to a volatile environment.
Manning, Catherine; Kilner, James; Neil, Louise; Karaminis, Themelis; Pellicano, Elizabeth
2017-09-01
Typical adults can track reward probabilities across trials to estimate the volatility of the environment and use this information to modify their learning rate (Behrens et al., 2007). In a stable environment, it is advantageous to take account of outcomes over many trials, whereas in a volatile environment, recent experience should be more strongly weighted than distant experience. Recent predictive coding accounts of autism propose that autistic individuals will demonstrate atypical updating of their behaviour in response to the statistics of the reward environment. To rigorously test this hypothesis, we administered a developmentally appropriate version of Behrens et al.'s (2007) task to 34 cognitively able children on the autism spectrum aged between 6 and 14 years, 32 age- and ability-matched typically developing children and 19 typical adults. Participants were required to choose between a green and a blue pirate chest, each associated with a randomly determined reward value between 0 and 100 points, with a combined total of 100 points. On each trial, the reward was given for one stimulus only. In the stable condition, the ratio of the blue or green response being rewarded was fixed at 75:25. In the volatile condition, the ratio alternated between 80:20 and 20:80 every 20 trials. We estimated the learning rate for each participant by fitting a delta rule model and compared this rate across conditions and groups. All groups increased their learning rate in the volatile condition compared to the stable condition. Unexpectedly, there was no effect of group and no interaction between group and condition. Thus, autistic children used information about the statistics of the reward environment to guide their decisions to a similar extent as typically developing children and adults. These results help constrain predictive coding accounts of autism by demonstrating that autism is not characterized by uniform differences in the weighting of prediction error. © 2016 The Authors. Developmental Science Published by John Wiley & Sons Ltd.
Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.
Kato, Ayaka; Morita, Kenji
2016-10-01
It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2) value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning are active even though learning has apparently converged, the systems might be in a state of dynamic equilibrium, where learning and forgetting are balanced.
Herbort, Maike C; Soch, Joram; Wüstenberg, Torsten; Krauel, Kerstin; Pujara, Maia; Koenigs, Michael; Gallinat, Jürgen; Walter, Henrik; Roepke, Stefan; Schott, Björn H
2016-01-01
Patients with borderline personality disorder (BPD) frequently exhibit impulsive behavior, and self-reported impulsivity is typically higher in BPD patients when compared to healthy controls. Previous functional neuroimaging studies have suggested a link between impulsivity, the ventral striatal response to reward anticipation, and prediction errors. Here we investigated the striatal neural response to monetary gain and loss anticipation and their relationship with impulsivity in 21 female BPD patients and 23 age-matched female healthy controls using functional magnetic resonance imaging (fMRI). Participants performed a delayed monetary incentive task in which three categories of objects predicted a potential gain, loss, or neutral outcome. Impulsivity was assessed using the Barratt Impulsiveness Scale (BIS-11). Compared to healthy controls, BPD patients exhibited significantly reduced fMRI responses of the ventral striatum/nucleus accumbens (VS/NAcc) to both reward-predicting and loss-predicting cues. BIS-11 scores showed a significant positive correlation with the VS/NAcc reward anticipation responses in healthy controls, and this correlation, while also nominally positive, failed to reach significance in BPD patients. BPD patients, on the other hand, exhibited a significantly negative correlation between ventral striatal loss anticipation responses and BIS-11 scores, whereas this correlation was significantly positive in healthy controls. Our results suggest that patients with BPD show attenuated anticipation responses in the VS/NAcc and, furthermore, that higher impulsivity in BPD patients might be related to impaired prediction of aversive outcomes.
Somato-dendritic Synaptic Plasticity and Error-backpropagation in Active Dendrites
Schiess, Mathieu; Urbanczik, Robert; Senn, Walter
2016-01-01
In the last decade dendrites of cortical neurons have been shown to nonlinearly combine synaptic inputs by evoking local dendritic spikes. It has been suggested that these nonlinearities raise the computational power of a single neuron, making it comparable to a 2-layer network of point neurons. But how these nonlinearities can be incorporated into the synaptic plasticity to optimally support learning remains unclear. We present a theoretically derived synaptic plasticity rule for supervised and reinforcement learning that depends on the timing of the presynaptic, the dendritic and the postsynaptic spikes. For supervised learning, the rule can be seen as a biological version of the classical error-backpropagation algorithm applied to the dendritic case. When modulated by a delayed reward signal, the same plasticity is shown to maximize the expected reward in reinforcement learning for various coding scenarios. Our framework makes specific experimental predictions and highlights the unique advantage of active dendrites for implementing powerful synaptic plasticity rules that have access to downstream information via backpropagation of action potentials. PMID:26841235
Papale, Andrew E; Zielinski, Mark C; Frank, Loren M; Jadhav, Shantanu P; Redish, A David
2016-12-07
Current theories posit that memories encoded during experiences are subsequently consolidated into longer-term storage. Hippocampal sharp-wave-ripple (SWR) events have been linked to this consolidation process during sleep, but SWRs also occur during awake immobility, where their role remains unclear. We report that awake SWR rates at the reward site are inversely related to the prevalence of vicarious trial and error (VTE) behaviors, thought to be involved in deliberation processes. SWR rates were diminished immediately after VTE behaviors and an increase in the rate of SWR events at the reward site predicted a decrease in subsequent VTE behaviors at the choice point. Furthermore, SWR disruptions increased VTE behaviors. These results suggest an inverse relationship between SWRs and VTE behaviors and suggest that awake SWRs and associated planning and memory consolidation mechanisms are engaged specifically in the context of higher levels of behavioral certainty. Copyright © 2016 Elsevier Inc. All rights reserved.
An Update on the Role of Serotonin and its Interplay with Dopamine for Reward.
Fischer, Adrian G; Ullsperger, Markus
2017-01-01
The specific role of serotonin and its interplay with dopamine (DA) in adaptive, reward guided behavior as well as drug dependance, still remains elusive. Recently, novel methods allowed cell type specific anatomical, functional and interventional analyses of serotonergic and dopaminergic circuits, promising significant advancement in understanding their functional roles. Furthermore, it is increasingly recognized that co-release of neurotransmitters is functionally relevant, understanding of which is required in order to interpret results of pharmacological studies and their relationship to neural recordings. Here, we review recent animal studies employing such techniques with the aim to connect their results to effects observed in human pharmacological studies and subjective effects of drugs. It appears that the additive effect of serotonin and DA conveys significant reward related information and is subjectively highly euphorizing. Neither DA nor serotonin alone have such an effect. This coincides with optogenetically targeted recordings in mice, where the dopaminergic system codes reward prediction errors (PE), and the serotonergic system mainly unsigned PE. Overall, this pattern of results indicates that joint activity between both systems carries essential reward information and invites parallel investigation of both neurotransmitter systems.
Electrophysiological responses to feedback during the application of abstract rules.
Walsh, Matthew M; Anderson, John R
2013-11-01
Much research focuses on how people acquire concrete stimulus-response associations from experience; however, few neuroscientific studies have examined how people learn about and select among abstract rules. To address this issue, we recorded ERPs as participants performed an abstract rule-learning task. In each trial, they viewed a sample number and two test numbers. Participants then chose a test number using one of three abstract mathematical rules they freely selected from: greater than the sample number, less than the sample number, or equal to the sample number. No one rule was always rewarded, but some rules were rewarded more frequently than others. To maximize their earnings, participants needed to learn which rules were rewarded most frequently. All participants learned to select the best rules for repeating and novel stimulus sets that obeyed the overall reward probabilities. Participants differed, however, in the extent to which they overgeneralized those rules to repeating stimulus sets that deviated from the overall reward probabilities. The feedback-related negativity (FRN), an ERP component thought to reflect reward prediction error, paralleled behavior. The FRN was sensitive to item-specific reward probabilities in participants who detected the deviant stimulus set, and the FRN was sensitive to overall reward probabilities in participants who did not. These results show that the FRN is sensitive to the utility of abstract rules and that the individual's representation of a task's states and actions shapes behavior as well as the FRN.
Electrophysiological Responses to Feedback during the Application of Abstract Rules
Walsh, Matthew M.; Anderson, John R.
2017-01-01
Much research focuses on how people acquire concrete stimulus–response associations from experience; however, few neuroscientific studies have examined how people learn about and select among abstract rules. To address this issue, we recorded ERPs as participants performed an abstract rule-learning task. In each trial, they viewed a sample number and two test numbers. Participants then chose a test number using one of three abstract mathematical rules they freely selected from: greater than the sample number, less than the sample number, or equal to the sample number. No one rule was always rewarded, but some rules were rewarded more frequently than others. To maximize their earnings, participants needed to learn which rules were rewarded most frequently. All participants learned to select the best rules for repeating and novel stimulus sets that obeyed the overall reward probabilities. Participants differed, however, in the extent to which they overgeneralized those rules to repeating stimulus sets that deviated from the overall reward probabilities. The feedback-related negativity (FRN), an ERP component thought to reflect reward prediction error, paralleled behavior. The FRN was sensitive to item-specific reward probabilities in participants who detected the deviant stimulus set, and the FRN was sensitive to overall reward probabilities in participants who did not. These results show that the FRN is sensitive to the utility of abstract rules and that the individualʼs representation of a taskʼs states and actions shapes behavior as well as the FRN. PMID:23915052
Homeostatic Regulation of Memory Systems and Adaptive Decisions
Mizumori, Sheri JY; Jo, Yong Sang
2013-01-01
While it is clear that many brain areas process mnemonic information, understanding how their interactions result in continuously adaptive behaviors has been a challenge. A homeostatic-regulated prediction model of memory is presented that considers the existence of a single memory system that is based on a multilevel coordinated and integrated network (from cells to neural systems) that determines the extent to which events and outcomes occur as predicted. The “multiple memory systems of the brain” have in common output that signals errors in the prediction of events and/or their outcomes, although these signals differ in terms of what the error signal represents (e.g., hippocampus: context prediction errors vs. midbrain/striatum: reward prediction errors). The prefrontal cortex likely plays a pivotal role in the coordination of prediction analysis within and across prediction brain areas. By virtue of its widespread control and influence, and intrinsic working memory mechanisms. Thus, the prefrontal cortex supports the flexible processing needed to generate adaptive behaviors and predict future outcomes. It is proposed that prefrontal cortex continually and automatically produces adaptive responses according to homeostatic regulatory principles: prefrontal cortex may serve as a controller that is intrinsically driven to maintain in prediction areas an experience-dependent firing rate set point that ensures adaptive temporally and spatially resolved neural responses to future prediction errors. This same drive by prefrontal cortex may also restore set point firing rates after deviations (i.e. prediction errors) are detected. In this way, prefrontal cortex contributes to reducing uncertainty in prediction systems. An emergent outcome of this homeostatic view may be the flexible and adaptive control that prefrontal cortex is known to implement (i.e. working memory) in the most challenging of situations. Compromise to any of the prediction circuits should result in rigid and suboptimal decision making and memory as seen in addiction and neurological disease. © 2013 The Authors. Hippocampus Published by Wiley Periodicals, Inc. PMID:23929788
Homeostatic regulation of memory systems and adaptive decisions.
Mizumori, Sheri J Y; Jo, Yong Sang
2013-11-01
While it is clear that many brain areas process mnemonic information, understanding how their interactions result in continuously adaptive behaviors has been a challenge. A homeostatic-regulated prediction model of memory is presented that considers the existence of a single memory system that is based on a multilevel coordinated and integrated network (from cells to neural systems) that determines the extent to which events and outcomes occur as predicted. The "multiple memory systems of the brain" have in common output that signals errors in the prediction of events and/or their outcomes, although these signals differ in terms of what the error signal represents (e.g., hippocampus: context prediction errors vs. midbrain/striatum: reward prediction errors). The prefrontal cortex likely plays a pivotal role in the coordination of prediction analysis within and across prediction brain areas. By virtue of its widespread control and influence, and intrinsic working memory mechanisms. Thus, the prefrontal cortex supports the flexible processing needed to generate adaptive behaviors and predict future outcomes. It is proposed that prefrontal cortex continually and automatically produces adaptive responses according to homeostatic regulatory principles: prefrontal cortex may serve as a controller that is intrinsically driven to maintain in prediction areas an experience-dependent firing rate set point that ensures adaptive temporally and spatially resolved neural responses to future prediction errors. This same drive by prefrontal cortex may also restore set point firing rates after deviations (i.e. prediction errors) are detected. In this way, prefrontal cortex contributes to reducing uncertainty in prediction systems. An emergent outcome of this homeostatic view may be the flexible and adaptive control that prefrontal cortex is known to implement (i.e. working memory) in the most challenging of situations. Compromise to any of the prediction circuits should result in rigid and suboptimal decision making and memory as seen in addiction and neurological disease. Copyright © 2013 Wiley Periodicals, Inc.
Fouragnan, Elsa; Retzler, Chris; Philiastides, Marios G
2018-03-25
Learning occurs when an outcome differs from expectations, generating a reward prediction error signal (RPE). The RPE signal has been hypothesized to simultaneously embody the valence of an outcome (better or worse than expected) and its surprise (how far from expectations). Nonetheless, growing evidence suggests that separate representations of the two RPE components exist in the human brain. Meta-analyses provide an opportunity to test this hypothesis and directly probe the extent to which the valence and surprise of the error signal are encoded in separate or overlapping networks. We carried out several meta-analyses on a large set of fMRI studies investigating the neural basis of RPE, locked at decision outcome. We identified two valence learning systems by pooling studies searching for differential neural activity in response to categorical positive-versus-negative outcomes. The first valence network (negative > positive) involved areas regulating alertness and switching behaviours such as the midcingulate cortex, the thalamus and the dorsolateral prefrontal cortex whereas the second valence network (positive > negative) encompassed regions of the human reward circuitry such as the ventral striatum and the ventromedial prefrontal cortex. We also found evidence of a largely distinct surprise-encoding network including the anterior cingulate cortex, anterior insula and dorsal striatum. Together with recent animal and electrophysiological evidence this meta-analysis points to a sequential and distributed encoding of different components of the RPE signal, with potentially distinct functional roles. © 2018 Wiley Periodicals, Inc.
Striatal dysfunction during reversal learning in unmedicated schizophrenia patients☆
Schlagenhauf, Florian; Huys, Quentin J.M.; Deserno, Lorenz; Rapp, Michael A.; Beck, Anne; Heinze, Hans-Joachim; Dolan, Ray; Heinz, Andreas
2014-01-01
Subjects with schizophrenia are impaired at reinforcement-driven reversal learning from as early as their first episode. The neurobiological basis of this deficit is unknown. We obtained behavioral and fMRI data in 24 unmedicated, primarily first episode, schizophrenia patients and 24 age-, IQ- and gender-matched healthy controls during a reversal learning task. We supplemented our fMRI analysis, focusing on learning from prediction errors, with detailed computational modeling to probe task solving strategy including an ability to deploy an internal goal directed model of the task. Patients displayed reduced functional activation in the ventral striatum (VS) elicited by prediction errors. However, modeling task performance revealed that a subgroup did not adjust their behavior according to an accurate internal model of the task structure, and these were also the more severely psychotic patients. In patients who could adapt their behavior, as well as in controls, task solving was best described by cognitive strategies according to a Hidden Markov Model. When we compared patients and controls who acted according to this strategy, patients still displayed a significant reduction in VS activation elicited by informative errors that precede salient changes of behavior (reversals). Thus, our study shows that VS dysfunction in schizophrenia patients during reward-related reversal learning remains a core deficit even when controlling for task solving strategies. This result highlights VS dysfunction is tightly linked to a reward-related reversal learning deficit in early, unmedicated schizophrenia patients. PMID:24291614
Malvaez, Melissa; Greenfield, Venuz Y.; Wang, Alice S.; Yorita, Allison M.; Feng, Lili; Linker, Kay E.; Monbouquette, Harold G.; Wassum, Kate M.
2015-01-01
Environmental stimuli have the ability to generate specific representations of the rewards they predict and in so doing alter the selection and performance of reward-seeking actions. The basolateral amygdala participates in this process, but precisely how is unknown. To rectify this, we monitored, in near-real time, basolateral amygdala glutamate concentration changes during a test of the ability of reward-predictive cues to influence reward-seeking actions (Pavlovian-instrumental transfer). Glutamate concentration was found to be transiently elevated around instrumental reward seeking. During the Pavlovian-instrumental transfer test these glutamate transients were time-locked to and correlated with only those actions invigorated by outcome-specific motivational information provided by the reward-predictive stimulus (i.e., actions earning the same specific outcome as predicted by the presented CS). In addition, basolateral amygdala AMPA, but not NMDA glutamate receptor inactivation abolished the selective excitatory influence of reward-predictive cues over reward seeking. These data the hypothesis that transient glutamate release in the BLA can encode the outcome-specific motivational information provided by reward-predictive stimuli. PMID:26212790
The error in total error reduction.
Witnauer, James E; Urcelay, Gonzalo P; Miller, Ralph R
2014-02-01
Most models of human and animal learning assume that learning is proportional to the discrepancy between a delivered outcome and the outcome predicted by all cues present during that trial (i.e., total error across a stimulus compound). This total error reduction (TER) view has been implemented in connectionist and artificial neural network models to describe the conditions under which weights between units change. Electrophysiological work has revealed that the activity of dopamine neurons is correlated with the total error signal in models of reward learning. Similar neural mechanisms presumably support fear conditioning, human contingency learning, and other types of learning. Using a computational modeling approach, we compared several TER models of associative learning to an alternative model that rejects the TER assumption in favor of local error reduction (LER), which assumes that learning about each cue is proportional to the discrepancy between the delivered outcome and the outcome predicted by that specific cue on that trial. The LER model provided a better fit to the reviewed data than the TER models. Given the superiority of the LER model with the present data sets, acceptance of TER should be tempered. Copyright © 2013 Elsevier Inc. All rights reserved.
Reinforcement active learning in the vibrissae system: optimal object localization.
Gordon, Goren; Dorfman, Nimrod; Ahissar, Ehud
2013-01-01
Rats move their whiskers to acquire information about their environment. It has been observed that they palpate novel objects and objects they are required to localize in space. We analyze whisker-based object localization using two complementary paradigms, namely, active learning and intrinsic-reward reinforcement learning. Active learning algorithms select the next training samples according to the hypothesized solution in order to better discriminate between correct and incorrect labels. Intrinsic-reward reinforcement learning uses prediction errors as the reward to an actor-critic design, such that behavior converges to the one that optimizes the learning process. We show that in the context of object localization, the two paradigms result in palpation whisking as their respective optimal solution. These results suggest that rats may employ principles of active learning and/or intrinsic reward in tactile exploration and can guide future research to seek the underlying neuronal mechanisms that implement them. Furthermore, these paradigms are easily transferable to biomimetic whisker-based artificial sensors and can improve the active exploration of their environment. Copyright © 2012 Elsevier Ltd. All rights reserved.
The neural mechanisms of learning from competitors.
Howard-Jones, Paul A; Bogacz, Rafal; Yoo, Jee H; Leonards, Ute; Demetriou, Skevi
2010-11-01
Learning from competitors poses a challenge for existing theories of reward-based learning, which assume that rewarded actions are more likely to be executed in the future. Such a learning mechanism would disadvantage a player in a competitive situation because, since the competitor's loss is the player's gain, reward might become associated with an action the player should themselves avoid. Using fMRI, we investigated the neural activity of humans competing with a computer in a foraging task. We observed neural activity that represented the variables required for learning from competitors: the actions of the competitor (in the player's motor and premotor cortex) and the reward prediction error arising from the competitor's feedback. In particular, regions positively correlated with the unexpected loss of the competitor (which was beneficial to the player) included the striatum and those regions previously implicated in response inhibition. Our results suggest that learning in such contexts may involve the competitor's unexpected losses activating regions of the player's brain that subserve response inhibition, as the player learns to avoid the actions that produced them. Copyright 2010 Elsevier Inc. All rights reserved.
Double dissociation of value computations in orbitofrontal and anterior cingulate neurons
Kennerley, Steven W.; Behrens, Timothy E. J.; Wallis, Jonathan D.
2011-01-01
Damage to prefrontal cortex (PFC) impairs decision-making, but the underlying value computations that might cause such impairments remain unclear. Here we report that value computations are doubly dissociable within PFC neurons. While many PFC neurons encoded chosen value, they used opponent encoding schemes such that averaging the neuronal population eliminated value coding. However, a special population of neurons in anterior cingulate cortex (ACC) - but not orbitofrontal cortex (OFC) - multiplex chosen value across decision parameters using a unified encoding scheme, and encoded reward prediction errors. In contrast, neurons in OFC - but not ACC - encoded chosen value relative to the recent history of choice values. Together, these results suggest complementary valuation processes across PFC areas: OFC neurons dynamically evaluate current choices relative to recent choice values, while ACC neurons encode choice predictions and prediction errors using a common valuation currency reflecting the integration of multiple decision parameters. PMID:22037498
Seo, Hyojung; Lee, Daeyeol
2008-01-01
The process of decision making in humans and other animals is adaptive and can be tuned through experience so as to optimize the outcomes of their choices in a dynamic environment. Previous studies have demonstrated that the anterior cingulate cortex plays an important role in updating the animal’s behavioral strategies when the action-outcome contingencies change. Moreover, neurons in the anterior cingulate cortex often encode the signals related to expected or actual reward. We investigated whether reward-related activity in the anterior cingulate cortex is affected by the animal’s previous reward history. This was tested in rhesus monkeys trained to make binary choices in a computer-simulated competitive zero-sum game. The animal’s choice behavior was relatively close to the optimal strategy, but also revealed small but systematic biases that are consistent with the use of a reinforcement learning algorithm. In addition, the activity of neurons in the dorsal anterior cingulate cortex that was related to the reward received by the animal in a given trial was often modulated by the rewards in the previous trials. Some of these neurons encoded the rate of rewards in previous trials, whereas others displayed activity modulations more closely related to the reward prediction errors. By contrast, signals related to the animal’s choices were only weakly represented in this cortical area. These results suggest that neurons in the dorsal anterior cingulate cortex might be involved in the subjective evaluation of choice outcomes based on the animal’s reward history. PMID:17670983
Assessing anhedonia in depression: Potentials and pitfalls
Rizvi, Sakina J.; Pizzagalli, Diego A.; Sproule, Beth A.; Kennedy, Sidney H.
2016-01-01
The resurgence of interest in anhedonia within major depression has been fuelled by clinical trials demonstrating its utility in predicting antidepressant response as well as recent conceptualizations focused on the role and manifestation of anhedonia in depression. Historically, anhedonia has been conceptualized as a “loss of pleasure”, yet neuropsychological and neurobiological studies reveal a multifaceted reconceptualization that emphasizes different facets of hedonic function, including desire, effort/motivation, anticipation and consummatory pleasure. To ensure generalizability across studies, evaluation of the available subjective and objective methods to assess anhedonia is necessary. The majority of research regarding anhedonia and its neurobiological underpinnings comes from preclinical research, which uses primary reward (e.g. food) to probe hedonic responding. In contrast, behavioural studies in humans primarily use secondary reward (e.g. money) to measure many aspects of reward responding, including delay discounting, response bias, prediction error, probabilistic reversal learning, effort, anticipation and consummatory pleasure. The development of subjective scales to measure anhedonia has also increased in the last decade. This review will assess the current methodology to measure anhedonia, with a focus on scales and behavioural tasks in humans. Limitations of current work and recommendations for future studies are discussed. PMID:26959336
Tobler, Philippe N.
2015-01-01
When we are learning to associate novel cues with outcomes, learning is more efficient if we take advantage of previously learned associations and thereby avoid redundant learning. The blocking effect represents this sort of efficiency mechanism and refers to the phenomenon in which a novel stimulus is blocked from learning when it is associated with a fully predicted outcome. Although there is sufficient evidence that this effect manifests itself when individuals learn about their own rewards, it remains unclear whether it also does when they learn about others’ rewards. We employed behavioral and neuroimaging methods to address this question. We demonstrate that blocking does indeed occur in the social domain and it does so to a similar degree as observed in the individual domain. On the neural level, activations in the medial prefrontal cortex (mPFC) show a specific contribution to blocking and learning-related prediction errors in the social domain. These findings suggest that the efficiency principle that applies to reward learning in the individual domain also applies to that in the social domain, with the mPFC playing a central role in implementing it. PMID:25326037
A Transient Dopamine Signal Represents Avoidance Value and Causally Influences the Demand to Avoid
Pultorak, Katherine J.; Schelp, Scott A.; Isaacs, Dominic P.; Krzystyniak, Gregory
2018-01-01
Abstract While an extensive literature supports the notion that mesocorticolimbic dopamine plays a role in negative reinforcement, recent evidence suggests that dopamine exclusively encodes the value of positive reinforcement. In the present study, we employed a behavioral economics approach to investigate whether dopamine plays a role in the valuation of negative reinforcement. Using rats as subjects, we first applied fast-scan cyclic voltammetry (FSCV) to determine that dopamine concentration decreases with the number of lever presses required to avoid electrical footshock (i.e., the economic price of avoidance). Analysis of the rate of decay of avoidance demand curves, which depict an inverse relationship between avoidance and increasing price, allows for inference of the worth an animal places on avoidance outcomes. Rapidly decaying demand curves indicate increased price sensitivity, or low worth placed on avoidance outcomes, while slow rates of decay indicate reduced price sensitivity, or greater worth placed on avoidance outcomes. We therefore used optogenetics to assess how inducing dopamine release causally modifies the demand to avoid electrical footshock in an economic setting. Increasing release at an avoidance predictive cue made animals more sensitive to price, consistent with a negative reward prediction error (i.e., the animal perceives they received a worse outcome than expected). Increasing release at avoidance made animals less sensitive to price, consistent with a positive reward prediction error (i.e., the animal perceives they received a better outcome than expected). These data demonstrate that transient dopamine release events represent the value of avoidance outcomes and can predictably modify the demand to avoid. PMID:29766047
The orbitofrontal cortex and beyond: from affect to decision-making.
Rolls, Edmund T; Grabenhorst, Fabian
2008-11-01
The orbitofrontal cortex represents the reward or affective value of primary reinforcers including taste, touch, texture, and face expression. It learns to associate other stimuli with these to produce representations of the expected reward value for visual, auditory, and abstract stimuli including monetary reward value. The orbitofrontal cortex thus plays a key role in emotion, by representing the goals for action. The learning process is stimulus-reinforcer association learning. Negative reward prediction error neurons are related to this affective learning. Activations in the orbitofrontal cortex correlate with the subjective emotional experience of affective stimuli, and damage to the orbitofrontal cortex impairs emotion-related learning, emotional behaviour, and subjective affective state. With an origin from beyond the orbitofrontal cortex, top-down attention to affect modulates orbitofrontal cortex representations, and attention to intensity modulates representations in earlier cortical areas of the physical properties of stimuli. Top-down word-level cognitive inputs can bias affective representations in the orbitofrontal cortex, providing a mechanism for cognition to influence emotion. Whereas the orbitofrontal cortex provides a representation of reward or affective value on a continuous scale, areas beyond the orbitofrontal cortex such as the medial prefrontal cortex area 10 are involved in binary decision-making when a choice must be made. For this decision-making, the orbitofrontal cortex provides a representation of each specific reward in a common currency.
Eppinger, Ben; Walter, Maik; Li, Shu-Chen
2017-04-01
In this study, we investigated the interplay of habitual (model-free) and goal-directed (model-based) decision processes by using a two-stage Markov decision task in combination with event-related potentials (ERPs) and computational modeling. To manipulate the demands on model-based decision making, we applied two experimental conditions with different probabilities of transitioning from the first to the second stage of the task. As we expected, when the stage transitions were more predictable, participants showed greater model-based (planning) behavior. Consistent with this result, we found that stimulus-evoked parietal (P300) activity at the second stage of the task increased with the predictability of the state transitions. However, the parietal activity also reflected model-free information about the expected values of the stimuli, indicating that at this stage of the task both types of information are integrated to guide decision making. Outcome-related ERP components only reflected reward-related processes: Specifically, a medial prefrontal ERP component (the feedback-related negativity) was sensitive to negative outcomes, whereas a component that is elicited by reward (the feedback-related positivity) increased as a function of positive prediction errors. Taken together, our data indicate that stimulus-locked parietal activity reflects the integration of model-based and model-free information during decision making, whereas feedback-related medial prefrontal signals primarily reflect reward-related decision processes.
Sadeh, Naomi; Spielberg, Jeffrey M; Hayes, Jasmeet P
2018-01-01
We examined current posttraumatic stress disorder (PTSD) symptoms, trait disinhibition, and affective context as contributors to impulsive and self-destructive behavior in 94 trauma-exposed Veterans. Participants completed an affective Go/No-Go task (GNG) with different emotional contexts (threat, reward, and a multidimensional threat/reward condition) and current PTSD, trait disinhibition, and risky/self-destructive behavior measures. PTSD interacted with trait disinhibition to explain recent engagement in risky/self-destructive behavior, with Veterans scoring high on trait disinhibition and current PTSD symptoms reporting the highest levels of these behaviors. On the GNG task, commission errors were also associated with the interaction of PTSD symptoms and trait disinhibition. Specifically, PTSD symptoms were associated with greater commission errors in threat vs. reward contexts for individuals who were low on trait disinhibition. In contrast, veterans high on PTSD and trait disinhibition exhibited the greatest number of commission errors in the multidimensional affective context that involved both threat and reward processing. Results highlight the interactive effects of PTSD and disinhibited personality traits, as well as threat and reward systems, as risk factors for impulsive and self-destructive behavior in trauma-exposed groups. Findings have clinical implications for understanding heterogeneity in the expression of PTSD and its association with disinhibited behavior. Copyright © 2017 Elsevier Ltd. All rights reserved.
Hernaus, Dennis; Gold, James M; Waltz, James A; Frank, Michael J
2018-04-03
While many have emphasized impaired reward prediction error signaling in schizophrenia, multiple studies suggest that some decision-making deficits may arise from overreliance on stimulus-response systems together with a compromised ability to represent expected value. Guided by computational frameworks, we formulated and tested two scenarios in which maladaptive representations of expected value should be most evident, thereby delineating conditions that may evoke decision-making impairments in schizophrenia. In a modified reinforcement learning paradigm, 42 medicated people with schizophrenia and 36 healthy volunteers learned to select the most frequently rewarded option in a 75-25 pair: once when presented with a more deterministic (90-10) pair and once when presented with a more probabilistic (60-40) pair. Novel and old combinations of choice options were presented in a subsequent transfer phase. Computational modeling was employed to elucidate contributions from stimulus-response systems (actor-critic) and expected value (Q-learning). People with schizophrenia showed robust performance impairments with increasing value difference between two competing options, which strongly correlated with decreased contributions from expected value-based learning (Q-learning). Moreover, a subtle yet consistent contextual choice bias for the probabilistic 75 option was present in people with schizophrenia, which could be accounted for by a context-dependent reward prediction error in the actor-critic. We provide evidence that decision-making impairments in schizophrenia increase monotonically with demands placed on expected value computations. A contextual choice bias is consistent with overreliance on stimulus-response learning, which may signify a deficit secondary to the maladaptive representation of expected value. These results shed new light on conditions under which decision-making impairments may arise. Copyright © 2018 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Reward inference by primate prefrontal and striatal neurons.
Pan, Xiaochuan; Fan, Hongwei; Sawa, Kosuke; Tsuda, Ichiro; Tsukada, Minoru; Sakagami, Masamichi
2014-01-22
The brain contains multiple yet distinct systems involved in reward prediction. To understand the nature of these processes, we recorded single-unit activity from the lateral prefrontal cortex (LPFC) and the striatum in monkeys performing a reward inference task using an asymmetric reward schedule. We found that neurons both in the LPFC and in the striatum predicted reward values for stimuli that had been previously well experienced with set reward quantities in the asymmetric reward task. Importantly, these LPFC neurons could predict the reward value of a stimulus using transitive inference even when the monkeys had not yet learned the stimulus-reward association directly; whereas these striatal neurons did not show such an ability. Nevertheless, because there were two set amounts of reward (large and small), the selected striatal neurons were able to exclusively infer the reward value (e.g., large) of one novel stimulus from a pair after directly experiencing the alternative stimulus with the other reward value (e.g., small). Our results suggest that although neurons that predict reward value for old stimuli in the LPFC could also do so for new stimuli via transitive inference, those in the striatum could only predict reward for new stimuli via exclusive inference. Moreover, the striatum showed more complex functions than was surmised previously for model-free learning.
Reward Inference by Primate Prefrontal and Striatal Neurons
Pan, Xiaochuan; Fan, Hongwei; Sawa, Kosuke; Tsuda, Ichiro; Tsukada, Minoru
2014-01-01
The brain contains multiple yet distinct systems involved in reward prediction. To understand the nature of these processes, we recorded single-unit activity from the lateral prefrontal cortex (LPFC) and the striatum in monkeys performing a reward inference task using an asymmetric reward schedule. We found that neurons both in the LPFC and in the striatum predicted reward values for stimuli that had been previously well experienced with set reward quantities in the asymmetric reward task. Importantly, these LPFC neurons could predict the reward value of a stimulus using transitive inference even when the monkeys had not yet learned the stimulus–reward association directly; whereas these striatal neurons did not show such an ability. Nevertheless, because there were two set amounts of reward (large and small), the selected striatal neurons were able to exclusively infer the reward value (e.g., large) of one novel stimulus from a pair after directly experiencing the alternative stimulus with the other reward value (e.g., small). Our results suggest that although neurons that predict reward value for old stimuli in the LPFC could also do so for new stimuli via transitive inference, those in the striatum could only predict reward for new stimuli via exclusive inference. Moreover, the striatum showed more complex functions than was surmised previously for model-free learning. PMID:24453328
Bradley, Kailyn A L; Case, Julia A C; Freed, Rachel D; Stern, Emily R; Gabbay, Vilma
2017-07-01
There has been growing interest under the Research Domain Criteria initiative to investigate behavioral constructs and their underlying neural circuitry. Abnormalities in reward processes are salient across psychiatric conditions and may precede future psychopathology in youth. However, the neural circuitry underlying such deficits has not been well defined. Therefore, in this pilot, we studied youth with diverse psychiatric symptoms and examined the neural underpinnings of reward anticipation, attainment, and positive prediction error (PPE, unexpected reward gain). Clinically, we focused on anhedonia, known to reflect deficits in reward function. Twenty-two psychotropic medication-free youth, 16 with psychiatric symptoms, exhibiting a full range of anhedonia, were scanned during the Reward Flanker Task. Anhedonia severity was quantified using the Snaith-Hamilton Pleasure Scale. Functional magnetic resonance imaging analyses were false discovery rate corrected for multiple comparisons. Anticipation activated a broad network, including the medial frontal cortex and ventral striatum, while attainment activated memory and emotion-related regions such as the hippocampus and parahippocampal gyrus, but not the ventral striatum. PPE activated a right-dominant fronto-temporo-parietal network. Anhedonia was only correlated with activation of the right angular gyrus during anticipation and the left precuneus during PPE at an uncorrected threshold. Findings are preliminary due to the small sample size. This pilot characterized the neural circuitry underlying different aspects of reward processing in youth with diverse psychiatric symptoms. These results highlight the complexity of the neural circuitry underlying reward anticipation, attainment, and PPE. Furthermore, this study underscores the importance of RDoC research in youth. Copyright © 2016 Elsevier B.V. All rights reserved.
Shapiro, Matthew L.
2017-01-01
Memory can inform goal-directed behavior by linking current opportunities to past outcomes. The orbitofrontal cortex (OFC) may guide value-based responses by integrating the history of stimulus–reward associations into expected outcomes, representations of predicted hedonic value and quality. Alternatively, the OFC may rapidly compute flexible “online” reward predictions by associating stimuli with the latest outcome. OFC neurons develop predictive codes when rats learn to associate arbitrary stimuli with outcomes, but the extent to which predictive coding depends on most recent events and the integrated history of rewards is unclear. To investigate how reward history modulates OFC activity, we recorded OFC ensembles as rats performed spatial discriminations that differed only in the number of rewarded trials between goal reversals. The firing rate of single OFC neurons distinguished identical behaviors guided by different goals. When >20 rewarded trials separated goal switches, OFC ensembles developed stable and anticorrelated population vectors that predicted overall choice accuracy and the goal selected in single trials. When <10 rewarded trials separated goal switches, OFC population vectors decorrelated rapidly after each switch, but did not develop anticorrelated firing patterns or predict choice accuracy. The results show that, whereas OFC signals respond rapidly to contingency changes, they predict choices only when reward history is relatively stable, suggesting that consecutive rewarded episodes are needed for OFC computations that integrate reward history into expected outcomes. SIGNIFICANCE STATEMENT Adapting to changing contingencies and making decisions engages the orbitofrontal cortex (OFC). Previous work shows that OFC function can either improve or impair learning depending on reward stability, suggesting that OFC guides behavior optimally when contingencies apply consistently. The mechanisms that link reward history to OFC computations remain obscure. Here, we examined OFC unit activity as rodents performed tasks controlled by contingencies that varied reward history. When contingencies were stable, OFC neurons signaled past, present, and pending events; when contingencies were unstable, past and present coding persisted, but predictive coding diminished. The results suggest that OFC mechanisms require stable contingencies across consecutive episodes to integrate reward history, represent predicted outcomes, and inform goal-directed choices. PMID:28115481
Toda, Koji; Sugase-Miyamoto, Yasuko; Mizuhiki, Takashi; Inaba, Kiyonori; Richmond, Barry J; Shidara, Munetaka
2012-01-01
The value of a predicted reward can be estimated based on the conjunction of both the intrinsic reward value and the length of time to obtain it. The question we addressed is how the two aspects, reward size and proximity to reward, influence the responses of neurons in rostral anterior cingulate cortex (rACC), a brain region thought to play an important role in reward processing. We recorded from single neurons while two monkeys performed a multi-trial reward schedule task. The monkeys performed 1-4 sequential color discrimination trials to obtain a reward of 1-3 liquid drops. There were two task conditions, a valid cue condition, where the number of trials and reward amount were associated with visual cues, and a random cue condition, where the cue was picked from the cue set at random. In the valid cue condition, the neuronal firing is strongly modulated by the predicted reward proximity during the trials. Information about the predicted reward amount is almost absent at those times. In substantial subpopulations, the neuronal responses decreased or increased gradually through schedule progress to the predicted outcome. These two gradually modulating signals could be used to calculate the effect of time on the perception of reward value. In the random cue condition, little information about the reward proximity or reward amount is encoded during the course of the trial before reward delivery, but when the reward is actually delivered the responses reflect both the reward proximity and reward amount. Our results suggest that the rACC neurons encode information about reward proximity and amount in a manner that is dependent on utility of reward information. The manner in which the information is represented could be used in the moment-to-moment calculation of the effect of time and amount on predicted outcome value.
Suboptimal choice in rats: incentive salience attribution promotes maladaptive decision-making
Chow, Jonathan J; Smith, Aaron P; Wilson, A George; Zentall, Thomas R; Beckmann, Joshua S
2016-01-01
Stimuli that are more predictive of subsequent reward also function as better conditioned reinforcers. Moreover, stimuli attributed with incentive salience function as more robust conditioned reinforcers. Some theories have suggested that conditioned reinforcement plays an important role in promoting suboptimal choice behavior, like gambling. The present experiments examined how different stimuli, those attributed with incentive salience versus those without, can function in tandem with stimulus-reward predictive utility to promote maladaptive decision-making in rats. One group of rats had lights associated with goal-tracking as the reward-predictive stimuli and another had levers associated with sign-tracking as the reward-predictive stimuli. All rats were first trained on a choice procedure in which the expected value across both alternatives was equivalent but differed in their stimulus-reward predictive utility. Next, the expected value across both alternatives was systematically changed so that the alternative with greater stimulus-reward predictive utility was suboptimal in regard to primary reinforcement. The results demonstrate that in order to obtain suboptimal choice behavior, incentive salience alongside strong stimulus-reward predictive utility may be necessary; thus, maladaptive decision-making can be driven more by the value attributed to stimuli imbued with incentive salience that reliably predict a reward rather than the reward itself. PMID:27993692
Suboptimal choice in rats: Incentive salience attribution promotes maladaptive decision-making.
Chow, Jonathan J; Smith, Aaron P; Wilson, A George; Zentall, Thomas R; Beckmann, Joshua S
2017-03-01
Stimuli that are more predictive of subsequent reward also function as better conditioned reinforcers. Moreover, stimuli attributed with incentive salience function as more robust conditioned reinforcers. Some theories have suggested that conditioned reinforcement plays an important role in promoting suboptimal choice behavior, like gambling. The present experiments examined how different stimuli, those attributed with incentive salience versus those without, can function in tandem with stimulus-reward predictive utility to promote maladaptive decision-making in rats. One group of rats had lights associated with goal-tracking as the reward-predictive stimuli and another had levers associated with sign-tracking as the reward-predictive stimuli. All rats were first trained on a choice procedure in which the expected value across both alternatives was equivalent but differed in their stimulus-reward predictive utility. Next, the expected value across both alternatives was systematically changed so that the alternative with greater stimulus-reward predictive utility was suboptimal in regard to primary reinforcement. The results demonstrate that in order to obtain suboptimal choice behavior, incentive salience alongside strong stimulus-reward predictive utility may be necessary; thus, maladaptive decision-making can be driven more by the value attributed to stimuli imbued with incentive salience that reliably predict a reward rather than the reward itself. Copyright © 2016 Elsevier B.V. All rights reserved.
Thoma, Patrizia; Norra, Christine; Juckel, Georg; Suchan, Boris; Bellebaum, Christian
2015-07-01
Previous literature established a link between major depressive disorder (MDD) and altered reward processing as well as between empathy and (observational) reward learning. The aim of the present study was to assess the effects of MDD on the electrophysiological correlates - the feedback-related negativity (FRN) and the P300 - of active and observational reward processing and to relate them to trait cognitive and affective empathy. Eighteen patients with MDD and 16 healthy controls performed an active and an observational probabilistic reward-learning task while event- related potentials were recorded. Also, participants were assessed with regard to self-reported cognitive and affective trait empathy. Relative to healthy controls, patients with MDD showed overall impaired learning and attenuated FRN amplitudes, irrespective of feedback valence and learning type (active vs. observational), but comparable P300 amplitudes. In the patient group, but not in controls, higher trait perspective taking scores were significantly correlated with reduced FRN amplitudes. The pattern of results suggests impaired prediction error processing and a negative effect of higher trait empathy on feedback-based learning in patients with MDD. Copyright © 2015 Elsevier B.V. All rights reserved.
Seid-Fatemi, Azade; Tobler, Philippe N
2015-05-01
When we are learning to associate novel cues with outcomes, learning is more efficient if we take advantage of previously learned associations and thereby avoid redundant learning. The blocking effect represents this sort of efficiency mechanism and refers to the phenomenon in which a novel stimulus is blocked from learning when it is associated with a fully predicted outcome. Although there is sufficient evidence that this effect manifests itself when individuals learn about their own rewards, it remains unclear whether it also does when they learn about others' rewards. We employed behavioral and neuroimaging methods to address this question. We demonstrate that blocking does indeed occur in the social domain and it does so to a similar degree as observed in the individual domain. On the neural level, activations in the medial prefrontal cortex (mPFC) show a specific contribution to blocking and learning-related prediction errors in the social domain. These findings suggest that the efficiency principle that applies to reward learning in the individual domain also applies to that in the social domain, with the mPFC playing a central role in implementing it. © The Author (2014). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Social conflicts elicit an N400-like component.
Huang, Yi; Kendrick, Keith M; Yu, Rongjun
2014-12-01
When people have different opinions, they often adjust their own attitude to match that of others, known as social conformity. How social conflicts trigger subsequent conformity remains unclear. One possibility is that a conflict with the group opinion is perceived as a violation of social information, analogous to using wrong grammar, and activates conflict monitoring and adjustment mechanisms. Using event related potential (ERP) recording combined with a face attractiveness judgment task, we investigated the neural encoding of social conflicts. We found that social conflicts elicit an N400-like negative deflection, being more negative for conflict with group opinions than no-conflict condition. The social conflict related signals also have a bi-directional profile similar to reward prediction error signals: it was more negative for under-estimation (i.e. one׳s own ratings were smaller than group ratings) than over-estimation, and the larger the differences between ratings, the larger the N400 amplitude. The N400 effects were significantly diminished in the non-social condition. We conclude that social conflicts are encoded in a bidirectional fashion in the N400-like component, similar to the pattern of reward-based prediction error signals. Our findings also suggest that the N400, a well-established ERP component encoding semantic violation, might be involved in social conflict processing and social learning. Copyright © 2014 Elsevier Ltd. All rights reserved.
Neurocomputational mechanisms of prosocial learning and links to empathy
Apps, Matthew A. J.; Valton, Vincent; Viding, Essi; Roiser, Jonathan P.
2016-01-01
Reinforcement learning theory powerfully characterizes how we learn to benefit ourselves. In this theory, prediction errors—the difference between a predicted and actual outcome of a choice—drive learning. However, we do not operate in a social vacuum. To behave prosocially we must learn the consequences of our actions for other people. Empathy, the ability to vicariously experience and understand the affect of others, is hypothesized to be a critical facilitator of prosocial behaviors, but the link between empathy and prosocial behavior is still unclear. During functional magnetic resonance imaging (fMRI) participants chose between different stimuli that were probabilistically associated with rewards for themselves (self), another person (prosocial), or no one (control). Using computational modeling, we show that people can learn to obtain rewards for others but do so more slowly than when learning to obtain rewards for themselves. fMRI revealed that activity in a posterior portion of the subgenual anterior cingulate cortex/basal forebrain (sgACC) drives learning only when we are acting in a prosocial context and signals a prosocial prediction error conforming to classical principles of reinforcement learning theory. However, there is also substantial variability in the neural and behavioral efficiency of prosocial learning, which is predicted by trait empathy. More empathic people learn more quickly when benefitting others, and their sgACC response is the most selective for prosocial learning. We thus reveal a computational mechanism driving prosocial learning in humans. This framework could provide insights into atypical prosocial behavior in those with disorders of social cognition. PMID:27528669
Mathiak, Krystyna A; Klasen, Martin; Weber, René; Ackermann, Hermann; Shergill, Sukhwinder S; Mathiak, Klaus
2011-07-12
Violent content in video games evokes many concerns but there is little research concerning its rewarding aspects. It was demonstrated that playing a video game leads to striatal dopamine release. It is unclear, however, which aspects of the game cause this reward system activation and if violent content contributes to it. We combined functional Magnetic Resonance Imaging (fMRI) with individual affect measures to address the neuronal correlates of violence in a video game. Thirteen male German volunteers played a first-person shooter game (Tactical Ops: Assault on Terror) during fMRI measurement. We defined success as eliminating opponents, and failure as being eliminated themselves. Affect was measured directly before and after game play using the Positive and Negative Affect Schedule (PANAS). Failure and success events evoked increased activity in visual cortex but only failure decreased activity in orbitofrontal cortex and caudate nucleus. A negative correlation between negative affect and responses to failure was evident in the right temporal pole (rTP). The deactivation of the caudate nucleus during failure is in accordance with its role in reward-prediction error: it occurred whenever subject missed an expected reward (being eliminated rather than eliminating the opponent). We found no indication that violence events were directly rewarding for the players. We addressed subjective evaluations of affect change due to gameplay to study the reward system. Subjects reporting greater negative affect after playing the game had less rTP activity associated with failure. The rTP may therefore be involved in evaluating the failure events in a social context, to regulate the players' mood.
Lloyd, Kevin; Dayan, Peter
2015-01-01
Substantial evidence suggests that the phasic activity of dopamine neurons represents reinforcement learning’s temporal difference prediction error. However, recent reports of ramp-like increases in dopamine concentration in the striatum when animals are about to act, or are about to reach rewards, appear to pose a challenge to established thinking. This is because the implied activity is persistently predictable by preceding stimuli, and so cannot arise as this sort of prediction error. Here, we explore three possible accounts of such ramping signals: (a) the resolution of uncertainty about the timing of action; (b) the direct influence of dopamine over mechanisms associated with making choices; and (c) a new model of discounted vigour. Collectively, these suggest that dopamine ramps may be explained, with only minor disturbance, by standard theoretical ideas, though urgent questions remain regarding their proximal cause. We suggest experimental approaches to disentangling which of the proposed mechanisms are responsible for dopamine ramps. PMID:26699940
Mesolimbic confidence signals guide perceptual learning in the absence of external feedback
Guggenmos, Matthias; Wilbertz, Gregor; Hebart, Martin N; Sterzer, Philipp
2016-01-01
It is well established that learning can occur without external feedback, yet normative reinforcement learning theories have difficulties explaining such instances of learning. Here, we propose that human observers are capable of generating their own feedback signals by monitoring internal decision variables. We investigated this hypothesis in a visual perceptual learning task using fMRI and confidence reports as a measure for this monitoring process. Employing a novel computational model in which learning is guided by confidence-based reinforcement signals, we found that mesolimbic brain areas encoded both anticipation and prediction error of confidence—in remarkable similarity to previous findings for external reward-based feedback. We demonstrate that the model accounts for choice and confidence reports and show that the mesolimbic confidence prediction error modulation derived through the model predicts individual learning success. These results provide a mechanistic neurobiological explanation for learning without external feedback by augmenting reinforcement models with confidence-based feedback. DOI: http://dx.doi.org/10.7554/eLife.13388.001 PMID:27021283
Apps, Matthew A.J.; Roiser, Jonathan P.; Viding, Essi
2015-01-01
Empathy—the capacity to understand and resonate with the experiences of others—can depend on the ability to predict when others are likely to receive rewards. However, although a plethora of research has examined the neural basis of predictions about the likelihood of receiving rewards ourselves, very little is known about the mechanisms that underpin variability in vicarious reward prediction. Human neuroimaging and nonhuman primate studies suggest that a subregion of the anterior cingulate cortex in the gyrus (ACCg) is engaged when others receive rewards. Does the ACCg show specialization for processing predictions about others' rewards and not one's own and does this specialization vary with empathic abilities? We examined hemodynamic responses in the human brain time-locked to cues that were predictive of a high or low probability of a reward either for the subject themselves or another person. We found that the ACCg robustly signaled the likelihood of a reward being delivered to another. In addition, ACCg response significantly covaried with trait emotion contagion, a necessary foundation for empathizing with other individuals. In individuals high in emotion contagion, the ACCg was specialized for processing others' rewards exclusively, but for those low in emotion contagion, this region also responded to information about the subject's own rewards. Our results are the first to show that the ACCg signals probabilistic predictions about rewards for other people and that the substantial individual variability in the degree to which the ACCg is specialized for processing others' rewards is related to trait empathy. SIGNIFICANCE STATEMENT Successfully cooperating, competing, or empathizing with others can depend on our ability to predict when others are going to get something rewarding. Although many studies have examined how the brain processes rewards we will get ourselves, very little is known about vicarious reward processing. Here, we show that a subregion of the anterior cingulate cortex in the gyrus (ACCg) shows a degree of specialization for processing others' versus one's own rewards. However, the degree to which the ACCg is specialized varies with people's ability to empathize with others. This new insight into how vicarious rewards are processed in the brain and vary with empathy may be key for understanding disorders of social behavior, including psychopathy and autism. PMID:26446224
Reward positivity is elicited by monetary reward in the absence of response choice.
Varona-Moya, Sergio; Morís, Joaquín; Luque, David
2015-02-11
The neural response to positive and negative feedback differs in their event-related potentials. Most often this difference is interpreted as the result of a negative voltage deflection after negative feedback. This deflection has been referred to as the feedback-related negativity component. The reinforcement learning model of the feedback-related negativity establishes that this component reflects an error monitoring process aimed to increase behavior adjustment progressively. However, a recent proposal suggests that the difference observed is actually due to a positivity reflecting the rewarding value of positive feedbacks - that is, the reward positivity component (RewP). From this it follows that RewP could be found even in the absence of any action-monitoring processes. We tested this prediction by means of an experiment in which visual target stimuli were intermixed with nontarget stimuli. Three types of targets signaled money gains, money losses, or the absence of either money gain or money loss, respectively. No motor response was required. Event-related potential analyses showed a central positivity in a 270-370 ms time window that was elicited by target stimuli signaling money gains, as compared with both stimuli signaling losses and no-gain/no-loss neutral stimuli. This is the first evidence to show that RewP is obtained when stimuli with rewarding values are passively perceived.
Computational substrates of norms and their violations during social exchange.
Xiang, Ting; Lohrenz, Terry; Montague, P Read
2013-01-16
Social norms in humans constrain individual behaviors to establish shared expectations within a social group. Previous work has probed social norm violations and the feelings that such violations engender; however, a computational rendering of the underlying neural and emotional responses has been lacking. We probed norm violations using a two-party, repeated fairness game (ultimatum game) where proposers offer a split of a monetary resource to a responder who either accepts or rejects the offer. Using a norm-training paradigm where subject groups are preadapted to either high or low offers, we demonstrate that unpredictable shifts in expected offers creates a difference in rejection rates exhibited by the two responder groups for otherwise identical offers. We constructed an ideal observer model that identified neural correlates of norm prediction errors in the ventral striatum and anterior insula, regions that also showed strong responses to variance-prediction errors generated by the same model. Subjective feelings about offers correlated with these norm prediction errors, and the two signals displayed overlapping, but not identical, neural correlates in striatum, insula, and medial orbitofrontal cortex. These results provide evidence for the hypothesis that responses in anterior insula can encode information about social norm violations that correlate with changes in overt behavior (changes in rejection rates). Together, these results demonstrate that the brain regions involved in reward prediction and risk prediction are also recruited in signaling social norm violations.
Computational Substrates of Norms and Their Violations during Social Exchange
Xiang, Ting; Lohrenz, Terry; Montague, P. Read
2013-01-01
Social norms in humans constrain individual behaviors to establish shared expectations within a social group. Previous work has probed social norm violations and the feelings that such violations engender; however, a computational rendering of the underlying neural and emotional responses has been lacking. We probed norm violations using a two-party, repeated fairness game (ultimatum game) where proposers offer a split of a monetary resource to a responder who either accepts or rejects the offer. Using a norm-training paradigm where subject groups are preadapted to either high or low offers, we demonstrate that unpredictable shifts in expected offers creates a difference in rejection rates exhibited by the two responder groups for otherwise identical offers. We constructed an ideal observer model that identified neural correlates of norm prediction errors in the ventral striatum and anterior insula, regions that also showed strong responses to variance-prediction errors generated by the same model. Subjective feelings about offers correlated with these norm prediction errors, and the two signals displayed overlapping, but not identical, neural correlates in striatum, insula, and medial orbitofrontal cortex. These results provide evidence for the hypothesis that responses in anterior insula can encode information about social norm violations that correlate with changes in overt behavior (changes in rejection rates). Together, these results demonstrate that the brain regions involved in reward prediction and risk prediction are also recruited in signaling social norm violations. PMID:23325247
BOLD responses in reward regions to hypothetical and imaginary monetary rewards
Miyapuram, Krishna P.; Tobler, Philippe N.; Gregorios-Pippas, Lucy; Schultz, Wolfram
2015-01-01
Monetary rewards are uniquely human. Because money is easy to quantify and present visually, it is the reward of choice for most fMRI studies, even though it cannot be handed over to participants inside the scanner. A typical fMRI study requires hundreds of trials and thus small amounts of monetary rewards per trial (e.g. 5p) if all trials are to be treated equally. However, small payoffs can have detrimental effects on performance due to their limited buying power. Hypothetical monetary rewards can overcome the limitations of smaller monetary rewards but it is less well known whether predictors of hypothetical rewards activate reward regions. In two experiments, visual stimuli were associated with hypothetical monetary rewards. In Experiment 1, we used stimuli predicting either visually presented or imagined hypothetical monetary rewards, together with non-rewarding control pictures. Activations to reward predictive stimuli occurred in reward regions, namely the medial orbitofrontal cortex and midbrain. In Experiment 2, we parametrically varied the amount of visually presented hypothetical monetary reward keeping constant the amount of actually received reward. Graded activation in midbrain was observed to stimuli predicting increasing hypothetical rewards. The results demonstrate the efficacy of using hypothetical monetary rewards in fMRI studies. PMID:21985912
BOLD responses in reward regions to hypothetical and imaginary monetary rewards.
Miyapuram, Krishna P; Tobler, Philippe N; Gregorios-Pippas, Lucy; Schultz, Wolfram
2012-01-16
Monetary rewards are uniquely human. Because money is easy to quantify and present visually, it is the reward of choice for most fMRI studies, even though it cannot be handed over to participants inside the scanner. A typical fMRI study requires hundreds of trials and thus small amounts of monetary rewards per trial (e.g. 5p) if all trials are to be treated equally. However, small payoffs can have detrimental effects on performance due to their limited buying power. Hypothetical monetary rewards can overcome the limitations of smaller monetary rewards but it is less well known whether predictors of hypothetical rewards activate reward regions. In two experiments, visual stimuli were associated with hypothetical monetary rewards. In Experiment 1, we used stimuli predicting either visually presented or imagined hypothetical monetary rewards, together with non-rewarding control pictures. Activations to reward predictive stimuli occurred in reward regions, namely the medial orbitofrontal cortex and midbrain. In Experiment 2, we parametrically varied the amount of visually presented hypothetical monetary reward keeping constant the amount of actually received reward. Graded activation in midbrain was observed to stimuli predicting increasing hypothetical rewards. The results demonstrate the efficacy of using hypothetical monetary rewards in fMRI studies. Copyright © 2011 Elsevier Inc. All rights reserved.
A computational and neural model of momentary subjective well-being
Rutledge, Robb B.; Skandali, Nikolina; Dayan, Peter; Dolan, Raymond J.
2014-01-01
The subjective well-being or happiness of individuals is an important metric for societies. Although happiness is influenced by life circumstances and population demographics such as wealth, we know little about how the cumulative influence of daily life events are aggregated into subjective feelings. Using computational modeling, we show that emotional reactivity in the form of momentary happiness in response to outcomes of a probabilistic reward task is explained not by current task earnings, but by the combined influence of recent reward expectations and prediction errors arising from those expectations. The robustness of this account was evident in a large-scale replication involving 18,420 participants. Using functional MRI, we show that the very same influences account for task-dependent striatal activity in a manner akin to the influences underpinning changes in happiness. PMID:25092308
The role of learning-related dopamine signals in addiction vulnerability.
Huys, Quentin J M; Tobler, Philippe N; Hasler, Gregor; Flagel, Shelly B
2014-01-01
Dopaminergic signals play a mathematically precise role in reward-related learning, and variations in dopaminergic signaling have been implicated in vulnerability to addiction. Here, we provide a detailed overview of the relationship between theoretical, mathematical, and experimental accounts of phasic dopamine signaling, with implications for the role of learning-related dopamine signaling in addiction and related disorders. We describe the theoretical and behavioral characteristics of model-free learning based on errors in the prediction of reward, including step-by-step explanations of the underlying equations. We then use recent insights from an animal model that highlights individual variation in learning during a Pavlovian conditioning paradigm to describe overlapping aspects of incentive salience attribution and model-free learning. We argue that this provides a computationally coherent account of some features of addiction. © 2014 Elsevier B.V. All rights reserved.
Performability modeling based on real data: A case study
NASA Technical Reports Server (NTRS)
Hsueh, M. C.; Iyer, R. K.; Trivedi, K. S.
1988-01-01
Described is a measurement-based performability model based on error and resource usage data collected on a multiprocessor system. A method for identifying the model structure is introduced and the resulting model is validated against real data. Model development from the collection of raw data to the estimation of the expected reward is described. Both normal and error behavior of the system are characterized. The measured data show that the holding times in key operational and error states are not simple exponentials and that a semi-Markov process is necessary to model system behavior. A reward function, based on the service rate and the error rate in each state, is then defined in order to estimate the performability of the system and to depict the cost of apparent types of errors.
Performability modeling based on real data: A casestudy
NASA Technical Reports Server (NTRS)
Hsueh, M. C.; Iyer, R. K.; Trivedi, K. S.
1987-01-01
Described is a measurement-based performability model based on error and resource usage data collected on a multiprocessor system. A method for identifying the model structure is introduced and the resulting model is validated against real data. Model development from the collection of raw data to the estimation of the expected reward is described. Both normal and error behavior of the system are characterized. The measured data show that the holding times in key operational and error states are not simple exponentials and that a semi-Markov process is necessary to model the system behavior. A reward function, based on the service rate and the error rate in each state, is then defined in order to estimate the performability of the system and to depict the cost of different types of errors.
2011-01-01
Background Violent content in video games evokes many concerns but there is little research concerning its rewarding aspects. It was demonstrated that playing a video game leads to striatal dopamine release. It is unclear, however, which aspects of the game cause this reward system activation and if violent content contributes to it. We combined functional Magnetic Resonance Imaging (fMRI) with individual affect measures to address the neuronal correlates of violence in a video game. Results Thirteen male German volunteers played a first-person shooter game (Tactical Ops: Assault on Terror) during fMRI measurement. We defined success as eliminating opponents, and failure as being eliminated themselves. Affect was measured directly before and after game play using the Positive and Negative Affect Schedule (PANAS). Failure and success events evoked increased activity in visual cortex but only failure decreased activity in orbitofrontal cortex and caudate nucleus. A negative correlation between negative affect and responses to failure was evident in the right temporal pole (rTP). Conclusions The deactivation of the caudate nucleus during failure is in accordance with its role in reward-prediction error: it occurred whenever subject missed an expected reward (being eliminated rather than eliminating the opponent). We found no indication that violence events were directly rewarding for the players. We addressed subjective evaluations of affect change due to gameplay to study the reward system. Subjects reporting greater negative affect after playing the game had less rTP activity associated with failure. The rTP may therefore be involved in evaluating the failure events in a social context, to regulate the players' mood. PMID:21749711
Martig, Adria K; Mizumori, Sheri JY
2010-01-01
Hippocampus (HPC) receives dopaminergic (DA) projections from the ventral tegmental area (VTA) and substantia nigra. These inputs appear to provide a modulatory signal that influences HPC dependent behaviors and place fields. We examined how efferent projections from VTA to HPC influence spatial working memory and place fields when the reward context changes. CA1 and CA3 process environmental context changes differently and VTA preferentially innervates CA1. Given these anatomical data and electrophysiological evidence that implicates DA in reward processing, we predicted that CA1 place fields would respond more strongly to both VTA disruption and changes in the reward context than CA3 place fields. Rats (N=9) were implanted with infusion cannula targeting VTA and recording tetrodes aimed at HPC. Then they were tested on a differential reward, win-shift working memory task. One recording session consisted of 5 baseline and 5 manipulation trials during which place cells in CA1/CA2 (N=167) and CA3 (N=94) were recorded. Prior to manipulation trials rats were infused with either baclofen or saline and then subjected to control or reward conditions during which the learned locations of large and small reward quantities were reversed. VTA disruption resulted in an increase in errors, and in CA1/CA2 place field reorganization. There were no changes in any measures of CA3 place field stability during VTA disruption. Reward manipulations did not affect performance or place field stability in CA1/CA2 or CA3; however, changes in the reward locations “rescued” performance and place field stability in CA1/CA2 when VTA activity was compromised, perhaps by trigging compensatory mechanisms. These data support the hypothesis that VTA contributes to spatial working memory performance perhaps specifically by maintaining place field stability selectively in CA1/CA2. PMID:20082295
Strauss, Gregory P; Thaler, Nicholas S; Matveeva, Tatyana M; Vogel, Sally J; Sutton, Griffin P; Lee, Bern G; Allen, Daniel N
2015-08-01
There is increasing evidence that schizophrenia (SZ) and bipolar disorder (BD) share a number of cognitive, neurobiological, and genetic markers. Shared features may be most prevalent among SZ and BD with a history of psychosis. This study extended this literature by examining reinforcement learning (RL) performance in individuals with SZ (n = 29), BD with a history of psychosis (BD+; n = 24), BD without a history of psychosis (BD-; n = 23), and healthy controls (HC; n = 24). RL was assessed through a probabilistic stimulus selection task with acquisition and test phases. Computational modeling evaluated competing accounts of the data. Each participant's trial-by-trial decision-making behavior was fit to 3 computational models of RL: (a) a standard actor-critic model simulating pure basal ganglia-dependent learning, (b) a pure Q-learning model simulating action selection as a function of learned expected reward value, and (c) a hybrid model where an actor-critic is "augmented" by a Q-learning component, meant to capture the top-down influence of orbitofrontal cortex value representations on the striatum. The SZ group demonstrated greater reinforcement learning impairments at acquisition and test phases than the BD+, BD-, and HC groups. The BD+ and BD- groups displayed comparable performance at acquisition and test phases. Collapsing across diagnostic categories, greater severity of current psychosis was associated with poorer acquisition of the most rewarding stimuli as well as poor go/no-go learning at test. Model fits revealed that reinforcement learning in SZ was best characterized by a pure actor-critic model where learning is driven by prediction error signaling alone. In contrast, BD-, BD+, and HC were best fit by a hybrid model where prediction errors are influenced by top-down expected value representations that guide decision making. These findings suggest that abnormalities in the reward system are more prominent in SZ than BD; however, current psychotic symptoms may be associated with reinforcement learning deficits regardless of a Diagnostic and Statistical Manual of Mental Disorders (5th Edition; American Psychiatric Association, 2013) diagnosis. (c) 2015 APA, all rights reserved).
'Proactive' use of cue-context congruence for building reinforcement learning's reward function.
Zsuga, Judit; Biro, Klara; Tajti, Gabor; Szilasi, Magdolna Emma; Papp, Csaba; Juhasz, Bela; Gesztelyi, Rudolf
2016-10-28
Reinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent's knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability. The central objective of reinforcement learning is to solve these two functions outside the agent's control either using, or not using a model. In the present paper, using the proactive model of reinforcement learning we offer insight on how the brain creates simplified representations of the environment, and how these representations are organized to support the identification of relevant stimuli and action. Furthermore, we identify neurobiological correlates of our model by suggesting that the reward and policy functions, attributes of the Bellman equitation, are built by the orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC), respectively. Based on this we propose that the OFC assesses cue-context congruence to activate the most context frame. Furthermore given the bidirectional neuroanatomical link between the OFC and model-free structures, we suggest that model-based input is incorporated into the reward prediction error (RPE) signal, and conversely RPE signal may be used to update the reward-related information of context frames and the policy underlying action selection in the OFC and ACC, respectively. Furthermore clinical implications for cognitive behavioral interventions are discussed.
Reward skewness coding in the insula independent of probability and loss
Tobler, Philippe N.
2011-01-01
Rewards in the natural environment are rarely predicted with complete certainty. Uncertainty relating to future rewards has typically been defined as the variance of the potential outcomes. However, the asymmetry of predicted reward distributions, known as skewness, constitutes a distinct but neuroscientifically underexplored risk term that may also have an impact on preference. By changing only reward magnitudes, we study skewness processing in equiprobable ternary lotteries involving only gains and constant probabilities, thus excluding probability distortion or loss aversion as mechanisms for skewness preference formation. We show that individual preferences are sensitive to not only the mean and variance but also to the skewness of predicted reward distributions. Using neuroimaging, we show that the insula, a structure previously implicated in the processing of reward-related uncertainty, responds to the skewness of predicted reward distributions. Some insula responses increased in a monotonic fashion with skewness (irrespective of individual skewness preferences), whereas others were similarly elevated to both negative and positive as opposed to no reward skew. These data support the notion that the asymmetry of reward distributions is processed in the brain and, taken together with replicated findings of mean coding in the striatum and variance coding in the cingulate, suggest that the brain codes distinct aspects of reward distributions in a distributed fashion. PMID:21849610
Reinforcement learning in depression: A review of computational research.
Chen, Chong; Takahashi, Taiki; Nakagawa, Shin; Inoue, Takeshi; Kusumi, Ichiro
2015-08-01
Despite being considered primarily a mood disorder, major depressive disorder (MDD) is characterized by cognitive and decision making deficits. Recent research has employed computational models of reinforcement learning (RL) to address these deficits. The computational approach has the advantage in making explicit predictions about learning and behavior, specifying the process parameters of RL, differentiating between model-free and model-based RL, and the computational model-based functional magnetic resonance imaging and electroencephalography. With these merits there has been an emerging field of computational psychiatry and here we review specific studies that focused on MDD. Considerable evidence suggests that MDD is associated with impaired brain signals of reward prediction error and expected value ('wanting'), decreased reward sensitivity ('liking') and/or learning (be it model-free or model-based), etc., although the causality remains unclear. These parameters may serve as valuable intermediate phenotypes of MDD, linking general clinical symptoms to underlying molecular dysfunctions. We believe future computational research at clinical, systems, and cellular/molecular/genetic levels will propel us toward a better understanding of the disease. Copyright © 2015 Elsevier Ltd. All rights reserved.
Diversity and Homogeneity in Responses of Midbrain Dopamine Neurons
Fiorillo, Christopher D.; Yun, Sora R.; Song, Minryung R.
2013-01-01
Dopamine neurons of the ventral midbrain have been found to signal a reward prediction error that can mediate positive reinforcement. Despite the demonstration of modest diversity at the cellular and molecular levels, there has been little analysis of response diversity in behaving animals. Here we examine response diversity in rhesus macaques to appetitive, aversive, and neutral stimuli having relative motivational values that were measured and controlled through a choice task. First, consistent with previous studies, we observed a continuum of response variability and an apparent absence of distinct clusters in scatter plots, suggesting a lack of statistically discrete subpopulations of neurons. Second, we found that a group of “sensitive” neurons tend to be more strongly suppressed by a variety of stimuli and to be more strongly activated by juice. Third, neurons in the “ventral tier” of substantia nigra were found to have greater suppression, and a subset of these had higher baseline firing rates and late “rebound” activation after suppression. These neurons could belong to a previously identified subgroup of dopamine neurons that express high levels of H-type cation channels but lack calbindin. Fourth, neurons further rostral exhibited greater suppression. Fifth, although we observed weak activation of some neurons by aversive stimuli, this was not associated with their aversiveness. In conclusion, we find a diversity of response properties, distributed along a continuum, within what may be a single functional population of neurons signaling reward prediction error. PMID:23486943
The effects of aging on the interaction between reinforcement learning and attention.
Radulescu, Angela; Daniel, Reka; Niv, Yael
2016-11-01
Reinforcement learning (RL) in complex environments relies on selective attention to uncover those aspects of the environment that are most predictive of reward. Whereas previous work has focused on age-related changes in RL, it is not known whether older adults learn differently from younger adults when selective attention is required. In 2 experiments, we examined how aging affects the interaction between RL and selective attention. Younger and older adults performed a learning task in which only 1 stimulus dimension was relevant to predicting reward, and within it, 1 "target" feature was the most rewarding. Participants had to discover this target feature through trial and error. In Experiment 1, stimuli varied on 1 or 3 dimensions and participants received hints that revealed the target feature, the relevant dimension, or gave no information. Group-related differences in accuracy and RTs differed systematically as a function of the number of dimensions and the type of hint available. In Experiment 2 we used trial-by-trial computational modeling of the learning process to test for age-related differences in learning strategies. Behavior of both young and older adults was explained well by a reinforcement-learning model that uses selective attention to constrain learning. However, the model suggested that older adults restricted their learning to fewer features, employing more focused attention than younger adults. Furthermore, this difference in strategy predicted age-related deficits in accuracy. We discuss these results suggesting that a narrower filter of attention may reflect an adaptation to the reduced capabilities of the reinforcement learning system. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Motivated To Win: Relationship between Anticipatory and Outcome Reward-Related Neural Activity
Nusslock, Robin
2015-01-01
Reward-processing involves two temporal stages characterized by two distinct neural processes: reward-anticipation and reward-outcome. Intriguingly, very little research has examined the relationship between neural processes involved in reward-anticipation and reward-outcome. To investigate this, one needs to consider the heterogeneity of reward-processing within each stage. To identify different stages of reward processing, we adapted a reward time-estimation task. While EEG data were recorded, participants were instructed to button-press 3.5 s after the onset of an Anticipation-Cue and received monetary reward for good time-estimation on the Reward trials, but not on No-Reward trials. We first separated reward-anticipation into event related potentials (ERPs) occurring at three sub-stages: reward/no-reward cue-evaluation, motor-preparation and feedback-anticipation. During reward/no-reward cue-evaluation, the Reward-Anticipation Cue led to a smaller N2 and larger P3. During motor-preparation, we report, for the first time, that the Reward-Anticipation Cue enhanced the Readiness Potential (RP), starting approximately 1 s before movement. At the subsequent feedback-anticipation stage, the Reward-Anticipation Cue elevated the Stimulus-Preceding Negativity (SPN). We also separated reward-outcome ERPs into different components occurring at different time-windows: the Feedback-Related Negativity (FRN), Feedback-P3 (FB-P3) and Late-Positive Potentials (LPP). Lastly, we examined the relationship between reward-anticipation and reward-outcome ERPs. We report that individual-differences in specific reward-anticipation ERPs uniquely predicted specific reward-outcome ERPs. In particular, the reward-anticipation Early-RP (1 to .8 s before movement) predicted early reward-outcome ERPs (FRN and FB-P3), whereas, the reward-anticipation SPN most strongly predicted a later reward-outcome ERP (LPP). Results have important implications for understanding the nature of the relationship between reward-anticipation and reward-outcome neural-processes. PMID:26433773
Neural Correlates of Temporal Credit Assignment in the Parietal Lobe
Eisenberg, Ian; Gottlieb, Jacqueline
2014-01-01
Empirical studies of decision making have typically assumed that value learning is governed by time, such that a reward prediction error arising at a specific time triggers temporally-discounted learning for all preceding actions. However, in natural behavior, goals must be acquired through multiple actions, and each action can have different significance for the final outcome. As is recognized in computational research, carrying out multi-step actions requires the use of credit assignment mechanisms that focus learning on specific steps, but little is known about the neural correlates of these mechanisms. To investigate this question we recorded neurons in the monkey lateral intraparietal area (LIP) during a serial decision task where two consecutive eye movement decisions led to a final reward. The underlying decision trees were structured such that the two decisions had different relationships with the final reward, and the optimal strategy was to learn based on the final reward at one of the steps (the “F” step) but ignore changes in this reward at the remaining step (the “I” step). In two distinct contexts, the F step was either the first or the second in the sequence, controlling for effects of temporal discounting. We show that LIP neurons had the strongest value learning and strongest post-decision responses during the transition after the F step regardless of the serial position of this step. Thus, the neurons encode correlates of temporal credit assignment mechanisms that allocate learning to specific steps independently of temporal discounting. PMID:24523935
Does general motivation energize financial reward-seeking behavior? Evidence from an effort task.
Chumbley, Justin; Fehr, Ernst
2014-01-01
We aimed to predict how hard subjects work for financial rewards from their general trait and state reward-motivation. We specifically asked 1) whether individuals high in general trait "reward responsiveness" work harder 2) whether task-irrelevant cues can make people work harder, by increasing general motivation. Each trial of our task contained a 1 second earning interval in which male subjects earned money for each button press. This was preceded by one of three predictive cues: an erotic picture of a woman, a man, or a geometric figure. We found that individuals high in trait "reward responsiveness" worked harder and earned more, irrespective of the predictive cue. Because female predictive cues are more rewarding, we expected them to increase general motivation in our male subjects and invigorate work, but found a more complex pattern.
Emotional arousal and discount rate in intertemporal choice are reference dependent.
Lempert, Karolina M; Glimcher, Paul W; Phelps, Elizabeth A
2015-04-01
Many decisions involve weighing immediate gratification against future consequences. In such intertemporal choices, people often choose smaller, immediate rewards over larger delayed rewards. It has been proposed that emotional responses to immediate rewards lead us to choose them at our long-term expense. Here we utilize an objective measure of emotional arousal-pupil dilation-to examine the role of emotion in these decisions. We show that emotional arousal responses, as well as choices, in intertemporal choice tasks are reference-dependent and reflect the decision-maker's recent history of offers. Arousal increases when less predictable rewards are better than expected, whether those rewards are immediate or delayed. Furthermore, when immediate rewards are less predictable than delayed rewards, participants tend to be patient. When delayed rewards are less predictable, immediate rewards are preferred. Our findings suggest that we can encourage people to be more patient by changing the context in which intertemporal choices are made. (c) 2015 APA, all rights reserved).
VTA neurons coordinate with the hippocampal reactivation of spatial experience
Gomperts, Stephen N; Kloosterman, Fabian; Wilson, Matthew A
2015-01-01
Spatial learning requires the hippocampus, and the replay of spatial sequences during hippocampal sharp wave-ripple (SPW-R) events of quiet wakefulness and sleep is believed to play a crucial role. To test whether the coordination of VTA reward prediction error signals with these replayed spatial sequences could contribute to this process, we recorded from neuronal ensembles of the hippocampus and VTA as rats performed appetitive spatial tasks and subsequently slept. We found that many reward responsive (RR) VTA neurons coordinated with quiet wakefulness-associated hippocampal SPW-R events that replayed recent experience. In contrast, coordination between RR neurons and SPW-R events in subsequent slow wave sleep was diminished. Together, these results indicate distinct contributions of VTA reinforcement activity associated with hippocampal spatial replay to the processing of wake and SWS-associated spatial memory. DOI: http://dx.doi.org/10.7554/eLife.05360.001 PMID:26465113
The role of prediction in social neuroscience
Brown, Elliot C.; Brüne, Martin
2012-01-01
Research has shown that the brain is constantly making predictions about future events. Theories of prediction in perception, action and learning suggest that the brain serves to reduce the discrepancies between expectation and actual experience, i.e., by reducing the prediction error. Forward models of action and perception propose the generation of a predictive internal representation of the expected sensory outcome, which is matched to the actual sensory feedback. Shared neural representations have been found when experiencing one's own and observing other's actions, rewards, errors, and emotions such as fear and pain. These general principles of the “predictive brain” are well established and have already begun to be applied to social aspects of cognition. The application and relevance of these predictive principles to social cognition are discussed in this article. Evidence is presented to argue that simple non-social cognitive processes can be extended to explain complex cognitive processes required for social interaction, with common neural activity seen for both social and non-social cognitions. A number of studies are included which demonstrate that bottom-up sensory input and top-down expectancies can be modulated by social information. The concept of competing social forward models and a partially distinct category of social prediction errors are introduced. The evolutionary implications of a “social predictive brain” are also mentioned, along with the implications on psychopathology. The review presents a number of testable hypotheses and novel comparisons that aim to stimulate further discussion and integration between currently disparate fields of research, with regard to computational models, behavioral and neurophysiological data. This promotes a relatively new platform for inquiry in social neuroscience with implications in social learning, theory of mind, empathy, the evolution of the social brain, and potential strategies for treating social cognitive deficits. PMID:22654749
Interactions between the nucleus accumbens and auditory cortices predict music reward value.
Salimpoor, Valorie N; van den Bosch, Iris; Kovacevic, Natasa; McIntosh, Anthony Randal; Dagher, Alain; Zatorre, Robert J
2013-04-12
We used functional magnetic resonance imaging to investigate neural processes when music gains reward value the first time it is heard. The degree of activity in the mesolimbic striatal regions, especially the nucleus accumbens, during music listening was the best predictor of the amount listeners were willing to spend on previously unheard music in an auction paradigm. Importantly, the auditory cortices, amygdala, and ventromedial prefrontal regions showed increased activity during listening conditions requiring valuation, but did not predict reward value, which was instead predicted by increasing functional connectivity of these regions with the nucleus accumbens as the reward value increased. Thus, aesthetic rewards arise from the interaction between mesolimbic reward circuitry and cortical networks involved in perceptual analysis and valuation.
Bellebaum, C; Jokisch, D; Gizewski, E R; Forsting, M; Daum, I
2012-02-01
Successful adaptation to the environment requires the learning of stimulus-response-outcome associations. Such associations can be learned actively by trial and error or by observing the behaviour and accompanying outcomes in other persons. The present study investigated similarities and differences in the neural mechanisms of active and observational learning from monetary feedback using functional magnetic resonance imaging. Two groups of 15 subjects each - active and observational learners - participated in the experiment. On every trial, active learners chose between two stimuli and received monetary feedback. Each observational learner observed the choices and outcomes of one active learner. Learning performance as assessed via active test trials without feedback was comparable between groups. Different activation patterns were observed for the processing of unexpected vs. expected monetary feedback in active and observational learners, particularly for positive outcomes. Activity for unexpected vs. expected reward was stronger in the right striatum in active learning, while activity in the hippocampus was bilaterally enhanced in observational and reduced in active learning. Modulation of activity by prediction error (PE) magnitude was observed in the right putamen in both types of learning, whereas PE related activations in the right anterior caudate nucleus and in the medial orbitofrontal cortex were stronger for active learning. The striatum and orbitofrontal cortex thus appear to link reward stimuli to own behavioural reactions and are less strongly involved when the behavioural outcome refers to another person's action. Alternative explanations such as differences in reward value between active and observational learning are also discussed. Copyright © 2011 Elsevier B.V. All rights reserved.
Morie, Kristen P; De Sanctis, Pierfilippo; Garavan, Hugh; Foxe, John J
2016-03-01
We investigated anticipatory and consummatory reward processing in cocaine addiction. In addition, we set out to assess whether task-monitoring systems were appropriately recalibrated in light of variable reward schedules. We also examined neural measures of task-monitoring and reward processing as a function of hedonic tone, since anhedonia is a vulnerability marker for addiction that is obviously germane in the context of reward processing. High-density event-related potentials were recorded while participants performed a speeded response task that systematically varied anticipated probabilities of reward receipt. The paradigm dissociated feedback regarding task success (or failure) from feedback regarding the value of reward (or loss), so that task-monitoring and reward processing could be examined in partial isolation. Twenty-three active cocaine abusers and 23 age-matched healthy controls participated. Cocaine abusers showed amplified anticipatory responses to reward predictive cues, but crucially, these responses were not as strongly modulated by reward probability as in controls. Cocaine users also showed blunted responses to feedback about task success or failure and did not use this information to update predictions about reward. In turn, they showed clearly blunted responses to reward feedback. In controls and users, measures of anhedonia were associated with reward motivation. In cocaine users, anhedonia was also associated with diminished monitoring and reward feedback responses. Findings imply that reward anticipation and monitoring deficiencies in addiction are associated with increased responsiveness to reward cues but impaired ability to predict reward in light of task contingencies, compounded by deficits in responding to actual reward outcomes.
Decision-making in schizophrenia: A predictive-coding perspective.
Sterzer, Philipp; Voss, Martin; Schlagenhauf, Florian; Heinz, Andreas
2018-05-31
Dysfunctional decision-making has been implicated in the positive and negative symptoms of schizophrenia. Decision-making can be conceptualized within the framework of hierarchical predictive coding as the result of a Bayesian inference process that uses prior beliefs to infer states of the world. According to this idea, prior beliefs encoded at higher levels in the brain are fed back as predictive signals to lower levels. Whenever these predictions are violated by the incoming sensory data, a prediction error is generated and fed forward to update beliefs encoded at higher levels. Well-documented impairments in cognitive decision-making support the view that these neural inference mechanisms are altered in schizophrenia. There is also extensive evidence relating the symptoms of schizophrenia to aberrant signaling of prediction errors, especially in the domain of reward and value-based decision-making. Moreover, the idea of altered predictive coding is supported by evidence for impaired low-level sensory mechanisms and motor processes. We review behavioral and neural findings from these research areas and provide an integrated view suggesting that schizophrenia may be related to a pervasive alteration in predictive coding at multiple hierarchical levels, including cognitive and value-based decision-making processes as well as sensory and motor systems. We relate these findings to decision-making processes and propose that varying degrees of impairment in the implicated brain areas contribute to the variety of psychotic experiences. Copyright © 2018 Elsevier Inc. All rights reserved.
The habenula governs the attribution of incentive salience to reward predictive cues
Danna, Carey L.; Shepard, Paul D.; Elmer, Greg I.
2013-01-01
The attribution of incentive salience to reward associated cues is critical for motivation and the pursuit of rewards. Disruptions in the integrity of the neural systems controlling these processes can lead to avolition and anhedonia, symptoms that cross the diagnostic boundaries of many neuropsychiatric illnesses. Here, we consider whether the habenula (Hb), a region recently demonstrated to encode negatively valenced events, also modulates the attribution of incentive salience to a neutral cue predicting a food reward. The Pavlovian autoshaping paradigm was used in the rat as an investigative tool to dissociate Pavlovian learning processes imparting strictly predictive value from learning that attributes incentive motivational value. Electrolytic lesions of the fasciculus retroflexus (fr), the sole pathway through which descending Hb efferents are conveyed, significantly increased incentive salience as measured by conditioned approaches to a cue light predictive of reward. Conversely, generation of a fictive Hb signal via fr stimulation during CS+ presentation significantly decreased the incentive salience of the predictive cue. Neither manipulation altered the reward predictive value of the cue as measured by conditioned approach to the food. Our results provide new evidence supporting a significant role for the Hb in governing the attribution of incentive motivational salience to reward predictive cues and further imply that pathological changes in Hb activity could contribute to the aberrant pursuit of debilitating goals or avolition and depression-like symptoms. PMID:24368898
EEG to Primary Rewards: Predictive Utility and Malleability by Brain Stimulation
Prause, Nicole; Siegle, Greg J.; Deblieck, Choi; Wu, Allan; Iacoboni, Marco
2016-01-01
Theta burst stimulation (TBS) is thought to affect reward processing mechanisms, which may increase and decrease reward sensitivity. To test the ability of TBS to modulate response to strong primary rewards, participants hypersensitive to primary rewards were recruited. Twenty men and women with at least two opposite-sex, sexual partners in the last year received two forms of TBS. Stimulations were randomized to avoid order effects and separated by 2 hours to reduce carryover. The two TBS forms have been demonstrated to inhibit (continuous) or excite (intermittent) the left dorsolateral prefrontal cortex using different pulse patterns, which links to brain areas associated with reward conditioning. After each TBS, participants completed tasks assessing their reward responsiveness to monetary and sexual rewards. Electroencephalography (EEG) was recorded. They also reported their number of orgasms in the weekend following stimulation. This signal was malleable by TBS, where excitatory TBS resulted in lower EEG alpha relative to inhibitory TBS to primary rewards. EEG responses to sexual rewards in the lab (following both forms of TBS) predicted the number of orgasms experienced over the forthcoming weekend. TBS may be useful in modifying hypersensitivity or hyposensitivity to primary rewards that predict sexual behaviors. Since TBS altered the anticipation of a sexual reward, TBS may offer a novel treatment for sexual desire problems. PMID:27902711
EEG to Primary Rewards: Predictive Utility and Malleability by Brain Stimulation.
Prause, Nicole; Siegle, Greg J; Deblieck, Choi; Wu, Allan; Iacoboni, Marco
2016-01-01
Theta burst stimulation (TBS) is thought to affect reward processing mechanisms, which may increase and decrease reward sensitivity. To test the ability of TBS to modulate response to strong primary rewards, participants hypersensitive to primary rewards were recruited. Twenty men and women with at least two opposite-sex, sexual partners in the last year received two forms of TBS. Stimulations were randomized to avoid order effects and separated by 2 hours to reduce carryover. The two TBS forms have been demonstrated to inhibit (continuous) or excite (intermittent) the left dorsolateral prefrontal cortex using different pulse patterns, which links to brain areas associated with reward conditioning. After each TBS, participants completed tasks assessing their reward responsiveness to monetary and sexual rewards. Electroencephalography (EEG) was recorded. They also reported their number of orgasms in the weekend following stimulation. This signal was malleable by TBS, where excitatory TBS resulted in lower EEG alpha relative to inhibitory TBS to primary rewards. EEG responses to sexual rewards in the lab (following both forms of TBS) predicted the number of orgasms experienced over the forthcoming weekend. TBS may be useful in modifying hypersensitivity or hyposensitivity to primary rewards that predict sexual behaviors. Since TBS altered the anticipation of a sexual reward, TBS may offer a novel treatment for sexual desire problems.
Pan, Pedro Mario; Sato, João R; Salum, Giovanni A; Rohde, Luis A; Gadelha, Ary; Zugman, Andre; Mari, Jair; Jackowski, Andrea; Picon, Felipe; Miguel, Eurípedes C; Pine, Daniel S; Leibenluft, Ellen; Bressan, Rodrigo A; Stringaris, Argyris
2017-11-01
Previous studies have implicated aberrant reward processing in the pathogenesis of adolescent depression. However, no study has used functional connectivity within a distributed reward network, assessed using resting-state functional MRI (fMRI), to predict the onset of depression in adolescents. This study used reward network-based functional connectivity at baseline to predict depressive disorder at follow-up in a community sample of adolescents. A total of 637 children 6-12 years old underwent resting-state fMRI. Discovery and replication analyses tested intrinsic functional connectivity (iFC) among nodes of a putative reward network. Logistic regression tested whether striatal node strength, a measure of reward-related iFC, predicted onset of a depressive disorder at 3-year follow-up. Further analyses investigated the specificity of this prediction. Increased left ventral striatum node strength predicted increased risk for future depressive disorder (odds ratio=1.54, 95% CI=1.09-2.18), even after excluding participants who had depressive disorders at baseline (odds ratio=1.52, 95% CI=1.05-2.20). Among 11 reward-network nodes, only the left ventral striatum significantly predicted depression. Striatal node strength did not predict other common adolescent psychopathology, such as anxiety, attention deficit hyperactivity disorder, and substance use. Aberrant ventral striatum functional connectivity specifically predicts future risk for depressive disorder. This finding further emphasizes the need to understand how brain reward networks contribute to youth depression.
Misfortune may be a blessing in disguise: Fairness perception and emotion modulate decision making.
Liu, Hong-Hsiang; Hwang, Yin-Dir; Hsieh, Ming H; Hsu, Yung-Fong; Lai, Wen-Sung
2017-08-01
Fairness perception and equality during social interactions frequently elicit affective arousal and affect decision making. By integrating the dictator game and a probabilistic gambling task, this study aimed to investigate the effects of a negative experience induced by perceived unfairness on decision making using behavioral, model fitting, and electrophysiological approaches. Participants were randomly assigned to the neutral, harsh, or kind groups, which consisted of various asset allocation scenarios to induce different levels of perceived unfairness. The monetary gain was subsequently considered the initial asset in a negatively rewarded, probabilistic gambling task in which the participants were instructed to maintain as much asset as possible. Our behavioral results indicated that the participants in the harsh group exhibited increased levels of negative emotions but retained greater total game scores than the participants in the other two groups. Parameter estimation of a reinforcement learning model using a Bayesian approach indicated that these participants were more loss aversive and consistent in decision making. Data from simultaneous ERP recordings further demonstrated that these participants exhibited larger feedback-related negativity to unexpected outcomes in the gambling task, which suggests enhanced reward sensitivity and signaling of reward prediction error. Collectively, our study suggests that a negative experience may be an advantage in the modulation of reward-based decision making. © 2017 Society for Psychophysiological Research.
Reward-dependent learning in neuronal networks for planning and decision making.
Dehaene, S; Changeux, J P
2000-01-01
Neuronal network models have been proposed for the organization of evaluation and decision processes in prefrontal circuitry and their putative neuronal and molecular bases. The models all include an implementation and simulation of an elementary reward mechanism. Their central hypothesis is that tentative rules of behavior, which are coded by clusters of active neurons in prefrontal cortex, are selected or rejected based on an evaluation by this reward signal, which may be conveyed, for instance, by the mesencephalic dopaminergic neurons with which the prefrontal cortex is densely interconnected. At the molecular level, the reward signal is postulated to be a neurotransmitter such as dopamine, which exerts a global modulatory action on prefrontal synaptic efficacies, either via volume transmission or via targeted synaptic triads. Negative reinforcement has the effect of destabilizing the currently active rule-coding clusters; subsequently, spontaneous activity varies again from one cluster to another, giving the organism the chance to discover and learn a new rule. Thus, reward signals function as effective selection signals that either maintain or suppress currently active prefrontal representations as a function of their current adequacy. Simulations of this variation-selection have successfully accounted for the main features of several major tasks that depend on prefrontal cortex integrity, such as the delayed-response test, the Wisconsin card sorting test, the Tower of London test and the Stroop test. For the more complex tasks, we have found it necessary to supplement the external reward input with a second mechanism that supplies an internal reward; it consists of an auto-evaluation loop which short-circuits the reward input from the exterior. This allows for an internal evaluation of covert motor intentions without actualizing them as behaviors, by simply testing them covertly by comparison with memorized former experiences. This element of architecture gives access to enhanced rates of learning via an elementary process of internal or covert mental simulation. We have recently applied these ideas to a new model, developed with M. Kerszberg, which hypothesizes that prefrontal cortex and its reward-related connections contribute crucially to conscious effortful tasks. This model distinguishes two main computational spaces within the human brain: a unique global workspace composed of distributed and heavily interconnected neurons with long-range axons, and a set of specialized and modular perceptual, motor, memory, evaluative and attentional processors. We postulate that workspace neurons are mobilized in effortful tasks for which the specialized processors do not suffice; they selectively mobilize or suppress, through descending connections, the contribution of specific processor neurons. In the course of task performance, workspace neurons become spontaneously co-activated, forming discrete though variable spatio-temporal patterns subject to modulation by vigilance signals and to selection by reward signals. A computer simulation of the Stroop task shows workspace activation to increase during acquisition of a novel task, effortful execution, and after errors. This model makes predictions concerning the spatio-temporal activation patterns during brain imaging of cognitive tasks, particularly concerning the conditions of activation of dorsolateral prefrontal cortex and anterior cingulate, their relation to reward mechanisms, and their specific reaction during error processing.
Steiger, Tineke K; Bunzeck, Nico
2017-01-01
Motivation can have invigorating effects on behavior via dopaminergic neuromodulation. While this relationship has mainly been established in theoretical models and studies in younger subjects, the impact of structural declines of the dopaminergic system during healthy aging remains unclear. To investigate this issue, we used electroencephalography (EEG) in healthy young and elderly humans in a reward-learning paradigm. Specifically, scene images were initially encoded by combining them with cues predicting monetary reward (high vs. low reward). Subsequently, recognition memory for the scenes was tested. As a main finding, we can show that response times (RTs) during encoding were faster for high reward predicting images in the young but not elderly participants. This pattern was resembled in power changes in the theta-band (4-7 Hz). Importantly, analyses of structural MRI data revealed that individual reward-related differences in the elderlies' response time could be predicted by the structural integrity of the dopaminergic substantia nigra (SN; as measured by magnetization transfer (MT)). These findings suggest a close relationship between reward-based invigoration, theta oscillations and age-dependent changes of the dopaminergic system.
Hippocampal morphology mediates biased memories of chronic pain
Berger, Sara E.; Vachon-Presseau, Étienne; Abdullah, Taha B.; Baria, Alex T.; Schnitzer, Thomas J.; Apkarian, A. Vania
2018-01-01
Experiences and memories are often mismatched. While multiple studies have investigated psychological underpinnings of recall error with respect to emotional events, the neurobiological mechanisms underlying the divergence between experiences and memories remain relatively unexplored in the domain of chronic pain. Here we examined the discrepancy between experienced chronic low back pain (CBP) intensity (twice daily ratings) and remembered pain intensity (n = 48 subjects) relative to psychometric properties, hippocampus morphology, memory capabilities, and personality traits related to reward. 77% of CBP patients exaggerated remembered pain, which depended on their strongest experienced pain and their most recent mood rating. This bias persisted over nearly 1 year and was related to reward memory bias and loss aversion. Shape displacement of a specific region in the left posterior hippocampus mediated personality effects on pain memory bias, predicted pain memory bias in a validation CBP group (n = 21), and accounted for 55% of the variance of pain memory bias. In two independent groups (n = 20/group), morphology of this region was stable over time and unperturbed by the development of chronic pain. These results imply that a localized hippocampal circuit, and personality traits associated with reward processing, largely determine exaggeration of daily pain experiences in chronic pain patients. PMID:29080714
Leathers, Marvin L; Olson, Carl R
2017-04-01
Neurons in the lateral intraparietal (LIP) area of macaque monkey parietal cortex respond to cues predicting rewards and penalties of variable size in a manner that depends on the motivational salience of the predicted outcome (strong for both large reward and large penalty) rather than on its value (positive for large reward and negative for large penalty). This finding suggests that LIP mediates the capture of attention by salient events and does not encode value in the service of value-based decision making. It leaves open the question whether neurons elsewhere in the brain encode value in the identical task. To resolve this issue, we recorded neuronal activity in the amygdala in the context of the task employed in the LIP study. We found that responses to reward-predicting cues were similar between areas, with the majority of reward-sensitive neurons responding more strongly to cues that predicted large reward than to those that predicted small reward. Responses to penalty-predicting cues were, however, markedly different. In the amygdala, unlike LIP, few neurons were sensitive to penalty size, few penalty-sensitive neurons favored large over small penalty, and the dependence of firing rate on penalty size was negatively correlated with its dependence on reward size. These results indicate that amygdala neurons encoded cue value under circumstances in which LIP neurons exhibited sensitivity to motivational salience. However, the representation of negative value, as reflected in sensitivity to penalty size, was weaker than the representation of positive value, as reflected in sensitivity to reward size. NEW & NOTEWORTHY This is the first study to characterize amygdala neuronal responses to cues predicting rewards and penalties of variable size in monkeys making value-based choices. Manipulating reward and penalty size allowed distinguishing activity dependent on motivational salience from activity dependent on value. This approach revealed in a previous study that neurons of the lateral intraparietal (LIP) area encode motivational salience. Here, it reveals that amygdala neurons encode value. The results establish a sharp functional distinction between the two areas. Copyright © 2017 the American Physiological Society.
Fronto-temporal white matter connectivity predicts reversal learning errors
Alm, Kylie H.; Rolheiser, Tyler; Mohamed, Feroze B.; Olson, Ingrid R.
2015-01-01
Each day, we make hundreds of decisions. In some instances, these decisions are guided by our innate needs; in other instances they are guided by memory. Probabilistic reversal learning tasks exemplify the close relationship between decision making and memory, as subjects are exposed to repeated pairings of a stimulus choice with a reward or punishment outcome. After stimulus–outcome associations have been learned, the associated reward contingencies are reversed, and participants are not immediately aware of this reversal. Individual differences in the tendency to choose the previously rewarded stimulus reveal differences in the tendency to make poorly considered, inflexible choices. Lesion studies have strongly linked reversal learning performance to the functioning of the orbitofrontal cortex, the hippocampus, and in some instances, the amygdala. Here, we asked whether individual differences in the microstructure of the uncinate fasciculus, a white matter tract that connects anterior and medial temporal lobe regions to the orbitofrontal cortex, predict reversal learning performance. Diffusion tensor imaging and behavioral paradigms were used to examine this relationship in 33 healthy young adults. The results of tractography revealed a significant negative relationship between reversal learning performance and uncinate axial diffusivity, but no such relationship was demonstrated in a control tract, the inferior longitudinal fasciculus. Our findings suggest that the uncinate might serve to integrate associations stored in the anterior and medial temporal lobes with expectations about expected value based on feedback history, computed in the orbitofrontal cortex. PMID:26150776
Paquet, Maxime; Courcy, François; Lavoie-Tremblay, Mélanie; Gagnon, Serge; Maillet, Stéphanie
2013-05-01
Few studies link organizational variables and outcomes to quality indicators. This approach would expose operant mechanisms by which work environment characteristics and organizational outcomes affect clinical effectiveness, safety, and quality indicators. What are the predominant psychosocial variables in the explanation of organizational outcomes and quality indicators (in this case, medication errors and length of stay)? The primary objective of this study was to link the fields of evidence-based practice to the field of decision making, by providing an effective model of intervention to improve safety and quality. The study involved healthcare workers (n = 243) from 13 different care units of a university affiliated health center in Canada. Data regarding the psychosocial work environment (10 work climate scales, effort/reward imbalance, and social support) was linked to organizational outcomes (absenteeism, turnover, overtime), to the nurse/patient ratio and quality indicators (medication errors and length of stay) using path analyses. The models produced in this study revealed a contribution of some psychosocial factors to quality indicators, through an indirect effect of personnel- or human resources-related variables, more precisely: turnover, absenteeism, overtime, and nurse/patient ratio. Four perceptions of work environment appear to play an important part in the indirect effect on both medication errors and length of stay: apparent social support from supervisors, appreciation of the workload demands, pride in being part of one's work team, and effort/reward balance. This study reveals the importance of employee perceptions of the work environment as an indirect predictor of quality of care. Working to improve these perceptions is a good investment for loyalty and attendance. In general, better personnel conditions lead to fewer medication errors and shorter length of stay. © Sigma Theta Tau International.
Schultz, Wolfram
2004-04-01
Neurons in a small number of brain structures detect rewards and reward-predicting stimuli and are active during the expectation of predictable food and liquid rewards. These neurons code the reward information according to basic terms of various behavioural theories that seek to explain reward-directed learning, approach behaviour and decision-making. The involved brain structures include groups of dopamine neurons, the striatum including the nucleus accumbens, the orbitofrontal cortex and the amygdala. The reward information is fed to brain structures involved in decision-making and organisation of behaviour, such as the dorsolateral prefrontal cortex and possibly the parietal cortex. The neural coding of basic reward terms derived from formal theories puts the neurophysiological investigation of reward mechanisms on firm conceptual grounds and provides neural correlates for the function of rewards in learning, approach behaviour and decision-making.
Rational decision-making in inhibitory control.
Shenoy, Pradeep; Yu, Angela J
2011-01-01
An important aspect of cognitive flexibility is inhibitory control, the ability to dynamically modify or cancel planned actions in response to changes in the sensory environment or task demands. We formulate a probabilistic, rational decision-making framework for inhibitory control in the stop signal paradigm. Our model posits that subjects maintain a Bayes-optimal, continually updated representation of sensory inputs, and repeatedly assess the relative value of stopping and going on a fine temporal scale, in order to make an optimal decision on when and whether to go on each trial. We further posit that they implement this continual evaluation with respect to a global objective function capturing the various reward and penalties associated with different behavioral outcomes, such as speed and accuracy, or the relative costs of stop errors and go errors. We demonstrate that our rational decision-making model naturally gives rise to basic behavioral characteristics consistently observed for this paradigm, as well as more subtle effects due to contextual factors such as reward contingencies or motivational factors. Furthermore, we show that the classical race model can be seen as a computationally simpler, perhaps neurally plausible, approximation to optimal decision-making. This conceptual link allows us to predict how the parameters of the race model, such as the stopping latency, should change with task parameters and individual experiences/ability.
Rational Decision-Making in Inhibitory Control
Shenoy, Pradeep; Yu, Angela J.
2011-01-01
An important aspect of cognitive flexibility is inhibitory control, the ability to dynamically modify or cancel planned actions in response to changes in the sensory environment or task demands. We formulate a probabilistic, rational decision-making framework for inhibitory control in the stop signal paradigm. Our model posits that subjects maintain a Bayes-optimal, continually updated representation of sensory inputs, and repeatedly assess the relative value of stopping and going on a fine temporal scale, in order to make an optimal decision on when and whether to go on each trial. We further posit that they implement this continual evaluation with respect to a global objective function capturing the various reward and penalties associated with different behavioral outcomes, such as speed and accuracy, or the relative costs of stop errors and go errors. We demonstrate that our rational decision-making model naturally gives rise to basic behavioral characteristics consistently observed for this paradigm, as well as more subtle effects due to contextual factors such as reward contingencies or motivational factors. Furthermore, we show that the classical race model can be seen as a computationally simpler, perhaps neurally plausible, approximation to optimal decision-making. This conceptual link allows us to predict how the parameters of the race model, such as the stopping latency, should change with task parameters and individual experiences/ability. PMID:21647306
Monfardini, Elisabetta; Gaveau, Valérie; Boussaoud, Driss; Hadj-Bouziane, Fadila; Meunier, Martine
2012-01-01
Much theoretical attention is currently devoted to social learning. Yet, empirical studies formally comparing its effectiveness relative to individual learning are rare. Here, we focus on free choice, which is at the heart of individual reward-based learning, but absent in social learning. Choosing among two equally valued options is known to create a preference for the selected option in both humans and monkeys. We thus surmised that social learning should be more helpful when choice-induced preferences retard individual learning than when they optimize it. To test this prediction, the same task requiring to find which among two items concealed a reward was applied to rhesus macaques and humans. The initial trial was individual or social, rewarded or unrewarded. Learning was assessed on the second trial. Choice-induced preference strongly affected individual learning. Monkeys and humans performed much more poorly after an initial negative choice than after an initial positive choice. Comparison with social learning verified our prediction. For negative outcome, social learning surpassed or at least equaled individual learning in all subjects. For positive outcome, the predicted superiority of individual learning did occur in a majority of subjects (5/6 monkeys and 6/12 humans). A minority kept learning better socially though, perhaps due to a more dominant/aggressive attitude toward peers. Poor learning from errors due to over-valuation of personal choices is among the decision-making biases shared by humans and animals. The present study suggests that choice-immune social learning may help curbing this potentially harmful tendency. Learning from successes is an easier path. The present data suggest that whether one tends to walk it alone or with a peer’s help might depend on the social dynamics within the actor/observer dyad. PMID:22969703
White, Stuart F; Geraci, Marilla; Lewis, Elizabeth; Leshin, Joseph; Teng, Cindy; Averbeck, Bruno; Meffert, Harma; Ernst, Monique; Blair, James R; Grillon, Christian; Blair, Karina S
2017-02-01
Deficits in reinforcement-based decision making have been reported in generalized anxiety disorder. However, the pathophysiology of these deficits is largely unknown; published studies have mainly examined adolescents, and the integrity of core functional processes underpinning decision making remains undetermined. In particular, it is unclear whether the representation of reinforcement prediction error (PE) (the difference between received and expected reinforcement) is disrupted in generalized anxiety disorder. This study addresses these issues in adults with the disorder. Forty-six unmedicated individuals with generalized anxiety disorder and 32 healthy comparison subjects group-matched on IQ, gender, and age performed a passive avoidance task while undergoing functional MRI. Data analyses were performed using a computational modeling approach. Behaviorally, individuals with generalized anxiety disorder showed impaired reinforcement-based decision making. Imaging results revealed that during feedback, individuals with generalized anxiety disorder relative to healthy subjects showed a reduced correlation between PE and activity within the ventromedial prefrontal cortex, ventral striatum, and other structures implicated in decision making. In addition, individuals with generalized anxiety disorder relative to healthy participants showed a reduced correlation between punishment PEs, but not reward PEs, and activity within the left and right lentiform nucleus/putamen. This is the first study to identify computational impairments during decision making in generalized anxiety disorder. PE signaling is significantly disrupted in individuals with the disorder and may lead to their decision-making deficits and excessive worry about everyday problems by disrupting the online updating ("reality check") of the current relationship between the expected values of current response options and the actual received rewards and punishments.
ERIC Educational Resources Information Center
Hershberg, Theodore; Robertson-Kraft, Claire
2010-01-01
Pay-for-performance systems in public schools have long been burdened with controversy. Critics of performance pay systems contend that because teachers' impact cannot be measured without error, it is impossible to create fair and accurate systems for evaluating and rewarding performance. By this standard, however, current practice fails on both…
Sethi, Arjun; Voon, Valerie; Critchley, Hugo D; Cercignani, Mara; Harrison, Neil A
2018-05-01
Computational models of reinforcement learning have helped dissect discrete components of reward-related function and characterize neurocognitive deficits in psychiatric illnesses. Stimulus novelty biases decision-making, even when unrelated to choice outcome, acting as if possessing intrinsic reward value to guide decisions toward uncertain options. Heightened novelty seeking is characteristic of attention deficit hyperactivity disorder, yet how this influences reward-related decision-making is computationally encoded, or is altered by stimulant medication, is currently uncertain. Here we used an established reinforcement-learning task to model effects of novelty on reward-related behaviour during functional MRI in 30 adults with attention deficit hyperactivity disorder and 30 age-, sex- and IQ-matched control subjects. Each participant was tested on two separate occasions, once ON and once OFF stimulant medication. OFF medication, patients with attention deficit hyperactivity disorder showed significantly impaired task performance (P = 0.027), and greater selection of novel options (P = 0.004). Moreover, persistence in selecting novel options predicted impaired task performance (P = 0.025). These behavioural deficits were accompanied by a significantly lower learning rate (P = 0.011) and heightened novelty signalling within the substantia nigra/ventral tegmental area (family-wise error corrected P < 0.05). Compared to effects in controls, stimulant medication improved attention deficit hyperactivity disorder participants' overall task performance (P = 0.011), increased reward-learning rates (P = 0.046) and enhanced their ability to differentiate optimal from non-optimal novel choices (P = 0.032). It also reduced substantia nigra/ventral tegmental area responses to novelty. Preliminary cross-sectional evidence additionally suggested an association between long-term stimulant treatment and a reduction in the rewarding value of novelty. These data suggest that aberrant substantia nigra/ventral tegmental area novelty processing plays an important role in the suboptimal reward-related decision-making characteristic of attention deficit hyperactivity disorder. Compared to effects in controls, abnormalities in novelty processing and reward-related learning were improved by stimulant medication, suggesting that they may be disorder-specific targets for the pharmacological management of attention deficit hyperactivity disorder symptoms.
Abnormal Striatal BOLD Responses to Reward Anticipation and Reward Delivery in ADHD
Furukawa, Emi; Bado, Patricia; Tripp, Gail; Mattos, Paulo; Wickens, Jeff R.; Bramati, Ivanei E.; Alsop, Brent; Ferreira, Fernanda Meireles; Lima, Debora; Tovar-Moll, Fernanda; Sergeant, Joseph A.; Moll, Jorge
2014-01-01
Altered reward processing has been proposed to contribute to the symptoms of attention deficit hyperactivity disorder (ADHD). The neurobiological mechanism underlying this alteration remains unclear. We hypothesize that the transfer of dopamine release from reward to reward-predicting cues, as normally observed in animal studies, may be deficient in ADHD. Functional magnetic resonance imaging (fMRI) was used to investigate striatal responses to reward-predicting cues and reward delivery in a classical conditioning paradigm. Data from 14 high-functioning and stimulant-naïve young adults with elevated lifetime symptoms of ADHD (8 males, 6 females) and 15 well-matched controls (8 males, 7 females) were included in the analyses. During reward anticipation, increased blood-oxygen-level-dependent (BOLD) responses in the right ventral and left dorsal striatum were observed in controls, but not in the ADHD group. The opposite pattern was observed in response to reward delivery; the ADHD group demonstrated significantly greater BOLD responses in the ventral striatum bilaterally and the left dorsal striatum relative to controls. In the ADHD group, the number of current hyperactivity/impulsivity symptoms was inversely related to ventral striatal responses during reward anticipation and positively associated with responses to reward. The BOLD response patterns observed in the striatum are consistent with impaired predictive dopamine signaling in ADHD, which may explain altered reward-contingent behaviors and symptoms of ADHD. PMID:24586543
Büchel, Christian; Peters, Jan; Banaschewski, Tobias; Bokde, Arun L. W.; Bromberg, Uli; Conrod, Patricia J.; Flor, Herta; Papadopoulos, Dimitri; Garavan, Hugh; Gowland, Penny; Heinz, Andreas; Walter, Henrik; Ittermann, Bernd; Mann, Karl; Martinot, Jean-Luc; Paillère-Martinot, Marie-Laure; Nees, Frauke; Paus, Tomas; Pausova, Zdenka; Poustka, Luise; Rietschel, Marcella; Robbins, Trevor W.; Smolka, Michael N.; Gallinat, Juergen; Schumann, Gunter; Knutson, Brian; Arroyo, Mercedes; Artiges, Eric; Aydin, Semiha; Bach, Christine; Barbot, Alexis; Barker, Gareth; Bruehl, Ruediger; Cattrell, Anna; Constant, Patrick; Crombag, Hans; Czech, Katharina; Dalley, Jeffrey; Decideur, Benjamin; Desrivieres, Sylvane; Fadai, Tahmine; Fauth-Buhler, Mira; Feng, Jianfeng; Filippi, Irinia; Frouin, Vincent; Fuchs, Birgit; Gemmeke, Isabel; Genauck, Alexander; Hanratty, Eanna; Heinrichs, Bert; Heym, Nadja; Hubner, Thomas; Ihlenfeld, Albrecht; Ing, Alex; Ireland, James; Jia, Tianye; Jones, Jennifer; Jurk, Sarah; Kaviani, Mehri; Klaassen, Arno; Kruschwitz, Johann; Lalanne, Christophe; Lanzerath, Dirk; Lathrop, Mark; Lawrence, Claire; Lemaitre, Hervé; Macare, Christine; Mallik, Catherine; Mar, Adam; Martinez-Medina, Lourdes; Mennigen, Eva; de Carvahlo, Fabiana Mesquita; Mignon, Xavier; Millenet, Sabina; Miranda, Ruben; Müller, Kathrin; Nymberg, Charlotte; Parchetka, Caroline; Pena-Oliver, Yolanda; Pentilla, Jani; Poline, Jean-Baptiste; Quinlan, Erin Burke; Rapp, Michael; Ripke, Stephan; Ripley, Tamzin; Robert, Gabriel; Rogers, John; Romanowski, Alexander; Ruggeri, Barbara; Schmäl, Christine; Schmidt, Dirk; Schneider, Sophia; Schubert, Florian; Schwartz, Yannick; Sommer, Wolfgang; Spanagel, Rainer; Speiser, Claudia; Spranger, Tade; Stedman, Alicia; Stephens, Dai; Strache, Nicole; Ströhle, Andreas; Struve, Maren; Subramaniam, Naresh; Theobald, David; Vetter, Nora; Vulser, Helene; Weiss, Katharina; Whelan, Robert; Williams, Steve; Xu, Bing; Yacubian, Juliana; Yu, Tao; Ziesch, Veronika
2017-01-01
Novelty-seeking tendencies in adolescents may promote innovation as well as problematic impulsive behaviour, including drug abuse. Previous research has not clarified whether neural hyper- or hypo-responsiveness to anticipated rewards promotes vulnerability in these individuals. Here we use a longitudinal design to track 144 novelty-seeking adolescents at age 14 and 16 to determine whether neural activity in response to anticipated rewards predicts problematic drug use. We find that diminished BOLD activity in mesolimbic (ventral striatal and midbrain) and prefrontal cortical (dorsolateral prefrontal cortex) regions during reward anticipation at age 14 predicts problematic drug use at age 16. Lower psychometric conscientiousness and steeper discounting of future rewards at age 14 also predicts problematic drug use at age 16, but the neural responses independently predict more variance than psychometric measures. Together, these findings suggest that diminished neural responses to anticipated rewards in novelty-seeking adolescents may increase vulnerability to future problematic drug use. PMID:28221370
Kim, Kyung Man; Baratta, Michael V; Yang, Aimei; Lee, Doheon; Boyden, Edward S; Fiorillo, Christopher D
2012-01-01
Activation of dopamine receptors in forebrain regions, for minutes or longer, is known to be sufficient for positive reinforcement of stimuli and actions. However, the firing rate of dopamine neurons is increased for only about 200 milliseconds following natural reward events that are better than expected, a response which has been described as a "reward prediction error" (RPE). Although RPE drives reinforcement learning (RL) in computational models, it has not been possible to directly test whether the transient dopamine signal actually drives RL. Here we have performed optical stimulation of genetically targeted ventral tegmental area (VTA) dopamine neurons expressing Channelrhodopsin-2 (ChR2) in mice. We mimicked the transient activation of dopamine neurons that occurs in response to natural reward by applying a light pulse of 200 ms in VTA. When a single light pulse followed each self-initiated nose poke, it was sufficient in itself to cause operant reinforcement. Furthermore, when optical stimulation was delivered in separate sessions according to a predetermined pattern, it increased locomotion and contralateral rotations, behaviors that are known to result from activation of dopamine neurons. All three of the optically induced operant and locomotor behaviors were tightly correlated with the number of VTA dopamine neurons that expressed ChR2, providing additional evidence that the behavioral responses were caused by activation of dopamine neurons. These results provide strong evidence that the transient activation of dopamine neurons provides a functional reward signal that drives learning, in support of RL theories of dopamine function.
James, Alex S; Pennington, Zachary T; Tran, Phu; Jentsch, James David
2015-01-01
Two theories regarding the role for dopamine neurons in learning include the concepts that their activity serves as a (1) mechanism that confers incentive salience onto rewards and associated cues and/or (2) contingency teaching signal reflecting reward prediction error. While both theories are provocative, the causal role for dopamine cell activity in either mechanism remains controversial. In this study mice that either fully or partially lacked NMDARs in dopamine neurons exclusively, as well as appropriate controls, were evaluated for reward-related learning; this experimental design allowed for a test of the premise that NMDA/glutamate receptor (NMDAR)-mediated mechanisms in dopamine neurons, including NMDA-dependent regulation of phasic discharge activity of these cells, modulate either the instrumental learning processes or the likelihood of pavlovian cues to become highly motivating incentive stimuli that directly attract behavior. Loss of NMDARs in dopamine neurons did not significantly affect baseline dopamine utilization in the striatum, novelty evoked locomotor behavior, or consumption of a freely available, palatable food solution. On the other hand, animals lacking NMDARs in dopamine cells exhibited a selective reduction in reinforced lever responses that emerged over the course of instrumental learning. Loss of receptor expression did not, however, influence the likelihood of an animal acquiring a pavlovian conditional response associated with attribution of incentive salience to reward-paired cues (sign tracking). These data support the view that reductions in NMDAR signaling in dopamine neurons affect instrumental reward-related learning but do not lend support to hypotheses that suggest that the behavioral significance of this signaling includes incentive salience attribution.
Potential effects of reward and loss avoidance in overweight adolescents.
Reyes, Sussanne; Peirano, Patricio; Luna, Beatriz; Lozoff, Betsy; Algarín, Cecilia
2015-08-01
Reward system and inhibitory control are brain functions that exert an influence on eating behavior regulation. We studied the differences in inhibitory control and sensitivity to reward and loss avoidance between overweight/obese and normal-weight adolescents. We assessed 51 overweight/obese and 52 normal-weight 15-y-old Chilean adolescents. The groups were similar regarding sex and intelligence quotient. Using Antisaccade and Incentive tasks, we evaluated inhibitory control and the effect of incentive trials (neutral, loss avoidance, and reward) on generating correct and incorrect responses (latency and error rate). Compared to normal-weight group participants, overweight/obese adolescents showed shorter latency for incorrect antisaccade responses (186.0 (95% CI: 176.8-195.2) vs. 201.3 ms (95% CI: 191.2-211.5), P < 0.05) and better performance reflected by lower error rate in incentive trials (43.6 (95% CI: 37.8-49.4) vs. 53.4% (95% CI: 46.8-60.0), P < 0.05). Overweight/obese adolescents were more accurate on loss avoidance (40.9 (95% CI: 33.5-47.7) vs. 49.8% (95% CI: 43.0-55.1), P < 0.05) and reward (41.0 (95% CI: 34.5-47.5) vs. 49.8% (95% CI: 43.0-55.1), P < 0.05) compared to neutral trials. Overweight/obese adolescents showed shorter latency for incorrect responses and greater accuracy in reward and loss avoidance trials. These findings could suggest that an imbalance of inhibition and reward systems influence their eating behavior.
Pan, Wei-Xing; Schmidt, Robert; Wickens, Jeffery R; Hyland, Brian I
2005-06-29
Behavioral conditioning of cue-reward pairing results in a shift of midbrain dopamine (DA) cell activity from responding to the reward to responding to the predictive cue. However, the precise time course and mechanism underlying this shift remain unclear. Here, we report a combined single-unit recording and temporal difference (TD) modeling approach to this question. The data from recordings in conscious rats showed that DA cells retain responses to predicted reward after responses to conditioned cues have developed, at least early in training. This contrasts with previous TD models that predict a gradual stepwise shift in latency with responses to rewards lost before responses develop to the conditioned cue. By exploring the TD parameter space, we demonstrate that the persistent reward responses of DA cells during conditioning are only accurately replicated by a TD model with long-lasting eligibility traces (nonzero values for the parameter lambda) and low learning rate (alpha). These physiological constraints for TD parameters suggest that eligibility traces and low per-trial rates of plastic modification may be essential features of neural circuits for reward learning in the brain. Such properties enable rapid but stable initiation of learning when the number of stimulus-reward pairings is limited, conferring significant adaptive advantages in real-world environments.
Gold, James M.; Waltz, James A.; Matveeva, Tatyana M.; Kasanova, Zuzana; Strauss, Gregory P.; Herbener, Ellen S.; Collins, Anne G.E.; Frank, Michael J.
2015-01-01
Context Negative symptoms are a core feature of schizophrenia, but their pathophysiology remains unclear. Objective Negative symptoms are defined by the absence of normal function. However, there must be a productive mechanism that leads to this absence. Here, we test a reinforcement learning account suggesting that negative symptoms result from a failure to represent the expected value of rewards coupled with preserved loss avoidance learning. Design Subjects performed a probabilistic reinforcement learning paradigm involving stimulus pairs in which choices resulted in either reward or avoidance of loss. Following training, subjects indicated their valuation of the stimuli in a transfer task. Computational modeling was used to distinguish between alternative accounts of the data. Setting A tertiary care research outpatient clinic. Patients A total of 47 clinically stable patients with a diagnosis of schizophrenia or schizoaffective disorder and 28 healthy volunteers participated. Patients were divided into high and low negative symptom groups. Main Outcome measures 1) The number of choices leading to reward or loss avoidance and 2) performance in the transfer phase. Quantitative fits from three different models were examined. Results High negative symptom patients demonstrated impaired learning from rewards but intact loss avoidance learning, and failed to distinguish rewarding stimuli from loss-avoiding stimuli in the transfer phase. Model fits revealed that high negative symptom patients were better characterized by an “actor-critic” model, learning stimulus-response associations, whereas controls and low negative symptom patients incorporated expected value of their actions (“Q-learning”) into the selection process. Conclusions Negative symptoms are associated with a specific reinforcement learning abnormality: High negative symptoms patients do not represent the expected value of rewards when making decisions but learn to avoid punishments through the use of prediction errors. This computational framework offers the potential to understand negative symptoms at a mechanistic level. PMID:22310503
Ridderinkhof, K. Richard; van Wouwe, Nelleke C.; Band, Guido P. H.; Wylie, Scott A.; Van der Stigchel, Stefan; van Hees, Pieter; Buitenweg, Jessika; van de Vijver, Irene; van den Wildenberg, Wery P. M.
2012-01-01
Reward-based decision-learning refers to the process of learning to select those actions that lead to rewards while avoiding actions that lead to punishments. This process, known to rely on dopaminergic activity in striatal brain regions, is compromised in Parkinson’s disease (PD). We hypothesized that such decision-learning deficits are alleviated by induced positive affect, which is thought to incur transient boosts in midbrain and striatal dopaminergic activity. Computational measures of probabilistic reward-based decision-learning were determined for 51 patients diagnosed with PD. Previous work has shown these measures to rely on the nucleus caudatus (outcome evaluation during the early phases of learning) and the putamen (reward prediction during later phases of learning). We observed that induced positive affect facilitated learning, through its effects on reward prediction rather than outcome evaluation. Viewing a few minutes of comedy clips served to remedy dopamine-related problems associated with frontostriatal circuitry and, consequently, learning to predict which actions will yield reward. PMID:22707944
Schevernels, Hanne; Krebs, Ruth M.; Santens, Patrick; Woldorff, Marty G.; Boehler, C. Nico
2013-01-01
Recently, attempts have been made to disentangle the neural underpinnings of preparatory processes related to reward and attention. Functional magnetic resonance imaging (fMRI) research showed that neural activity related to the anticipation of reward and to attentional demands invokes neural activity patterns featuring large-scale overlap, along with some differences and interactions. Due to the limited temporal resolution of fMRI, however, the temporal dynamics of these processes remain unclear. Here, we report an event-related potentials (ERP) study in which cued attentional demands and reward prospect were combined in a factorial design. Results showed that reward prediction dominated early cue processing, as well as the early and later parts of the contingent negative variation (CNV) slow-wave ERP component that has been associated with task-preparation processes. Moreover these reward-related electrophysiological effects correlated across participants with response-time speeding on reward-prospect trials. In contrast, cued attentional demands affected only the later part of the CNV, with the highest amplitudes following cues predicting high-difficulty potential-reward targets, thus suggesting maximal task preparation when the task requires it and entails reward prospect. Consequently, we suggest that task-preparation processes triggered by reward can arise earlier, and potentially more directly, than strategic top-down aspects of preparation based on attentional demands. PMID:24064071
Deepened Extinction following Compound Stimulus Presentation: Noradrenergic Modulation
ERIC Educational Resources Information Center
Janak, Patricia H.; Corbit, Laura H.
2011-01-01
Behavioral extinction is an active form of new learning involving the prediction of nonreward where reward has previously been present. The expression of extinction learning can be disrupted by the presentation of reward itself or reward-predictive stimuli (reinstatement) as well as the passage of time (spontaneous recovery) or contextual changes…
Demiral, Şükrü Barış; Golosheykin, Simon; Anokhin, Andrey P
2017-05-01
Detection and evaluation of the mismatch between the intended and actually obtained result of an action (reward prediction error) is an integral component of adaptive self-regulation of behavior. Extensive human and animal research has shown that evaluation of action outcome is supported by a distributed network of brain regions in which the anterior cingulate cortex (ACC) plays a central role, and the integration of distant brain regions into a unified feedback-processing network is enabled by long-range phase synchronization of cortical oscillations in the theta band. Neural correlates of feedback processing are associated with individual differences in normal and abnormal behavior, however, little is known about the role of genetic factors in the cerebral mechanisms of feedback processing. Here we examined genetic influences on functional cortical connectivity related to prediction error in young adult twins (age 18, n=399) using event-related EEG phase coherence analysis in a monetary gambling task. To identify prediction error-specific connectivity pattern, we compared responses to loss and gain feedback. Monetary loss produced a significant increase of theta-band synchronization between the frontal midline region and widespread areas of the scalp, particularly parietal areas, whereas gain resulted in increased synchrony primarily within the posterior regions. Genetic analyses showed significant heritability of frontoparietal theta phase synchronization (24 to 46%), suggesting that individual differences in large-scale network dynamics are under substantial genetic control. We conclude that theta-band synchronization of brain oscillations related to negative feedback reflects genetically transmitted differences in the neural mechanisms of feedback processing. To our knowledge, this is the first evidence for genetic influences on task-related functional brain connectivity assessed using direct real-time measures of neuronal synchronization. Copyright © 2016 Elsevier B.V. All rights reserved.
Gregory, Sarah; Blair, R James; Ffytche, Dominic; Simmons, Andrew; Kumari, Veena; Hodgins, Sheilagh; Blackwood, Nigel
2015-02-01
Men with antisocial personality disorder show lifelong abnormalities in adaptive decision making guided by the weighing up of reward and punishment information. Among men with antisocial personality disorder, modification of the behaviour of those with additional diagnoses of psychopathy seems particularly resistant to punishment. We did a case-control functional MRI (fMRI) study in 50 men, of whom 12 were violent offenders with antisocial personality disorder and psychopathy, 20 were violent offenders with antisocial personality disorder but not psychopathy, and 18 were healthy non-offenders. We used fMRI to measure brain activation associated with the representation of punishment or reward information during an event-related probabilistic response-reversal task, assessed with standard general linear-model-based analysis. Offenders with antisocial personality disorder and psychopathy displayed discrete regions of increased activation in the posterior cingulate cortex and anterior insula in response to punished errors during the task reversal phase, and decreased activation to all correct rewarded responses in the superior temporal cortex. This finding was in contrast to results for offenders without psychopathy and healthy non-offenders. Punishment prediction error signalling in offenders with antisocial personality disorder and psychopathy was highly atypical. This finding challenges the widely held view that such men are simply characterised by diminished neural sensitivity to punishment. Instead, this finding indicates altered organisation of the information-processing system responsible for reinforcement learning and appropriate decision making. This difference between violent offenders with antisocial personality disorder with and without psychopathy has implications for the causes of these disorders and for treatment approaches. National Forensic Mental Health Research and Development Programme, UK Ministry of Justice, Psychiatry Research Trust, NIHR Biomedical Research Centre. Copyright © 2015 Elsevier Ltd. All rights reserved.
Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments.
Leong, Yuan Chang; Radulescu, Angela; Daniel, Reka; DeWoskin, Vivian; Niv, Yael
2017-01-18
Little is known about the relationship between attention and learning during decision making. Using eye tracking and multivariate pattern analysis of fMRI data, we measured participants' dimensional attention as they performed a trial-and-error learning task in which only one of three stimulus dimensions was relevant for reward at any given time. Analysis of participants' choices revealed that attention biased both value computation during choice and value update during learning. Value signals in the ventromedial prefrontal cortex and prediction errors in the striatum were similarly biased by attention. In turn, participants' focus of attention was dynamically modulated by ongoing learning. Attentional switches across dimensions correlated with activity in a frontoparietal attention network, which showed enhanced connectivity with the ventromedial prefrontal cortex between switches. Our results suggest a bidirectional interaction between attention and learning: attention constrains learning to relevant dimensions of the environment, while we learn what to attend to via trial and error. Copyright © 2017 Elsevier Inc. All rights reserved.
Enhanced Neural Responses to Imagined Primary Rewards Predict Reduced Monetary Temporal Discounting.
Hakimi, Shabnam; Hare, Todd A
2015-09-23
The pervasive tendency to discount the value of future rewards varies considerably across individuals and has important implications for health and well-being. Here, we used fMRI with human participants to examine whether an individual's neural representation of an imagined primary reward predicts the degree to which the value of delayed monetary payments is discounted. Because future rewards can never be experienced at the time of choice, imagining or simulating the benefits of a future reward may play a critical role in decisions between alternatives with either immediate or delayed benefits. We found that enhanced ventromedial prefrontal cortex response during imagined primary reward receipt was correlated with reduced discounting in a separate monetary intertemporal choice task. Furthermore, activity in enhanced ventromedial prefrontal cortex during reward imagination predicted temporal discounting behavior both between- and within-individual decision makers with 62% and 73% mean balanced accuracy, respectively. These results suggest that the quality of reward imagination may impact the degree to which future outcomes are discounted. Significance statement: We report a novel test of the hypothesis that an important factor influencing the discount rate for future rewards is the quality with which they are imagined or estimated in the present. Previous work has shown that temporal discounting is linked to individual characteristics ranging from general intelligence to the propensity for addiction. We demonstrate that individual differences in a neurobiological measure of primary reward imagination are significantly correlated with discounting rates for future monetary payments. Moreover, our neurobiological measure of imagination can be used to accurately predict choice behavior both between and within individuals. These results suggest that improving reward imagination may be a useful therapeutic target for individuals whose high discount rates promote detrimental behaviors. Copyright © 2015 the authors 0270-6474/15/3513103-07$15.00/0.
Forster, Sarah E; Zirnheld, Patrick; Shekhar, Anantha; Steinhauer, Stuart R; O'Donnell, Brian F; Hetrick, William P
2017-09-01
Signals carried by the mesencephalic dopamine system and conveyed to anterior cingulate cortex are critically implicated in probabilistic reward learning and performance monitoring. A common evaluative mechanism purportedly subserves both functions, giving rise to homologous medial frontal negativities in feedback- and response-locked event-related brain potentials (the feedback-related negativity (FRN) and the error-related negativity (ERN), respectively), reflecting dopamine-dependent prediction error signals to unexpectedly negative events. Consistent with this model, the dopamine receptor antagonist, haloperidol, attenuates the ERN, but effects on FRN have not yet been evaluated. ERN and FRN were recorded during a temporal interval learning task (TILT) following randomized, double-blind administration of haloperidol (3 mg; n = 18), diphenhydramine (an active control for haloperidol; 25 mg; n = 20), or placebo (n = 21) to healthy controls. Centroparietal positivities, the Pe and feedback-locked P300, were also measured and correlations between ERP measures and behavioral indices of learning, overall accuracy, and post-error compensatory behavior were evaluated. We hypothesized that haloperidol would reduce ERN and FRN, but that ERN would uniquely track automatic, error-related performance adjustments, while FRN would be associated with learning and overall accuracy. As predicted, ERN was reduced by haloperidol and in those exhibiting less adaptive post-error performance; however, these effects were limited to ERNs following fast timing errors. In contrast, the FRN was not affected by drug condition, although increased FRN amplitude was associated with improved accuracy. Significant drug effects on centroparietal positivities were also absent. Our results support a functional and neurobiological dissociation between the ERN and FRN.
Hickey, Clayton; Peelen, Marius V
2017-08-02
Theories of reinforcement learning and approach behavior suggest that reward can increase the perceptual salience of environmental stimuli, ensuring that potential predictors of outcome are noticed in the future. However, outcome commonly follows visual processing of the environment, occurring even when potential reward cues have long disappeared. How can reward feedback retroactively cause now-absent stimuli to become attention-drawing in the future? One possibility is that reward and attention interact to prime lingering visual representations of attended stimuli that sustain through the interval separating stimulus and outcome. Here, we test this idea using multivariate pattern analysis of fMRI data collected from male and female humans. While in the scanner, participants searched for examples of target categories in briefly presented pictures of cityscapes and landscapes. Correct task performance was followed by reward feedback that could randomly have either high or low magnitude. Analysis showed that high-magnitude reward feedback boosted the lingering representation of target categories while reducing the representation of nontarget categories. The magnitude of this effect in each participant predicted the behavioral impact of reward on search performance in subsequent trials. Other analyses show that sensitivity to reward-as expressed in a personality questionnaire and in reactivity to reward feedback in the dopaminergic midbrain-predicted reward-elicited variance in lingering target and nontarget representations. Credit for rewarding outcome thus appears to be assigned to the target representation, causing the visual system to become sensitized for similar objects in the future. SIGNIFICANCE STATEMENT How do reward-predictive visual stimuli become salient and attention-drawing? In the real world, reward cues precede outcome and reward is commonly received long after potential predictors have disappeared. How can the representation of environmental stimuli be affected by outcome that occurs later in time? Here, we show that reward acts on lingering representations of environmental stimuli that sustain through the interval between stimulus and outcome. Using naturalistic scene stimuli and multivariate pattern analysis of fMRI data, we show that reward boosts the representation of attended objects and reduces the representation of unattended objects. This interaction of attention and reward processing acts to prime vision for stimuli that may serve to predict outcome. Copyright © 2017 the authors 0270-6474/17/377297-08$15.00/0.
Dopamine neurons learn relative chosen value from probabilistic rewards
Lak, Armin; Stauffer, William R; Schultz, Wolfram
2016-01-01
Economic theories posit reward probability as one of the factors defining reward value. Individuals learn the value of cues that predict probabilistic rewards from experienced reward frequencies. Building on the notion that responses of dopamine neurons increase with reward probability and expected value, we asked how dopamine neurons in monkeys acquire this value signal that may represent an economic decision variable. We found in a Pavlovian learning task that reward probability-dependent value signals arose from experienced reward frequencies. We then assessed neuronal response acquisition during choices among probabilistic rewards. Here, dopamine responses became sensitive to the value of both chosen and unchosen options. Both experiments showed also the novelty responses of dopamine neurones that decreased as learning advanced. These results show that dopamine neurons acquire predictive value signals from the frequency of experienced rewards. This flexible and fast signal reflects a specific decision variable and could update neuronal decision mechanisms. DOI: http://dx.doi.org/10.7554/eLife.18044.001 PMID:27787196
Chung, Yu Sun; Barch, Deanna M
2016-04-01
Schizophrenia is characterized by deficits of context processing, thought to be related to dorsolateral prefrontal cortex (DLPFC) impairment. Despite emerging evidence suggesting a crucial role of the DLPFC in integrating reward and goal information, we do not know whether individuals with schizophrenia can represent and integrate reward-related context information to modulate cognitive control. To address this question, 36 individuals with schizophrenia (n = 29) or schizoaffective disorder (n = 7) and 27 healthy controls performed a variant of a response conflict task (Padmala & Pessoa, 2011) during fMRI scanning, in both baseline and reward conditions, with monetary incentives on some reward trials. We used a mixed state-item design that allowed us to examine both sustained and transient reward effects on cognitive control. Different from predictions about impaired DLPFC function in schizophrenia, we found an intact pattern of increased sustained DLPFC activity during reward versus baseline blocks in individuals with schizophrenia at a group level but blunted sustained activations in the putamen. Contrary to our predictions, individuals with schizophrenia showed blunted cue-related activations in several regions of the basal ganglia responding to reward-predicting cues. Importantly, as predicted, individual differences in anhedonia/amotivation symptoms severity were significantly associated with reduced sustained DLPFC activation in the same region that showed overall increased activity as a function of reward. These results suggest that individual differences in motivational impairments in schizophrenia may be related to dysfunction of the DLPFC and striatum in motivationally salient situations. (c) 2016 APA, all rights reserved).
Vaidya, Avinash R; Fellows, Lesley K
2015-09-16
Adaptively interacting with our environment requires extracting information that will allow us to successfully predict reward. This can be a challenge, particularly when there are many candidate cues, and when rewards are probabilistic. Recent work has demonstrated that visual attention is allocated to stimulus features that have been associated with reward on previous trials. The ventromedial frontal lobe (VMF) has been implicated in learning in dynamic environments of this kind, but the mechanism by which this region influences this process is not clear. Here, we hypothesized that the VMF plays a critical role in guiding attention to reward-predictive stimulus features based on feedback. We tested the effects of VMF damage in human subjects on a visual search task in which subjects were primed to attend to task-irrelevant colors associated with different levels of reward, incidental to the search task. Consistent with previous work, we found that distractors had a greater influence on reaction time when they appeared in colors associated with high reward in the previous trial compared with colors associated with low reward in healthy control subjects and patients with prefrontal damage sparing the VMF. However, this reward modulation of attentional priming was absent in patients with VMF damage. Thus, an intact VMF is necessary for directing attention based on experience with cue-reward associations. We suggest that this region plays a role in selecting reward-predictive cues to facilitate future learning. There has been a swell of interest recently in the ventromedial frontal cortex (VMF), a brain region critical to associative learning. However, the underlying mechanism by which this region guides learning is not well understood. Here, we tested the effects of damage to this region in humans on a task in which rewards were linked incidentally to visual features, resulting in trial-by-trial attentional priming. Controls and subjects with prefrontal damage sparing the VMF showed normal reward priming, but VMF-damaged patients did not. This work sheds light on a potential mechanism through which this region influences behavior. We suggest that the VMF is necessary for directing attention to reward-predictive visual features based on feedback, facilitating future learning and decision-making. Copyright © 2015 the authors 0270-6474/15/3512813-11$15.00/0.
Age-related influence of contingencies on a saccade task
Jazbec, Sandra; Hardin, Michael G.; Schroth, Elizabeth; McClure, Erin; Pine, Daniel S.; Ernst, Monique
2009-01-01
Adolescence is characterized by increased risk-taking and sensation-seeking, presumably brought about by developmental changes within reward-mediating brain circuits. A better understanding of the neural mechanisms underlying reward-seeking during adolescence can have critical implications for the development of strategies to enhance adolescent performance in potentially dangerous situations. Yet little research has investigated the influence of age on the modulation of behavior by incentives with neuroscience-based methods. A monetary reward antisaccade task (the RST) was used with 23 healthy adolescents and 30 healthy adults. Performance accuracy, latency and peak velocity of saccade responses (prosaccades and antisaccades) were analyzed. Performance accuracy across all groups was improved by incentives (obtain reward, avoid punishment) for both, prosaccades and antisaccades. However, modulation of antisaccade errors (direction errors) by incentives differed between groups: adolescents modulated saccade latency and peak velocity depending on contingencies, with incentives aligning their performance to that of adults; adults did not show a modulation by incentives. These findings suggest that incentives modulate a global measure of performance (percent direction errors) in adults and adolescents, and exert a more powerful influence on the control of incorrect motor responses in adolescents than in adults. These findings suggest that this task can be used in neuroimaging studies as a probe of the influence of incentives on cognitive control from a developmental perspective as well as in health and disease. PMID:16733706
Age-related influence of contingencies on a saccade task.
Jazbec, Sandra; Hardin, Michael G; Schroth, Elizabeth; McClure, Erin; Pine, Daniel S; Ernst, Monique
2006-10-01
Adolescence is characterized by increased risk-taking and sensation-seeking, presumably brought about by developmental changes within reward-mediating brain circuits. A better understanding of the neural mechanisms underlying reward-seeking during adolescence can have critical implications for the development of strategies to enhance adolescent performance in potentially dangerous situations. Yet little research has investigated the influence of age on the modulation of behavior by incentives with neuroscience-based methods. A monetary reward antisaccade task (the RST) was used with 23 healthy adolescents and 30 healthy adults. Performance accuracy, latency and peak velocity of saccade responses (prosaccades and antisaccades) were analyzed. Performance accuracy across all groups was improved by incentives (obtain reward, avoid punishment) for both, prosaccades and antisaccades. However, modulation of antisaccade errors (direction errors) by incentives differed between groups: adolescents modulated saccade latency and peak velocity depending on contingencies, with incentives aligning their performance to that of adults; adults did not show a modulation by incentives. These findings suggest that incentives modulate a global measure of performance (percent direction errors) in adults and adolescents, and exert a more powerful influence on the control of incorrect motor responses in adolescents than in adults. These findings suggest that this task can be used in neuroimaging studies as a probe of the influence of incentives on cognitive control from a developmental perspective as well as in health and disease.
ERIC Educational Resources Information Center
Tucker, Jalie A.; Vuchinich, Rudy E.; Black, Bethany C.; Rippens, Paula D.
2006-01-01
This study investigated whether a behavioral economic index of the value of rewards available over different time horizons improved prediction of drinking outcomes beyond established biopsychosocial predictors. Preferences for immediate drinking versus more delayed rewards made possible by saving money were determined from expenditures prior to…
Developmental Effects of Incentives on Response Inhibition
Geier, Charles F.; Luna, Beatriz
2012-01-01
Inhibitory control and incentive processes underlie decision-making, yet few studies have explicitly examined their interaction across development. Here, the effects of potential rewards and losses on inhibitory control in sixty-four adolescents (13-17-year-olds) and forty-two young adults (18-29-year-olds) were examined using an incentivized antisaccade task. Notably, measures were implemented to minimize age-related differences in reward valuation and potentially confounding motivation effects. Incentives affected antisaccade metrics differently across the age groups. Younger adolescents generated more errors than adults on reward trials, but all groups performed well on loss trials. Adolescent saccade latencies also differed from adults across the range of reward trials. Overall, results suggest persistent immaturities in the integration of reward and inhibitory control processes across adolescence. PMID:22540668
Flagel, Shelly B; Akil, Huda; Robinson, Terry E
2009-01-01
Drugs of abuse acquire different degrees of control over thoughts and actions based not only on the effects of drugs themselves, but also on predispositions of the individual. Those individuals who become addicted are unable to shift their thoughts and actions away from drugs and drug-associated stimuli. Thus in addicts, exposure to places or things (cues) that has been previously associated with drug-taking often instigates renewed drug-taking. We and others have postulated that drug-associated cues acquire the ability to maintain and instigate drug-taking behavior in part because they acquire incentive motivational properties through Pavlovian (stimulus-stimulus) learning. In the case of compulsive behavioral disorders, including addiction, such cues may be attributed with pathological incentive value ("incentive salience"). For this reason, we have recently begun to explore individual differences in the tendency to attribute incentive salience to cues that predict rewards. When discrete cues are associated with the non-contingent delivery of food or drug rewards some animals come to quickly approach and engage the cue even if it is located at a distance from where the reward will be delivered. In these animals the reward-predictive cue itself becomes attractive, eliciting approach towards it, presumably because it is attributed with incentive salience. Animals that develop this type of conditional response are called "sign-trackers". Other animals, "goal-trackers", do not approach the reward-predictive cue, but upon cue presentation they immediately go to the location where food will be delivered (the "goal"). For goal-trackers the reward-predictive cue is not attractive, presumably because it is not attributed with incentive salience. We review here preliminary data suggesting that these individual differences in the tendency to attribute incentive salience to cues predictive of reward may confer vulnerability or resistance to compulsive behavioral disorders, including addiction. It will be important, therefore, to study how environmental, neurobiological and genetic interactions determine the extent to which individuals attribute incentive value to reward-predictive stimuli.
Potential effects of reward and loss avoidance in overweight adolescents
Reyes, Sussanne; Peirano, Patricio; Luna, Beatriz; Lozoff, Betsy; Algarín, Cecilia
2015-01-01
Background Reward system and inhibitory control are brain functions that exert an influence on eating behavior regulation. We studied the differences in inhibitory control and sensitivity to reward and loss avoidance between overweight/obese and normal-weight adolescents. Methods We assessed 51 overweight/obese and 52 normal-weight 15-y-old Chilean adolescents. The groups were similar regarding sex and intelligence quotient. Using Antisaccade and Incentive tasks, we evaluated inhibitory control and the effect of incentive trials (neutral, loss avoidance, and reward) on generating correct and incorrect responses (latency and error rate). Results Compared to normal-weight group participants, overweight/obese adolescents showed shorter latency for incorrect antisaccade responses (186.0 (95% CI: 176.8–195.2) vs. 201.3 ms (95% CI: 191.2–211.5), P < 0.05) and better performance reflected by lower error rate in incentive trials (43.6 (95% CI: 37.8–49.4) vs. 53.4% (95% CI: 46.8–60.0), P < 0.05). Overweight/obese adolescents were more accurate on loss avoidance (40.9 (95% CI: 33.5–47.7) vs. 49.8% (95% CI: 43.0–55.1), P < 0.05) and reward (41.0 (95% CI: 34.5–47.5) vs. 49.8% (95% CI: 43.0–55.1), P < 0.05) compared to neutral trials. Conclusion Overweight/obese adolescents showed shorter latency for incorrect responses and greater accuracy in reward and loss avoidance trials. These findings could suggest that an imbalance of inhibition and reward systems influence their eating behavior. PMID:25927543
Amygdala mu-opioid receptors mediate the motivating influence of cue-triggered reward expectations.
Lichtenberg, Nina T; Wassum, Kate M
2017-02-01
Environmental reward-predictive stimuli can retrieve from memory a specific reward expectation that allows them to motivate action and guide choice. This process requires the basolateral amygdala (BLA), but little is known about the signaling systems necessary within this structure. Here we examined the role of the neuromodulatory opioid receptor system in the BLA in such cue-directed action using the outcome-specific Pavlovian-to-instrumental transfer (PIT) test in rats. Inactivation of BLA mu-, but not delta-opioid receptors was found to dose-dependently attenuate the ability of a reward-predictive cue to selectively invigorate the performance of actions directed at the same unique predicted reward (i.e. to express outcome-specific PIT). BLA mu-opioid receptor inactivation did not affect the ability of a reward itself to similarly motivate action (outcome-specific reinstatement), suggesting a more selective role for the BLA mu-opioid receptor in the motivating influence of currently unobservable rewarding events. These data reveal a new role for BLA mu-opioid receptor activation in the cued recall of precise reward memories and the use of this information to motivate specific action plans. © 2016 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Reiter, Andrea M F; Heinze, Hans-Jochen; Schlagenhauf, Florian; Deserno, Lorenz
2017-02-01
Despite its clinical relevance and the recent recognition as a diagnostic category in the DSM-5, binge eating disorder (BED) has rarely been investigated from a cognitive neuroscientific perspective targeting a more precise neurocognitive profiling of the disorder. BED patients suffer from a lack of behavioral control during recurrent binge eating episodes and thus fail to adapt their behavior in the face of negative consequences, eg, high risk for obesity. To examine impairments in flexible reward-based decision-making, we exposed BED patients (n=22) and matched healthy individuals (n=22) to a reward-guided decision-making task during functional resonance imaging (fMRI). Performing fMRI analysis informed via computational modeling of choice behavior, we were able to identify specific signatures of altered decision-making in BED. On the behavioral level, we observed impaired behavioral adaptation in BED, which was due to enhanced switching behavior, a putative deficit in striking a balance between exploration and exploitation appropriately. This was accompanied by diminished activation related to exploratory decisions in the anterior insula/ventro-lateral prefrontal cortex. Moreover, although so-called model-free reward prediction errors remained intact, representation of ventro-medial prefrontal learning signatures, incorporating inference on unchosen options, was reduced in BED, which was associated with successful decision-making in the task. On the basis of a computational psychiatry account, the presented findings contribute to defining a neurocognitive phenotype of BED.
Reiter, Andrea M F; Heinze, Hans-Jochen; Schlagenhauf, Florian; Deserno, Lorenz
2017-01-01
Despite its clinical relevance and the recent recognition as a diagnostic category in the DSM-5, binge eating disorder (BED) has rarely been investigated from a cognitive neuroscientific perspective targeting a more precise neurocognitive profiling of the disorder. BED patients suffer from a lack of behavioral control during recurrent binge eating episodes and thus fail to adapt their behavior in the face of negative consequences, eg, high risk for obesity. To examine impairments in flexible reward-based decision-making, we exposed BED patients (n=22) and matched healthy individuals (n=22) to a reward-guided decision-making task during functional resonance imaging (fMRI). Performing fMRI analysis informed via computational modeling of choice behavior, we were able to identify specific signatures of altered decision-making in BED. On the behavioral level, we observed impaired behavioral adaptation in BED, which was due to enhanced switching behavior, a putative deficit in striking a balance between exploration and exploitation appropriately. This was accompanied by diminished activation related to exploratory decisions in the anterior insula/ventro-lateral prefrontal cortex. Moreover, although so-called model-free reward prediction errors remained intact, representation of ventro–medial prefrontal learning signatures, incorporating inference on unchosen options, was reduced in BED, which was associated with successful decision-making in the task. On the basis of a computational psychiatry account, the presented findings contribute to defining a neurocognitive phenotype of BED. PMID:27301429
A plastic corticostriatal circuit model of adaptation in perceptual decision making
Hsiao, Pao-Yueh; Lo, Chung-Chuan
2013-01-01
The ability to optimize decisions and adapt them to changing environments is a crucial brain function that increase survivability. Although much has been learned about the neuronal activity in various brain regions that are associated with decision making, and about how the nervous systems may learn to achieve optimization, the underlying neuronal mechanisms of how the nervous systems optimize decision strategies with preference given to speed or accuracy, and how the systems adapt to changes in the environment, remain unclear. Based on extensive empirical observations, we addressed the question by extending a previously described cortico-basal ganglia circuit model of perceptual decisions with the inclusion of a dynamic dopamine (DA) system that modulates spike-timing dependent plasticity (STDP). We found that, once an optimal model setting that maximized the reward rate was selected, the same setting automatically optimized decisions across different task environments through dynamic balancing between the facilitating and depressing components of the DA dynamics. Interestingly, other model parameters were also optimal if we considered the reward rate that was weighted by the subject's preferences for speed or accuracy. Specifically, the circuit model favored speed if we increased the phasic DA response to the reward prediction error, whereas the model favored accuracy if we reduced the tonic DA activity or the phasic DA responses to the estimated reward probability. The proposed model provides insight into the roles of different components of DA responses in decision adaptation and optimization in a changing environment. PMID:24339814
Overcoming Indecision by Changing the Decision Boundary
2017-01-01
The dominant theoretical framework for decision making asserts that people make decisions by integrating noisy evidence to a threshold. It has recently been shown that in many ecologically realistic situations, decreasing the decision boundary maximizes the reward available from decisions. However, empirical support for decreasing boundaries in humans is scant. To investigate this problem, we used an ideal observer model to identify the conditions under which participants should change their decision boundaries with time to maximize reward rate. We conducted 6 expanded-judgment experiments that precisely matched the assumptions of this theoretical model. In this paradigm, participants could sample noisy, binary evidence presented sequentially. Blocks of trials were fixed in duration, and each trial was an independent reward opportunity. Participants therefore had to trade off speed (getting as many rewards as possible) against accuracy (sampling more evidence). Having access to the actual evidence samples experienced by participants enabled us to infer the slope of the decision boundary. We found that participants indeed modulated the slope of the decision boundary in the direction predicted by the ideal observer model, although we also observed systematic deviations from optimality. Participants using suboptimal boundaries do so in a robust manner, so that any error in their boundary setting is relatively inexpensive. The use of a normative model provides insight into what variable(s) human decision makers are trying to optimize. Furthermore, this normative model allowed us to choose diagnostic experiments and in doing so we present clear evidence for time-varying boundaries. PMID:28406682
Reiter, Andrea M F; Koch, Stefan P; Schröger, Erich; Hinrichs, Hermann; Heinze, Hans-Jochen; Deserno, Lorenz; Schlagenhauf, Florian
2016-08-01
Behavioral control is influenced not only by learning from the choices made and the rewards obtained but also by "what might have happened," that is, inference about unchosen options and their fictive outcomes. Substantial progress has been made in understanding the neural signatures of direct learning from choices that are actually made and their associated rewards via reward prediction errors (RPEs). However, electrophysiological correlates of abstract inference in decision-making are less clear. One seminal theory suggests that the so-called feedback-related negativity (FRN), an ERP peaking 200-300 msec after a feedback stimulus at frontocentral sites of the scalp, codes RPEs. Hitherto, the FRN has been predominantly related to a so-called "model-free" RPE: The difference between the observed outcome and what had been expected. Here, by means of computational modeling of choice behavior, we show that individuals employ abstract, "double-update" inference on the task structure by concurrently tracking values of chosen stimuli (associated with observed outcomes) and unchosen stimuli (linked to fictive outcomes). In a parametric analysis, model-free RPEs as well as their modification because of abstract inference were regressed against single-trial FRN amplitudes. We demonstrate that components related to abstract inference uniquely explain variance in the FRN beyond model-free RPEs. These findings advance our understanding of the FRN and its role in behavioral adaptation. This might further the investigation of disturbed abstract inference, as proposed, for example, for psychiatric disorders, and its underlying neural correlates.
Neural Response to Reward as a Predictor of Rise in Depressive Symptoms in Adolescence
Morgan, Judith K.; Olino, Thomas M.; McMakin, Dana L.; Ryan, Neal D.; Forbes, Erika E.
2012-01-01
Adolescence is a developmental period characterized by significant increases in the onset of depression, but also by increases in depressive symptoms, even among psychiatrically healthy youth. Disrupted reward function has been postulated as a critical factor in the development of depression, but it is still unclear which adolescents are particularly at risk for rising depressive symptoms. We provide a conceptual stance on gender, pubertal development, and reward type as potential moderators of the association between neural response to reward and rises in depressive symptoms. In addition, we describe preliminary findings that support claims of this conceptual stance. We propose that (1) status-related rewards may be particularly salient for eliciting neural response relevant to depressive symptoms in boys, whereas social rewards may be more salient for eliciting neural response relevant to depressive symptoms in girls and (2) the pattern of reduced striatal response and enhanced medial prefrontal response to reward may be particularly predictive of depressive symptoms in pubertal adolescents. We found that greater vmPFC activation when winning rewards predicted greater increases in depressive symptoms over two years, for boys only, and less striatal activation when anticipating rewards predicted greater increases in depressive symptoms over two years, for adolescents in mid to late pubertal stages but not those in pre to early puberty. We also propose directions for future studies, including the investigation of social vs. monetary reward directly and the longitudinal assessment of parallel changes in pubertal development, neural response to reward, and depressive symptoms. PMID:22521464
Emotional arousal predicts intertemporal choice
Lempert, Karolina M.; Johnson, Eli; Phelps, Elizabeth A.
2016-01-01
People generally prefer immediate rewards to rewards received after a delay, often even when the delayed reward is larger. This phenomenon is known as temporal discounting. It has been suggested that preferences for immediate rewards may be due to their being more concrete than delayed rewards. This concreteness may evoke an enhanced emotional response. Indeed, manipulating the representation of a future reward to make it more concrete has been shown to heighten the reward’s subjective emotional intensity, making people more likely to choose it. Here we use an objective measure of arousal – pupil dilation – to investigate if emotional arousal mediates the influence of delayed reward concreteness on choice. We recorded pupil dilation responses while participants made choices between immediate and delayed rewards. We manipulated concreteness through time interval framing: delayed rewards were presented either with the date on which they would be received (e.g., “$30, May 3”; DATE condition, more concrete) or in terms of delay to receipt (e.g., “$30, 7 days; DAYS condition, less concrete). Contrary to prior work, participants were not overall more patient in the DATE condition. However, there was individual variability in response to time framing, and this variability was predicted by differences in pupil dilation between conditions. Emotional arousal increased as the subjective value of delayed rewards increased, and predicted choice of the delayed reward on each trial. This study advances our understanding of the role of emotion in temporal discounting. PMID:26882337
Decision-making, sensitivity to reward, and attrition in weight-management
Koritzky, Gilly; Dieterle, Camille; Rice, Chantelle; Jordan, Katie; Bechara, Antoine
2014-01-01
Objective Attrition is a common problem in weight-management. Understanding the risk factors for attrition should enhance professionals’ ability to increase completion rates and improve health outcomes for more individuals. We propose a model that draws upon neuropsychological knowledge on reward-sensitivity in obesity and overeating to predict attrition. Design & Methods 52 participants in a weight-management program completed a complex decision-making task.Decision-making characteristics – including sensitivity to reward – were further estimated using a quantitative model. Impulsivity and risk-taking measures were also administered. Results Consistent with the hypothesis that sensitivity to reward predicted attrition, program dropouts had higher sensitivity to reward than completers (p < 0.03). No differences were observed between completers and dropouts in initial BMI, age, employment status, or the number of prior weight-loss attempts (p ≥ 0.07). Completers had a slightly higher education level than dropouts, but its inclusion in the model did not increase predictive power. Impulsivity, delay of gratification, and risk-taking did not predict attrition, either. Conclusions Findings link attrition in weight-management to the neural mechanisms associated with reward-seeking and related influences on decision-making. Individual differences in the magnitude of response elicited by rewards may account for the relative difficulty experienced by dieters in adhering to treatment. PMID:24771588
Immaturities in Reward Processing and Its Influence on Inhibitory Control in Adolescence
Terwilliger, R.; Teslovich, T.; Velanova, K.; Luna, B.
2010-01-01
The nature of immature reward processing and the influence of rewards on basic elements of cognitive control during adolescence are currently not well understood. Here, during functional magnetic resonance imaging, healthy adolescents and adults performed a modified antisaccade task in which trial-by-trial reward contingencies were manipulated. The use of a novel fast, event-related design enabled developmental differences in brain function underlying temporally distinct stages of reward processing and response inhibition to be assessed. Reward trials compared with neutral trials resulted in faster correct inhibitory responses across ages and in fewer inhibitory errors in adolescents. During reward trials, the blood oxygen level–dependent signal was attenuated in the ventral striatum in adolescents during cue assessment, then overactive during response preparation, suggesting limitations during adolescence in reward assessment and heightened reactivity in anticipation of reward compared with adults. Importantly, heightened activity in the frontal cortex along the precentral sulcus was also observed in adolescents during reward-trial response preparation, suggesting reward modulation of oculomotor control regions supporting correct inhibitory responding. Collectively, this work characterizes specific immaturities in adolescent brain systems that support reward processing and describes the influence of reward on inhibitory control. In sum, our findings suggest mechanisms that may underlie adolescents’ vulnerability to poor decision-making and risk-taking behavior. PMID:19875675
Instant transformation of learned repulsion into motivational "wanting".
Robinson, Mike J F; Berridge, Kent C
2013-02-18
Learned cues for pleasant reward often elicit desire, which, in addicts, may become compulsive. According to the dominant view in addiction neuroscience and reinforcement modeling, such desires are the simple products of learning, coming from a past association with reward outcome. We demonstrate that cravings are more than merely the products of accumulated pleasure memories-even a repulsive learned cue for unpleasantness can become suddenly desired via the activation of mesocorticolimbic circuitry. Rats learned repulsion toward a Pavlovian cue (a briefly-inserted metal lever) that always predicted an unpleasant Dead Sea saltiness sensation. Yet, upon first reencounter in a novel sodium-depletion state to promote mesocorticolimbic reactivity (reflected by elevated Fos activation in ventral tegmentum, nucleus accumbens, ventral pallidum, and the orbitofrontal prefrontal cortex), the learned cue was instantly transformed into an attractive and powerful motivational magnet. Rats jumped and gnawed on the suddenly attractive Pavlovian lever cue, despite never having tasted intense saltiness as anything other than disgusting. Instant desire transformation of a learned cue contradicts views that Pavlovian desires are essentially based on previously learned values (e.g., prediction error or temporal difference models). Instead desire is recomputed at reencounter by integrating Pavlovian information with the current brain/physiological state. This powerful brain transformation reverses strong learned revulsion into avid attraction. When applied to addiction, related mesocorticolimbic transformations (e.g., drugs or neural sensitization) of cues for already-pleasant drug experiences could create even more intense cravings. This cue/state transformation helps define what it means to say that addiction hijacks brain limbic circuits of natural reward. Copyright © 2013 Elsevier Ltd. All rights reserved.
Trait Anticipatory Pleasure Predicts Effort Expenditure for Reward
Geaney, Joachim T.; Treadway, Michael T.; Smillie, Luke D.
2015-01-01
Research in motivation and emotion has been increasingly influenced by the perspective that processes underpinning the motivated approach of rewarding goals are distinct from those underpinning enjoyment during reward consummation. This distinction recently inspired the construction of the Temporal Experience of Pleasure Scale (TEPS), a self-report measure that distinguishes trait anticipatory pleasure (pre-reward feelings of desire) from consummatory pleasure (feelings of enjoyment and gratification upon reward attainment). In a university community sample (N = 97), we examined the TEPS subscales as predictors of (1) the willingness to expend effort for monetary rewards, and (2) affective responses to a pleasant mood induction procedure. Results showed that both anticipatory pleasure and a well-known trait measure of reward motivation predicted effort-expenditure for rewards when the probability of being rewarded was relatively low. Against expectations, consummatory pleasure was unrelated to induced pleasant affect. Taken together, our findings provide support for the validity of the TEPS anticipatory pleasure scale, but not the consummatory pleasure scale. PMID:26115223
Claes, Nathalie; Vlaeyen, Johan W S; Crombez, Geert
2016-09-01
Previous research shows that goal-directed behavior might be modulated by cues that predict (dis)similar outcomes. However, the literature investigating this modulation with pain outcomes is scarce. Therefore, this experiment investigated whether environmental cues predicting pain or reward modulate defensive pain responding. Forty-eight healthy participants completed a joystick movement task with two different movement orientations. Performing one movement was associated with a painful stimulus, whereas performance of another movement was associated with reward, i.e. lottery tickets. In a subsequent task, participants learned to associate three different cues withpain, reward, or neither of the two. Next, these cues were integrated in the movement task. This study demonstrates that in general, aversive cues enhance and appetitive cues reduce pain-related fear. Furthermore, we found that incongruence between the outcomes predicted by the movement and the cue results in more oscillatory behavior, i.e., participants were more willing to perform a painful movement when a cue predicting reward was simultaneously presented, and vice versa. Similarly, when given a choice, participants preferred to perform the reward movement, unless there was an incongruence between the outcomes predicted by the movements and cues. Taken together, these results provide experimental evidence that environmental cues are capable of modulating pain-related fear and avoidance behavior. Copyright © 2016 Elsevier Ltd. All rights reserved.
Gowin, Joshua L; Ball, Tali M; Wittmann, Marc; Tapert, Susan F; Paulus, Martin P
2015-07-01
Nearly half of individuals with substance use disorders relapse in the year after treatment. A diagnostic tool to help clinicians make decisions regarding treatment does not exist for psychiatric conditions. Identifying individuals with high risk for relapse to substance use following abstinence has profound clinical consequences. This study aimed to develop neuroimaging as a robust tool to predict relapse. 68 methamphetamine-dependent adults (15 female) were recruited from 28-day inpatient treatment. During treatment, participants completed a functional MRI scan that examined brain activation during reward processing. Patients were followed 1 year later to assess abstinence. We examined brain activation during reward processing between relapsing and abstaining individuals and employed three random forest prediction models (clinical and personality measures, neuroimaging measures, a combined model) to generate predictions for each participant regarding their relapse likelihood. 18 individuals relapsed. There were significant group by reward-size interactions for neural activation in the left insula and right striatum for rewards. Abstaining individuals showed increased activation for large, risky relative to small, safe rewards, whereas relapsing individuals failed to show differential activation between reward types. All three random forest models yielded good test characteristics such that a positive test for relapse yielded a likelihood ratio 2.63, whereas a negative test had a likelihood ratio of 0.48. These findings suggest that neuroimaging can be developed in combination with other measures as an instrument to predict relapse, advancing tools providers can use to make decisions about individualized treatment of substance use disorders. Published by Elsevier Ireland Ltd.
James, Alex S.; Pennington, Zachary T.; Tran, Phu
2015-01-01
Abstract Two theories regarding the role for dopamine neurons in learning include the concepts that their activity serves as a (1) mechanism that confers incentive salience onto rewards and associated cues and/or (2) contingency teaching signal reflecting reward prediction error. While both theories are provocative, the causal role for dopamine cell activity in either mechanism remains controversial. In this study mice that either fully or partially lacked NMDARs in dopamine neurons exclusively, as well as appropriate controls, were evaluated for reward-related learning; this experimental design allowed for a test of the premise that NMDA/glutamate receptor (NMDAR)-mediated mechanisms in dopamine neurons, including NMDA-dependent regulation of phasic discharge activity of these cells, modulate either the instrumental learning processes or the likelihood of pavlovian cues to become highly motivating incentive stimuli that directly attract behavior. Loss of NMDARs in dopamine neurons did not significantly affect baseline dopamine utilization in the striatum, novelty evoked locomotor behavior, or consumption of a freely available, palatable food solution. On the other hand, animals lacking NMDARs in dopamine cells exhibited a selective reduction in reinforced lever responses that emerged over the course of instrumental learning. Loss of receptor expression did not, however, influence the likelihood of an animal acquiring a pavlovian conditional response associated with attribution of incentive salience to reward-paired cues (sign tracking). These data support the view that reductions in NMDAR signaling in dopamine neurons affect instrumental reward-related learning but do not lend support to hypotheses that suggest that the behavioral significance of this signaling includes incentive salience attribution. PMID:26464985
Sensorimotor Learning Biases Choice Behavior: A Learning Neural Field Model for Decision Making
Schöner, Gregor; Gail, Alexander
2012-01-01
According to a prominent view of sensorimotor processing in primates, selection and specification of possible actions are not sequential operations. Rather, a decision for an action emerges from competition between different movement plans, which are specified and selected in parallel. For action choices which are based on ambiguous sensory input, the frontoparietal sensorimotor areas are considered part of the common underlying neural substrate for selection and specification of action. These areas have been shown capable of encoding alternative spatial motor goals in parallel during movement planning, and show signatures of competitive value-based selection among these goals. Since the same network is also involved in learning sensorimotor associations, competitive action selection (decision making) should not only be driven by the sensory evidence and expected reward in favor of either action, but also by the subject's learning history of different sensorimotor associations. Previous computational models of competitive neural decision making used predefined associations between sensory input and corresponding motor output. Such hard-wiring does not allow modeling of how decisions are influenced by sensorimotor learning or by changing reward contingencies. We present a dynamic neural field model which learns arbitrary sensorimotor associations with a reward-driven Hebbian learning algorithm. We show that the model accurately simulates the dynamics of action selection with different reward contingencies, as observed in monkey cortical recordings, and that it correctly predicted the pattern of choice errors in a control experiment. With our adaptive model we demonstrate how network plasticity, which is required for association learning and adaptation to new reward contingencies, can influence choice behavior. The field model provides an integrated and dynamic account for the operations of sensorimotor integration, working memory and action selection required for decision making in ambiguous choice situations. PMID:23166483
Effects of Direct Social Experience on Trust Decisions and Neural Reward Circuitry
Fareri, Dominic S.; Chang, Luke J.; Delgado, Mauricio R.
2012-01-01
The human striatum is integral for reward-processing and supports learning by linking experienced outcomes with prior expectations. Recent endeavors implicate the striatum in processing outcomes of social interactions, such as social approval/rejection, as well as in learning reputations of others. Interestingly, social impressions often influence our behavior with others during interactions. Information about an interaction partner’s moral character acquired from biographical information hinders updating of expectations after interactions via top down modulation of reward circuitry. An outstanding question is whether initial impressions formed through experience similarly modulate the ability to update social impressions at the behavioral and neural level. We investigated the role of experienced social information on trust behavior and reward-related BOLD activity. Participants played a computerized ball-tossing game with three fictional partners manipulated to be perceived as good, bad, or neutral. Participants then played an iterated trust game as investors with these same partners while undergoing fMRI. Unbeknownst to participants, partner behavior in the trust game was random and unrelated to their ball-tossing behavior. Participants’ trust decisions were influenced by their prior experience in the ball-tossing game, investing less often with the bad partner compared to the good and neutral. Reinforcement learning models revealed that participants were more sensitive to updating their beliefs about good and bad partners when experiencing outcomes consistent with initial experience. Increased striatal and anterior cingulate BOLD activity for positive versus negative trust game outcomes emerged, which further correlated with model-derived prediction error learning signals. These results suggest that initial impressions formed from direct social experience can be continually shaped by consistent information through reward learning mechanisms. PMID:23087604
The amygdala and basal forebrain as a pathway for motivationally guided attention.
Peck, Christopher J; Salzman, C Daniel
2014-10-08
Visual stimuli associated with rewards attract spatial attention. Neurophysiological mechanisms that mediate this process must register both the motivational significance and location of visual stimuli. Recent neurophysiological evidence indicates that the amygdala encodes information about both of these parameters. Furthermore, the firing rate of amygdala neurons predicts the allocation of spatial attention. One neural pathway through which the amygdala might influence attention involves the intimate and bidirectional connections between the amygdala and basal forebrain (BF), a brain area long implicated in attention. Neurons in the rhesus monkey amygdala and BF were therefore recorded simultaneously while subjects performed a detection task in which the stimulus-reward associations of visual stimuli modulated spatial attention. Neurons in BF were spatially selective for reward-predictive stimuli, much like the amygdala. The onset of reward-predictive signals in each brain area suggested different routes of processing for reward-predictive stimuli appearing in the ipsilateral and contralateral fields. Moreover, neurons in the amygdala, but not BF, tracked trial-to-trial fluctuations in spatial attention. These results suggest that the amygdala and BF could play distinct yet inter-related roles in influencing attention elicited by reward-predictive stimuli. Copyright © 2014 the authors 0270-6474/14/3413757-11$15.00/0.
Lambert, Kelly G.; Hyer, Molly M.; Rzucidlo, Amanda A.; Bergeron, Timothy; Landis, Timothy; Bardi, Massimo
2014-01-01
Emotional resilience enhances an animal's ability to maintain physiological allostasis and adaptive responses in the midst of challenges ranging from cognitive uncertainty to chronic stress. In the current study, neurobiological factors related to strategic responses to uncertainty produced by prediction errors were investigated by initially profiling male rats as passive, active or flexible copers (n = 12 each group) and assigning to either a contingency-trained or non-contingency trained group. Animals were subsequently trained in a spatial learning task so that problem solving strategies in the final probe task, as well-various biomarkers of brain activation and plasticity in brain areas associated with cognition and emotional regulation, could be assessed. Additionally, fecal samples were collected to further determine markers of stress responsivity and emotional resilience. Results indicated that contingency-trained rats exhibited more adaptive responses in the probe trial (e.g., fewer interrupted grooming sequences and more targeted search strategies) than the noncontingent-trained rats; additionally, increased DHEA/CORT ratios were observed in the contingent-trained animals. Diminished activation of the habenula (i.e., fos-immunoreactivity) was correlated with resilience factors such as increased levels of DHEA metabolites during cognitive training. Of the three coping profiles, flexible copers exhibited enhanced neuroplasticity (i.e., increased dentate gyrus doublecortin-immunoreactivity) compared to the more consistently responding active and passive copers. Thus, in the current study, contingency training via effort-based reward (EBR) training, enhanced by a flexible coping style, provided neurobiological resilience and adaptive responses to prediction errors in the final probe trial. These findings have implications for psychiatric illnesses that are influenced by altered stress responses and decision-making abilities (e.g., depression). PMID:24808837
Lambert, Kelly G; Hyer, Molly M; Rzucidlo, Amanda A; Bergeron, Timothy; Landis, Timothy; Bardi, Massimo
2014-01-01
Emotional resilience enhances an animal's ability to maintain physiological allostasis and adaptive responses in the midst of challenges ranging from cognitive uncertainty to chronic stress. In the current study, neurobiological factors related to strategic responses to uncertainty produced by prediction errors were investigated by initially profiling male rats as passive, active or flexible copers (n = 12 each group) and assigning to either a contingency-trained or non-contingency trained group. Animals were subsequently trained in a spatial learning task so that problem solving strategies in the final probe task, as well-various biomarkers of brain activation and plasticity in brain areas associated with cognition and emotional regulation, could be assessed. Additionally, fecal samples were collected to further determine markers of stress responsivity and emotional resilience. Results indicated that contingency-trained rats exhibited more adaptive responses in the probe trial (e.g., fewer interrupted grooming sequences and more targeted search strategies) than the noncontingent-trained rats; additionally, increased DHEA/CORT ratios were observed in the contingent-trained animals. Diminished activation of the habenula (i.e., fos-immunoreactivity) was correlated with resilience factors such as increased levels of DHEA metabolites during cognitive training. Of the three coping profiles, flexible copers exhibited enhanced neuroplasticity (i.e., increased dentate gyrus doublecortin-immunoreactivity) compared to the more consistently responding active and passive copers. Thus, in the current study, contingency training via effort-based reward (EBR) training, enhanced by a flexible coping style, provided neurobiological resilience and adaptive responses to prediction errors in the final probe trial. These findings have implications for psychiatric illnesses that are influenced by altered stress responses and decision-making abilities (e.g., depression).
van Meel, Catharina S; Oosterlaan, Jaap; Heslenfeld, Dirk J; Sergeant, Joseph A
2005-01-01
Neuroimaging studies on ADHD suggest abnormalities in brain regions associated with decision-making and reward processing such as the anterior cingulate cortex (ACC) and orbitofrontal cortex. Recently, event-related potential (ERP) studies demonstrated that the ACC is involved in processing feedback signals during guessing and gambling. The resulting negative deflection, the 'feedback-related negativity' (FRN) has been interpreted as reflecting an error in reward prediction. In the present study, ERPs elicited by positive and negative feedback were recorded in children with ADHD and normal controls during guessing. 'Correct' and 'incorrect' guesses resulted in respectively monetary gains and losses. The FRN amplitude to losses was more pronounced in the ADHD group than in normal controls. Positive and negative feedback differentially affected long latency components in the ERP waveforms of normal controls, but not ADHD children. These later deflections might be related to further emotional or strategic processing. The present findings suggest an enhanced sensitivity to unfavourable outcomes in children with ADHD, probably due to abnormalities in mesolimbic reward circuits. In addition, further processing, such as affective evaluation and the assessment of future consequences of the feedback signal seems to be altered in ADHD. These results may further help understanding the neural basis of decision-making deficits in ADHD.
Hippocampal morphology mediates biased memories of chronic pain.
Berger, Sara E; Vachon-Presseau, Étienne; Abdullah, Taha B; Baria, Alex T; Schnitzer, Thomas J; Apkarian, A Vania
2018-02-01
Experiences and memories are often mismatched. While multiple studies have investigated psychological underpinnings of recall error with respect to emotional events, the neurobiological mechanisms underlying the divergence between experiences and memories remain relatively unexplored in the domain of chronic pain. Here we examined the discrepancy between experienced chronic low back pain (CBP) intensity (twice daily ratings) and remembered pain intensity (n = 48 subjects) relative to psychometric properties, hippocampus morphology, memory capabilities, and personality traits related to reward. 77% of CBP patients exaggerated remembered pain, which depended on their strongest experienced pain and their most recent mood rating. This bias persisted over nearly 1 year and was related to reward memory bias and loss aversion. Shape displacement of a specific region in the left posterior hippocampus mediated personality effects on pain memory bias, predicted pain memory bias in a validation CBP group (n = 21), and accounted for 55% of the variance of pain memory bias. In two independent groups (n = 20/group), morphology of this region was stable over time and unperturbed by the development of chronic pain. These results imply that a localized hippocampal circuit, and personality traits associated with reward processing, largely determine exaggeration of daily pain experiences in chronic pain patients. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.
Ott, Derek V M; Ullsperger, Markus; Jocham, Gerhard; Neumann, Jane; Klein, Tilmann A
2011-07-15
The prefrontal cortex is known to play a key role in higher-order cognitive functions. Recently, we showed that this brain region is active in reinforcement learning, during which subjects constantly have to integrate trial outcomes in order to optimize performance. To further elucidate the role of the dorsolateral prefrontal cortex (DLPFC) in reinforcement learning, we applied continuous theta-burst stimulation (cTBS) either to the left or right DLPFC, or to the vertex as a control region, respectively, prior to the performance of a probabilistic learning task in an fMRI environment. While there was no influence of cTBS on learning performance per se, we observed a stimulation-dependent modulation of reward vs. punishment sensitivity: Left-hemispherical DLPFC stimulation led to a more reward-guided performance, while right-hemispherical cTBS induced a more avoidance-guided behavior. FMRI results showed enhanced prediction error coding in the ventral striatum in subjects stimulated over the left as compared to the right DLPFC. Both behavioral and imaging results are in line with recent findings that left, but not right-hemispherical stimulation can trigger a release of dopamine in the ventral striatum, which has been suggested to increase the relative impact of rewards rather than punishment on behavior. Copyright © 2011 Elsevier Inc. All rights reserved.
Treviño, Mario
2014-01-01
Animal choices depend on direct sensory information, but also on the dynamic changes in the magnitude of reward. In visual discrimination tasks, the emergence of lateral biases in the choice record from animals is often described as a behavioral artifact, because these are highly correlated with error rates affecting psychophysical measurements. Here, we hypothesized that biased choices could constitute a robust behavioral strategy to solve discrimination tasks of graded difficulty. We trained mice to swim in a two-alterative visual discrimination task with escape from water as the reward. Their prevalence of making lateral choices increased with stimulus similarity and was present in conditions of high discriminability. While lateralization occurred at the individual level, it was absent, on average, at the population level. Biased choice sequences obeyed the generalized matching law and increased task efficiency when stimulus similarity was high. A mathematical analysis revealed that strongly-biased mice used information from past rewards but not past choices to make their current choices. We also found that the amount of lateralized choices made during the first day of training predicted individual differences in the average learning behavior. This framework provides useful analysis tools to study individualized visual-learning trajectories in mice. PMID:25524257
Lesaint, Florian; Sigaud, Olivier; Flagel, Shelly B; Robinson, Terry E; Khamassi, Mehdi
2014-02-01
Reinforcement Learning has greatly influenced models of conditioning, providing powerful explanations of acquired behaviour and underlying physiological observations. However, in recent autoshaping experiments in rats, variation in the form of Pavlovian conditioned responses (CRs) and associated dopamine activity, have questioned the classical hypothesis that phasic dopamine activity corresponds to a reward prediction error-like signal arising from a classical Model-Free system, necessary for Pavlovian conditioning. Over the course of Pavlovian conditioning using food as the unconditioned stimulus (US), some rats (sign-trackers) come to approach and engage the conditioned stimulus (CS) itself - a lever - more and more avidly, whereas other rats (goal-trackers) learn to approach the location of food delivery upon CS presentation. Importantly, although both sign-trackers and goal-trackers learn the CS-US association equally well, only in sign-trackers does phasic dopamine activity show classical reward prediction error-like bursts. Furthermore, neither the acquisition nor the expression of a goal-tracking CR is dopamine-dependent. Here we present a computational model that can account for such individual variations. We show that a combination of a Model-Based system and a revised Model-Free system can account for the development of distinct CRs in rats. Moreover, we show that revising a classical Model-Free system to individually process stimuli by using factored representations can explain why classical dopaminergic patterns may be observed for some rats and not for others depending on the CR they develop. In addition, the model can account for other behavioural and pharmacological results obtained using the same, or similar, autoshaping procedures. Finally, the model makes it possible to draw a set of experimental predictions that may be verified in a modified experimental protocol. We suggest that further investigation of factored representations in computational neuroscience studies may be useful.
Lesaint, Florian; Sigaud, Olivier; Flagel, Shelly B.; Robinson, Terry E.; Khamassi, Mehdi
2014-01-01
Reinforcement Learning has greatly influenced models of conditioning, providing powerful explanations of acquired behaviour and underlying physiological observations. However, in recent autoshaping experiments in rats, variation in the form of Pavlovian conditioned responses (CRs) and associated dopamine activity, have questioned the classical hypothesis that phasic dopamine activity corresponds to a reward prediction error-like signal arising from a classical Model-Free system, necessary for Pavlovian conditioning. Over the course of Pavlovian conditioning using food as the unconditioned stimulus (US), some rats (sign-trackers) come to approach and engage the conditioned stimulus (CS) itself – a lever – more and more avidly, whereas other rats (goal-trackers) learn to approach the location of food delivery upon CS presentation. Importantly, although both sign-trackers and goal-trackers learn the CS-US association equally well, only in sign-trackers does phasic dopamine activity show classical reward prediction error-like bursts. Furthermore, neither the acquisition nor the expression of a goal-tracking CR is dopamine-dependent. Here we present a computational model that can account for such individual variations. We show that a combination of a Model-Based system and a revised Model-Free system can account for the development of distinct CRs in rats. Moreover, we show that revising a classical Model-Free system to individually process stimuli by using factored representations can explain why classical dopaminergic patterns may be observed for some rats and not for others depending on the CR they develop. In addition, the model can account for other behavioural and pharmacological results obtained using the same, or similar, autoshaping procedures. Finally, the model makes it possible to draw a set of experimental predictions that may be verified in a modified experimental protocol. We suggest that further investigation of factored representations in computational neuroscience studies may be useful. PMID:24550719
Derefinko, Karen J.; Eisenlohr-Moul, Tory A.; Peters, Jessica R.; Roberts, Walter; Walsh, Erin C.; Milich, Richard; Lynam, Donald R.
2017-01-01
Background Physiological responses to reward and extinction are believed to represent the Behavioral Activation System (BAS) and Behavioral Inhibition System (BIS) constructs of Reinforcement Sensitivity Theory and underlie externalizing behaviors, including substance use. However, little research has examined these relations directly. Methods We assessed individuals’ cardiac pre-ejection periods (PEP) and electrodermal responses (EDR) during reward and extinction trials through the “Number Elimination Game” paradigm. Responses represented BAS and BIS, respectively. We then examined whether these responses provided incremental utility in the prediction of future alcohol, marijuana, and cigarette use. Results Zero-inflated Poisson (ZIP) regression models were used to examine the predictive utility of physiological BAS and BIS responses above and beyond previous substance use. Physiological responses accounted for incremental variance over previous use. Low BAS responses during reward predicted frequency of alcohol use at year 3. Low BAS responses during reward and extinction and high BIS responses during extinction predicted frequency of marijuana use at year 3. For cigarette use, low BAS response during extinction predicted use at year 3. Conclusions These findings suggest that the constructs of Reinforcement Sensitivity Theory, as assessed through physiology, contribute to the longitudinal maintenance of substance use. PMID:27306728
The Impact of Financial Reward Contingencies on Cognitive Function Profiles in Adult ADHD
Marx, Ivo; Höpcke, Cornelia; Berger, Christoph; Wandschneider, Roland; Herpertz, Sabine C.
2013-01-01
Objectives Although it is well established that cognitive performance in children with attention-deficit/hyperactivity disorder (ADHD) is affected by reward and that key deficits associated with the disorder may thereby be attenuated or even compensated, this phenomenon in adults with ADHD has thus far not been addressed. Therefore, the aim of the present study was to examine the motivating effect of financial reward on task performance in adults with ADHD by focusing on the domains of executive functioning, attention, time perception, and delay aversion. Methods We examined male and female adults aged 18–40 years with ADHD (n = 38) along with a matched control group (n = 40) using six well-established experimental paradigms. Results Impaired performance in the ADHD group was observed for stop-signal omission errors, n-back accuracy, reaction time variability in the continuous performance task, and time reproduction accuracy, and reward normalized time reproduction accuracy. Furthermore, when rewarded, subjects with ADHD exhibited longer reaction times and fewer false positives in the continuous performance task, which suggests the use of strategies to prevent impulsivity errors. Conclusions Taken together, our results support the existence of both cognitive and motivational mechanisms for the disorder, which is in line with current models of ADHD. Furthermore, our data suggest cognitive strategies of “stopping and thinking” as a possible underlying mechanism for task improvement that seems to be mediated by reward, which highlights the importance of the interaction between motivation and cognition in adult ADHD. PMID:23840573
Music-related reward responses predict episodic memory performance.
Ferreri, Laura; Rodriguez-Fornells, Antoni
2017-12-01
Music represents a special type of reward involving the recruitment of the mesolimbic dopaminergic system. According to recent theories on episodic memory formation, as dopamine strengthens the synaptic potentiation produced by learning, stimuli triggering dopamine release could result in long-term memory improvements. Here, we behaviourally test whether music-related reward responses could modulate episodic memory performance. Thirty participants rated (in terms of arousal, familiarity, emotional valence, and reward) and encoded unfamiliar classical music excerpts. Twenty-four hours later, their episodic memory was tested (old/new recognition and remember/know paradigm). Results revealed an influence of music-related reward responses on memory: excerpts rated as more rewarding were significantly better recognized and remembered. Furthermore, inter-individual differences in the ability to experience musical reward, measured through the Barcelona Music Reward Questionnaire, positively predicted memory performance. Taken together, these findings shed new light on the relationship between music, reward and memory, showing for the first time that music-driven reward responses are directly implicated in higher cognitive functions and can account for individual differences in memory performance.
Previous Cocaine Exposure Makes Rats Hypersensitive to Both Delay and Reward Magnitude
Roesch, Matthew R.; Takahashi, Yuji; Gugsa, Nishan; Bissonette, Gregory B.; Schoenbaum, Geoffrey
2008-01-01
Animals prefer an immediate over a delayed reward, just as they prefer a large over a small reward. Exposure to psychostimulants causes long-lasting changes in structures critical for this behavior and might disrupt normal time-discounting performance. To test this hypothesis, we exposed rats to cocaine daily for 2 weeks (30 mg/kg, i.p.). Approximately 6 weeks later, we tested them on a variant of a time-discounting task, in which the rats responded to one of two locations to obtain reward while we independently manipulated the delay to reward and reward magnitude. Performance did not differ between cocaine-treated and saline-treated (control) rats when delay lengths and reward magnitudes were equal at the two locations. However, cocaine-treated rats were significantly more likely to shift their responding when we increased the delay or reward size asymmetrically. Furthermore, they were slower to respond and made more errors when forced to the side associated with the lower value. We conclude that previous exposure to cocaine makes choice behavior hypersensitive to differences in the time to and size of available rewards, consistent with a general effect of cocaine exposure on reward valuation mechanisms. PMID:17202492
Previous cocaine exposure makes rats hypersensitive to both delay and reward magnitude.
Roesch, Matthew R; Takahashi, Yuji; Gugsa, Nishan; Bissonette, Gregory B; Schoenbaum, Geoffrey
2007-01-03
Animals prefer an immediate over a delayed reward, just as they prefer a large over a small reward. Exposure to psychostimulants causes long-lasting changes in structures critical for this behavior and might disrupt normal time-discounting performance. To test this hypothesis, we exposed rats to cocaine daily for 2 weeks (30 mg/kg, i.p.). Approximately 6 weeks later, we tested them on a variant of a time-discounting task, in which the rats responded to one of two locations to obtain reward while we independently manipulated the delay to reward and reward magnitude. Performance did not differ between cocaine-treated and saline-treated (control) rats when delay lengths and reward magnitudes were equal at the two locations. However, cocaine-treated rats were significantly more likely to shift their responding when we increased the delay or reward size asymmetrically. Furthermore, they were slower to respond and made more errors when forced to the side associated with the lower value. We conclude that previous exposure to cocaine makes choice behavior hypersensitive to differences in the time to and size of available rewards, consistent with a general effect of cocaine exposure on reward valuation mechanisms.
On the distinction between open and closed economies.
Timberlake, W; Peden, B F
1987-01-01
Open and closed economies have been assumed to produce opposite relations between responding and the programmed density of reward (the amount of reward divided by its cost). Experimental procedures that are treated as open economies typically dissociate responding and total reward by providing supplemental income outside the experimental session; procedures construed as closed economies do not. In an open economy responding is assumed to be directly related to reward density, whereas in a closed economy responding is assumed to be inversely related to reward density. In contrast to this predicted correlation between response-reward relations and type of economy, behavior regulation theory predicts both direct and inverse relations in both open and closed economies. Specifically, responding should be a bitonic function of reward density regardless of the type of economy and is dependent only on the ratio of the schedule terms rather than on their absolute size. These predictions were tested by four experiments in which pigeons' key pecking produced food on fixed-ratio and variable-interval schedules over a range of reward magnitudes and under several open- and closed-economy procedures. The results better supported the behavior regulation view by showing a general bitonic function between key pecking and food density in all conditions. In most cases, the absolute size of the schedule requirement and the magnitude of reward had no effect; equal ratios of these terms produced approximately equal responding. PMID:3625103
Dissociating error-based and reinforcement-based loss functions during sensorimotor learning
McGregor, Heather R.; Mohatarem, Ayman
2017-01-01
It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback. PMID:28753634
Dissociating error-based and reinforcement-based loss functions during sensorimotor learning.
Cashaback, Joshua G A; McGregor, Heather R; Mohatarem, Ayman; Gribble, Paul L
2017-07-01
It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback.
Medial prefrontal cortex as an action-outcome predictor.
Alexander, William H; Brown, Joshua W
2011-09-18
The medial prefrontal cortex (mPFC) and especially anterior cingulate cortex is central to higher cognitive function and many clinical disorders, yet its basic function remains in dispute. Various competing theories of mPFC have treated effects of errors, conflict, error likelihood, volatility and reward, using findings from neuroimaging and neurophysiology in humans and monkeys. No single theory has been able to reconcile and account for the variety of findings. Here we show that a simple model based on standard learning rules can simulate and unify an unprecedented range of known effects in mPFC. The model reinterprets many known effects and suggests a new view of mPFC, as a region concerned with learning and predicting the likely outcomes of actions, whether good or bad. Cognitive control at the neural level is then seen as a result of evaluating the probable and actual outcomes of one's actions. © 2011 Nature America, Inc. All rights reserved.
Medial prefrontal cortex as an action-outcome predictor
Alexander, William H.; Brown, Joshua W.
2011-01-01
The medial prefrontal cortex (mPFC) and especially anterior cingulate cortex (ACC) is central to higher cognitive function and numerous clinical disorders, yet its basic function remains in dispute. Various competing theories of mPFC have treated effects of errors, conflict, error likelihood, volatility, and reward, based on findings from neuroimaging and neurophysiology in humans and monkeys. To date, no single theory has been able to reconcile and account for the variety of findings. Here we show that a simple model based on standard learning rules can simulate and unify an unprecedented range of known effects in mPFC. The model reinterprets many known effects and suggests a new view of mPFC, as a region concerned with learning and predicting the likely outcomes of actions, whether good or bad. Cognitive control at the neural level is then seen as a result of evaluating the probable and actual outcomes of one's actions. PMID:21926982
Activation of VTA GABA neurons disrupts reward consumption
van Zessen, Ruud; Phillips, Jana L.; Budygin, Evgeny A.; Stuber, Garret D.
2012-01-01
The activity of Ventral Tegmental Area (VTA) dopamine (DA) neurons promotes behavioral responses to rewards and environmental stimuli that predict them. VTA GABA inputs synapse directly onto DA neurons and may regulate DA neuronal activity to alter reward-related behaviors, however, the functional consequences of selective activation of VTA GABA neurons remains unknown. Here, we show that in vivo optogenetic activation of VTA GABA neurons disrupts reward consummatory behavior, but not conditioned anticipatory behavior in response to reward-predictive cues. In addition, direct activation of VTA GABA projections to the nucleus accumbens (NAc) resulted in detectable GABA release, but did not alter reward consumption. Furthermore, optogenetic stimulation of VTA GABA neurons directly suppressed the activity and excitability of neighboring DA neurons, as well as the release of DA in the NAc, suggesting that the dynamic interplay between VTA DA and GABA neurons can control the initiation and termination of reward-related behaviors. PMID:22445345
Morie, K P; De Sanctis, P; Foxe, J J
2014-07-25
Task execution almost always occurs in the context of reward-seeking or punishment-avoiding behavior. As such, ongoing task-monitoring systems are influenced by reward anticipation systems. In turn, when a task has been executed either successfully or unsuccessfully, future iterations of that task will be re-titrated on the basis of the task outcome. Here, we examined the neural underpinnings of the task-monitoring and reward-evaluation systems to better understand how they govern reward-seeking behavior. Twenty-three healthy adult participants performed a task where they accrued points that equated to real world value (gift cards) by responding as rapidly as possible within an allotted timeframe, while success rate was titrated online by changing the duration of the timeframe dependent on participant performance. Informative cues initiated each trial, indicating the probability of potential reward or loss (four levels from very low to very high). We manipulated feedback by first informing participants of task success/failure, after which a second feedback signal indicated actual magnitude of reward/loss. High-density electroencephalography (EEG) recordings allowed for examination of event-related potentials (ERPs) to the informative cues and in turn, to both feedback signals. Distinct ERP components associated with reward cues, task-preparatory and task-monitoring processes, and reward feedback processes were identified. Unsurprisingly, participants displayed increased ERP amplitudes associated with task-preparatory processes following cues that predicted higher chances of reward. They also rapidly updated reward and loss prediction information dependent on task performance after the first feedback signal. Finally, upon reward receipt, initial reward probability was no longer taken into account. Rather, ERP measures suggested that only the magnitude of actual reward or loss was now processed. Reward and task-monitoring processes are clearly dissociable, but interact across very fast timescales to update reward predictions as information about task success or failure is accrued. Careful delineation of these processes will be useful in future investigations in clinical groups where such processes are suspected of having gone awry. Copyright © 2014 IBRO. Published by Elsevier Ltd. All rights reserved.
Morie, Kristen P.; De Sanctis, Pierfilippo; Foxe, John J.
2014-01-01
Task execution almost always occurs in the context of reward-seeking or punishment-avoiding behavior. As such, ongoing task monitoring systems are influenced by reward anticipation systems. In turn, when a task has been executed either successfully or unsuccessfully, future iterations of that task will be re-titrated on the basis of the task outcome. Here, we examined the neural underpinnings of the task-monitoring and reward-evaluation systems to better understand how they govern reward seeking behavior. Twenty-three healthy adult participants performed a task where they accrued points that equated to real world value (gift cards) by responding as rapidly as possible within an allotted timeframe, while success rate was titrated online by changing the duration of the timeframe dependent on participant performance. Informative cues initiated each trial, indicating the probability of potential reward or loss (four levels from very low to very high). We manipulated feedback by first informing participants of task success/failure, after which a second feedback signal indicated actual magnitude of reward/loss. High-density EEG recordings allowed for examination of event-related potentials (ERPs) to the informative cues and in turn, to both feedback signals. Distinct ERP components associated with reward cues, task preparatory and task monitoring processes, and reward feedback processes were identified. Unsurprisingly, participants displayed increased ERP amplitudes associated with task preparatory processes following cues that predicted higher chances of reward. They also rapidly updated reward and loss prediction information dependent on task performance after the first feedback signal. Finally, upon reward receipt, initial reward probability was no longer taken into account. Rather, ERP measures suggested that only the magnitude of actual reward or loss was now processed. Reward and task monitoring processes are clearly dissociable, but interact across very fast timescales to update reward predictions as information about task success or failure is accrued. Careful delineation of these processes will be useful in future investigations in clinical groups where such processes are suspected of having gone awry. PMID:24836852
Badre, David
2012-01-01
Growing evidence suggests that the prefrontal cortex (PFC) is organized hierarchically, with more anterior regions having increasingly abstract representations. How does this organization support hierarchical cognitive control and the rapid discovery of abstract action rules? We present computational models at different levels of description. A neural circuit model simulates interacting corticostriatal circuits organized hierarchically. In each circuit, the basal ganglia gate frontal actions, with some striatal units gating the inputs to PFC and others gating the outputs to influence response selection. Learning at all of these levels is accomplished via dopaminergic reward prediction error signals in each corticostriatal circuit. This functionality allows the system to exhibit conditional if–then hypothesis testing and to learn rapidly in environments with hierarchical structure. We also develop a hybrid Bayesian-reinforcement learning mixture of experts (MoE) model, which can estimate the most likely hypothesis state of individual participants based on their observed sequence of choices and rewards. This model yields accurate probabilistic estimates about which hypotheses are attended by manipulating attentional states in the generative neural model and recovering them with the MoE model. This 2-pronged modeling approach leads to multiple quantitative predictions that are tested with functional magnetic resonance imaging in the companion paper. PMID:21693490
Hintsa, Taina; Hintsanen, Mirka; Jokela, Markus; Pulkki-Råback, Laura; Keltikangas-Järvinen, Liisa
2013-06-01
Personality dispositions may influence perceptions of work stress. The paper examines the relationship between temperament in terms of Strelau’s Regulative Theory of Temperament and the effort-reward imbalance and its components. There were 890 participants (360 men) aged 37.9 years on average. Temperament traits of briskness and perseveration (temporal characteristics of behavior), sensory sensitivity, emotional reactivity, endurance and activity (energetic characteristics of behavior) were measured by Strelau & Zawadzki’s Formal Characteristics of Behavior- Temperament Inventory (FCB-TI) in 1997 and 2001. Effort and reward at work were assessed with the original effort- reward imbalance (ERI) questionnaire of 2007. Higher ERI at work was predicted by higher emotional reactivity, higher perseveration, lower briskness, and lower endurance. Higher effort and lower rewards at work were predicted by higher perseveration and lower endurance. The FCB-TI temperament characteristics accounted for 5.2%, 4.8% and 6.5% of the variance in the ERI, effort and reward, respectively. Lower emotional reactivity, lower perseveration, higher brisk- ness and higher endurance predicted higher esteem at work, job promotion and job security. Individual differences in arousability, reflected in temporal and energetic characteristics of behavior, may predispose to or to protect from an effort-reward imbalance at work. Individual differences should be acknowledged in work stress prevention and developing interventions.
Opponent appetitive-aversive neural processes underlie predictive learning of pain relief.
Seymour, Ben; O'Doherty, John P; Koltzenburg, Martin; Wiech, Katja; Frackowiak, Richard; Friston, Karl; Dolan, Raymond
2005-09-01
Termination of a painful or unpleasant event can be rewarding. However, whether the brain treats relief in a similar way as it treats natural reward is unclear, and the neural processes that underlie its representation as a motivational goal remain poorly understood. We used fMRI (functional magnetic resonance imaging) to investigate how humans learn to generate expectations of pain relief. Using a pavlovian conditioning procedure, we show that subjects experiencing prolonged experimentally induced pain can be conditioned to predict pain relief. This proceeds in a manner consistent with contemporary reward-learning theory (average reward/loss reinforcement learning), reflected by neural activity in the amygdala and midbrain. Furthermore, these reward-like learning signals are mirrored by opposite aversion-like signals in lateral orbitofrontal cortex and anterior cingulate cortex. This dual coding has parallels to 'opponent process' theories in psychology and promotes a formal account of prediction and expectation during pain.
Post-learning hippocampal dynamics promote preferential retention of rewarding events
Gruber, Matthias J.; Ritchey, Maureen; Wang, Shao-Fang; Doss, Manoj K.; Ranganath, Charan
2016-01-01
Reward motivation is known to modulate memory encoding, and this effect depends on interactions between the substantia nigra/ ventral tegmental area complex (SN/VTA) and the hippocampus. It is unknown, however, whether these interactions influence offline neural activity in the human brain that is thought to promote memory consolidation. Here, we used functional magnetic resonance imaging (fMRI) to test the effect of reward motivation on post-learning neural dynamics and subsequent memory for objects that were learned in high- or low-reward motivation contexts. We found that post-learning increases in resting-state functional connectivity between the SN/VTA and hippocampus predicted preferential retention of objects that were learned in high-reward contexts. In addition, multivariate pattern classification revealed that hippocampal representations of high-reward contexts were preferentially reactivated during post-learning rest, and the number of hippocampal reactivations was predictive of preferential retention of items learned in high-reward contexts. These findings indicate that reward motivation alters offline post-learning dynamics between the SN/VTA and hippocampus, providing novel evidence for a potential mechanism by which reward could influence memory consolidation. PMID:26875624
Remembering forward: Neural correlates of memory and prediction in human motor adaptation
Scheidt, Robert A; Zimbelman, Janice L; Salowitz, Nicole M G; Suminski, Aaron J; Mosier, Kristine M; Houk, James; Simo, Lucia
2011-01-01
We used functional MR imaging (FMRI), a robotic manipulandum and systems identification techniques to examine neural correlates of predictive compensation for spring-like loads during goal-directed wrist movements in neurologically-intact humans. Although load changed unpredictably from one trial to the next, subjects nevertheless used sensorimotor memories from recent movements to predict and compensate upcoming loads. Prediction enabled subjects to adapt performance so that the task was accomplished with minimum effort. Population analyses of functional images revealed a distributed, bilateral network of cortical and subcortical activity supporting predictive load compensation during visual target capture. Cortical regions - including prefrontal, parietal and hippocampal cortices - exhibited trial-by-trial fluctuations in BOLD signal consistent with the storage and recall of sensorimotor memories or “states” important for spatial working memory. Bilateral activations in associative regions of the striatum demonstrated temporal correlation with the magnitude of kinematic performance error (a signal that could drive reward-optimizing reinforcement learning and the prospective scaling of previously learned motor programs). BOLD signal correlations with load prediction were observed in the cerebellar cortex and red nuclei (consistent with the idea that these structures generate adaptive fusimotor signals facilitating cancellation of expected proprioceptive feedback, as required for conditional feedback adjustments to ongoing motor commands and feedback error learning). Analysis of single subject images revealed that predictive activity was at least as likely to be observed in more than one of these neural systems as in just one. We conclude therefore that motor adaptation is mediated by predictive compensations supported by multiple, distributed, cortical and subcortical structures. PMID:21840405
Gaudio, Jennifer L; Snowdon, Charles T
2008-11-01
Animals living in stable home ranges have many potential cues to locate food. Spatial and color cues are important for wild Callitrichids (marmosets and tamarins). Field studies have assigned the highest priority to distal spatial cues for determining the location of food resources with color cues serving as a secondary cue to assess relative ripeness, once a food source is located. We tested two hypotheses with captive cotton-top tamarins: (a) Tamarins will demonstrate higher rates of initial learning when rewarded for attending to spatial cues versus color cues. (b) Tamarins will show higher rates of correct responses when transferred from color cues to spatial cues than from spatial cues to color cues. The results supported both hypotheses. Tamarins rewarded based on spatial location made significantly more correct choices and fewer errors than tamarins rewarded based on color cues during initial learning. Furthermore, tamarins trained on color cues showed significantly increased correct responses and decreased errors when cues were reversed to reward spatial cues. Subsequent reversal to color cues induced a regression in performance. For tamarins spatial cues appear more salient than color cues in a foraging task. (PsycINFO Database Record (c) 2008 APA, all rights reserved).
Roos, Corey R; Mann, Karl; Witkiewitz, Katie
2017-11-01
Researchers have sought to distinguish between individuals whose alcohol use disorder (AUD) is maintained by drinking to relieve negative affect ('relief drinkers') and those whose AUD is maintained by the rewarding effects of alcohol ('reward drinkers'). As an opioid receptor antagonist, naltrexone may be particularly effective for reward drinkers. Acamprosate, which has been shown to down-regulate the glutamatergic system, may be particularly effective for relief drinkers. This study sought to replicate and extend prior work (PREDICT study; Glöckner-Rist et al. ) by examining dimensions of reward and relief temptation to drink and subtypes of individuals with distinct patterns of reward/relief temptation. We utilized data from two randomized clinical trials for AUD (Project MATCH, n = 1726 and COMBINE study, n = 1383). We also tested whether classes of reward/relief temptation would predict differential response to naltrexone and acamprosate in COMBINE. Results replicated prior work by identifying reward and relief temptation factors, which had excellent reliability and construct validity. Using factor mixture modeling, we identified five distinct classes of reward/relief temptation that replicated across studies. In COMBINE, we found a significant class-by-acamprosate interaction effect. Among those most likely classified in the high relief/moderate reward temptation class, individuals had better drinking outcomes if assigned to acamprosate versus placebo. We did not find a significant class-by-naltrexone interaction effect. Our study questions the orthogonal classification of drinkers into only two types (reward or relief drinkers) and adds to the body of research on moderators of acamprosate, which may inform clinical decision making in the treatment of AUD. © 2016 Society for the Study of Addiction.
Nelson, Brady D; Perlman, Greg; Klein, Daniel N; Kotov, Roman; Hajcak, Greg
2016-12-01
A blunted neural response to rewards has recently emerged as a potential mechanistic biomarker of adolescent depression. The reward positivity, an event-related potential elicited by feedback indicating monetary gain relative to loss, has been associated with risk for depression. The authors examined whether the reward positivity prospectively predicted the development of depression 18 months later in a large community sample of adolescent girls. The sample included 444 girls 13.5-15.5 years old with no lifetime history of a depressive disorder, along with a biological parent for each girl. At baseline, the adolescents' reward positivity was measured using a monetary guessing task, their current depressive symptoms were assessed using a self-report questionnaire, and the adolescents' and parents' lifetime psychiatric histories were evaluated with diagnostic interviews. The same interview and questionnaire were administered to the adolescents again approximately 18 months later. A blunted reward positivity at baseline predicted first-onset depressive disorder and greater depressive symptom scores 18 months later. The reward positivity was also a significant predictor independent of other prominent risk factors, including baseline depressive symptoms and adolescent and parental lifetime psychiatric history. The combination of a blunted reward positivity and greater depressive symptom scores at baseline provided the greatest positive predictive value for first-onset depressive disorder. This study provides strong converging evidence that a blunted neural response to rewards precedes adolescent-onset depression and symptom emergence. Blunted neural response may therefore constitute an important target for screening and prevention.
The alcoholic brain: neural bases of impaired reward-based decision-making in alcohol use disorders.
Galandra, Caterina; Basso, Gianpaolo; Cappa, Stefano; Canessa, Nicola
2018-03-01
Neuroeconomics is providing insights into the neural bases of decision-making in normal and pathological conditions. In the neuropsychiatric domain, this discipline investigates how abnormal functioning of neural systems associated with reward processing and cognitive control promotes different disorders, and whether such evidence may inform treatments. This endeavor is crucial when studying different types of addiction, which share a core promoting mechanism in the imbalance between impulsive subcortical neural signals associated with immediate pleasurable outcomes and inhibitory signals mediated by a prefrontal reflective system. The resulting impairment in behavioral control represents a hallmark of alcohol use disorders (AUDs), a chronic relapsing disorder characterized by excessive alcohol consumption despite devastating consequences. This review aims to summarize available magnetic resonance imaging (MRI) evidence on reward-related decision-making alterations in AUDs, and to envision possible future research directions. We review functional MRI (fMRI) studies using tasks involving monetary rewards, as well as MRI studies relating decision-making parameters to neurostructural gray- or white-matter metrics. The available data suggest that excessive alcohol exposure affects neural signaling within brain networks underlying adaptive behavioral learning via the implementation of prediction errors. Namely, weaker ventromedial prefrontal cortex activity and altered connectivity between ventral striatum and dorsolateral prefrontal cortex likely underpin a shift from goal-directed to habitual actions which, in turn, might underpin compulsive alcohol consumption and relapsing episodes despite adverse consequences. Overall, these data highlight abnormal fronto-striatal connectivity as a candidate neurobiological marker of impaired choice in AUDs. Further studies are needed, however, to unveil its implications in the multiple facets of decision-making.
Anticipation of Monetary Reward Can Attenuate the Vigilance Decrement
Grosso, Mallory; Liu, Guanyu; Mitko, Alex; Morris, Rachael; DeGutis, Joseph
2016-01-01
Motivation and reward can have differential effects on separate aspects of sustained attention. We previously demonstrated that continuous reward/punishment throughout a sustained attention task improves overall performance, but not vigilance decrements. One interpretation of these findings is that vigilance decrements are due to resource depletion, which is not overcome by increasing overall motivation. However, an alternative explanation is that as one performs a continuously rewarded task there are less potential gains/losses as the task progresses, which could decrease motivation over time, producing a vigilance decrement. This would predict that keeping future gains/losses consistent throughout the task would reduce the vigilance decrement. In the current study, we examined this possibility by comparing two versions (continuous-small loss vs. anticipate-large loss) of a 10-minute gradual onset continuous performance task (gradCPT), a challenging go/no-go sustained attention task. Participants began each task with the potential to keep $18. In the continuous-small-loss version, small monetary losses were accrued continuously throughout the task for each error. However, in the anticipate-large-loss version, participants lost all $18 if they erroneously responded to one target that always appeared toward the end of the vigil. Typical vigilance decrements were observed in the continuous-small-loss condition. In the anticipate-large-loss condition, vigilance decrements were reduced, particularly when the anticipate-large loss condition was completed second. This suggests that the looming possibility of a large loss can attenuate the vigilance decrement and that this attenuation may occur most consistently after sufficient task experience. We discuss these results in the context of current theories of sustained attention. PMID:27472785
Impairments in action-outcome learning in schizophrenia.
Morris, Richard W; Cyrzon, Chad; Green, Melissa J; Le Pelley, Mike E; Balleine, Bernard W
2018-03-03
Learning the causal relation between actions and their outcomes (AO learning) is critical for goal-directed behavior when actions are guided by desire for the outcome. This can be contrasted with habits that are acquired by reinforcement and primed by prevailing stimuli, in which causal learning plays no part. Recently, we demonstrated that goal-directed actions are impaired in schizophrenia; however, whether this deficit exists alongside impairments in habit or reinforcement learning is unknown. The present study distinguished deficits in causal learning from reinforcement learning in schizophrenia. We tested people with schizophrenia (SZ, n = 25) and healthy adults (HA, n = 25) in a vending machine task. Participants learned two action-outcome contingencies (e.g., push left to get a chocolate M&M, push right to get a cracker), and they also learned one contingency was degraded by delivery of noncontingent outcomes (e.g., free M&Ms), as well as changes in value by outcome devaluation. Both groups learned the best action to obtain rewards; however, SZ did not distinguish the more causal action when one AO contingency was degraded. Moreover, action selection in SZ was insensitive to changes in outcome value unless feedback was provided, and this was related to the deficit in AO learning. The failure to encode the causal relation between action and outcome in schizophrenia occurred without any apparent deficit in reinforcement learning. This implies that poor goal-directed behavior in schizophrenia cannot be explained by a more primary deficit in reward learning such as insensitivity to reward value or reward prediction errors.
Berthet, Pierre; Hellgren-Kotaleski, Jeanette; Lansner, Anders
2012-01-01
Several studies have shown a strong involvement of the basal ganglia (BG) in action selection and dopamine dependent learning. The dopaminergic signal to striatum, the input stage of the BG, has been commonly described as coding a reward prediction error (RPE), i.e., the difference between the predicted and actual reward. The RPE has been hypothesized to be critical in the modulation of the synaptic plasticity in cortico-striatal synapses in the direct and indirect pathway. We developed an abstract computational model of the BG, with a dual pathway structure functionally corresponding to the direct and indirect pathways, and compared its behavior to biological data as well as other reinforcement learning models. The computations in our model are inspired by Bayesian inference, and the synaptic plasticity changes depend on a three factor Hebbian–Bayesian learning rule based on co-activation of pre- and post-synaptic units and on the value of the RPE. The model builds on a modified Actor-Critic architecture and implements the direct (Go) and the indirect (NoGo) pathway, as well as the reward prediction (RP) system, acting in a complementary fashion. We investigated the performance of the model system when different configurations of the Go, NoGo, and RP system were utilized, e.g., using only the Go, NoGo, or RP system, or combinations of those. Learning performance was investigated in several types of learning paradigms, such as learning-relearning, successive learning, stochastic learning, reversal learning and a two-choice task. The RPE and the activity of the model during learning were similar to monkey electrophysiological and behavioral data. Our results, however, show that there is not a unique best way to configure this BG model to handle well all the learning paradigms tested. We thus suggest that an agent might dynamically configure its action selection mode, possibly depending on task characteristics and also on how much time is available. PMID:23060764
Anticipatory Pleasure Predicts Motivation for Reward in Major Depression
Sherdell, Lindsey; Waugh, Christian E.; Gotlib, Ian H.
2012-01-01
Anhedonia, the lack of interest or pleasure in response to hedonic stimuli or experiences, is a cardinal symptom of depression. This deficit in hedonic processing has been posited to influence depressed individuals’ motivation to engage in potentially rewarding experiences. Accumulating evidence indicates that hedonic processing is not a unitary construct but rather consists of an anticipatory and a consummatory phase. We examined how these components of hedonic processing influence motivation to obtain reward in participants diagnosed with major depression and in never-disordered controls. Thirty-eight currently depressed and 30 never-disordered control participants rated their liking of humorous and nonhumorous cartoons and then made a series of choices between viewing a cartoon from either group. Each choice was associated with a specified amount of effort participants would have to exert before viewing the chosen cartoon. Although depressed and control participants did not differ in their consummatory liking of the rewards, levels of reward liking predicted motivation to expend effort for the rewards only in the control participants; in the depressed participants, liking and motivation were dissociated. In the depressed group, levels of anticipatory anhedonia predicted motivation to exert effort for the rewards. These findings support the formulation that anhedonia is not a unitary construct and suggest that, for depressed individuals, deficits in motivation for reward are driven primarily by low anticipatory pleasure and not by decreased consummatory liking. PMID:21842963
Rudebeck, Peter H; Ripple, Joshua A; Mitz, Andrew R; Averbeck, Bruno B; Murray, Elisabeth A
2017-02-22
Orbitofrontal cortex (OFC), medial frontal cortex (MFC), and amygdala mediate stimulus-reward learning, but the mechanisms through which they interact are unclear. Here, we investigated how neurons in macaque OFC and MFC signaled rewards and the stimuli that predicted them during learning with and without amygdala input. Macaques performed a task that required them to evaluate two stimuli and then choose one to receive the reward associated with that option. Four main findings emerged. First, amygdala lesions slowed the acquisition and use of stimulus-reward associations. Further analyses indicated that this impairment was due, at least in part, to ineffective use of negative feedback to guide subsequent decisions. Second, the activity of neurons in OFC and MFC rapidly evolved to encode the amount of reward associated with each stimulus. Third, amygdalectomy reduced encoding of stimulus-reward associations during the evaluation of different stimuli. Reward encoding of anticipated and received reward after choices were made was not altered. Fourth, amygdala lesions led to an increase in the proportion of neurons in MFC, but not OFC, that encoded the instrumental response that monkeys made on each trial. These correlated changes in behavior and neural activity after amygdala lesions strongly suggest that the amygdala contributes to the ability to learn stimulus-reward associations rapidly by shaping encoding within OFC and MFC. SIGNIFICANCE STATEMENT Altered functional interactions among orbital frontal cortex (OFC), medial frontal cortex (MFC), and amygdala are thought to underlie several psychiatric conditions, many related to reward learning. Here, we investigated the causal contribution of the amygdala to the development of neuronal activity in macaque OFC and MFC related to rewards and the stimuli that predict them during learning. Without amygdala inputs, neurons in both OFC and MFC showed decreased encoding of stimulus-reward associations. MFC also showed increased encoding of the instrumental responses that monkeys made on each trial. Behaviorally, changes in neural activity were accompanied by slower stimulus-reward learning. The findings suggest that interactions among amygdala, OFC, and MFC contribute to learning about stimuli that predict rewards. Copyright © 2017 the authors 0270-6474/17/372186-17$15.00/0.
Averbeck, Bruno B.
2017-01-01
Orbitofrontal cortex (OFC), medial frontal cortex (MFC), and amygdala mediate stimulus–reward learning, but the mechanisms through which they interact are unclear. Here, we investigated how neurons in macaque OFC and MFC signaled rewards and the stimuli that predicted them during learning with and without amygdala input. Macaques performed a task that required them to evaluate two stimuli and then choose one to receive the reward associated with that option. Four main findings emerged. First, amygdala lesions slowed the acquisition and use of stimulus–reward associations. Further analyses indicated that this impairment was due, at least in part, to ineffective use of negative feedback to guide subsequent decisions. Second, the activity of neurons in OFC and MFC rapidly evolved to encode the amount of reward associated with each stimulus. Third, amygdalectomy reduced encoding of stimulus–reward associations during the evaluation of different stimuli. Reward encoding of anticipated and received reward after choices were made was not altered. Fourth, amygdala lesions led to an increase in the proportion of neurons in MFC, but not OFC, that encoded the instrumental response that monkeys made on each trial. These correlated changes in behavior and neural activity after amygdala lesions strongly suggest that the amygdala contributes to the ability to learn stimulus–reward associations rapidly by shaping encoding within OFC and MFC. SIGNIFICANCE STATEMENT Altered functional interactions among orbital frontal cortex (OFC), medial frontal cortex (MFC), and amygdala are thought to underlie several psychiatric conditions, many related to reward learning. Here, we investigated the causal contribution of the amygdala to the development of neuronal activity in macaque OFC and MFC related to rewards and the stimuli that predict them during learning. Without amygdala inputs, neurons in both OFC and MFC showed decreased encoding of stimulus–reward associations. MFC also showed increased encoding of the instrumental responses that monkeys made on each trial. Behaviorally, changes in neural activity were accompanied by slower stimulus–reward learning. The findings suggest that interactions among amygdala, OFC, and MFC contribute to learning about stimuli that predict rewards. PMID:28123082
A measurement-based performability model for a multiprocessor system
NASA Technical Reports Server (NTRS)
Ilsueh, M. C.; Iyer, Ravi K.; Trivedi, K. S.
1987-01-01
A measurement-based performability model based on real error-data collected on a multiprocessor system is described. Model development from the raw errror-data to the estimation of cumulative reward is described. Both normal and failure behavior of the system are characterized. The measured data show that the holding times in key operational and failure states are not simple exponential and that semi-Markov process is necessary to model the system behavior. A reward function, based on the service rate and the error rate in each state, is then defined in order to estimate the performability of the system and to depict the cost of different failure types and recovery procedures.
Hauser, Tobias U; Iannaccone, Reto; Ball, Juliane; Mathys, Christoph; Brandeis, Daniel; Walitza, Susanne; Brem, Silvia
2014-10-01
Attention-deficit/hyperactivity disorder (ADHD) has been associated with deficient decision making and learning. Models of ADHD have suggested that these deficits could be caused by impaired reward prediction errors (RPEs). Reward prediction errors are signals that indicate violations of expectations and are known to be encoded by the dopaminergic system. However, the precise learning and decision-making deficits and their neurobiological correlates in ADHD are not well known. To determine the impaired decision-making and learning mechanisms in juvenile ADHD using advanced computational models, as well as the related neural RPE processes using multimodal neuroimaging. Twenty adolescents with ADHD and 20 healthy adolescents serving as controls (aged 12-16 years) were examined using a probabilistic reversal learning task while simultaneous functional magnetic resonance imaging and electroencephalogram were recorded. Learning and decision making were investigated by contrasting a hierarchical Bayesian model with an advanced reinforcement learning model and by comparing the model parameters. The neural correlates of RPEs were studied in functional magnetic resonance imaging and electroencephalogram. Adolescents with ADHD showed more simplistic learning as reflected by the reinforcement learning model (exceedance probability, Px = .92) and had increased exploratory behavior compared with healthy controls (mean [SD] decision steepness parameter β: ADHD, 4.83 [2.97]; controls, 6.04 [2.53]; P = .02). The functional magnetic resonance imaging analysis revealed impaired RPE processing in the medial prefrontal cortex during cue as well as during outcome presentation (P < .05, family-wise error correction). The outcome-related impairment in the medial prefrontal cortex could be attributed to deficient processing at 200 to 400 milliseconds after feedback presentation as reflected by reduced feedback-related negativity (ADHD, 0.61 [3.90] μV; controls, -1.68 [2.52] μV; P = .04). The combination of computational modeling of behavior and multimodal neuroimaging revealed that impaired decision making and learning mechanisms in adolescents with ADHD are driven by impaired RPE processing in the medial prefrontal cortex. This novel, combined approach furthers the understanding of the pathomechanisms in ADHD and may advance treatment strategies.
Evidence for a shared representation of sequential cues that engage sign-tracking.
Smedley, Elizabeth B; Smith, Kyle S
2018-06-19
Sign-tracking is a phenomenon whereby cues that predict rewards come to acquire their own motivational value (incentive salience) and attract appetitive behavior. Typically, sign-tracking paradigms have used single auditory, visual, or lever cues presented prior to a reward delivery. Yet, real world examples of events often can be predicted by a sequence of cues. We have shown that animals will sign-track to multiple cues presented in temporal sequence, and with time develop a bias in responding toward a reward distal cue over a reward proximal cue. Further, extinction of responding to the reward proximal cue directly decreases responding to the reward distal cue. One possible explanation of this result is that serial cues become representationally linked with one another. Here we provide further support of this by showing that extinction of responding to a reward distal cue directly reduces responding to a reward proximal cue. We suggest that the incentive salience of one cue can influence the incentive salience of the other cue. Copyright © 2018. Published by Elsevier B.V.
Reward processing in neurodegenerative disease
Perry, David C.; Kramer, Joel H.
2015-01-01
Representation of reward value involves a distributed network including cortical and subcortical structures. Because neurodegenerative illnesses target specific anatomic networks that partially overlap with the reward circuit they would be predicted to have distinct impairments in reward processing. This review presents the existing evidence of reward processing changes in neurodegenerative diseases including mild cognitive impairment, Alzheimer's disease, frontotemporal dementia, amyotrophic lateral sclerosis, Parkinson's disease, and Huntington's disease, as well as in healthy aging. Carefully distinguishing the different aspects of reward processing (primary rewards, secondary rewards, reward-based learning, and reward-based decision-making) and using tasks that differentiate the stages of processing reward will lead to improved understanding of this fundamental process and clarify a contributing cause of behavioral change in these illnesses. PMID:24417286
Hahn, Amanda C.; DeBruine, Lisa M.; Jones, Benedict C.
2015-01-01
The factors that contribute to individual differences in the reward value of cute infant facial characteristics are poorly understood. Here we show that the effect of cuteness on a behavioural measure of the reward value of infant faces is greater among women reporting strong maternal tendencies. By contrast, maternal tendencies did not predict women's subjective ratings of the cuteness of these infant faces. These results show, for the first time, that the reward value of infant facial cuteness is greater among women who report being more interested in interacting with infants, implicating maternal tendencies in individual differences in the reward value of infant cuteness. Moreover, our results indicate that the relationship between maternal tendencies and the reward value of infant facial cuteness is not due to individual differences in women's ability to detect infant cuteness. This latter result suggests that individual differences in the reward value of infant cuteness are not simply a by-product of low-cost, functionless biases in the visual system. PMID:25740842
Lopatina, Nina; McDannald, Michael A.; Styer, Clay V.; Peterson, Jacob F.; Sadacca, Brian F.; Cheer, Joseph F.
2016-01-01
The orbitofrontal cortex (OFC) has been broadly implicated in the ability to use the current value of expected outcomes to guide behavior. Although value correlates have been prominently reported in lateral OFC, they are more often associated with more medial areas. Further, recent studies in primates have suggested a dissociation in which the lateral OFC is involved in credit assignment and representation of reward identity and more medial areas are critical to representing value. Previously, we used unblocking to test more specifically what information about outcomes is represented by OFC neurons in rats; consistent with the proposed dichotomy between the lateral and medial OFC, we found relatively little linear value coding in the lateral OFC (Lopatina et al., 2015). Here we have repeated this experiment, recording in the medial OFC, to test whether such value signals might be found there. Neurons were recorded in an unblocking task as rats learned about cues that signaled either more, less, or the same amount of reward. We found that medial OFC neurons acquired responses to these cues; however, these responses did not signal different reward values across cues. Surprisingly, we found that cells developed responses to cues predicting a change, particularly a decrease, in reward value. This is consistent with a special role for medial OFC in representing current value to support devaluation/revaluation sensitive changes in behavior. SIGNIFICANCE STATEMENT This study uniquely examines encoding in rodent mOFC at the single-unit level in response to cues that predict more, less, or no change in reward in rats during training in a Pavlovian unblocking task, finding more cells responding to change-predictive cues and stronger activity in response to cues predictive of less reward. PMID:27511013
Cerebellar granule cells encode the expectation of reward
Wagner, Mark J; Kim, Tony Hyun; Savall, Joan; Schnitzer, Mark J; Luo, Liqun
2017-01-01
The human brain contains ~60 billion cerebellar granule cells1, which outnumber all other neurons combined. Classical theories posit that a large, diverse population of granule cells allows for highly detailed representations of sensorimotor context, enabling downstream Purkinje cells to sense fine contextual changes2–6. Although evidence suggests a role for cerebellum in cognition7–10, granule cells are known to encode only sensory11–13 and motor14 context. Using two-photon calcium imaging in behaving mice, here we show that granule cells convey information about the expectation of reward. Mice initiated voluntary forelimb movements for delayed water reward. Some granule cells responded preferentially to reward or reward omission, whereas others selectively encoded reward anticipation. Reward responses were not restricted to forelimb movement, as a Pavlovian task evoked similar responses. Compared to predictable rewards, unexpected rewards elicited markedly different granule cell activity despite identical stimuli and licking responses. In both tasks, reward signals were widespread throughout multiple cerebellar lobules. Tracking the same granule cells over several days of learning revealed that cells with reward-anticipating responses emerged from those that responded at the start of learning to reward delivery, whereas reward omission responses grew stronger as learning progressed. The discovery of predictive, non-sensorimotor encoding in granule cells is a major departure from current understanding of these neurons and dramatically enriches contextual information available to postsynaptic Purkinje cells, with important implications for cognitive processing in the cerebellum. PMID:28321129
The Nucleus Accumbens and Pavlovian Reward Learning
Day, Jeremy J.
2011-01-01
The ability to form associations between predictive environmental events and rewarding outcomes is a fundamental aspect of learned behavior. This apparently simple ability likely requires complex neural processing evolved to identify, seek, and utilize natural rewards and redirect these activities based on updated sensory information. Emerging evidence from both animal and human research suggests that this type of processing is mediated in part by the nucleus accumbens and a closely associated network of brain structures. The nucleus accumbens is required for a number of reward-related behaviors, and processes specific information about reward availability, value, and context. Additionally, this structure is critical for the acquisition and expression of most Pavlovian stimulus-reward relationships, and cues that predict rewards produce robust changes in neural activity in the nucleus accumbens. While processing within the nucleus accumbens may enable or promote Pavlovian reward learning in natural situations, it has also been implicated in aspects of human drug addiction, including the ability of drug-paired cues to control behavior. This article will provide a critical review of the existing animal and human literature concerning the role of the NAc in Pavlovian learning with non-drug rewards and consider some clinical implications of these findings. PMID:17404375
Positive mood effects on delay discounting.
Hirsh, Jacob B; Guindon, Alex; Morisano, Dominique; Peterson, Jordan B
2010-10-01
Delay discounting is the process by which the value of an expected reward decreases as the delay to obtaining that reward increases. Individuals with higher discounting rates tend to prefer smaller immediate rewards over larger delayed rewards. Previous research has indicated that personality can influence an individual's discounting rates, with higher levels of Extraversion predicting a preference for immediate gratification. The current study examined how this relationship would be influenced by situational mood inductions. While main effects were observed for both Extraversion and cognitive ability in the prediction of discounting rates, a significant interaction was also observed between Extraversion and positive affect. Extraverted individuals were more likely to prefer an immediate reward when first put in a positive mood. Extraverts thus appear particularly sensitive to impulsive, incentive-reward-driven behavior by temperament and by situational factors heightening positive affect. (PsycINFO Database Record (c) 2010 APA, all rights reserved).
Simen, Patrick; Contreras, David; Buck, Cara; Hu, Peter; Holmes, Philip; Cohen, Jonathan D
2009-12-01
The drift-diffusion model (DDM) implements an optimal decision procedure for stationary, 2-alternative forced-choice tasks. The height of a decision threshold applied to accumulating information on each trial determines a speed-accuracy tradeoff (SAT) for the DDM, thereby accounting for a ubiquitous feature of human performance in speeded response tasks. However, little is known about how participants settle on particular tradeoffs. One possibility is that they select SATs that maximize a subjective rate of reward earned for performance. For the DDM, there exist unique, reward-rate-maximizing values for its threshold and starting point parameters in free-response tasks that reward correct responses (R. Bogacz, E. Brown, J. Moehlis, P. Holmes, & J. D. Cohen, 2006). These optimal values vary as a function of response-stimulus interval, prior stimulus probability, and relative reward magnitude for correct responses. We tested the resulting quantitative predictions regarding response time, accuracy, and response bias under these task manipulations and found that grouped data conformed well to the predictions of an optimally parameterized DDM.
Hasler, Brant P.; Casement, Melynda D.; Sitnick, Stephanie L.; Shaw, Daniel S.; Forbes, Erika E.
2017-01-01
Eveningness, a preference for later sleep-wake timing, is linked to altered reward function, which may explain a consistent association with substance abuse. Notably, the extant literature rests largely on cross-sectional data, yet both eveningness and reward function show developmental changes. We examined whether circadian preference during late adolescence predicted the neural response to reward two years later. A sample of 93 males reported circadian preference and completed a monetary reward fMRI paradigm at ages 20 and 22. Primary analyses examined longitudinal paths from circadian preference to medial prefrontal cortex (mPFC) and ventral striatal (VS) reward responses. We also explored whether reward responses mediated longitudinal associations between circadian preference and alcohol dependence, frequency of alcohol use, and/or frequency of cannabis use. Age 20 eveningness was positively associated with age 22 mPFC and VS responses to win, but not associated with age 22 reactivity to reward anticipation. Age 20 eveningness was indirectly related to age 22 alcohol dependence via age 22 mPFC response to win. Our findings provide novel evidence that altered reward-related brain function could underlie associations between eveningness and alcohol use problems. Eveningness may be an under-recognized but modifiable risk factor for reward-related problems such as mood and substance use disorders. PMID:28254633
Abraham, Antony D; Neve, Kim A; Lattal, K Matthew
2016-07-01
Dopamine is critical for many processes that drive learning and memory, including motivation, prediction error, incentive salience, memory consolidation, and response output. Theories of dopamine's function in these processes have, for the most part, been developed from behavioral approaches that examine learning mechanisms in appetitive tasks. A parallel and growing literature indicates that dopamine signaling is involved in consolidation of memories into stable representations in aversive tasks such as fear conditioning. Relatively little is known about how dopamine may modulate memories that form during extinction, when organisms learn that the relation between previously associated events is severed. We investigated whether fear and reward extinction share common mechanisms that could be enhanced with dopamine D1/5 receptor activation. Pharmacological activation of dopamine D1/5 receptors (with SKF 81297) enhanced extinction of both cued and contextual fear. These effects also occurred in the extinction of cocaine-induced conditioned place preference, suggesting that the observed effects on extinction were not specific to a particular type of procedure (aversive or appetitive). A cAMP/PKA biased D1 agonist (SKF 83959) did not affect fear extinction, whereas a broadly efficacious D1 agonist (SKF 83822) promoted fear extinction. Together, these findings show that dopamine D1/5 receptor activation is a target for the enhancement of fear or reward extinction.
Differences in reward processing between putative cell types in primate prefrontal cortex
Fan, Hongwei; Wang, Rubin; Sakagami, Masamichi
2017-01-01
Single-unit studies in monkeys have demonstrated that neurons in the prefrontal cortex predict the reward type, reward amount or reward availability associated with a stimulus. To examine contributions of pyramidal cells and interneurons in reward processing, single-unit activity was extracellularly recorded in prefrontal cortices of four monkeys performing a reward prediction task. Based on their shapes of spike waveforms, prefrontal neurons were classified into broad-spike and narrow-spike units that represented putative pyramidal cells and interneurons, respectively. We mainly observed that narrow-spike neurons showed higher firing rates but less bursty discharges than did broad-spike neurons. Both narrow-spike and broad-spike cells selectively responded to the stimulus, reward and their interaction, and the proportions of each type of selective neurons were similar between the two cell classes. Moreover, the two types of cells displayed equal reliability of reward or stimulus discrimination. Furthermore, we found that broad-spike and narrow-spike cells showed distinct mechanisms for encoding reward or stimulus information. Broad-spike neurons raised their firing rate relative to the baseline rate to represent the preferred reward or stimulus information, whereas narrow-spike neurons inhibited their firing rate lower than the baseline rate to encode the non-preferred reward or stimulus information. Our results suggest that narrow-spike and broad-spike cells were equally involved in reward and stimulus processing in the prefrontal cortex. They utilized a binary strategy to complementarily represent reward or stimulus information, which was consistent with the task structure in which the monkeys were required to remember two reward conditions and two visual stimuli. PMID:29261734
Differences in reward processing between putative cell types in primate prefrontal cortex.
Fan, Hongwei; Pan, Xiaochuan; Wang, Rubin; Sakagami, Masamichi
2017-01-01
Single-unit studies in monkeys have demonstrated that neurons in the prefrontal cortex predict the reward type, reward amount or reward availability associated with a stimulus. To examine contributions of pyramidal cells and interneurons in reward processing, single-unit activity was extracellularly recorded in prefrontal cortices of four monkeys performing a reward prediction task. Based on their shapes of spike waveforms, prefrontal neurons were classified into broad-spike and narrow-spike units that represented putative pyramidal cells and interneurons, respectively. We mainly observed that narrow-spike neurons showed higher firing rates but less bursty discharges than did broad-spike neurons. Both narrow-spike and broad-spike cells selectively responded to the stimulus, reward and their interaction, and the proportions of each type of selective neurons were similar between the two cell classes. Moreover, the two types of cells displayed equal reliability of reward or stimulus discrimination. Furthermore, we found that broad-spike and narrow-spike cells showed distinct mechanisms for encoding reward or stimulus information. Broad-spike neurons raised their firing rate relative to the baseline rate to represent the preferred reward or stimulus information, whereas narrow-spike neurons inhibited their firing rate lower than the baseline rate to encode the non-preferred reward or stimulus information. Our results suggest that narrow-spike and broad-spike cells were equally involved in reward and stimulus processing in the prefrontal cortex. They utilized a binary strategy to complementarily represent reward or stimulus information, which was consistent with the task structure in which the monkeys were required to remember two reward conditions and two visual stimuli.
Bio-robots automatic navigation with electrical reward stimulation.
Sun, Chao; Zhang, Xinlu; Zheng, Nenggan; Chen, Weidong; Zheng, Xiaoxiang
2012-01-01
Bio-robots that controlled by outer stimulation through brain computer interface (BCI) suffer from the dependence on realtime guidance of human operators. Current automatic navigation methods for bio-robots focus on the controlling rules to force animals to obey man-made commands, with animals' intelligence ignored. This paper proposes a new method to realize the automatic navigation for bio-robots with electrical micro-stimulation as real-time rewards. Due to the reward-seeking instinct and trial-and-error capability, bio-robot can be steered to keep walking along the right route with rewards and correct its direction spontaneously when rewards are deprived. In navigation experiments, rat-robots learn the controlling methods in short time. The results show that our method simplifies the controlling logic and realizes the automatic navigation for rat-robots successfully. Our work might have significant implication for the further development of bio-robots with hybrid intelligence.
Appelhans, Bradley M.; Woolf, Kathleen; Pagoto, Sherry L.; Schneider, Kristin L.; Whited, Matthew C.; Liebman, Rebecca
2012-01-01
Overeating is believed to result when the appetitive motivation to consume palatable food exceeds an individual’s capacity for inhibitory control of eating. This hypothesis was supported in recent studies involving predominantly normal weight women, but has not been tested in obese populations. The current study tested the interaction between food reward sensitivity and inhibitory control in predicting palatable food intake among energy-replete overweight and obese women (N=62). Sensitivity to palatable food reward was measured with the Power of Food Scale. Inhibitory control was assessed with a computerized choice task that captures the tendency to discount large delayed rewards relative to smaller immediate rewards. Participants completed an eating in the absence of hunger protocol in which homeostatic energy needs were eliminated with a bland preload of plain oatmeal, followed by a bogus laboratory taste test of palatable and bland snacks. The interaction between food reward sensitivity and inhibitory control was a significant predictor of palatable food intake in regression analyses controlling for body mass index and the amount of preload consumed. Probing this interaction indicated that higher food reward sensitivity predicted greater palatable food intake at low levels of inhibitory control, but was not associated with intake at high levels of inhibitory control. As expected, no associations were found in a similar regression analysis predicting intake of bland foods. Findings support a neurobehavioral model of eating behavior in which sensitivity to palatable food reward drives overeating only when accompanied by insufficient inhibitory control. Strengthening inhibitory control could enhance weight management programs. PMID:21475139
Reward Sensitivity and Waiting Impulsivity: Shift towards Reward Valuation away from Action Control
Mechelmans, Daisy J; Strelchuk, Daniela; Doñamayor, Nuria; Banca, Paula; Robbins, Trevor W; Baek, Kwangyeol
2017-01-01
Abstract Background Impulsivity and reward expectancy are commonly interrelated. Waiting impulsivity, measured using the rodent 5-Choice Serial Reaction Time task, predicts compulsive cocaine seeking and sign (or cue) tracking. Here, we assess human waiting impulsivity using a novel translational task, the 4-Choice Serial Reaction Time task, and the relationship with reward cues. Methods Healthy volunteers (n=29) performed the monetary incentive delay task as a functional MRI study where subjects observe a cue predicting reward (cue) and wait to respond for high (£5), low (£1), or no reward. Waiting impulsivity was tested with the 4-Choice Serial Reaction Time task. Results For high reward prospects (£5, no reward), greater waiting impulsivity on the 4-CSRT correlated with greater medial orbitofrontal cortex and lower supplementary motor area activity to cues. In response to high reward cues, greater waiting impulsivity was associated with greater subthalamic nucleus connectivity with orbitofrontal cortex and greater subgenual cingulate connectivity with anterior insula, but decreased connectivity with regions implicated in action selection and preparation. Conclusion These findings highlight a shift towards regions implicated in reward valuation and a shift towards compulsivity away from higher level motor preparation and action selection and response. We highlight the role of reward sensitivity and impulsivity, mechanisms potentially linking human waiting impulsivity with incentive approach and compulsivity, theories highly relevant to disorders of addiction. PMID:29020291
Dopaminergic Modulation of Decision Making and Subjective Well-Being.
Rutledge, Robb B; Skandali, Nikolina; Dayan, Peter; Dolan, Raymond J
2015-07-08
The neuromodulator dopamine has a well established role in reporting appetitive prediction errors that are widely considered in terms of learning. However, across a wide variety of contexts, both phasic and tonic aspects of dopamine are likely to exert more immediate effects that have been less well characterized. Of particular interest is dopamine's influence on economic risk taking and on subjective well-being, a quantity known to be substantially affected by prediction errors resulting from the outcomes of risky choices. By boosting dopamine levels using levodopa (l-DOPA) as human subjects made economic decisions and repeatedly reported their momentary happiness, we show here an effect on both choices and happiness. Boosting dopamine levels increased the number of risky options chosen in trials involving potential gains but not trials involving potential losses. This effect could be better captured as increased Pavlovian approach in an approach-avoidance decision model than as a change in risk preferences within an established prospect theory model. Boosting dopamine also increased happiness resulting from some rewards. Our findings thus identify specific novel influences of dopamine on decision making and emotion that are distinct from its established role in learning. Copyright © 2015 Rutledge et al.
Dopaminergic Modulation of Decision Making and Subjective Well-Being
Skandali, Nikolina; Dayan, Peter; Dolan, Raymond J.
2015-01-01
The neuromodulator dopamine has a well established role in reporting appetitive prediction errors that are widely considered in terms of learning. However, across a wide variety of contexts, both phasic and tonic aspects of dopamine are likely to exert more immediate effects that have been less well characterized. Of particular interest is dopamine's influence on economic risk taking and on subjective well-being, a quantity known to be substantially affected by prediction errors resulting from the outcomes of risky choices. By boosting dopamine levels using levodopa (l-DOPA) as human subjects made economic decisions and repeatedly reported their momentary happiness, we show here an effect on both choices and happiness. Boosting dopamine levels increased the number of risky options chosen in trials involving potential gains but not trials involving potential losses. This effect could be better captured as increased Pavlovian approach in an approach–avoidance decision model than as a change in risk preferences within an established prospect theory model. Boosting dopamine also increased happiness resulting from some rewards. Our findings thus identify specific novel influences of dopamine on decision making and emotion that are distinct from its established role in learning. PMID:26156984
A balance of activity in brain control and reward systems predicts self-regulatory outcomes
Chen, Pin-Hao A.; Huckins, Jeremy F.; Hofmann, Wilhelm; Kelley, William M.; Heatherton, Todd F.
2017-01-01
Abstract Previous neuroimaging work has shown that increased reward-related activity following exposure to food cues is predictive of self-control failure. The balance model suggests that self-regulation failures result from an imbalance in reward and executive control mechanisms. However, an open question is whether the relative balance of activity in brain systems associated with executive control (vs reward) supports self-regulatory outcomes when people encounter tempting cues in daily life. Sixty-nine chronic dieters, a population known for frequent lapses in self-control, completed a food cue-reactivity task during an fMRI scanning session, followed by a weeklong sampling of daily eating behaviors via ecological momentary assessment. We related participants’ food cue activity in brain systems associated with executive control and reward to real-world eating patterns. Specifically, a balance score representing the amount of activity in brain regions associated with self-regulatory control, relative to automatic reward-related activity, predicted dieters’ control over their eating behavior during the following week. This balance measure may reflect individual self-control capacity and be useful for examining self-regulation success in other domains and populations. PMID:28158874
A balance of activity in brain control and reward systems predicts self-regulatory outcomes.
Lopez, Richard B; Chen, Pin-Hao A; Huckins, Jeremy F; Hofmann, Wilhelm; Kelley, William M; Heatherton, Todd F
2017-05-01
Previous neuroimaging work has shown that increased reward-related activity following exposure to food cues is predictive of self-control failure. The balance model suggests that self-regulation failures result from an imbalance in reward and executive control mechanisms. However, an open question is whether the relative balance of activity in brain systems associated with executive control (vs reward) supports self-regulatory outcomes when people encounter tempting cues in daily life. Sixty-nine chronic dieters, a population known for frequent lapses in self-control, completed a food cue-reactivity task during an fMRI scanning session, followed by a weeklong sampling of daily eating behaviors via ecological momentary assessment. We related participants' food cue activity in brain systems associated with executive control and reward to real-world eating patterns. Specifically, a balance score representing the amount of activity in brain regions associated with self-regulatory control, relative to automatic reward-related activity, predicted dieters' control over their eating behavior during the following week. This balance measure may reflect individual self-control capacity and be useful for examining self-regulation success in other domains and populations. © The Author (2017). Published by Oxford University Press.
Graziane, Nicholas M; Neumann, Peter A; Dong, Yan
2018-01-01
The lateral habenula (LHb) regulates reward learning and controls the updating of reward-related information. Drugs of abuse have the capacity to hijack the cellular and neurocircuit mechanisms mediating reward learning, forming non-adaptable, compulsive behaviors geared toward obtaining illicit substances. Here, we discuss current findings demonstrating how drugs of abuse alter intrinsic and synaptic LHb neuronal function. Additionally, we discuss evidence for how drug-induced LHb alterations may affect the ability to predict reward, potentially facilitating an addiction-like state. Altogether, we combine ex vivo and in vivo results for an overview of how drugs of abuse alter LHb function and how these functional alterations affect the ability to learn and update behavioral responses to hedonic external stimuli.
Sandra, Dasha A; Otto, A Ross
2018-03-01
While psychological, economic, and neuroscientific accounts of behavior broadly maintain that people minimize expenditure of cognitive effort, empirical work reveals how reward incentives can mobilize increased cognitive effort expenditure. Recent theories posit that the decision to expend effort is governed, in part, by a cost-benefit tradeoff whereby the potential benefits of mental effort can offset the perceived costs of effort exertion. Taking an individual differences approach, the present study examined whether one's executive function capacity, as measured by Stroop interference, predicts the extent to which reward incentives reduce switch costs in a task-switching paradigm, which indexes additional expenditure of cognitive effort. In accordance with the predictions of a cost-benefit account of effort, we found that a low executive function capacity-and, relatedly, a low intrinsic motivation to expend effort (measured by Need for Cognition)-predicted larger increase in cognitive effort expenditure in response to monetary reward incentives, while individuals with greater executive function capacity-and greater intrinsic motivation to expend effort-were less responsive to reward incentives. These findings suggest that an individual's cost-benefit tradeoff is constrained by the perceived costs of exerting cognitive effort. Copyright © 2017 Elsevier B.V. All rights reserved.
Anticipatory pleasure predicts motivation for reward in major depression.
Sherdell, Lindsey; Waugh, Christian E; Gotlib, Ian H
2012-02-01
Anhedonia, the lack of interest or pleasure in response to hedonic stimuli or experiences, is a cardinal symptom of depression. This deficit in hedonic processing has been posited to influence depressed individuals' motivation to engage in potentially rewarding experiences. Accumulating evidence indicates that hedonic processing is not a unitary construct but rather consists of an anticipatory and a consummatory phase. We examined how these components of hedonic processing influence motivation to obtain reward in participants diagnosed with major depression and in never-disordered controls. Thirty-eight currently depressed and 30 never-disordered control participants rated their liking of humorous and nonhumorous cartoons and then made a series of choices between viewing a cartoon from either group. Each choice was associated with a specified amount of effort participants would have to exert before viewing the chosen cartoon. Although depressed and control participants did not differ in their consummatory liking of the rewards, levels of reward liking predicted motivation to expend effort for the rewards only in the control participants; in the depressed participants, liking and motivation were dissociated. In the depressed group, levels of anticipatory anhedonia predicted motivation to exert effort for the rewards. These findings support the formulation that anhedonia is not a unitary construct and suggest that, for depressed individuals, deficits in motivation for reward are driven primarily by low anticipatory pleasure and not by decreased consummatory liking. PsycINFO Database Record (c) 2012 APA, all rights reserved.
View Estimation Based on Value System
NASA Astrophysics Data System (ADS)
Takahashi, Yasutake; Shimada, Kouki; Asada, Minoru
Estimation of a caregiver's view is one of the most important capabilities for a child to understand the behavior demonstrated by the caregiver, that is, to infer the intention of behavior and/or to learn the observed behavior efficiently. We hypothesize that the child develops this ability in the same way as behavior learning motivated by an intrinsic reward, that is, he/she updates the model of the estimated view of his/her own during the behavior imitated from the observation of the behavior demonstrated by the caregiver based on minimizing the estimation error of the reward during the behavior. From this view, this paper shows a method for acquiring such a capability based on a value system from which values can be obtained by reinforcement learning. The parameters of the view estimation are updated based on the temporal difference error (hereafter TD error: estimation error of the state value), analogous to the way such that the parameters of the state value of the behavior are updated based on the TD error. Experiments with simple humanoid robots show the validity of the method, and the developmental process parallel to young children's estimation of its own view during the imitation of the observed behavior of the caregiver is discussed.
Dysfunctional insular connectivity during reward prediction in patients with first-episode psychosis
Schmidt, André; Palaniyappan, Lena; Smieskova, Renata; Simon, Andor; Riecher-Rössler, Anita; Lang, Undine E.; Fusar-Poli, Paolo; McGuire, Philip; Borgwardt, Stefan J.
2016-01-01
Background Increasing evidence indicates that psychosis is associated with abnormal reward processing. Imaging studies in patients with first-episode psychosis (FEP) have revealed reduced activity in diverse brain regions, including the ventral striatum, insula and anterior cingulate cortex (ACC), during reward prediction. However, whether these reductions in local brain activity are due to altered connectivity has rarely been explored. Methods We applied dynamic causal modelling and Bayesian model selection to fMRI data during the Salience Attribution Task to investigate whether patients with FEP showed abnormal modulation of connectivity between the ventral striatum, insula and ACC induced by rewarding cues and whether these changes were related to positive psychotic symptoms and atypical antipsychotic medication. Results The model including reward-induced modulation of insula–ACC connectivity was the best fitting model in each group. Compared with healthy controls (n = 19), patients with FEP (n = 29) revealed reduced right insula–ACC connectivity. After subdividing patients according to current antipsychotic medication, we found that the reduced insula–ACC connectivity relative to healthy controls was observed only in untreated patients (n = 17), not in patients treated with antipsychotics (n = 12), and that it correlated negatively with unusual thought content in untreated patients with FEP. Limitations The modest sample size of untreated patients with FEP was a limitation of our study. Conclusion This study indicates that insula–ACC connectivity during reward prediction is reduced in untreated patients with FEP and related to the formation of positive psychotic symptoms. Our study further suggests that atypical antipsychotics may reverse connectivity between the insula and the ACC during reward prediction. PMID:26854756
Reward-related attentional bias and adolescent substance use: a prognostic relationship?
van Hemel-Ruiter, Madelon E; de Jong, Peter J; Ostafin, Brian D; Oldehinkel, Albertine J
2015-01-01
Current cognitive-motivational addiction theories propose that prioritizing appetitive, reward-related information (attentional bias) plays a vital role in substance abuse behavior. Previous cross-sectional research has shown that adolescent substance use is related to reward-related attentional biases. The present study was designed to extend these findings by testing whether these reward biases have predictive value for adolescent substance use at three-year follow-up. Participants (N = 657, mean age = 16.2 yrs at baseline) were a sub-sample of Tracking Adolescents' Individual Lives Survey (TRAILS), a large longitudinal community cohort study. We used a spatial orienting task as a behavioral index of appetitive-related attentional processes at baseline and a substance use questionnaire at both baseline and three years follow-up. Bivariate correlational analyses showed that enhanced attentional engagement with cues that predicted potential reward and nonpunishment was positively associated with substance use (alcohol, tobacco, and cannabis) three years later. However, reward bias was not predictive of changes in substance use. A post-hoc analysis in a selection of adolescents who started using illicit drugs (other than cannabis) in the follow-up period demonstrated that stronger baseline attentional engagement toward cues of nonpunishment was related to a higher level of illicit drug use three years later. The finding that reward bias was not predictive for the increase in substance use in adolescents who already started using substances at baseline, but did show prognostic value in adolescents who initiated drug use in between baseline and follow-up suggests that appetitive bias might be especially important in the initiation stages of adolescent substance use.
Reward-Related Attentional Bias and Adolescent Substance Use: A Prognostic Relationship?
van Hemel-Ruiter, Madelon E.; de Jong, Peter J.; Ostafin, Brian D.; Oldehinkel, Albertine J.
2015-01-01
Current cognitive-motivational addiction theories propose that prioritizing appetitive, reward-related information (attentional bias) plays a vital role in substance abuse behavior. Previous cross-sectional research has shown that adolescent substance use is related to reward-related attentional biases. The present study was designed to extend these findings by testing whether these reward biases have predictive value for adolescent substance use at three-year follow-up. Participants (N = 657, mean age = 16.2 yrs at baseline) were a sub-sample of Tracking Adolescents’ Individual Lives Survey (TRAILS), a large longitudinal community cohort study. We used a spatial orienting task as a behavioral index of appetitive-related attentional processes at baseline and a substance use questionnaire at both baseline and three years follow-up. Bivariate correlational analyses showed that enhanced attentional engagement with cues that predicted potential reward and nonpunishment was positively associated with substance use (alcohol, tobacco, and cannabis) three years later. However, reward bias was not predictive of changes in substance use. A post-hoc analysis in a selection of adolescents who started using illicit drugs (other than cannabis) in the follow-up period demonstrated that stronger baseline attentional engagement toward cues of nonpunishment was related to a higher level of illicit drug use three years later. The finding that reward bias was not predictive for the increase in substance use in adolescents who already started using substances at baseline, but did show prognostic value in adolescents who initiated drug use in between baseline and follow-up suggests that appetitive bias might be especially important in the initiation stages of adolescent substance use. PMID:25816295
The free-energy principle: a unified brain theory?
Friston, Karl
2010-02-01
A free-energy principle has been proposed recently that accounts for action, perception and learning. This Review looks at some key brain theories in the biological (for example, neural Darwinism) and physical (for example, information theory and optimal control theory) sciences from the free-energy perspective. Crucially, one key theme runs through each of these theories - optimization. Furthermore, if we look closely at what is optimized, the same quantity keeps emerging, namely value (expected reward, expected utility) or its complement, surprise (prediction error, expected cost). This is the quantity that is optimized under the free-energy principle, which suggests that several global brain theories might be unified within a free-energy framework.
Pupil dilation signals uncertainty and surprise in a learning gambling task.
Lavín, Claudio; San Martín, René; Rosales Jubal, Eduardo
2013-01-01
Pupil dilation under constant illumination is a physiological marker where modulation is related to several cognitive functions involved in daily decision making. There is evidence for a role of pupil dilation change during decision-making tasks associated with uncertainty, reward-prediction errors and surprise. However, while some work suggests that pupil dilation is mainly modulated by reward predictions, others point out that this marker is related to uncertainty signaling and surprise. Supporting the latter hypothesis, the neural substrate of this marker is related to noradrenaline (NA) activity which has been also related to uncertainty signaling. In this work we aimed to test whether pupil dilation is a marker for uncertainty and surprise in a learning task. We recorded pupil dilation responses in 10 participants performing the Iowa Gambling Task (IGT), a decision-making task that requires learning and constant monitoring of outcomes' feedback, which are important variables within the traditional study of human decision making. Results showed that pupil dilation changes were modulated by learned uncertainty and surprise regardless of feedback magnitudes. Interestingly, greater pupil dilation changes were found during positive feedback (PF) presentation when there was lower uncertainty about a future negative feedback (NF); and by surprise during NF presentation. These results support the hypothesis that pupil dilation is a marker of learned uncertainty, and may be used as a marker of NA activity facing unfamiliar situations in humans.
Pupil dilation signals uncertainty and surprise in a learning gambling task
Lavín, Claudio; San Martín, René; Rosales Jubal, Eduardo
2014-01-01
Pupil dilation under constant illumination is a physiological marker where modulation is related to several cognitive functions involved in daily decision making. There is evidence for a role of pupil dilation change during decision-making tasks associated with uncertainty, reward-prediction errors and surprise. However, while some work suggests that pupil dilation is mainly modulated by reward predictions, others point out that this marker is related to uncertainty signaling and surprise. Supporting the latter hypothesis, the neural substrate of this marker is related to noradrenaline (NA) activity which has been also related to uncertainty signaling. In this work we aimed to test whether pupil dilation is a marker for uncertainty and surprise in a learning task. We recorded pupil dilation responses in 10 participants performing the Iowa Gambling Task (IGT), a decision-making task that requires learning and constant monitoring of outcomes’ feedback, which are important variables within the traditional study of human decision making. Results showed that pupil dilation changes were modulated by learned uncertainty and surprise regardless of feedback magnitudes. Interestingly, greater pupil dilation changes were found during positive feedback (PF) presentation when there was lower uncertainty about a future negative feedback (NF); and by surprise during NF presentation. These results support the hypothesis that pupil dilation is a marker of learned uncertainty, and may be used as a marker of NA activity facing unfamiliar situations in humans. PMID:24427126
Wu, Howard G; Miyamoto, Yohsuke R; Gonzalez Castro, Luis Nicolas; Ölveczky, Bence P; Smith, Maurice A
2014-02-01
Individual differences in motor learning ability are widely acknowledged, yet little is known about the factors that underlie them. Here we explore whether movement-to-movement variability in motor output, a ubiquitous if often unwanted characteristic of motor performance, predicts motor learning ability. Surprisingly, we found that higher levels of task-relevant motor variability predicted faster learning both across individuals and across tasks in two different paradigms, one relying on reward-based learning to shape specific arm movement trajectories and the other relying on error-based learning to adapt movements in novel physical environments. We proceeded to show that training can reshape the temporal structure of motor variability, aligning it with the trained task to improve learning. These results provide experimental support for the importance of action exploration, a key idea from reinforcement learning theory, showing that motor variability facilitates motor learning in humans and that our nervous systems actively regulate it to improve learning.
Temporal structure of motor variability is dynamically regulated and predicts motor learning ability
Wu, Howard G; Miyamoto, Yohsuke R; Castro, Luis Nicolas Gonzalez; Ölveczky, Bence P; Smith, Maurice A
2015-01-01
Individual differences in motor learning ability are widely acknowledged, yet little is known about the factors that underlie them. Here we explore whether movement-to-movement variability in motor output, a ubiquitous if often unwanted characteristic of motor performance, predicts motor learning ability. Surprisingly, we found that higher levels of task-relevant motor variability predicted faster learning both across individuals and across tasks in two different paradigms, one relying on reward-based learning to shape specific arm movement trajectories and the other relying on error-based learning to adapt movements in novel physical environments. We proceeded to show that training can reshape the temporal structure of motor variability, aligning it with the trained task to improve learning. These results provide experimental support for the importance of action exploration, a key idea from reinforcement learning theory, showing that motor variability facilitates motor learning in humans and that our nervous systems actively regulate it to improve learning. PMID:24413700
Sweitzer, Maggie M.; Geier, Charles F.; Denlinger, Rachel; Forbes, Erika E.; Raiff, Bethany R.; Dallery, Jesse; McClernon, F.J.; Donny, Eric C.
2017-01-01
Rationale Tobacco smoking is associated with dysregulated reward processing within the striatum, characterized by hypersensitivity to smoking rewards and hyposensitivity to non-smoking rewards. This bias toward smoking reward at the expense of alternative rewards is further exacerbated by deprivation from smoking, which may contribute to difficulty maintaining abstinence during a quit attempt. Objective We examined whether abstinence-induced changes in striatal processing of rewards predicted lapse likelihood during a quit attempt supported by contingency management (CM), in which abstinence from smoking was reinforced with money. Methods Thirty-six non-treatment seeking smokers participated in two fMRI sessions, one following 24-hr abstinence and one following smoking as usual. During each scan, participants completed a rewarded guessing task designed to elicit striatal activation in which they could earn smoking and monetary rewards delivered after the scan. Participants then engaged in a 3-week CM-supported quit attempt. Results As previously reported, 24-hr abstinence was associated with increased striatal activation in anticipation of smoking reward and decreased activation in anticipation of monetary reward. Individuals exhibiting greater decrements in right striatal activation to monetary reward during abstinence (controlling for activation during non-abstinence) were more likely to lapse during CM (p<.05), even when controlling for other predictors of lapse outcome (e.g., craving); no association was seen for smoking reward. Conclusions These results are consistent with a growing number of studies indicating the specific importance of disrupted striatal processing of non-drug reward in nicotine dependence, and highlight the importance of individual differences in abstinence-induced deficits in striatal function for smoking cessation. PMID:26660448
Sweitzer, Maggie M; Geier, Charles F; Denlinger, Rachel; Forbes, Erika E; Raiff, Bethany R; Dallery, Jesse; McClernon, F J; Donny, Eric C
2016-03-01
Tobacco smoking is associated with dysregulated reward processing within the striatum, characterized by hypersensitivity to smoking rewards and hyposensitivity to non-smoking rewards. This bias toward smoking reward at the expense of alternative rewards is further exacerbated by deprivation from smoking, which may contribute to difficulty maintaining abstinence during a quit attempt. We examined whether abstinence-induced changes in striatal processing of rewards predicted lapse likelihood during a quit attempt supported by contingency management (CM), in which abstinence from smoking was reinforced with money. Thirty-six non-treatment-seeking smokers participated in two functional MRI (fMRI) sessions, one following 24-h abstinence and one following smoking as usual. During each scan, participants completed a rewarded guessing task designed to elicit striatal activation in which they could earn smoking and monetary rewards delivered after the scan. Participants then engaged in a 3-week CM-supported quit attempt. As previously reported, 24-h abstinence was associated with increased striatal activation in anticipation of smoking reward and decreased activation in anticipation of monetary reward. Individuals exhibiting greater decrements in right striatal activation to monetary reward during abstinence (controlling for activation during non-abstinence) were more likely to lapse during CM (p < 0.025), even when controlling for other predictors of lapse outcome (e.g., craving); no association was seen for smoking reward. These results are consistent with a growing number of studies indicating the specific importance of disrupted striatal processing of non-drug reward in nicotine dependence and highlight the importance of individual differences in abstinence-induced deficits in striatal function for smoking cessation.
Murty, Vishnu P; Adcock, R Alison
2014-08-01
Learning how to obtain rewards requires learning about their contexts and likely causes. How do long-term memory mechanisms balance the need to represent potential determinants of reward outcomes with the computational burden of an over-inclusive memory? One solution would be to enhance memory for salient events that occur during reward anticipation, because all such events are potential determinants of reward. We tested whether reward motivation enhances encoding of salient events like expectancy violations. During functional magnetic resonance imaging, participants performed a reaction-time task in which goal-irrelevant expectancy violations were encountered during states of high- or low-reward motivation. Motivation amplified hippocampal activation to and declarative memory for expectancy violations. Connectivity of the ventral tegmental area (VTA) with medial prefrontal, ventrolateral prefrontal, and visual cortices preceded and predicted this increase in hippocampal sensitivity. These findings elucidate a novel mechanism whereby reward motivation can enhance hippocampus-dependent memory: anticipatory VTA-cortical-hippocampal interactions. Further, the findings integrate literatures on dopaminergic neuromodulation of prefrontal function and hippocampus-dependent memory. We conclude that during reward motivation, VTA modulation induces distributed neural changes that amplify hippocampal signals and records of expectancy violations to improve predictions-a potentially unique contribution of the hippocampus to reward learning. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Shaping Attention with Reward: Effects of Reward on Space- and Object-Based Selection
Shomstein, Sarah; Johnson, Jacoba
2014-01-01
The contribution of rewarded actions to automatic attentional selection remains obscure. We hypothesized that some forms of automatic orienting, such as object-based selection, can be completely abandoned in lieu of reward maximizing strategy. While presenting identical visual stimuli to the observer, in a set of two experiments, we manipulate what is being rewarded (different object targets or random object locations) and the type of reward received (money or points). It was observed that reward alone guides attentional selection, entirely predicting behavior. These results suggest that guidance of selective attention, while automatic, is flexible and can be adjusted in accordance with external non-sensory reward-based factors. PMID:24121412
Development of a Self-Report Measure of Reward Sensitivity:A Test in Current and Former Smokers.
Hughes, John R; Callas, Peter W; Priest, Jeff S; Etter, Jean-Francois; Budney, Alan J; Sigmon, Stacey C
2017-06-01
Tobacco use or abstinence may increase or decrease reward sensitivity. Most existing measures of reward sensitivity were developed decades ago, and few have undergone extensive psychometric testing. We developed a 58-item survey of the anticipated enjoyment from, wanting for, and frequency of common rewards (the Rewarding Events Inventory-REI). The current analysis focuses on ratings of anticipated enjoyment. The first validation study recruited current and former smokers from Internet sites. The second study recruited smokers who wished to quit and monetarily reinforced them to stay abstinent in a laboratory study and a comparison group of former smokers. In both studies, participants completed the inventory on two occasions, 3-7 days apart. They also completed four anhedonia scales and a behavioral test of reduced reward sensitivity. Half of the enjoyment ratings loaded on four factors: socializing, active hobbies, passive hobbies, and sex/drug use. Cronbach's alpha coefficients were all ≥0.73 for overall mean and factor scores. Test-retest correlations were all ≥0.83. Correlations of the overall and factor scores with frequency of rewards and anhedonia scales were 0.19-0.53, except for the sex/drugs factor. The scores did not correlate with behavioral tests of reward and did not differ between current and former smokers. Lower overall mean enjoyment score predicted a shorter time to relapse. Internal reliability and test-retest reliability of the enjoyment outcomes of the REI are excellent, and construct and predictive validity are modest but promising. The REI is comprehensive and up-to-date, yet is short enough to use on repeated occasions. Replication tests, especially predictive validity tests, are needed. Both use of and abstinence from nicotine appear to increase or decrease how rewarding nondrug rewards are; however, self-report scales to test this have limitations. Our inventory of enjoyment from 58 rewards appears to be reliable and valid as well as comprehensive and up-to-date, yet is short enough to use on repeated occasions. Replication tests, especially of the predictive validity of our scale, are needed. © The Author 2017. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Do motivational incentives reduce the inhibition deficit in ADHD?
Shanahan, Michelle A; Pennington, Bruce F; Willcutt, Erik W
2008-01-01
The primary goal of this study was to test three competing theories of ADHD: the inhibition theory, the motivational theory, and a dual deficit theory. Previous studies have produced conflicting findings about the effects of incentives on executive processes in ADHD. In the present study of 25 children with ADHD and 30 typically developing controls, motivation was manipulated within the Stop Task. Stop signal reaction time was examined, as well as reaction time, its variability, and the number of errors in the primary choice reaction time task. Overall, the pattern of results supported the inhibition theory over the motivational or dual deficit hypotheses, as main effects of group were found for most key variables (ADHD group was worse), whereas the group by reward interaction predicted by the motivational and dual deficit accounts was not found. Hence, as predicted by the inhibition theory, children with ADHD performed worse than controls irrespective of incentives.
Reward Modulates Adaptations to Conflict
ERIC Educational Resources Information Center
Braem, Senne; Verguts, Tom; Roggeman, Chantal; Notebaert, Wim
2012-01-01
Both cognitive conflict (e.g. Verguts & Notebaert, 2009) and reward signals (e.g. Waszak & Pholulamdeth, 2009) have been proposed to enhance task-relevant associations. Bringing these two notions together, we predicted that reward modulates conflict-based sequential adaptations in cognitive control. This was tested combining either a single…
Chester, David S; DeWall, C Nathan; Derefinko, Karen J; Estus, Steven; Lynam, Donald R; Peters, Jessica R; Jiang, Yang
2016-10-01
Individuals with genotypes that code for reduced dopaminergic brain activity often exhibit a predisposition toward aggression. However, it remains largely unknown how dopaminergic genotypes may increase aggression. Lower-functioning dopamine systems motivate individuals to seek reward from external sources such as illicit drugs and other risky experiences. Based on emerging evidence that aggression is a rewarding experience, we predicted that the effect of lower-functioning dopaminergic functioning on aggression would be mediated by tendencies to seek the environment for rewards. Caucasian female and male undergraduates (N = 277) were genotyped for five polymorphisms of the dopamine D2 receptor (DRD2) gene; they reported their previous history of aggression and their dispositional reward-seeking. Lower-functioning DRD2 profiles were associated with greater sensation-seeking, which then predicted greater aggression. Our findings suggest that lower-functioning dopaminergic activity puts individuals at risk for violence because it motivates them to experience aggression's hedonically rewarding qualities.
Learning shapes the aversion and reward responses of lateral habenula neurons
Wang, Daqing; Li, Yi; Feng, Qiru; Guo, Qingchun; Zhou, Jingfeng; Luo, Minmin
2017-01-01
The lateral habenula (LHb) is believed to encode negative motivational values. It remains unknown how LHb neurons respond to various stressors and how learning shapes their responses. Here, we used fiber-photometry and electrophysiology to track LHb neuronal activity in freely-behaving mice. Bitterness, pain, and social attack by aggressors intensively excite LHb neurons. Aversive Pavlovian conditioning induced activation by the aversion-predicting cue in a few trials. The experience of social defeat also conditioned excitatory responses to previously neutral social stimuli. In contrast, fiber photometry and single-unit recordings revealed that sucrose reward inhibited LHb neurons and often produced excitatory rebound. It required prolonged conditioning and high reward probability to induce inhibition by reward-predicting cues. Therefore, LHb neurons can bidirectionally process a diverse array of aversive and reward signals. Importantly, their responses are dynamically shaped by learning, suggesting that the LHb participates in experience-dependent selection of behavioral responses to stressors and rewards. DOI: http://dx.doi.org/10.7554/eLife.23045.001 PMID:28561735
Cross-national prevalence and cultural correlates of bipolar I disorder.
Johnson, Kaja R; Johnson, Sheri L
2014-07-01
Bipolar disorder has been consistently related to heightened sensitivity to reward. Greater reward sensitivity predicts the onset of disorder, a more severe course, and conversion from milder to severe forms. No studies consider whether cultural factors related to reward sensitivity influence the course of bipolar disorder. This study examines the relationship of reward-relevant cultural values to global prevalence rates of bipolar I disorder. Lifetime prevalence of bipolar I disorder for 17 countries was drawn from epidemiological studies that used structured diagnostic interviews of large community samples. Bivariate correlations were used to assess the relationship of bipolar disorder prevalence with national scores on four reward-relevant cultural dimensions (Power Distance, Individualism, Long-Term Orientation, and Performance Orientation). The prevalence of bipolar I disorder was correlated in the predicted manner with Power Distance and Individualism, and with Long-Term Orientation and Performance Orientation after outliers were removed. Findings provide evidence for a cultural model of reward sensitivity in bipolar disorder.
Telzer, Eva H; Fuligni, Andrew J; Lieberman, Matthew D; Galván, Adriana
2013-01-01
Adolescence is a period of intensified emotions and an increase in motivated behaviors and passions. Evidence from developmental neuroscience suggests that this heightened emotionality occurs, in part, due to a peak in functional reactivity to rewarding stimuli, which renders adolescents more oriented toward reward-seeking behaviors. Most prior work has focused on how reward sensitivity may create vulnerabilities, leading to increases in risk taking. Here, we test whether heightened reward sensitivity may potentially be an asset for adolescents when engaged in prosocial activities. Thirty-two adolescents were followed over a one-year period to examine whether ventral striatum activation to prosocial rewards predicts decreases in risk taking over a year. Results show that heightened ventral striatum activation to prosocial stimuli relates to longitudinal declines in risk taking. Therefore, the very same neural region that has conferred vulnerability for adolescent risk taking may also be protective against risk taking. Copyright © 2012 Elsevier Ltd. All rights reserved.
Invigoration of reward-seeking by cue and proximity encoding in the nucleus accumbens
McGinty, Vincent B.; Lardeux, Sylvie; Taha, Sharif A.; Kim, James J.; Nicola, Saleem M.
2014-01-01
Summary A key function of the nucleus accumbens is to promote vigorous reward-seeking, but the corresponding neural mechanism has not been identified despite many years of research. Here we study cued flexible approach behavior, a form of reward-seeking that strongly depends on the accumbens, and we describe a robust, single-cell neural correlate of behavioral vigor in the excitatory response of accumbens neurons to reward-predictive cues. Well before locomotion begins, this cue-evoked excitation predicts both the movement initiation latency and speed of subsequent flexible approach responses, but not of stereotyped, inflexible responses. Moreover, the excitation simultaneously signals the subject’s proximity to the approach target, a signal that appears to mediate greater response vigor on trials that begin with the subject closer to the target. These results demonstrate a neural mechanism for response invigoration whereby accumbens neuronal encoding of reward availability and target proximity together drive the onset and speed of reward-seeking locomotion. PMID:23764290
ERIC Educational Resources Information Center
Schilling, Deanna E.
The overjustification hypothesis predicts decreased intrinsic motivation when persons are paid to perform an interesting task. The factors of reward experience, socioeconomic status (SES), and sex are examined while testing conflicting predictions of the hypothesis and reinforcement theory. Children from grade 1 at two public elementary schools…
Venturella, Irene; Finocchiaro, Roberta
2017-01-01
The present research explored rewarding bias and attentional deficits in Internet addiction (IA) based on the IAT (Internet Addiction Test) construct, during an attentional inhibitory task (Go/NoGo task). Event-related Potentials (ERPs) effects (Feedback Related Negativity (FRN) and P300) were monitored in concomitance with Behavioral Activation System (BAS) modulation. High-IAT young participants showed specific responses to IA-related cues (videos representing online gambling and videogames) in terms of cognitive performance (decreased Response Times, RTs; and Error Rates, ERs) and ERPs modulation (decreased FRN and increased P300). Consistent reward and attentional biases was adduced to explain the cognitive “gain” effect and the anomalous response in terms of both feedback behavior (FRN) and attentional (P300) mechanisms in high-IAT. In addition, BAS and BAS-Reward subscales measures were correlated with both IAT and ERPs variations. Therefore, high sensitivity to IAT may be considered as a marker of dysfunctional reward processing (reduction of monitoring) and cognitive control (higher attentional values) for specific IA-related cues. More generally, a direct relationship among reward-related behavior, Internet addiction and BAS attitude was suggested. PMID:28704978
Kasties, Nils; Starosta, Sarah; Güntürkün, Onur; Stüttgen, Maik C.
2016-01-01
Animals exploit visual information to identify objects, form stimulus-reward associations, and prepare appropriate behavioral responses. The nidopallium caudolaterale (NCL), an associative region of the avian endbrain, contains neurons exhibiting prominent response modulation during presentation of reward-predicting visual stimuli, but it is unclear whether neural activity represents valuation signals, stimulus properties, or sensorimotor contingencies. To test the hypothesis that NCL neurons represent stimulus value, we subjected pigeons to a Pavlovian sign-tracking paradigm in which visual cues predicted rewards differing in magnitude (large vs. small) and delay to presentation (short vs. long). Subjects’ strength of conditioned responding to visual cues reliably differentiated between predicted reward types and thus indexed valuation. The majority of NCL neurons discriminated between visual cues, with discriminability peaking shortly after stimulus onset and being maintained at lower levels throughout the stimulus presentation period. However, while some cells’ firing rates correlated with reward value, such neurons were not more frequent than expected by chance. Instead, neurons formed discernible clusters which differed in their preferred visual cue. We propose that this activity pattern constitutes a prerequisite for using visual information in more complex situations e.g. requiring value-based choices. PMID:27762287
Intrinsic Work Value-Reward Dissonance and Work Satisfaction during Young Adulthood
Porfeli, Erik J.; Mortimer, Jeylan T.
2010-01-01
Previous research suggests that discrepancies between work values and rewards are indicators of dissonance that induce change in both to reduce such dissonance over time. The present study elaborates this model to suggest parallels with the first phase of the extension- and-strain curve. Small discrepancies or small increases in extension are presumed to be almost unnoticeable, while increasingly large discrepancies are thought to yield exponentially increasing strain. Work satisfaction is a principal outcome of dissonance; hence, work value-reward discrepancies are predicted to diminish work satisfaction in an exponential fashion. Findings from the work and family literature, however, lead to the prediction that this curvilinear association will be moderated by gender and family roles. Using longitudinal data spanning the third decade of life, the results suggest that intrinsic work value-reward discrepancies, as predicted, are increasingly associated, in a negative curvilinear fashion, with work satisfaction. This pattern, however, differs as a function of gender and family roles. Females who established family roles exhibited the expected pattern while other gender by family status groups did not. The results suggest that gender and family roles moderate the association between intrinsic work value-reward dissonance and satisfaction. In addition, women who remained unmarried and childless exhibited the strongest associations between occupational rewards and satisfaction. PMID:20526434
Intrinsic Work Value-Reward Dissonance and Work Satisfaction during Young Adulthood.
Porfeli, Erik J; Mortimer, Jeylan T
2010-06-01
Previous research suggests that discrepancies between work values and rewards are indicators of dissonance that induce change in both to reduce such dissonance over time. The present study elaborates this model to suggest parallels with the first phase of the extension- and-strain curve. Small discrepancies or small increases in extension are presumed to be almost unnoticeable, while increasingly large discrepancies are thought to yield exponentially increasing strain. Work satisfaction is a principal outcome of dissonance; hence, work value-reward discrepancies are predicted to diminish work satisfaction in an exponential fashion. Findings from the work and family literature, however, lead to the prediction that this curvilinear association will be moderated by gender and family roles. Using longitudinal data spanning the third decade of life, the results suggest that intrinsic work value-reward discrepancies, as predicted, are increasingly associated, in a negative curvilinear fashion, with work satisfaction. This pattern, however, differs as a function of gender and family roles. Females who established family roles exhibited the expected pattern while other gender by family status groups did not. The results suggest that gender and family roles moderate the association between intrinsic work value-reward dissonance and satisfaction. In addition, women who remained unmarried and childless exhibited the strongest associations between occupational rewards and satisfaction.
May, Paul J.; McHaffie, John G.; Stanford, Terrence R.; Jiang, Huai; Costello, M. Gabriela; Coizet, Veronique; Hayes, Lauren M.; Haber, Suzanne N.; Redgrave, Peter
2010-01-01
Much of the evidence linking the short-latency phasic signaling of midbrain dopaminergic neurons with reward-prediction errors used in learning and habit formation comes from recording the visual responses of monkey dopaminergic neurons. However, the information encoded by dopaminergic neuron activity is constrained by the qualities of the afferent visual signals made available to these cells. Recent evidence from rats and cats indicates the primary source of this visual input originates subcortically, via a direct tectonigral projection. The present anatomical study sought to establish whether a direct tectonigral projection is a significant feature of the primate brain. Injections of anterograde tracers into the superior colliculus of macaque monkeys labelled terminal arbors throughout the substantia nigra, with the densest terminations in the dorsal tier. Labelled boutons were found in close association (possibly indicative of synaptic contact) with ventral midbrain neurons staining positively for the dopaminergic marker tyrosine hydroxylase. Injections of retrograde tracer confined to the macaque substantia nigra retrogradely labelled small to medium sized neurons in the intermediate and deep layers of the superior colliculus. Together, these data indicate that a direct tectonigral projection is also a feature of the monkey brain, and therefore likely to have been conserved throughout mammalian evolution. Insofar as the superior colliculus is configured to detect unpredicted, biologically salient, sensory events, it may be safer to regard the phasic responses of midbrain dopaminergic neurons as ‘sensory prediction errors’ rather than ‘reward prediction errors’, in which case, dopamine-based theories of reinforcement learning will require revision. PMID:19175405
Inferior frontal cortex activity is modulated by reward sensitivity and performance variability.
Fuentes-Claramonte, Paola; Ávila, César; Rodríguez-Pujadas, Aina; Costumero, Víctor; Ventura-Campos, Noelia; Bustamante, Juan Carlos; Rosell-Negre, Patricia; Barrós-Loscertales, Alfonso
2016-02-01
High reward sensitivity has been linked with motivational and cognitive disorders related with prefrontal and striatal brain function during inhibitory control. However, few studies have analyzed the interaction among reward sensitivity, task performance and neural activity. Participants (N=57) underwent fMRI while performing a Go/No-go task with Frequent-go (77.5%), Infrequent-go (11.25%) and No-go (11.25%) stimuli. Task-associated activity was found in inhibition-related brain regions, with different activity patterns for right and left inferior frontal gyri (IFG): right IFG responded more strongly to No-go stimuli, while left IFG responded similarly to all infrequent stimuli. Reward sensitivity correlated with omission errors in Go trials and reaction time (RT) variability, and with increased activity in right and left IFG for No-go and Infrequent-go stimuli compared with Frequent-go. Bilateral IFG activity was associated with RT variability, with reward sensitivity mediating this association. These results suggest that reward sensitivity modulates behavior and brain function during executive control. Copyright © 2016 Elsevier B.V. All rights reserved.
Deci, E L; Koestner, R; Ryan, R M
1999-11-01
A meta-analysis of 128 studies examined the effects of extrinsic rewards on intrinsic motivation. As predicted, engagement-contingent, completion-contingent, and performance-contingent rewards significantly undermined free-choice intrinsic motivation (d = -0.40, -0.36, and -0.28, respectively), as did all rewards, all tangible rewards, and all expected rewards. Engagement-contingent and completion-contingent rewards also significantly undermined self-reported interest (d = -0.15, and -0.17), as did all tangible rewards and all expected rewards. Positive feedback enhanced both free-choice behavior (d = 0.33) and self-reported interest (d = 0.31). Tangible rewards tended to be more detrimental for children than college students, and verbal rewards tended to be less enhancing for children than college students. The authors review 4 previous meta-analyses of this literature and detail how this study's methods, analyses, and results differed from the previous ones.
DISRUPTION OF CONDITIONED REWARD ASSOCIATION BY TYPICAL AND ATYPICAL ANTIPSYCHOTICS
Danna, C.L.; Elmer, G.I.
2013-01-01
Antipsychotic drugs are broadly classified into typical and atypical compounds; they vary in their pharmacological profile however a common component is their antagonist effects at the D2 dopamine receptors (DRD2). Unfortunately, diminished DRD2 activation is generally thought to be associated with the severity of neuroleptic-induced anhedonia. The purpose of this study was to determine the effect of the atypical antipsychotic olanzapine and typical antipsychotic haloperidol in a paradigm that reflects the learned transfer of incentive motivational properties to previously neutral stimuli, namely autoshaping. In order to provide a dosing comparison to a therapeutically relevant endpoint, both drugs were tested against amphetamine-induced disruption of prepulse inhibition as well. In the autoshaping task, rats were exposed to repeated pairings of stimuli that were differentially predictive of reward delivery. Conditioned approach to the reward predictive cue (sign-tracking) and to the reward (goal-tracking) increased during repeated pairings in the vehicle treated rats. Haloperidol and olanzapine completely abolished this behavior at relatively low doses (100 μg/kg). This same dose was the threshold dose for each drug to antagonize the sensorimotor gating deficits produced by amphetamine. At lower doses (3–30 μg/kg) both drugs produced a dose-dependent decrease in conditioned approach to the reward predictive cue. There was no difference between drugs at this dose range which indicates that olanzapine disrupts autoshaping at a significantly lower proposed DRD2 receptor occupancy. Interestingly, neither drug disrupted conditioned approach to the reward at the same dose range that disrupted conditioned approach to the reward predictive cue. Thus, haloperidol and olanzapine, at doses well below what is considered therapeutically relevant, disrupts the attribution of incentive motivational value to previously neutral cues. Drug effects on this dimension of reward processing are an important consideration in the development of future pharmacological treatments for schizophrenia. PMID:20416333
Chung, Tammy; Paulsen, David J.; Geier, Charles F.; Luna, Beatriz; Clark, Duncan B.
2015-01-01
This preliminary study examined the extent to which regional brain activation during a reward cue antisaccade (AS) task was associated with 6-month treatment outcome in adolescent substance users. Antisaccade performance provides a sensitive measure of executive function and cognitive control, and generally improves with reward cues. We hypothesized that when preparing to execute an AS, greater activation in regions associated with cognitive and oculomotor control supporting AS, particularly during reward cue trials, would be associated with lower substance use severity at 6-month follow-up. Adolescents (n=14, ages 14-18) recruited from community-based outpatient treatment completed an fMRI reward cue AS task (reward and neutral conditions), and provided follow-up data. Results indicated that AS errors decreased in reward, compared to neutral, trials. AS behavioral performance, however, was not associated with treatment outcome. As hypothesized, activation in regions of interest (ROIs) associated with cognitive (e.g., ventrolateral prefrontal cortex) and oculomotor control (e.g., supplementary eye field) during reward trials were inversely correlated with marijuana problem severity at 6-months. ROI activation during neutral trials was not associated with outcomes. Results support the role of motivational (reward cue) factors to enhance cognitive control processes, and suggest a potential brain-based correlate of youth treatment outcome. PMID:26026506
Hydration level is an internal variable for computing motivation to obtain water rewards in monkeys.
Minamimoto, Takafumi; Yamada, Hiroshi; Hori, Yukiko; Suhara, Tetsuya
2012-05-01
In the process of motivation to engage in a behavior, valuation of the expected outcome is comprised of not only external variables (i.e., incentives) but also internal variables (i.e., drive). However, the exact neural mechanism that integrates these variables for the computation of motivational value remains unclear. Besides, the signal of physiological needs, which serves as the primary internal variable for this computation, remains to be identified. Concerning fluid rewards, the osmolality level, one of the physiological indices for the level of thirst, may be an internal variable for valuation, since an increase in the osmolality level induces drinking behavior. Here, to examine the relationship between osmolality and the motivational value of a water reward, we repeatedly measured the blood osmolality level, while 2 monkeys continuously performed an instrumental task until they spontaneously stopped. We found that, as the total amount of water earned increased, the osmolality level progressively decreased (i.e., the hydration level increased) in an individual-dependent manner. There was a significant negative correlation between the error rate of the task (the proportion of trials with low motivation) and the osmolality level. We also found that the increase in the error rate with reward accumulation can be well explained by a formula describing the changes in the osmolality level. These results provide a biologically supported computational formula for the motivational value of a water reward that depends on the hydration level, enabling us to identify the neural mechanism that integrates internal and external variables.
Shulman, Elizabeth P; Monahan, Kathryn C; Steinberg, Laurence
2017-01-01
This report compares the effects (concurrent and lagged) of the anticipated rewards and costs of violent crime on engagement in severe violence in a sample of male juvenile offenders (N = 1,170; 42.1% black, 34.0% Hispanic, 19.2% white, and 4.6% other; ages 14-18 at baseline). Anticipated rewards (social approval, thrill) are more predictive of concurrent severe violence than are anticipated costs (social disapproval, risk of punishment). The analysis finds no evidence that perceptions of the rewards and costs of violent crime influence engagement in severe violence 6 months later. The results support the view that adolescence is a time of heightened reward salience but raise doubt about the longitudinal predictive validity of perceptions about crime during this time of life. © 2017 The Authors. Child Development © 2017 Society for Research in Child Development, Inc.
Aerobic exercise modulates anticipatory reward processing via the μ-opioid receptor system.
Saanijoki, Tiina; Nummenmaa, Lauri; Tuulari, Jetro J; Tuominen, Lauri; Arponen, Eveliina; Kalliokoski, Kari K; Hirvonen, Jussi
2018-06-08
Physical exercise modulates food reward and helps control body weight. The endogenous µ-opioid receptor (MOR) system is involved in rewarding aspects of both food and physical exercise, yet interaction between endogenous opioid release following exercise and anticipatory food reward remains unresolved. Here we tested whether exercise-induced opioid release correlates with increased anticipatory reward processing in humans. We scanned 24 healthy lean men after rest and after a 1 h session of aerobic exercise with positron emission tomography (PET) using MOR-selective radioligand [ 11 C]carfentanil. After both PET scans, the subjects underwent a functional magnetic resonance imaging (fMRI) experiment where they viewed pictures of palatable versus nonpalatable foods to trigger anticipatory food reward responses. Exercise-induced changes in MOR binding in key regions of reward circuit (amygdala, thalamus, ventral and dorsal striatum, and orbitofrontal and cingulate cortices) were used to predict the changes in anticipatory reward responses in fMRI. Exercise-induced changes in MOR binding correlated negatively with the exercise-induced changes in neural anticipatory food reward responses in orbitofrontal and cingulate cortices, insula, ventral striatum, amygdala, and thalamus: higher exercise-induced opioid release predicted higher brain responses to palatable versus nonpalatable foods. We conclude that MOR activation following exercise may contribute to the considerable interindividual variation in food craving and consumption after exercise, which might promote compensatory eating and compromise weight control. © 2018 Wiley Periodicals, Inc.
Elliott, R; Agnew, Z; Deakin, J F W
2008-05-01
Functional imaging studies in recent years have confirmed the involvement of orbitofrontal cortex (OFC) in human reward processing and have suggested that OFC responses are context-dependent. A seminal electrophysiological experiment in primates taught animals to associate abstract visual stimuli with differently valuable food rewards. Subsequently, pairs of these learned abstract stimuli were presented and firing of OFC neurons to the medium-value stimulus was measured. OFC firing was shown to depend on the relative value context. In this study, we developed a human analogue of this paradigm and scanned subjects using functional magnetic resonance imaging. The analysis compared neuronal responses to two superficially identical events, which differed only in terms of the preceding context. Medial OFC response to the same perceptual stimulus was greater when the stimulus predicted the more valuable of two rewards than when it predicted the less valuable. Additional responses were observed in other components of reward circuitry, the amygdala and ventral striatum. The central finding is consistent with the primate results and suggests that OFC neurons code relative rather than absolute reward value. Amygdala and striatal involvement in coding reward value is also consistent with recent functional imaging data. By using a simpler and less confounded paradigm than many functional imaging studies, we are able to demonstrate that relative financial reward value per se is coded in distinct subregions of an extended reward and decision-making network.
Distinct Reward Properties are Encoded via Corticostriatal Interactions
Smith, David V.; Rigney, Anastasia E.; Delgado, Mauricio R.
2016-01-01
The striatum serves as a critical brain region for reward processing. Yet, understanding the link between striatum and reward presents a challenge because rewards are composed of multiple properties. Notably, affective properties modulate emotion while informative properties help obtain future rewards. We approached this problem by emphasizing affective and informative reward properties within two independent guessing games. We found that both reward properties evoked activation within the nucleus accumbens, a subregion of the striatum. Striatal responses to informative, but not affective, reward properties predicted subsequent utilization of information for obtaining monetary reward. We hypothesized that activation of the striatum may be necessary but not sufficient to encode distinct reward properties. To investigate this possibility, we examined whether affective and informative reward properties were differentially encoded in corticostriatal interactions. Strikingly, we found that the striatum exhibited dissociable connectivity patterns with the ventrolateral prefrontal cortex, with increasing connectivity for affective reward properties and decreasing connectivity for informative reward properties. Our results demonstrate that affective and informative reward properties are encoded via corticostriatal interactions. These findings highlight how corticostriatal systems contribute to reward processing, potentially advancing models linking striatal activation to behavior. PMID:26831208
Distinct Reward Properties are Encoded via Corticostriatal Interactions.
Smith, David V; Rigney, Anastasia E; Delgado, Mauricio R
2016-02-02
The striatum serves as a critical brain region for reward processing. Yet, understanding the link between striatum and reward presents a challenge because rewards are composed of multiple properties. Notably, affective properties modulate emotion while informative properties help obtain future rewards. We approached this problem by emphasizing affective and informative reward properties within two independent guessing games. We found that both reward properties evoked activation within the nucleus accumbens, a subregion of the striatum. Striatal responses to informative, but not affective, reward properties predicted subsequent utilization of information for obtaining monetary reward. We hypothesized that activation of the striatum may be necessary but not sufficient to encode distinct reward properties. To investigate this possibility, we examined whether affective and informative reward properties were differentially encoded in corticostriatal interactions. Strikingly, we found that the striatum exhibited dissociable connectivity patterns with the ventrolateral prefrontal cortex, with increasing connectivity for affective reward properties and decreasing connectivity for informative reward properties. Our results demonstrate that affective and informative reward properties are encoded via corticostriatal interactions. These findings highlight how corticostriatal systems contribute to reward processing, potentially advancing models linking striatal activation to behavior.
Scalar utility theory and proportional processing: what does it actually imply?
Rosenström, Tom; Wiesner, Karoline; Houston, Alasdair I
2017-01-01
Scalar Utility Theory (SUT) is a model used to predict animal and human choice behaviour in the context of reward amount, delay to reward, and variability in these quantities (risk preferences). This article reviews and extends SUT, deriving novel predictions. We show that, contrary to what has been implied in the literature, (1) SUT can predict both risk averse and risk prone behaviour for both reward amounts and delays to reward depending on experimental parameters, (2) SUT implies violations of several concepts of rational behaviour (e.g. it violates strong stochastic transitivity and its equivalents, and leads to probability matching) and (3) SUT can predict, but does not always predict, a linear relationship between risk sensitivity in choices and coefficient of variation in the decision-making experiment. SUT derives from Scalar Expectancy Theory which models uncertainty in behavioural timing using a normal distribution. We show that the above conclusions also hold for other distributions, such as the inverse Gaussian distribution derived from drift-diffusion models. A straightforward way to test the key assumptions of SUT is suggested and possible extensions, future prospects and mechanistic underpinnings are discussed. PMID:27288541
Scalar utility theory and proportional processing: What does it actually imply?
Rosenström, Tom; Wiesner, Karoline; Houston, Alasdair I
2016-09-07
Scalar Utility Theory (SUT) is a model used to predict animal and human choice behaviour in the context of reward amount, delay to reward, and variability in these quantities (risk preferences). This article reviews and extends SUT, deriving novel predictions. We show that, contrary to what has been implied in the literature, (1) SUT can predict both risk averse and risk prone behaviour for both reward amounts and delays to reward depending on experimental parameters, (2) SUT implies violations of several concepts of rational behaviour (e.g. it violates strong stochastic transitivity and its equivalents, and leads to probability matching) and (3) SUT can predict, but does not always predict, a linear relationship between risk sensitivity in choices and coefficient of variation in the decision-making experiment. SUT derives from Scalar Expectancy Theory which models uncertainty in behavioural timing using a normal distribution. We show that the above conclusions also hold for other distributions, such as the inverse Gaussian distribution derived from drift-diffusion models. A straightforward way to test the key assumptions of SUT is suggested and possible extensions, future prospects and mechanistic underpinnings are discussed. Copyright © 2016 Elsevier Ltd. All rights reserved.
Reward sensitivity predicts ice cream-related attentional bias assessed by inattentional blindness.
Li, Xiaoming; Tao, Qian; Fang, Ya; Cheng, Chen; Hao, Yangyang; Qi, Jianjun; Li, Yu; Zhang, Wei; Wang, Ying; Zhang, Xiaochu
2015-06-01
The cognitive mechanism underlying the association between individual differences in reward sensitivity and food craving is unknown. The present study explored the mechanism by examining the role of reward sensitivity in attentional bias toward ice cream cues. Forty-nine college students who displayed high level of ice cream craving (HICs) and 46 who displayed low level of ice cream craving (LICs) performed an inattentional blindness (IB) task which was used to assess attentional bias for ice cream. In addition, reward sensitivity and coping style were assessed by the Behavior Inhibition System/Behavior Activation System Scales and Simplified Coping Style Questionnaire. Results showed significant higher identification rate of the critical stimulus in the HICs than LICs, suggesting greater attentional bias for ice cream in the HICs. It was indicated that attentional bias for food cues persisted even under inattentional condition. Furthermore, a significant correlation was found between the attentional bias and reward sensitivity after controlling for coping style, and reward sensitivity predicted attentional bias for food cues. The mediation analyses showed that attentional bias mediated the relationship between reward sensitivity and food craving. Those findings suggest that the association between individual differences in reward sensitivity and food craving may be attributed to attentional bias for food-related cues. Copyright © 2015 Elsevier Ltd. All rights reserved.
Smith, Tim J.; Senju, Atsushi
2017-01-01
While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue–reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue–reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. PMID:28250186
Vernetti, Angélina; Smith, Tim J; Senju, Atsushi
2017-03-15
While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. © 2017 The Authors.
Factors that Predict Full-Time Community College Faculty Engagement in Online Instruction
ERIC Educational Resources Information Center
Akroyd, Duane; Patton, Bess; Bracken, Susan
2013-01-01
This study is a secondary quantitative analysis of the 2004 National Study of Postsecondary Faculty (NSOPF) data. It examines the ability of human capital, intrinsic rewards, extrinsic rewards, and gender/race demographics to predict full-time community college faculty teaching on-line courses. Findings indicate that those faculty with higher…
Learning-dependent plasticity in human auditory cortex during appetitive operant conditioning.
Puschmann, Sebastian; Brechmann, André; Thiel, Christiane M
2013-11-01
Animal experiments provide evidence that learning to associate an auditory stimulus with a reward causes representational changes in auditory cortex. However, most studies did not investigate the temporal formation of learning-dependent plasticity during the task but rather compared auditory cortex receptive fields before and after conditioning. We here present a functional magnetic resonance imaging study on learning-related plasticity in the human auditory cortex during operant appetitive conditioning. Participants had to learn to associate a specific category of frequency-modulated tones with a reward. Only participants who learned this association developed learning-dependent plasticity in left auditory cortex over the course of the experiment. No differential responses to reward predicting and nonreward predicting tones were found in auditory cortex in nonlearners. In addition, learners showed similar learning-induced differential responses to reward-predicting and nonreward-predicting tones in the ventral tegmental area and the nucleus accumbens, two core regions of the dopaminergic neurotransmitter system. This may indicate a dopaminergic influence on the formation of learning-dependent plasticity in auditory cortex, as it has been suggested by previous animal studies. Copyright © 2012 Wiley Periodicals, Inc.
Executive function and decision-making in women with fibromyalgia.
Verdejo-García, Antonio; López-Torrecillas, Francisca; Calandre, Elena Pita; Delgado-Rodríguez, Antonia; Bechara, Antoine
2009-02-01
Patients with fibromyalgia (FM) typically report cognitive problems, and they state that these deficits are disturbing in everyday life. Despite these substantial subjective complaints by FM patients, very few studies have addressed objectively the effect of such aversive states on neuropsychological performance. In this study we aimed to examine possible impairment of executive function and decision-making in a sample of 36 women diagnosed with FM and 36 healthy women matched in age, education, and socio-economic status. We contrasted performance of both groups on two measures of executive functioning: the Wisconsin Card Sorting Test (WCST), which assesses cognitive flexibility skills, and the Iowa Gambling Tasks (IGT; original and variant versions), which assess emotion-based decision-making. We also examined the relationship between executive function performance and pain experience, and between executive function and personality traits of novelty-seeking, harm avoidance, reward dependence, and persistence (measured by the Temperament and Character Inventory-Revised). Results showed that on the WCST, FM women showed poorer performance than healthy comparison women on the number of categories and non-perseverative errors, but not on perseverative errors. FM patients also showed altered learning curve in the original IGT (where reward is immediate and punishment is delayed), suggesting compromised emotion-based decision-making; but not in the variant IGT (where punishment is immediate but reward is delayed), suggesting hypersensitivity to reward. Personality variables were very mildly associated with cognitive performance in FM women.
Habit formation coincides with shifts in reinforcement representations in the sensorimotor striatum.
Smith, Kyle S; Graybiel, Ann M
2016-03-01
Evaluating outcomes of behavior is a central function of the striatum. In circuits engaging the dorsomedial striatum, sensitivity to goal value is accentuated during learning, whereas outcome sensitivity is thought to be minimal in the dorsolateral striatum and its habit-related corticostriatal circuits. However, a distinct population of projection neurons in the dorsolateral striatum exhibits selective sensitivity to rewards. Here, we evaluated the outcome-related signaling in such neurons as rats performed an instructional T-maze task for two rewards. As the rats formed maze-running habits and then changed behavior after reward devaluation, we detected outcome-related spike activity in 116 units out of 1,479 recorded units. During initial training, nearly equal numbers of these units fired preferentially either after rewarded runs or after unrewarded runs, and the majority were responsive at only one of two reward locations. With overtraining, as habits formed, firing in nonrewarded trials almost disappeared, and reward-specific firing declined. Thus error-related signaling was lost, and reward signaling became generalized. Following reward devaluation, in an extinction test, postgoal activity was nearly undetectable, despite accurate running. Strikingly, when rewards were then returned, postgoal activity reappeared and recapitulated the original early response pattern, with nearly equal numbers responding to rewarded and unrewarded runs and to single rewards. These findings demonstrate that outcome evaluation in the dorsolateral striatum is highly plastic and tracks stages of behavioral exploration and exploitation. These signals could be a new target for understanding compulsive behaviors that involve changes to dorsal striatum function. Copyright © 2016 the American Physiological Society.
Delaying gratification depends on social trust
Michaelson, Laura; de la Vega, Alejandro; Chatham, Christopher H.; Munakata, Yuko
2013-01-01
Delaying gratification is hard, yet predictive of important life outcomes, such as academic achievement and physical health. Prominent theories focus on the role of self-control, hypersensitivity to immediate rewards, and the cost of time spent waiting. However, delaying gratification may also require trust in people delivering future rewards as promised. To test the role of social trust, participants were presented with character vignettes and faces that varied in trustworthiness, and then choose between hypothetical smaller immediate or larger delayed rewards from those characters. Across two experiments, participants were less willing to wait for delayed rewards from less trustworthy characters, and perceived trustworthiness predicted willingness to delay gratification. These findings provide the first demonstration of a causal role for social trust in willingness to delay gratification, independent of other relevant factors, such as self-control or reward history. Thus, delaying gratification requires choosing not only a later reward, but a reward that is potentially less likely to be delivered, when there is doubt about the person promising it. Implications of this work include the need to revise prominent theories of delay of gratification, and new directions for interventions with populations characterized by impulsivity. PMID:23801977
Leon, M I; Gallistel, C R
1998-07-01
For rats that bar pressed for intracranial electrical stimulation in a 2-lever matching paradigm with concurrent variable interval schedules of reward, the authors found that the time allocation ratio is based on a multiplicative combination of the ratio of subjective reward magnitudes and the ratio of the rates of reward. Multiplicative combining was observed in a range covering approximately 2 orders of magnitude in the ratio of the rates of reward from about 1:10 to 10:1) and an order of magnitude change in the size of rewards. After determining the relation between the pulse frequency of stimulation and subjective reward magnitude, the authors were able to predict from knowledge of the subjective magnitudes of the rewards and the obtained relative rates of reward the subject's time allocation ratio over a range in which it varied by more than 3 orders of magnitude.
fMRI of alterations in reward selection, anticipation, and feedback in major depressive disorder.
Smoski, Moria J; Felder, Jennifer; Bizzell, Joshua; Green, Steven R; Ernst, Monique; Lynch, Thomas R; Dichter, Gabriel S
2009-11-01
The purpose of the present investigation was to evaluate reward processing in unipolar major depressive disorder (MDD). Specifically, we investigated whether adults with MDD demonstrated hyporesponsivity in striatal brain regions and/or hyperresponsivity in cortical brain regions involved in conflict monitoring using a Wheel of Fortune task designed to probe responses during reward selection, reward anticipation, and reward feedback. Functional magnetic resonance imaging (fMRI) data indicated that the MDD group was characterized by reduced activation of striatal reward regions during reward selection, reward anticipation, and reward feedback, supporting previous data indicating hyporesponsivity of reward systems in MDD. Support was not found for hyperresponsivity of cognitive control regions during reward selection or reward anticipation. Instead, MDD participants showed hyperresponsivity in orbitofrontal cortex, a region associated with assessment of risk and reward, during reward selection, as well as decreased activation of the middle frontal gyrus and the rostral cingulate gyrus during reward selection and anticipation. Finally, depression severity was predicted by activation in bilateral midfrontal gyrus during reward selection. Results indicate that MDD is characterized by striatal hyporesponsivity, and that future studies of MDD treatments that seek to improve responses to rewarding stimuli should assess striatal functioning.
Fiorenza, Amanda M; Shnitko, Tatiana A; Sullivan, Kaitlin M; Vemuru, Sudheer R; Gomez-A, Alexander; Esaki, Julie Y; Boettiger, Charlotte A; Da Cunha, Claudio; Robinson, Donita L
2018-06-01
Conditioned stimuli (CS) that predict reward delivery acquire the ability to induce phasic dopamine release in the nucleus accumbens (NAc). This dopamine release may facilitate conditioned approach behavior, which often manifests as approach to the site of reward delivery (called "goal-tracking") or to the CS itself (called "sign-tracking"). Previous research has linked sign-tracking in particular to impulsivity and drug self-administration, and addictive drugs may promote the expression of sign-tracking. Ethanol (EtOH) acutely promotes phasic release of dopamine in the accumbens, but it is unknown whether an alcoholic reward alters dopamine release to a CS. We hypothesized that Pavlovian conditioning with an alcoholic reward would increase dopamine release triggered by the CS and subsequent sign-tracking behavior. Moreover, we predicted that chronic intermittent EtOH (CIE) exposure would promote sign-tracking while acute administration of naltrexone (NTX) would reduce it. Rats received 14 doses of EtOH (3 to 5 g/kg, intragastric) or water followed by 6 days of Pavlovian conditioning training. Rewards were a chocolate solution with or without 10% (w/v) alcohol. We used fast-scan cyclic voltammetry to measure phasic dopamine release in the NAc core in response to the CS and the rewards. We also determined the effect of NTX (1 mg/kg, subcutaneous) on conditioned approach. Both CIE and alcoholic reward, individually but not together, associated with greater dopamine to the CS than control conditions. However, this increase in dopamine release was not linked to greater sign-tracking, as both CIE and alcoholic reward shifted conditioned approach from sign-tracking behavior to goal-tracking behavior. However, they both also increased sensitivity to NTX, which reduced goal-tracking behavior. While a history of EtOH exposure or alcoholic reward enhanced dopamine release to a CS, they did not promote sign-tracking under the current conditions. These findings are consistent with the interpretation that EtOH can stimulate conditioned approach, but indicate that the conditioned response may manifest as goal-tracking. Copyright © 2018 by the Research Society on Alcoholism.
Predictive models of glucose control: roles for glucose-sensing neurones.
Kosse, C; Gonzalez, A; Burdakov, D
2015-01-01
The brain can be viewed as a sophisticated control module for stabilizing blood glucose. A review of classical behavioural evidence indicates that central circuits add predictive (feedforward/anticipatory) control to the reactive (feedback/compensatory) control by peripheral organs. The brain/cephalic control is constructed and engaged, via associative learning, by sensory cues predicting energy intake or expenditure (e.g. sight, smell, taste, sound). This allows rapidly measurable sensory information (rather than slowly generated internal feedback signals, e.g. digested nutrients) to control food selection, glucose supply for fight-or-flight responses or preparedness for digestion/absorption. Predictive control is therefore useful for preventing large glucose fluctuations. We review emerging roles in predictive control of two classes of widely projecting hypothalamic neurones, orexin/hypocretin (ORX) and melanin-concentrating hormone (MCH) cells. Evidence is cited that ORX neurones (i) are activated by sensory cues (e.g. taste, sound), (ii) drive hepatic production, and muscle uptake, of glucose, via sympathetic nerves, (iii) stimulate wakefulness and exploration via global brain projections and (iv) are glucose-inhibited. MCH neurones are (i) glucose-excited, (ii) innervate learning and reward centres to promote synaptic plasticity, learning and memory and (iii) are critical for learning associations useful for predictive control (e.g. using taste to predict nutrient value of food). This evidence is unified into a model for predictive glucose control. During associative learning, inputs from some glucose-excited neurones may promote connections between the 'fast' senses and reward circuits, constructing neural shortcuts for efficient action selection. In turn, glucose-inhibited neurones may engage locomotion/exploration and coordinate the required fuel supply. Feedback inhibition of the latter neurones by glucose would ensure that glucose fluxes they stimulate (from liver, into muscle) are balanced. Estimating nutrient challenges from indirect sensory cues may become more difficult when the cues become complex and variable (e.g. like human foods today). Consequent errors of predictive glucose control may contribute to obesity and diabetes. © 2014 The Authors. Acta Physiologica published by John Wiley & Sons Ltd on behalf of Scandinavian Physiological Society.
State-based versus reward-based motivation in younger and older adults.
Worthy, Darrell A; Cooper, Jessica A; Byrne, Kaileigh A; Gorlick, Marissa A; Maddox, W Todd
2014-12-01
Recent decision-making work has focused on a distinction between a habitual, model-free neural system that is motivated toward actions that lead directly to reward and a more computationally demanding goal-directed, model-based system that is motivated toward actions that improve one's future state. In this article, we examine how aging affects motivation toward reward-based versus state-based decision making. Participants performed tasks in which one type of option provided larger immediate rewards but the alternative type of option led to larger rewards on future trials, or improvements in state. We predicted that older adults would show a reduced preference for choices that led to improvements in state and a greater preference for choices that maximized immediate reward. We also predicted that fits from a hybrid reinforcement-learning model would indicate greater model-based strategy use in younger than in older adults. In line with these predictions, older adults selected the options that maximized reward more often than did younger adults in three of the four tasks, and modeling results suggested reduced model-based strategy use. In the task where older adults showed similar behavior to younger adults, our model-fitting results suggested that this was due to the utilization of a win-stay-lose-shift heuristic rather than a more complex model-based strategy. Additionally, within older adults, we found that model-based strategy use was positively correlated with memory measures from our neuropsychological test battery. We suggest that this shift from state-based to reward-based motivation may be due to age related declines in the neural structures needed for more computationally demanding model-based decision making.
Scheres, Anouk; Dijkstra, Marianne; Ainslie, Eleanor; Balkan, Jaclyn; Reynolds, Brady; Sonuga-Barke, Edmund; Castellanos, F Xavier
2006-01-01
This study investigated whether age and ADHD symptoms affected choice preferences in children and adolescents when they chose between (1) small immediate rewards and larger delayed rewards and (2) small certain rewards and larger probabilistic uncertain rewards. A temporal discounting (TD) task and a probabilistic discounting (PD) task were used to measure the degree to which the subjective value of a large reward decreased as one had to wait longer for it (TD), and as the probability of obtaining it decreased (PD). Rewards used were small amounts of money. In the TD task, the large reward (10 cents) was delayed by between 0 and 30s, and the immediate reward varied in magnitude (0-10 cents). In the PD task, receipt of the large reward (10 cents) varied in likelihood, with probabilities of 0, 0.25, 0.5, 0.75, and 1.0 used, and the certain reward varied in magnitude (0-10 cents). Age and diagnostic group did not affect the degree of PD of rewards: All participants made choices so that total gains were maximized. As predicted, young children, aged 6-11 years (n = 25) demonstrated steeper TD of rewards than adolescents, aged 12-17 years (n = 21). This effect remained significant even when choosing the immediate reward did not shorten overall task duration. This, together with the lack of interaction between TD task version and age, suggests that steeper discounting in young children is driven by reward immediacy and not by delay aversion. Contrary to our predictions, participants with ADHD (n = 22) did not demonstrate steeper TD of rewards than controls (n = 24). These results raise the possibility that strong preferences for small immediate rewards in ADHD, as found in previous research, depend on factors such as total maximum gain and the use of fixed versus varied delay durations. The decrease in TD as observed in adolescents compared to children may be related to developmental changes in the (dorsolateral) prefrontal cortex. Future research needs to investigate these possibilities.
Does prediction error drive one-shot declarative learning?
Greve, Andrea; Cooper, Elisa; Kaula, Alexander; Anderson, Michael C; Henson, Richard
2017-06-01
The role of prediction error (PE) in driving learning is well-established in fields such as classical and instrumental conditioning, reward learning and procedural memory; however, its role in human one-shot declarative encoding is less clear. According to one recent hypothesis, PE reflects the divergence between two probability distributions: one reflecting the prior probability (from previous experiences) and the other reflecting the sensory evidence (from the current experience). Assuming unimodal probability distributions, PE can be manipulated in three ways: (1) the distance between the mode of the prior and evidence, (2) the precision of the prior, and (3) the precision of the evidence. We tested these three manipulations across five experiments, in terms of peoples' ability to encode a single presentation of a scene-item pairing as a function of previous exposures to that scene and/or item. Memory was probed by presenting the scene together with three choices for the previously paired item, in which the two foil items were from other pairings within the same condition as the target item. In Experiment 1, we manipulated the evidence to be either consistent or inconsistent with prior expectations, predicting PE to be larger, and hence memory better, when the new pairing was inconsistent. In Experiments 2a-c, we manipulated the precision of the priors, predicting better memory for a new pairing when the (inconsistent) priors were more precise. In Experiment 3, we manipulated both visual noise and prior exposure for unfamiliar faces, before pairing them with scenes, predicting better memory when the sensory evidence was more precise. In all experiments, the PE hypotheses were supported. We discuss alternative explanations of individual experiments, and conclude the Predictive Interactive Multiple Memory Signals (PIMMS) framework provides the most parsimonious account of the full pattern of results.
Carl, Hannah; Walsh, Erin; Eisenlohr-Moul, Tory; Minkel, Jared; Crowther, Andrew; Moore, Tyler; Gibbs, Devin; Petty, Chris; Bizzell, Josh; Dichter, Gabriel S; Smoski, Moria J
2016-10-01
The purpose of the present investigation was to evaluate whether pre-treatment neural activation in response to rewards is a predictor of clinical response to Behavioral Activation Therapy for Depression (BATD), an empirically validated psychotherapy that decreases depressive symptoms by increasing engagement with rewarding stimuli and reducing avoidance behaviors. Participants were 33 outpatients with major depressive disorder (MDD) and 20 matched controls. We examined group differences in activation, and the capacity to sustain activation, across task runs using functional magnetic resonance imaging (fMRI) and the monetary incentive delay (MID) task. Hierarchical linear modeling was used to investigate whether pre-treatment neural responses predicted change in depressive symptoms over the course of BATD treatment. MDD and Control groups differed in sustained activation during reward outcomes in the right nucleus accumbens, such that the MDD group experienced a significant decrease in activation in this region from the first to second task run relative to controls. Pretreatment anhedonia severity and pretreatment task-related reaction times were predictive of response to treatment. Furthermore, sustained activation in the anterior cingulate cortex during reward outcomes predicted response to psychotherapy; patients with greater sustained activation in this region were more responsive to BATD treatment. The current study only included a single treatment condition, thus it unknown whether these predictors of treatment response are specific to BATD or psychotherapy in general. Findings add to the growing body of literature suggesting that the capacity to sustain neural responses to rewards may be a critical endophenotype of MDD. Copyright © 2016 Elsevier B.V. All rights reserved.
Nucleus Accumbens Acetylcholine Receptors Modulate Dopamine and Motivation.
Collins, Anne L; Aitken, Tara J; Greenfield, Venuz Y; Ostlund, Sean B; Wassum, Kate M
2016-11-01
Environmental reward-predictive cues can motivate reward-seeking behaviors. Although this influence is normally adaptive, it can become maladaptive in disordered states, such as addiction. Dopamine release in the nucleus accumbens core (NAc) is known to mediate the motivational impact of reward-predictive cues, but little is known about how other neuromodulatory systems contribute to cue-motivated behavior. Here, we examined the role of the NAc cholinergic receptor system in cue-motivated behavior using a Pavlovian-to-instrumental transfer task designed to assess the motivating influence of a reward-predictive cue over an independently-trained instrumental action. Disruption of NAc muscarinic acetylcholine receptor activity attenuated, whereas blockade of nicotinic receptors augmented cue-induced invigoration of reward seeking. We next examined a potential dopaminergic mechanism for this behavioral effect by combining fast-scan cyclic voltammetry with local pharmacological acetylcholine receptor manipulation. The data show evidence of opposing modulation of cue-evoked dopamine release, with muscarinic and nicotinic receptor antagonists causing suppression and augmentation, respectively, consistent with the behavioral effects of these manipulations. In addition to demonstrating cholinergic modulation of naturally-evoked and behaviorally-relevant dopamine signaling, these data suggest that NAc cholinergic receptors may gate the expression of cue-motivated behavior through modulation of phasic dopamine release.
Smillie, Luke D; Dalgleish, Len I; Jackson, Chris J
2007-04-01
According to Gray's (1973) Reinforcement Sensitivity Theory (RST), a Behavioral Inhibition System (BIS) and a Behavioral Activation System (BAS) mediate effects of goal conflict and reward on behavior. BIS functioning has been linked with individual differences in trait anxiety and BAS functioning with individual differences in trait impulsivity. In this article, it is argued that behavioral outputs of the BIS and BAS can be distinguished in terms of learning and motivation processes and that these can be operationalized using the Signal Detection Theory measures of response-sensitivity and response-bias. In Experiment 1, two measures of BIS-reactivity predicted increased response-sensitivity under goal conflict, whereas one measure of BAS-reactivity predicted increased response-sensitivity under reward. In Experiment 2, two measures of BIS-reactivity predicted response-bias under goal conflict, whereas a measure of BAS-reactivity predicted motivation response-bias under reward. In both experiments, impulsivity measures did not predict criteria for BAS-reactivity as traditionally predicted by RST.
Reward sensitivity, decisional bias, and metacognitive deficits in cocaine drug addiction.
Balconi, Michela; Finocchiaro, Roberta; Campanella, Salvatore
2014-01-01
The present research explored the effect of reward sensitivity bias and metacognitive deficits on substance use disorder (SUD) in the decision-making process. The behavioral activation system (BAS) was used as a predictive marker of dysfunctional behavior during the Iowa gambling task (IGT). We also tried to relate this motivational system bias to self-reported metacognitive measures (self-knowledge, strategic planning, flexibility, and efficacy) in the decision processes. Thirty-four SUD participants (cocaine dependent) and 39 participants in the control group underwent the IGT. The SUD group was associated with a poorer performance on the IGT and a dysfunctional metacognition ability (unrealistic representation). An increase in the reward sensitivity (higher BAS, BAS reward responsiveness, and BAS reward) was observed in the SUD group compared with the control group and explained (through a regression analysis) the main behavioral deficits. More generally, an increase in the BAS reward responsiveness may be considered a predictive measure of risk-taking and dysfunctional behavior, not only in pathological (SUD) individuals, but also in subclinical individuals (controls). We discuss the likely cognitive, brain, and neurotransmitter contributions to this phenomenon.
Dopamine and extinction: A convergence of theory with fear and reward circuitry
Abraham, Antony D.; Neve, Kim A.; Lattal, K. Matthew
2014-01-01
Research on dopamine lies at the intersection of sophisticated theoretical and neurobiological approaches to learning and memory. Dopamine has been shown to be critical for many processes that drive learning and memory, including motivation, prediction error, incentive salience, memory consolidation, and response output. Theories of dopamine’s function in these processes have, for the most part, been developed from behavioral approaches that examine learning mechanisms in reward-related tasks. A parallel and growing literature indicates that dopamine is involved in fear conditioning and extinction. These studies are consistent with long-standing ideas about appetitive-aversive interactions in learning theory and they speak to the general nature of cellular and molecular processes that underlie behavior. We review the behavioral and neurobiological literature showing a role for dopamine in fear conditioning and extinction. At a cellular level, we review dopamine signaling and receptor pharmacology, cellular and molecular events that follow dopamine receptor activation, and brain systems in which dopamine functions. At a behavioral level, we describe theories of learning and dopamine function that could describe the fundamental rules underlying how dopamine modulates different aspects of learning and memory processes. PMID:24269353
Dopamine and extinction: a convergence of theory with fear and reward circuitry.
Abraham, Antony D; Neve, Kim A; Lattal, K Matthew
2014-02-01
Research on dopamine lies at the intersection of sophisticated theoretical and neurobiological approaches to learning and memory. Dopamine has been shown to be critical for many processes that drive learning and memory, including motivation, prediction error, incentive salience, memory consolidation, and response output. Theories of dopamine's function in these processes have, for the most part, been developed from behavioral approaches that examine learning mechanisms in reward-related tasks. A parallel and growing literature indicates that dopamine is involved in fear conditioning and extinction. These studies are consistent with long-standing ideas about appetitive-aversive interactions in learning theory and they speak to the general nature of cellular and molecular processes that underlie behavior. We review the behavioral and neurobiological literature showing a role for dopamine in fear conditioning and extinction. At a cellular level, we review dopamine signaling and receptor pharmacology, cellular and molecular events that follow dopamine receptor activation, and brain systems in which dopamine functions. At a behavioral level, we describe theories of learning and dopamine function that could describe the fundamental rules underlying how dopamine modulates different aspects of learning and memory processes. Copyright © 2013 Elsevier Inc. All rights reserved.
It's about time: Earlier rewards increase intrinsic motivation.
Woolley, Kaitlin; Fishbach, Ayelet
2018-06-01
Can immediate (vs. delayed) rewards increase intrinsic motivation? Prior research compared the presence versus absence of rewards. By contrast, this research compared immediate versus delayed rewards, predicting that more immediate rewards increase intrinsic motivation by creating a perceptual fusion between the activity and its goal (i.e., the reward). In support of the hypothesis, framing a reward from watching a news program as more immediate (vs. delayed) increased intrinsic motivation to watch the program (Study 1), and receiving more immediate bonus (vs. delayed, Study 2; and vs. delayed and no bonus, Study 3) increased intrinsic motivation in an experimental task. The effect of reward timing was mediated by the strength of the association between an activity and a reward, and was specific to intrinsic (vs. extrinsic) motivation-immediacy influenced the positive experience of an activity, but not perceived outcome importance (Study 4). In addition, the effect of the timing of rewards was independent of the effect of the magnitude of the rewards (Study 5). (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Saez, Rebecca A; Saez, Alexandre; Paton, Joseph J; Lau, Brian; Salzman, C Daniel
2017-07-05
The same reward can possess different motivational meaning depending upon its magnitude relative to other rewards. To study the neurophysiological mechanisms mediating assignment of motivational meaning, we recorded the activity of neurons in the amygdala and orbitofrontal cortex (OFC) of monkeys during a Pavlovian task in which the relative amount of liquid reward associated with one conditioned stimulus (CS) was manipulated by changing the reward amount associated with a second CS. Anticipatory licking tracked relative reward magnitude, implying that monkeys integrated information about recent rewards to adjust the motivational meaning of a CS. Upon changes in relative reward magnitude, neural responses to reward-predictive cues updated more rapidly in OFC than amygdala, and activity in OFC but not the amygdala was modulated by recent reward history. These results highlight a distinction between the amygdala and OFC in assessing reward history to support the flexible assignment of motivational meaning to sensory cues. Copyright © 2017 Elsevier Inc. All rights reserved.
San Martín, René; Appelbaum, Lawrence G.; Huettel, Scott A.; Woldorff, Marty G.
2016-01-01
Adaptive choice behavior depends critically on identifying and learning from outcome-predicting cues. We hypothesized that attention may be preferentially directed toward certain outcome-predicting cues. We studied this possibility by analyzing event-related potential (ERP) responses in humans during a probabilistic decision-making task. Participants viewed pairs of outcome-predicting visual cues and then chose to wager either a small (i.e., loss-minimizing) or large (i.e., gain-maximizing) amount of money. The cues were bilaterally presented, which allowed us to extract the relative neural responses to each cue by using a contralateral-versus-ipsilateral ERP contrast. We found an early lateralized ERP response, whose features matched the attention-shift-related N2pc component and whose amplitude scaled with the learned reward-predicting value of the cues as predicted by an attention-for-reward model. Consistently, we found a double dissociation involving the N2pc. Across participants, gain-maximization positively correlated with the N2pc amplitude to the most reliable gain-predicting cue, suggesting an attentional bias toward such cues. Conversely, loss-minimization was negatively correlated with the N2pc amplitude to the most reliable loss-predicting cue, suggesting an attentional avoidance toward such stimuli. These results indicate that learned stimulus–reward associations can influence rapid attention allocation, and that differences in this process are associated with individual differences in economic decision-making performance. PMID:25139941
Neural correlates of appetitive-aversive interactions in Pavlovian fear conditioning.
Nasser, Helen M; McNally, Gavan P
2013-03-19
We used Pavlovian counterconditioning in rats to identify the neural mechanisms for appetitive-aversive motivational interactions. In Stage I, rats were trained on conditioned stimulus (CS)-food (unconditioned stimulus [US]) pairings. In Stage II, this appetitive CS was transformed into a fear CS via pairings with footshock. The development of fear responses was retarded in rats that had received Stage I appetitive training. This counterconditioning was associated with increased levels of phosphorylated mitogen activated protein kinase immunoreactivity (pMAPK-IR) in several brain regions, including midline thalamus, rostral agranular insular cortex (RAIC), lateral amygdala, and nucleus accumbens core and shell, but decreased expression in the ventrolateral quadrant of the midbrain periaqueductal gray. These brain regions showing differential pMAPK-IR have previously been identified as part of the fear prediction error circuit. We then examined the causal role of RAIC MAPK in fear learning and showed that Stage II fear learning was prevented by RAIC infusions of the MEK inhibitor PD098059 (0.5 µg/hemisphere). Taken together, these results show that there are opponent interactions between the appetitive and aversive motivational systems during fear learning and that the transformation of a reward CS into a fear CS is linked to heightened activity in the fear prediction error circuit.
Lau, Brian; Monteiro, Tiago; Paton, Joseph J
2017-10-01
Computational models of reinforcement learning (RL) strive to produce behavior that maximises reward, and thus allow software or robots to behave adaptively [1]. At the core of RL models is a learned mapping between 'states'-situations or contexts that an agent might encounter in the world-and actions. A wealth of physiological and anatomical data suggests that the basal ganglia (BG) is important for learning these mappings [2,3]. However, the computations performed by specific circuits are unclear. In this brief review, we highlight recent work concerning the anatomy and physiology of BG circuits that suggest refinements in our understanding of computations performed by the basal ganglia. We focus on one important component of basal ganglia circuitry, midbrain dopamine neurons, drawing attention to data that has been cast as supporting or departing from the RL framework that has inspired experiments in basal ganglia research over the past two decades. We suggest that the parallel circuit architecture of the BG might be expected to produce variability in the response properties of different dopamine neurons, and that variability in response profile may not reflect variable functions, but rather different arguments that serve as inputs to a common function: the computation of prediction error. Copyright © 2017 Elsevier Ltd. All rights reserved.
Overcommitment as a predictor of effort-reward imbalance: evidence from an 8-year follow-up study.
Feldt, Taru; Hyvönen, Katriina; Mäkikangas, Anne; Rantanen, Johanna; Huhtala, Mari; Kinnunen, Ulla
2016-07-01
The effort-reward imbalance (ERI) model includes the personal characteristic of overcommitment (OC) and the job-related characteristics of effort, reward, and ERI, all of which are assumed to play a role in an employee's health and well-being at work. The aim of the present longitudinal study was to shed more light on the dynamics of the ERI model by investigating the basic hypotheses related to the role of OC in the model, ie, to establish whether an employee's OC could be a risk factor for an increased experience of high effort, low reward, and high ERI at work. The study was based on 5-wave, 8-year follow-up data collected among Finnish professionals in 2006 (T1, N=747), 2008 (T2, N=422), 2010 (T3, N=368), 2012 (T4, N=325), and 2014 (T5, N=273). The participants were mostly male (85% at T1) and the majority of them worked in technical fields. OC, effort, reward, and ERI were measured at each time point with the 23-item ERI scale. Three cross-lagged structural equation models (SEM) were estimated and compared by using full information maximum likelihood method: (i) OC predicted later experiences of effort, reward, and ERI (normal causation model), (ii) effort, reward, and ERI predicted later OC (reversed causation model), and (iii) associations in normal causal and reversed causal models were simultaneously valid (reciprocal causation model). The results supported the normal causation model: strong OC predicted later experiences of high effort, low reward and high ERI. High OC is a risk factor for an increased experience of job strain factors; that is, high effort, low reward, and high ERI. Thus, OC is a risk factor not only for an employee's well-being and health but also for an increasing risk for perceiving adverse job strain factors in the working environment.
Kim, Ji-Eun; Son, Jung-Woo; Choi, Won-Hee; Kim, Yeoung-Rang; Oh, Jong-Hyun; Lee, Seungbok; Kim, Jang-Kyu
2014-06-01
This study aimed to examine differences in brain activation for various types of reward and feedback in adolescent Internet addicts (AIA) and normal adolescents (NA) using functional magnetic resonance imaging (fMRI). AIA (n = 15) and NA (n = 15) underwent fMRI while performing easy tasks for which performance feedback (PF), social reward (SR) (such as compliments), or monetary reward (MR) was given. Using the no reward (NR) condition, three types of contrasts (PF-NR, SR-NR, and MR-NR) were analyzed. In NA, we observed activation in the reward-related subcortical system, self-related brain region, and other brain areas for the three contrasts, but these brain areas showed almost no activation in AIA. Instead, AIA showed significant activation in the dorsolateral prefrontal cortex for the PF-NR contrast and the negative correlation was found between the level of activation in the left superior temporal gyrus (BA 22) and the duration of Internet game use per day in AIA. These findings suggest that AIA show reduced levels of self-related brain activation and decreased reward sensitivity irrespective of the type of reward and feedback. AIA may be only sensitive to error monitoring regardless of positive feelings, such as sense of satisfaction or achievement. © 2014 The Authors. Psychiatry and Clinical Neurosciences © 2014 Japanese Society of Psychiatry and Neurology.
Chung, Tammy; Geier, Charles; Luna, Beatriz; Pajtek, Stefan; Terwilliger, Robert; Thatcher, Dawn; Clark, Duncan
2010-01-01
Effective response inhibition is a key component of recovery from addiction. Some research suggests that response inhibition can be enhanced through reward contingencies. We examined the effect of monetary incentive on response inhibition among adolescents with and without substance use disorder (SUD) using a fast event-related fMRI antisaccade reward task. The fMRI task permits investigation of how reward (monetary incentive) might modulate inhibitory control during three task phases: cue presentation (reward or neutral trial), response preparation, and response execution. Adolescents with lifetime SUD (n=12; 100% marijuana use disorder) were gender and age-matched to healthy controls (n=12). Monetary incentive facilitated inhibitory control for SUD adolescents; for healthy controls, the difference in error rate for neutral and reward trials was not significant. There were no significant differences in behavioral performance between groups across reward and neutral trials, however, group differences in regional brain activation were identified. During the response preparation phase of reward trials, SUD adolescents, compared to controls, showed increased activation of prefrontal and oculomotor control (e.g., frontal eye field) areas, brain regions that have been associated with effective response inhibition. Results indicate differences in brain activation between SUD and control youth when preparing to inhibit a prepotent response in the context of reward, and support a possible role for incentives in enhancing response inhibition among youth with SUD. PMID:21115229
Sensitivity of Locus Ceruleus Neurons to Reward Value for Goal-Directed Actions
Richmond, Barry J.
2015-01-01
The noradrenergic nucleus locus ceruleus (LC) is associated classically with arousal and attention. Recent data suggest that it might also play a role in motivation. To study how LC neuronal responses are related to motivational intensity, we recorded 121 single neurons from two monkeys while reward size (one, two, or four drops) and the manner of obtaining reward (passive vs active) were both manipulated. The monkeys received reward under three conditions: (1) releasing a bar when a visual target changed color; (2) passively holding a bar; or (3) touching and releasing a bar. In the first two conditions, a visual cue indicated the size of the upcoming reward, and, in the third, the reward was constant through each block of 25 trials. Performance levels and lipping intensity (an appetitive behavior) both showed that the monkeys' motivation in the task was related to the predicted reward size. In conditions 1 and 2, LC neurons were activated phasically in relation to cue onset, and this activation strengthened with increasing expected reward size. In conditions 1 and 3, LC neurons were activated before the bar-release action, and the activation weakened with increasing expected reward size but only in task 1. These effects evolved as monkeys progressed through behavioral sessions, because increasing fatigue and satiety presumably progressively decreased the value of the upcoming reward. These data indicate that LC neurons integrate motivationally relevant information: both external cues and internal drives. The LC might provide the impetus to act when the predicted outcome value is low. PMID:25740528
Social Reward Questionnaire—Adolescent Version and its association with callous–unemotional traits
Neumann, Craig S.; Roberts, Ruth; McCrory, Eamon; Viding, Essi
2017-01-01
During adolescence, social interactions are a potent source of reward. However, no measure of social reward value exists for this age group. In this study, we adapted the adult Social Reward Questionnaire, which we had previously developed and validated, for use with adolescents. Participants aged 11–16 (n = 568; 50% male) completed the Social Reward Questionnaire—Adolescent Version (SRQ-A), alongside measures of personality traits—five-factor model (FFM) and callous–unemotional (CU) traits—for construct validity purposes. A confirmatory factor analysis of the SRQ-A supported a five-factor structure (Comparative Fit Index = 0.90; Root Mean Square Error of Approximation = 0.07), equating to five questionnaire subscales: enjoyment of Admiration, Negative Social Potency, Passivity, Prosocial Interactions and Sociability. Associations with FFM and CU traits were in line with what is seen for adult samples, providing support for the meaning of SRQ-A subscales in adolescents. In particular, adolescents with high levels of CU traits showed an ‘inverted’ pattern of social reward, in which being cruel is enjoyable and being kind is not. Gender invariance was also assessed and was partially supported. The SRQ-A is a valid, reliable measure of individual differences in social reward in adolescents. PMID:28484617
Distinct medial temporal networks encode surprise during motivation by reward versus punishment
Murty, Vishnu P.; LaBar, Kevin S.; Adcock, R. Alison
2016-01-01
Adaptive motivated behavior requires predictive internal representations of the environment, and surprising events are indications for encoding new representations of the environment. The medial temporal lobe memory system, including the hippocampus and surrounding cortex, encodes surprising events and is influenced by motivational state. Because behavior reflects the goals of an individual, we investigated whether motivational valence (i.e., pursuing rewards versus avoiding punishments) also impacts neural and mnemonic encoding of surprising events. During functional magnetic resonance imaging (fMRI), participants encountered perceptually unexpected events either during the pursuit of rewards or avoidance of punishments. Despite similar levels of motivation across groups, reward and punishment facilitated the processing of surprising events in different medial temporal lobe regions. Whereas during reward motivation, perceptual surprises enhanced activation in the hippocampus, during punishment motivation surprises instead enhanced activation in parahippocampal cortex. Further, we found that reward motivation facilitated hippocampal coupling with ventromedial PFC, whereas punishment motivation facilitated parahippocampal cortical coupling with orbitofrontal cortex. Behaviorally, post-scan testing revealed that reward, but not punishment, motivation resulted in greater memory selectivity for surprising events encountered during goal pursuit. Together these findings demonstrate that neuromodulatory systems engaged by anticipation of reward and punishment target separate components of the medial temporal lobe, modulating medial temporal lobe sensitivity and connectivity. Thus, reward and punishment motivation yield distinct neural contexts for learning, with distinct consequences for how surprises are incorporated into predictive mnemonic models of the environment. PMID:26854903
Distinct medial temporal networks encode surprise during motivation by reward versus punishment.
Murty, Vishnu P; LaBar, Kevin S; Adcock, R Alison
2016-10-01
Adaptive motivated behavior requires predictive internal representations of the environment, and surprising events are indications for encoding new representations of the environment. The medial temporal lobe memory system, including the hippocampus and surrounding cortex, encodes surprising events and is influenced by motivational state. Because behavior reflects the goals of an individual, we investigated whether motivational valence (i.e., pursuing rewards versus avoiding punishments) also impacts neural and mnemonic encoding of surprising events. During functional magnetic resonance imaging (fMRI), participants encountered perceptually unexpected events either during the pursuit of rewards or avoidance of punishments. Despite similar levels of motivation across groups, reward and punishment facilitated the processing of surprising events in different medial temporal lobe regions. Whereas during reward motivation, perceptual surprises enhanced activation in the hippocampus, during punishment motivation surprises instead enhanced activation in parahippocampal cortex. Further, we found that reward motivation facilitated hippocampal coupling with ventromedial PFC, whereas punishment motivation facilitated parahippocampal cortical coupling with orbitofrontal cortex. Behaviorally, post-scan testing revealed that reward, but not punishment, motivation resulted in greater memory selectivity for surprising events encountered during goal pursuit. Together these findings demonstrate that neuromodulatory systems engaged by anticipation of reward and punishment target separate components of the medial temporal lobe, modulating medial temporal lobe sensitivity and connectivity. Thus, reward and punishment motivation yield distinct neural contexts for learning, with distinct consequences for how surprises are incorporated into predictive mnemonic models of the environment. Copyright © 2016 Elsevier Inc. All rights reserved.
Transformational and transactional leadership: a meta-analytic test of their relative validity.
Judge, Timothy A; Piccolo, Ronald F
2004-10-01
This study provided a comprehensive examination of the full range of transformational, transactional, and laissez-faire leadership. Results (based on 626 correlations from 87 sources) revealed an overall validity of .44 for transformational leadership, and this validity generalized over longitudinal and multisource designs. Contingent reward (.39) and laissez-faire (-.37) leadership had the next highest overall relations; management by exception (active and passive) was inconsistently related to the criteria. Surprisingly, there were several criteria for which contingent reward leadership had stronger relations than did transformational leadership. Furthermore, transformational leadership was strongly correlated with contingent reward (.80) and laissez-faire (-.65) leadership. Transformational and contingent reward leadership generally predicted criteria controlling for the other leadership dimensions, although transformational leadership failed to predict leader job performance. (c) 2004 APA, all rights reserved
Elevated Striatal Reactivity Across Monetary and Social Rewards in Bipolar I Disorder
Dutra, Sunny J.; Cunningham, William A.; Kober, Hedy; Gruber, June
2016-01-01
Bipolar disorder (BD) is associated with increased reactivity to rewards and heightened positive affectivity. It is less clear to what extent this heightened reward sensitivity is evident across contexts and what the associated neural mechanisms might be. The present investigation employed both a monetary and social incentive delay task among adults with remitted BD type I (N=24) and a healthy non-psychiatric control group (HC; N=25) using fMRI. Both whole-brain and region-of-interest analyses revealed elevated ventral and dorsal striatal reactivity across monetary and social reward receipt, but not anticipation, in the BD group. Post-hoc analyses further suggested that greater striatal reactivity to reward receipt across monetary and social reward tasks predicted decreased self-reported positive affect when anticipating subsequent rewards in the HC, but not BD, group. Results point toward elevated striatal reactivity to reward receipt as a potential neural mechanism of reward reactivity. PMID:26390194
Richter, Michael
2010-05-01
Two experiments assessed the moderating impact of task context on the relationship between reward and cardiovascular response. Randomly assigned to the cells of a 2 (task context: reward vs. demand) x 2 (reward value: low vs. high) between-persons design, participants performed either a memory task with an unclear performance standard (Experiment 1) or a visual scanning task with an unfixed performance standard (Experiment 2). Before performing the task--where participants could earn either a low or a high reward--participants responded to questions about either task reward or task demand. In accordance with the theoretical predictions derived from Wright's (1996) integrative model, reactivity of pre-ejection period increased with reward value if participants had rated aspects of task reward before performing the task. If they had rated task demand, pre-ejection period did not differ as a function of reward. Copyright 2010 Elsevier B.V. All rights reserved.
RM-SORN: a reward-modulated self-organizing recurrent neural network.
Aswolinskiy, Witali; Pipa, Gordon
2015-01-01
Neural plasticity plays an important role in learning and memory. Reward-modulation of plasticity offers an explanation for the ability of the brain to adapt its neural activity to achieve a rewarded goal. Here, we define a neural network model that learns through the interaction of Intrinsic Plasticity (IP) and reward-modulated Spike-Timing-Dependent Plasticity (STDP). IP enables the network to explore possible output sequences and STDP, modulated by reward, reinforces the creation of the rewarded output sequences. The model is tested on tasks for prediction, recall, non-linear computation, pattern recognition, and sequence generation. It achieves performance comparable to networks trained with supervised learning, while using simple, biologically motivated plasticity rules, and rewarding strategies. The results confirm the importance of investigating the interaction of several plasticity rules in the context of reward-modulated learning and whether reward-modulated self-organization can explain the amazing capabilities of the brain.
Providing Extrinsic Reward for Test Performance Undermines Long-Term Memory Acquisition.
Kuhbandner, Christof; Aslan, Alp; Emmerdinger, Kathrin; Murayama, Kou
2016-01-01
Based on numerous studies showing that testing studied material can improve long-term retention more than restudying the same material, it is often suggested that the number of tests in education should be increased to enhance knowledge acquisition. However, testing in real-life educational settings often entails a high degree of extrinsic motivation of learners due to the common practice of placing important consequences on the outcome of a test. Such an effect on the motivation of learners may undermine the beneficial effects of testing on long-term memory because it has been shown that extrinsic motivation can reduce the quality of learning. To examine this issue, participants learned foreign language vocabulary words, followed by an immediate test in which one-third of the words were tested and one-third restudied. To manipulate extrinsic motivation during immediate testing, participants received either monetary reward contingent on test performance or no reward. After 1 week, memory for all words was tested. In the immediate test, reward reduced correct recall and increased commission errors, indicating that reward reduced the number of items that can benefit from successful retrieval. The results in the delayed test revealed that reward additionally reduced the gain received from successful retrieval because memory for initially successfully retrieved words was lower in the reward condition. However, testing was still more effective than restudying under reward conditions because reward undermined long-term memory for concurrently restudied material as well. These findings indicate that providing performance-contingent reward in a test can undermine long-term knowledge acquisition.
Providing Extrinsic Reward for Test Performance Undermines Long-Term Memory Acquisition
Kuhbandner, Christof; Aslan, Alp; Emmerdinger, Kathrin; Murayama, Kou
2016-01-01
Based on numerous studies showing that testing studied material can improve long-term retention more than restudying the same material, it is often suggested that the number of tests in education should be increased to enhance knowledge acquisition. However, testing in real-life educational settings often entails a high degree of extrinsic motivation of learners due to the common practice of placing important consequences on the outcome of a test. Such an effect on the motivation of learners may undermine the beneficial effects of testing on long-term memory because it has been shown that extrinsic motivation can reduce the quality of learning. To examine this issue, participants learned foreign language vocabulary words, followed by an immediate test in which one-third of the words were tested and one-third restudied. To manipulate extrinsic motivation during immediate testing, participants received either monetary reward contingent on test performance or no reward. After 1 week, memory for all words was tested. In the immediate test, reward reduced correct recall and increased commission errors, indicating that reward reduced the number of items that can benefit from successful retrieval. The results in the delayed test revealed that reward additionally reduced the gain received from successful retrieval because memory for initially successfully retrieved words was lower in the reward condition. However, testing was still more effective than restudying under reward conditions because reward undermined long-term memory for concurrently restudied material as well. These findings indicate that providing performance–contingent reward in a test can undermine long-term knowledge acquisition. PMID:26869978
The neuroscience of investing: fMRI of the reward system.
Peterson, Richard L
2005-11-15
Functional magnetic resonance imaging (fMRI) has proven a useful tool for observing neural BOLD signal changes during complex cognitive and emotional tasks. Yet the meaning and applicability of the fMRI data being gathered is still largely unknown. The brain's reward system underlies the fundamental neural processes of goal evaluation, preference formation, positive motivation, and choice behavior. fMRI technology allows researchers to dynamically visualize reward system processes. Experimenters can then correlate reward system BOLD activations with experimental behavior from carefully controlled experiments. In the SPAN lab at Stanford University, directed by Brian Knutson Ph.D., researchers have been using financial tasks during fMRI scanning to correlate emotion, behavior, and cognition with the reward system's fundamental neural activations. One goal of the SPAN lab is the development of predictive models of behavior. In this paper we extrapolate our fMRI results toward understanding and predicting individual behavior in the uncertain and high-risk environment of the financial markets. The financial market price anomalies of "value versus glamour" and "momentum" may be real-world examples of reward system activation biasing collective behavior. On the individual level, the investor's bias of overconfidence may similarly be related to reward system activation. We attempt to understand selected "irrational" investor behaviors and anomalous financial market price patterns through correlations with findings from fMRI research of the reward system.
Dynamic Sensor Tasking for Space Situational Awareness via Reinforcement Learning
NASA Astrophysics Data System (ADS)
Linares, R.; Furfaro, R.
2016-09-01
This paper studies the Sensor Management (SM) problem for optical Space Object (SO) tracking. The tasking problem is formulated as a Markov Decision Process (MDP) and solved using Reinforcement Learning (RL). The RL problem is solved using the actor-critic policy gradient approach. The actor provides a policy which is random over actions and given by a parametric probability density function (pdf). The critic evaluates the policy by calculating the estimated total reward or the value function for the problem. The parameters of the policy action pdf are optimized using gradients with respect to the reward function. Both the critic and the actor are modeled using deep neural networks (multi-layer neural networks). The policy neural network takes the current state as input and outputs probabilities for each possible action. This policy is random, and can be evaluated by sampling random actions using the probabilities determined by the policy neural network's outputs. The critic approximates the total reward using a neural network. The estimated total reward is used to approximate the gradient of the policy network with respect to the network parameters. This approach is used to find the non-myopic optimal policy for tasking optical sensors to estimate SO orbits. The reward function is based on reducing the uncertainty for the overall catalog to below a user specified uncertainty threshold. This work uses a 30 km total position error for the uncertainty threshold. This work provides the RL method with a negative reward as long as any SO has a total position error above the uncertainty threshold. This penalizes policies that take longer to achieve the desired accuracy. A positive reward is provided when all SOs are below the catalog uncertainty threshold. An optimal policy is sought that takes actions to achieve the desired catalog uncertainty in minimum time. This work trains the policy in simulation by letting it task a single sensor to "learn" from its performance. The proposed approach for the SM problem is tested in simulation and good performance is found using the actor-critic policy gradient method.
Murty, Vishnu P.; Tompary, Alexa; Adcock, R. Alison
2017-01-01
Reward motivation has been demonstrated to enhance declarative memory by facilitating systems-level consolidation. Although high-reward information is often intermixed with lower reward information during an experience, memory for high value information is prioritized. How is this selectivity achieved? One possibility is that postencoding consolidation processes bias memory strengthening to those representations associated with higher reward. To test this hypothesis, we investigated the influence of differential reward motivation on the selectivity of postencoding markers of systems-level memory consolidation. Human participants encoded intermixed, trial-unique memoranda that were associated with either high or low-value during fMRI acquisition. Encoding was interleaved with periods of rest, allowing us to investigate experience-dependent changes in connectivity as they related to later memory. Behaviorally, we found that reward motivation enhanced 24 h associative memory. Analysis of patterns of postencoding connectivity showed that, even though learning trials were intermixed, there was significantly greater connectivity with regions of high-level, category-selective visual cortex associated with high-reward trials. Specifically, increased connectivity of category-selective visual cortex with both the VTA and the anterior hippocampus predicted associative memory for high- but not low-reward memories. Critically, these results were independent of encoding-related connectivity and univariate activity measures. Thus, these findings support a model by which the selective stabilization of memories for salient events is supported by postencoding interactions with sensory cortex associated with reward. SIGNIFICANCE STATEMENT Reward motivation is thought to promote memory by supporting memory consolidation. Yet, little is known as to how brain selects relevant information for subsequent consolidation based on reward. We show that experience-dependent changes in connectivity of both the anterior hippocampus and the VTA with high-level visual cortex selectively predicts memory for high-reward memoranda at a 24 h delay. These findings provide evidence for a novel mechanism guiding the consolidation of memories for valuable events, namely, postencoding interactions between neural systems supporting mesolimbic dopamine activation, episodic memory, and perception. PMID:28100737
ERIC Educational Resources Information Center
Schweimer, Judith; Hauber, Wolfgang
2005-01-01
The anterior cingulate cortex (ACC) plays a critical role in stimulus-reinforcement learning and reward-guided selection of actions. Here we conducted a series of experiments to further elucidate the role of the ACC in instrumental behavior involving effort-based decision-making and instrumental learning guided by reward-predictive stimuli. In…
Impaired Feedback Processing for Symbolic Reward in Individuals with Internet Game Overuse
Kim, Jinhee; Kim, Hackjin; Kang, Eunjoo
2017-01-01
Reward processing, which plays a critical role in adaptive behavior, is impaired in addiction disorders, which are accompanied by functional abnormalities in brain reward circuits. Internet gaming disorder, like substance addiction, is thought to be associated with impaired reward processing, but little is known about how it affects learning, especially when feedback is conveyed by less-salient motivational events. Here, using both monetary (±500 KRW) and symbolic (Chinese characters “right” or “wrong”) rewards and penalties, we investigated whether behavioral performance and feedback-related neural responses are altered in Internet game overuse (IGO) group. Using functional MRI, brain responses for these two types of reward/penalty feedback were compared between young males with problems of IGO (IGOs, n = 18, mean age = 22.2 ± 2.0 years) and age-matched control subjects (Controls, n = 20, mean age = 21.2 ± 2.1) during a visuomotor association task where associations were learned between English letters and one of four responses. No group difference was found in adjustment of error responses following the penalty or in brain responses to penalty, for either monetary or symbolic penalties. The IGO individuals, however, were more likely to fail to choose the response previously reinforced by symbolic (but not monetary) reward. A whole brain two-way ANOVA analysis for reward revealed reduced activations in the IGO group in the rostral anterior cingulate cortex/ventromedial prefrontal cortex (vmPFC) in response to both reward types, suggesting impaired reward processing. However, the responses to reward in the inferior parietal region and medial orbitofrontal cortex/vmPFC were affected by the types of reward in the IGO group. Unlike the control group, in the IGO group the reward response was reduced only for symbolic reward, suggesting lower attentional and value processing specific to symbolic reward. Furthermore, the more severe the Internet gaming overuse symptoms in the IGO group, the greater the activations of the ventral striatum for monetary relative to symbolic reward. These findings suggest that IGO is associated with bias toward motivationally salient reward, which would lead to poor goal-directed behavior in everyday life. PMID:29051739
Robinson, Mike J F; Anselme, Patrick; Fischer, Adam M; Berridge, Kent C
2014-06-01
Uncertainty is a component of many gambling games and may play a role in incentive motivation and cue attraction. Uncertainty can increase the attractiveness for predictors of reward in the Pavlovian procedure of autoshaping, visible as enhanced sign-tracking (or approach and nibbles) by rats of a metal lever whose sudden appearance acts as a conditioned stimulus (CS+) to predict sucrose pellets as an unconditioned stimulus (UCS). Here we examined how reward uncertainty might enhance incentive salience as sign-tracking both in intensity and by broadening the range of attractive CS+s. We also examined whether initially induced uncertainty enhancements of CS+ attraction can endure beyond uncertainty itself, and persist even when Pavlovian prediction becomes 100% certain. Our results show that uncertainty can broaden incentive salience attribution to make CS cues attractive that would otherwise not be (either because they are too distal from reward or too risky to normally attract sign-tracking). In addition, uncertainty enhancement of CS+ incentive salience, once induced by initial exposure, persisted even when Pavlovian CS-UCS correlations later rose toward 100% certainty in prediction. Persistence suggests an enduring incentive motivation enhancement potentially relevant to gambling, which in some ways resembles incentive-sensitization. Higher motivation to uncertain CS+s leads to more potent attraction to these cues when they predict the delivery of uncertain rewards. In humans, those cues might possibly include the sights and sounds associated with gambling, which contribute a major component of the play immersion experienced by problematic gamblers. Copyright © 2014 Elsevier B.V. All rights reserved.
Robinson, Mike J. F.; Anselme, Patrick; Fischer, Adam M.; Berridge, Kent C.
2014-01-01
Uncertainty is a component of many gambling games and may play a role in incentive motivation and cue attraction. Uncertainty can increase the attractiveness for predictors of reward in the Pavlovian procedure of autoshaping, visible as enhanced sign-tracking (or approach and nibbles) by rats of a metal lever whose sudden appearance acts as a conditioned stimulus (CS+) to predict sucrose pellets as an unconditioned stimulus (UCS). Here we examined how reward uncertainty might enhance incentive salience as sign-tracking both in intensity and by broadening the range of attractive CS+s. We also examined whether initially-induced uncertainty enhancements of CS+ attraction can endure beyond uncertainty itself, and persist even when Pavlovian prediction becomes 100% certain. Our results show that uncertainty can broaden incentive salience attribution to make CS cues attractive that would otherwise not be (either because they are too distal from reward or too risky to normally attract sign-tracking). In addition, uncertainty enhancement of CS+ incentive salience, once induced by initial exposure, persisted even when Pavlovian CS-UCS correlations later rose toward 100% certainty in prediction. Persistence suggests an enduring incentive motivation enhancement potentially relevant to gambling, which in some ways resembles incentive-sensitization. Higher motivation to uncertain CS+s leads to more potent attraction to these cues when they predict the delivery of uncertain rewards. In humans, those cues might possibly include the sights and sounds associated with gambling, which contribute a major component of the play immersion experienced by problematic gamblers. PMID:24631397
Vaidya, Avinash R; Fellows, Lesley K
2016-09-21
Real-world decisions are typically made between options that vary along multiple dimensions, requiring prioritization of the important dimensions to support optimal choice. Learning in this setting depends on attributing decision outcomes to the dimensions with predictive relevance rather than to dimensions that are irrelevant and nonpredictive. This attribution problem is computationally challenging, and likely requires an interplay between selective attention and reward learning. Both these processes have been separately linked to the prefrontal cortex, but little is known about how they combine to support learning the reward value of multidimensional stimuli. Here, we examined the necessary contributions of frontal lobe subregions in attributing feedback to relevant and irrelevant dimensions on a trial-by-trial basis in humans. Patients with focal frontal lobe damage completed a demanding reward learning task where options varied on three dimensions, only one of which predicted reward. Participants with left lateral frontal lobe damage attributed rewards to irrelevant dimensions, rather than the relevant dimension. Damage to the ventromedial frontal lobe also impaired learning about the relevant dimension, but did not increase reward attribution to irrelevant dimensions. The results argue for distinct roles for these two regions in learning the value of multidimensional decision options under dynamic conditions, with the lateral frontal lobe required for selecting the relevant dimension to associate with reward, and the ventromedial frontal lobe required to learn the reward association itself. The real world is complex and multidimensional; how do we attribute rewards to predictive features when surrounded by competing cues? Here, we tested the critical involvement of human frontal lobe subregions in a probabilistic, multidimensional learning environment, asking whether focal lesions affected trial-by-trial attribution of feedback to relevant and irrelevant dimensions. The left lateral frontal lobe was required for filtering option dimensions to allow appropriate feedback attribution, while the ventromedial frontal lobe was necessary for learning the value of features in the relevant dimension. These findings argue that selective attention and associative learning processes mediated by anatomically distinct frontal lobe subregions are both critical for adaptive choice in more complex, ecologically valid settings. Copyright © 2016 the authors 0270-6474/16/369843-16$15.00/0.
Stress in nurses: stress-related affect and its determinants examined over the nursing day.
Johnston, Derek W; Jones, Martyn C; Charles, Kathryn; McCann, Sharon K; McKee, Lorna
2013-06-01
Nurses are a stressed group and this may affect their health and work performance. The determinants of occupational stress in nurses and other occupational groups have almost invariably been examined in between subject studies. This study aimed to determine if the main determinants of occupation stress, i.e. demand, control, effort and reward, operate within nurses. A real time study using personal digital-assistant-based ecological momentary assessment to measure affect and its hypothesised determinants every 90 min in 254 nurses over three nursing shifts. The measures were negative affect, positive affect, demand/effort, control and reward. While the effects varied in magnitude between people, in general increased negative affect was predicted by high demand/effort, low control and low reward. Control and reward moderated the effects of demand/effort. High positive affect was predicted by high demand/effort, control and reward. The same factors are associated with variations in stress-related affect within nurses as between.
Yang, Xin-Hua; Huang, Jia; Zhu, Cui-Ying; Wang, Ye-Fei; Cheung, Eric F C; Chan, Raymond C K; Xie, Guang-Rong
2014-12-30
Anhedonia is a hallmark symptom of major depressive disorder (MDD). Preliminary findings suggest that anhedonia is characterized by reduced reward anticipation and motivation of obtaining reward. However, relatively little is known about reward-based decision-making in depression. We tested the hypothesis that anhedonia in MDD may reflect specific impairments in motivation on reward-based decision-making and the deficits might be associated with depressive symptoms severity. In study 1, individuals with and without depressive symptoms performed the modified version of the Effort Expenditure for Rewards Task (EEfRT), a behavioral measure of cost/benefit decision-making. In study 2, MDD patients, remitted MDD patients and healthy controls were recruited for the same procedures. We found evidence for decreased willingness to make effort for rewards among individuals with subsyndromal depression; the effect was amplified in MDD patients, but dissipated in patients with remitted depression. We also found that reduced anticipatory and consummatory pleasure predicted decreased willingness to expend efforts to obtain rewards in MDD patients. For individuals with subsyndromal depression, the impairments were correlated with anticipatory anhedonia but not consummatory anhedonia. These data offer novel evidence that motivational deficits in MDD are correlated with depression severity and predicted by self-reported anhedonia. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Reward uncertainty enhances incentive salience attribution as sign-tracking
Anselme, Patrick; Robinson, Mike J. F.; Berridge, Kent C.
2014-01-01
Conditioned stimuli (CSs) come to act as motivational magnets following repeated association with unconditioned stimuli (UCSs) such as sucrose rewards. By traditional views, the more reliably predictive a Pavlovian CS-UCS association, the more the CS becomes attractive. However, in some cases, less predictability might equal more motivation. Here we examined the effect of introducing uncertainty in CS-UCS association on CS strength as an attractive motivation magnet. In the present study, Experiment 1 assessed the effects of Pavlovian predictability versus uncertainty about reward probability and/or reward magnitude on the acquisition and expression of sign-tracking (ST) and goal-tracking (GT) responses in an autoshaping procedure. Results suggested that uncertainty produced strongest incentive salience expressed as sign-tracking. Experiment 2 examined whether a within-individual temporal shift from certainty to uncertainty conditions could produce a stronger CS motivational magnet when uncertainty began, and found that sign-tracking still increased after the shift. Overall, our results support earlier reports that ST responses become more pronounced in the presence of uncertainty regarding CS-UCS associations, especially when uncertainty combines both probability and magnitude. These results suggest that Pavlovian uncertainty, although diluting predictability, is still able to enhance the incentive motivational power of particular CSs. PMID:23078951
Walsh, Erin; Carl, Hannah; Eisenlohr-Moul, Tory; Minkel, Jared; Crowther, Andrew; Moore, Tyler; Gibbs, Devin; Petty, Chris; Bizzell, Josh; Smoski, Moria J; Dichter, Gabriel S
2017-03-01
There are few reliable predictors of response to antidepressant treatments. In the present investigation, we examined pretreatment functional brain connectivity during reward processing as a potential predictor of response to Behavioral Activation Treatment for Depression (BATD), a validated psychotherapy that promotes engagement with rewarding stimuli and reduces avoidance behaviors. Thirty-three outpatients with major depressive disorder (MDD) and 20 matched controls completed two runs of the monetary incentive delay task during functional magnetic resonance imaging after which participants with MDD received up to 15 sessions of BATD. Seed-based generalized psychophysiological interaction analyses focused on task-based connectivity across task runs, as well as the attenuation of connectivity from the first to the second run of the task. The average change in Beck Depression Inventory-II scores due to treatment was 10.54 points, a clinically meaningful response. Groups differed in seed-based functional connectivity among multiple frontostriatal regions. Hierarchical linear modeling revealed that improved treatment response to BATD was predicted by greater connectivity between the left putamen and paracingulate gyrus during reward anticipation. In addition, MDD participants with greater attenuation of connectivity between several frontostriatal seeds, and midline subcallosal cortex and left paracingulate gyrus demonstrated improved response to BATD. These findings indicate that pretreatment frontostriatal functional connectivity during reward processing is predictive of response to a psychotherapy modality that promotes improving approach-related behaviors in MDD. Furthermore, connectivity attenuation among reward-processing regions may be a particularly powerful endophenotypic predictor of response to BATD in MDD.
Koenig, Stephan; Kadel, Hanna; Uengoer, Metin; Schubö, Anna; Lachnit, Harald
2017-01-01
Stimuli in our sensory environment differ with respect to their physical salience but moreover may acquire motivational salience by association with reward. If we repeatedly observed that reward is available in the context of a particular cue but absent in the context of another cue the former typically attracts more attention than the latter. However, we also may encounter cues uncorrelated with reward. A cue with 50% reward contingency may induce an average reward expectancy but at the same time induces high reward uncertainty. In the current experiment we examined how both values, reward expectancy and uncertainty, affected overt attention. Two different colors were established as predictive cues for low reward and high reward respectively. A third color was followed by high reward on 50% of the trials and thus induced uncertainty. Colors then were introduced as distractors during search for a shape target, and we examined the relative potential of the color distractors to capture and hold the first fixation. We observed that capture frequency corresponded to reward expectancy while capture duration corresponded to uncertainty. The results may suggest that within trial reward expectancy is represented at an earlier time window than uncertainty. PMID:28744206
The human orbitofrontal cortex monitors outcomes even when no reward is at stake.
Schnider, Armin; Treyer, Valerie; Buck, Alfred
2005-01-01
The orbitofrontal cortex (OFC) processes the occurrence or omission of anticipated rewards, but clinical evidence suggests that it might serve as a generic outcome monitoring system, independent of tangible reward. In this positron emission tomography (PET) study, normal human subjects performed a series of tasks in which they simply had to predict behind which one of two colored rectangles a drawing of an object was hidden. While all tasks involved anticipation in that they had an expectation phase between the subject's prediction and the presentation of the outcome, they varied with regards to the uncertainty of outcome. No comment on the correctness of the prediction, no record of ongoing performance, and no reward, not even a score, was provided. Nonetheless, we found strong activation of the OFC: in comparison with a baseline task, the left anterior medial OFC showed activation in all conditions, indicating a basic role in anticipation; the left posterior OFC was activated in all tasks with some uncertainty of outcome, suggesting a role in the monitoring of outcomes; the right medial OFC showed activation exclusively during guessing. The data indicate a generic role of the human OFC, with some topical specificity, in the generation of hypotheses and processing of outcomes, independent of the presence of explicit reward.
San Martín, René; Appelbaum, Lawrence G; Huettel, Scott A; Woldorff, Marty G
2016-01-01
Adaptive choice behavior depends critically on identifying and learning from outcome-predicting cues. We hypothesized that attention may be preferentially directed toward certain outcome-predicting cues. We studied this possibility by analyzing event-related potential (ERP) responses in humans during a probabilistic decision-making task. Participants viewed pairs of outcome-predicting visual cues and then chose to wager either a small (i.e., loss-minimizing) or large (i.e., gain-maximizing) amount of money. The cues were bilaterally presented, which allowed us to extract the relative neural responses to each cue by using a contralateral-versus-ipsilateral ERP contrast. We found an early lateralized ERP response, whose features matched the attention-shift-related N2pc component and whose amplitude scaled with the learned reward-predicting value of the cues as predicted by an attention-for-reward model. Consistently, we found a double dissociation involving the N2pc. Across participants, gain-maximization positively correlated with the N2pc amplitude to the most reliable gain-predicting cue, suggesting an attentional bias toward such cues. Conversely, loss-minimization was negatively correlated with the N2pc amplitude to the most reliable loss-predicting cue, suggesting an attentional avoidance toward such stimuli. These results indicate that learned stimulus-reward associations can influence rapid attention allocation, and that differences in this process are associated with individual differences in economic decision-making performance. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Murayama, Kou; Kitagami, Shinji
2014-02-01
Recent research suggests that extrinsic rewards promote memory consolidation through dopaminergic modulation processes. However, no conclusive behavioral evidence exists given that the influence of extrinsic reward on attention and motivation during encoding and consolidation processes are inherently confounded. The present study provides behavioral evidence that extrinsic rewards (i.e., monetary incentives) enhance human memory consolidation independently of attention and motivation. Participants saw neutral pictures, followed by a reward or control cue in an unrelated context. Our results (and a direct replication study) demonstrated that the reward cue predicted a retrograde enhancement of memory for the preceding neutral pictures. This retrograde effect was observed only after a delay, not immediately upon testing. An additional experiment showed that emotional arousal or unconscious resource mobilization cannot explain the retrograde enhancement effect. These results provide support for the notion that the dopaminergic memory consolidation effect can result from extrinsic reward.
Strategic attention deployment for delay of gratification in working and waiting situations.
Peake, Philip K; Mischel, Walter; Hebl, Michelle
2002-03-01
Two studies examined whether the detrimental effects of attention to rewards on delay of gratification in waiting situations holds-or reverses-in working situations. In Study 1, preschoolers waited or worked for desired delayed rewards. Delay times increased when children worked in the presence of rewards but, as predicted, this increase was due to the distraction provided by the work itself. not because attention to rewards motivated children to sustain work. Analysis of spontaneous attention deployment showed that attending to rewards reduces delay time regardless of the working or waiting nature of the task. Fixing attention on rewards was a particularly detrimental strategy regardless of the type of task. Study 2 showed that when the work is not engaging, however, attention to rewards can motivate instrumental work and facilitate delay of gratification as long as attention deployment does not become fixed on the rewards.
ERIC Educational Resources Information Center
Kinman, Gail
2016-01-01
This study utilises the effort-reward imbalance (ERI) model of job stress to predict several indices of well-being in academics in the UK: mental ill health, job satisfaction and leaving intentions. This model posits that (a) employees who believe that their efforts are not counterbalanced by sufficient rewards will experience impaired well-being…
Niedhammer, I; Siegrist, J
1998-11-01
The effect of psychosocial factors at work on health, especially cardiovascular health, has given rise to growing concern in occupational epidemiology over the last few years. Two theoretical models, Karasek's model and the Effort-Reward Imbalance model, have been developed to evaluate psychosocial factors at work within specific conceptual frameworks in an attempt to take into account the serious methodological difficulties inherent in the evaluation of such factors. Karasek's model, the most widely used model, measures three factors: psychological demands, decision latitude and social support at work. Many studies have shown the predictive effects of these factors on cardiovascular diseases independently of well-known cardiovascular risk factors. More recently, the Effort-Reward Imbalance model takes into account the role of individual coping characteristics which was neglected in the Karasek model. The effort-reward imbalance model focuses on the reciprocity of exchange in occupational life where high-cost/low-gain conditions are considered particularly stressful. Three dimensions of rewards are distinguished: money, esteem and gratifications in terms of promotion prospects and job security. Some studies already support that high-effort/low reward-conditions are predictive of cardiovascular diseases.
Segarra, Nuria; Metastasio, Antonio; Ziauddeen, Hisham; Spencer, Jennifer; Reinders, Niels R; Dudas, Robert B; Arrondo, Gonzalo; Robbins, Trevor W; Clark, Luke; Fletcher, Paul C; Murray, Graham K
2016-07-01
Alterations in reward processes may underlie motivational and anhedonic symptoms in depression and schizophrenia. However it remains unclear whether these alterations are disorder-specific or shared, and whether they clearly relate to symptom generation or not. We studied brain responses to unexpected rewards during a simulated slot-machine game in 24 patients with depression, 21 patients with schizophrenia, and 21 healthy controls using functional magnetic resonance imaging. We investigated relationships between brain activation, task-related motivation, and questionnaire rated anhedonia. There was reduced activation in the orbitofrontal cortex, ventral striatum, inferior temporal gyrus, and occipital cortex in both depression and schizophrenia in comparison with healthy participants during receipt of unexpected reward. In the medial prefrontal cortex both patient groups showed reduced activation, with activation significantly more abnormal in schizophrenia than depression. Anterior cingulate and medial frontal cortical activation predicted task-related motivation, which in turn predicted anhedonia severity in schizophrenia. Our findings provide evidence for overlapping hypofunction in ventral striatal and orbitofrontal regions in depression and schizophrenia during unexpected reward receipt, and for a relationship between unexpected reward processing in the medial prefrontal cortex and the generation of motivational states.
Segarra, Nuria; Metastasio, Antonio; Ziauddeen, Hisham; Spencer, Jennifer; Reinders, Niels R; Dudas, Robert B; Arrondo, Gonzalo; Robbins, Trevor W; Clark, Luke; Fletcher, Paul C; Murray, Graham K
2016-01-01
Alterations in reward processes may underlie motivational and anhedonic symptoms in depression and schizophrenia. However it remains unclear whether these alterations are disorder-specific or shared, and whether they clearly relate to symptom generation or not. We studied brain responses to unexpected rewards during a simulated slot-machine game in 24 patients with depression, 21 patients with schizophrenia, and 21 healthy controls using functional magnetic resonance imaging. We investigated relationships between brain activation, task-related motivation, and questionnaire rated anhedonia. There was reduced activation in the orbitofrontal cortex, ventral striatum, inferior temporal gyrus, and occipital cortex in both depression and schizophrenia in comparison with healthy participants during receipt of unexpected reward. In the medial prefrontal cortex both patient groups showed reduced activation, with activation significantly more abnormal in schizophrenia than depression. Anterior cingulate and medial frontal cortical activation predicted task-related motivation, which in turn predicted anhedonia severity in schizophrenia. Our findings provide evidence for overlapping hypofunction in ventral striatal and orbitofrontal regions in depression and schizophrenia during unexpected reward receipt, and for a relationship between unexpected reward processing in the medial prefrontal cortex and the generation of motivational states. PMID:26708106
Mason, Ashley E.; Laraia, Barbara; Daubenmier, Jennifer; Hecht, Frederick M.; Lustig, Robert H.; Puterman, Eli; Adler, Nancy; Dallman, Mary; Kiernan, Michaela; Gearhardt, Ashley N.; Epel, Elissa S.
2015-01-01
Purpose Obese individuals vary in their experience of food cravings and tendency to engage in reward-driven eating, both of which can be modulated by the neural reward system rather than physiological hunger. We examined two predictions in a sample of obese women: (1) whether opioidergic blockade reduced food-craving intensity, and (2) whether opioidergic blockade reduced an association between food-craving intensity and reward-driven eating, which is a trait-like index of three factors (lack of control over eating, lack of satiation, preoccupation with food). Methods Forty-four obese, pre-menopausal women completed the Reward-based Eating Drive (RED) scale at study start and daily food-craving intensity on 5 days on which they ingested either a pill-placebo (2 days), a 25mg naltrexone dose (1 day), or a standard 50mg naltrexone dose (2 days). Results Craving intensity was similar under naltrexone and placebo doses. The association between food-craving intensity and reward-driven eating significantly differed between placebo and 50mg naltrexone doses. Reward-driven eating and craving intensity were significantly positively associated under both placebo doses. As predicted, opioidergic blockade (for both doses 25mg and 50mg naltrexone) reduced this positive association between reward-driven eating and craving intensity to non-significance. Conclusions Opioidergic blockade did not reduce craving intensity; however, blockade reduced an association between trait-like reward-driven eating and daily food-craving intensity, and may help identify an important endophenotype within obesity. PMID:26164674
Smith, Aaron P; Hofford, Rebecca S; Zentall, Thomas R; Beckmann, Joshua S
2018-05-01
Laboratory experiments often model risk through a choice between a large, uncertain (LU) reward against a small, certain (SC) reward as an index of an individual's risk tolerance. An important factor generally lacking from these procedures are reward-associated cues that may modulate risk preferences. We tested whether the addition of cues signaling 'jackpot' wins to LU choices would modulate risk preferences and if these cue effects were mediated by dopaminergic signaling. Three groups of rats chose between LU and SC rewards for which the LU probability of reward decreased across blocks. The unsignaled group received a non-informative stimulus of trial outcome. The signaled group received a jackpot signal prior to reward delivery and blackout on losses. The signaled-light group received a similar jackpot for wins, but a salient loss signal distinct from the win signal. Presenting win signals decreased the discounting of LU value for both signaled groups regardless of loss signal, while the unsignaled group showed discounting similar to previous research without cues. Pharmacological challenges with D1/D2 agonists and antagonists revealed that D1 antagonism increased and decreased sensitives to the relative probability of reward for unsignaled and signaled groups, respectively, while D2 agonists decreased sensitivities to the relative magnitude of reward. The results highlight how signals predictive of wins can promote maladaptive risk taking in individuals, while loss signals have reduced effect. Additionally, the presence of reward-predictive cues may change the underlying neurobehavioral mechanisms mediating decision-making under risk.
Brain substrates of reward processing and the μ-opioid receptor: a pathway into pain?
Nees, Frauke; Becker, Susanne; Millenet, Sabina; Banaschewski, Tobias; Poustka, Luise; Bokde, Arun; Bromberg, Uli; Büchel, Christian; Conrod, Patricia J; Desrivières, Sylvane; Frouin, Vincent; Gallinat, Jürgen; Garavan, Hugh; Heinz, Andreas; Ittermann, Bernd; Martinot, Jean-Luc; Papadopoulos Orfanos, Dimitri; Paus, Tomáš; Smolka, Michael N; Walter, Henrik; Whelan, Rob; Schumann, Gunter; Flor, Herta
2017-02-01
The processing of reward and reinforcement learning seems to be important determinants of pain chronicity. However, reward processing is already altered early in life and if this is related to the development of pain symptoms later on is not known. The aim of this study was first to examine whether behavioural and brain-related indicators of reward processing at the age of 14 to 15 years are significant predictors of pain complaints 2 years later, at 16 to 17 years. Second, we investigated the contribution of genetic variations in the opioidergic system, which is linked to the processing of both, reward and pain, to this prediction. We used the monetary incentive delay task to assess reward processing, the Children's Somatization Inventory as measure of pain complaints and tested the effects of 2 single nucleotide polymorphisms (rs1799971/rs563649) of the human μ-opioid receptor gene. We found a significant prediction of pain complaints by responses in the dorsal striatum during reward feedback, independent of genetic predisposition. The relationship of pain complaints and activation in the periaqueductal gray and ventral striatum depended on the T-allele of rs563649. Carriers of this allele also showed more pain complaints than CC-allele carriers. Therefore, brain responses to reward outcomes and higher sensitivity to pain might be related already early in life and may thus set the course for pain complaints later in life, partly depending on a specific opioidergic genetic predisposition.
Elevated striatal reactivity across monetary and social rewards in bipolar I disorder.
Dutra, Sunny J; Cunningham, William A; Kober, Hedy; Gruber, June
2015-11-01
Bipolar disorder (BD) is associated with increased reactivity to rewards and heightened positive affectivity. It is less clear to what extent this heightened reward sensitivity is evident across contexts and what the associated neural mechanisms might be. The present investigation used both a monetary and social incentive delay task among adults with remitted BD Type I (n = 24) and a healthy nonpsychiatric control group (HC; n = 25) using fMRI. Both whole-brain and region-of-interest analyses revealed elevated reactivity to reward receipt in the striatum, a region implicated in incentive sensitivity, in the BD group. Post hoc analyses revealed that greater striatal reactivity to reward receipt, across monetary and social reward tasks, predicted decreased self-reported positive affect when anticipating subsequent rewards in the HC but not in the BD group. Results point toward elevated striatal reactivity to reward receipt as a potential neural mechanism of persistent reward pursuit in BD. (c) 2015 APA, all rights reserved).
Salama, Aallaa; Gründer, Gerhard; Spreckelmeyer, Katja N.
2014-01-01
Recent studies have reported inconsistent results regarding the loss of reward sensitivity in the aging brain. Although such an age effect might be due to a decline of physiological processes, it may also be a consequence of age-related changes in motivational preference for different rewards. Here, we examined whether the age effects on neural correlates of reward anticipation are modulated by the type of expected reward. Functional magnetic resonance images were acquired in 24 older (60–78 years) and 24 young participants (20–28 years) while they performed an incentive delay task offering monetary or social rewards. Anticipation of either reward type recruited brain structures associated with reward, including the nucleus accumbens (NAcc). Region of interest analysis revealed an interaction effect of reward type and age group in the right NAcc: enhanced activation to cues of social reward was detected in the older subsample while enhanced activation to cues of monetary reward was detected in the younger subsample. Our results suggest that neural sensitivity to reward-predicting cues does not generally decrease with age. Rather, neural responses in the NAcc appear to be modulated by the type of reward, presumably reflecting age-related changes in motivational value attributed to different types of reward. PMID:23547243
Della Libera, Chiara; Calletti, Riccardo; Eštočinová, Jana; Chelazzi, Leonardo; Santandrea, Elisa
2017-04-01
Recent evidence indicates that the attentional priority of objects and locations is altered by the controlled delivery of reward, reflecting reward-based attentional learning. Here, we take an approach hinging on intersubject variability to probe the neurobiological bases of the reward-driven plasticity of spatial priority maps. Specifically, we ask whether an individual's susceptibility to the reward-based treatment can be accounted for by specific predictors, notably personality traits that are linked to reward processing (along with more general personality traits), but also gender. Using a visual search protocol, we show that when different target locations are associated with unequal reward probability, different priorities are acquired by the more rewarded relative to the less rewarded locations. However, while males exhibit the expected pattern of results, with greater priority for locations associated with higher reward, females show an opposite trend. Critically, both the extent and the direction of reward-based adjustments are further predicted by personality traits indexing reward sensitivity, indicating that not only male and female brains are differentially sensitive to reward, but also that specific personality traits further contribute to shaping their learning-dependent attentional plasticity. These results contribute to a better understanding of the neurobiology underlying reward-dependent attentional learning and cross-subject variability in this domain.
Tosun, Tuğçe; Gür, Ezgi; Balcı, Fuat
2016-01-01
Animals can shape their timed behaviors based on experienced probabilistic relations in a nearly optimal fashion. On the other hand, it is not clear if they adopt these timed decisions by making computations based on previously learnt task parameters (time intervals, locations, and probabilities) or if they gradually develop their decisions based on trial and error. To address this question, we tested mice in the timed-switching task, which required them to anticipate when (after a short or long delay) and at which of the two delay locations a reward would be presented. The probability of short trials differed between test groups in two experiments. Critically, we first trained mice on relevant task parameters by signaling the active trial with a discriminative stimulus and delivered the corresponding reward after the associated delay without any response requirement (without inducing switching behavior). During the test phase, both options were presented simultaneously to characterize the emergence and temporal characteristics of the switching behavior. Mice exhibited timed-switching behavior starting from the first few test trials, and their performance remained stable throughout testing in the majority of the conditions. Furthermore, as the probability of the short trial increased, mice waited longer before switching from the short to long location (experiment 1). These behavioral adjustments were in directions predicted by reward maximization. These results suggest that rather than gradually adjusting their time-dependent choice behavior, mice abruptly adopted temporal decision strategies by directly integrating their previous knowledge of task parameters into their timed behavior, supporting the model-based representational account of temporal risk assessment. PMID:26733674
Finger, Elizabeth C; Marsh, Abigail A; Blair, Karina S; Reid, Marguerite E; Sims, Courtney; Ng, Pamela; Pine, Daniel S; Blair, R James R
2011-02-01
Dysfunction in the amygdala and orbitofrontal cortex has been reported in youths and adults with psychopathic traits. The specific nature of the functional irregularities within these structures remains poorly understood. The authors used a passive avoidance task to examine the responsiveness of these systems to early stimulus-reinforcement exposure, when prediction errors are greatest and learning maximized, and to reward in youths with psychopathic traits and comparison youths. While performing the passive avoidance learning task, 15 youths with conduct disorder or oppositional defiant disorder plus a high level of psychopathic traits and 15 healthy subjects completed a 3.0-T fMRI scan. Relative to the comparison youths, the youths with a disruptive behavior disorder plus psychopathic traits showed less orbitofrontal responsiveness both to early stimulus-reinforcement exposure and to rewards, as well as less caudate response to early stimulus-reinforcement exposure. There were no group differences in amygdala responsiveness to these two task measures, but amygdala responsiveness throughout the task was lower in the youths with psychopathic traits. Compromised sensitivity to early reinforcement information in the orbitofrontal cortex and caudate and to reward outcome information in the orbitofrontal cortex of youths with conduct disorder or oppositional defiant disorder plus psychopathic traits suggests that the integrated functioning of the amygdala, caudate, and orbitofrontal cortex may be disrupted. This provides a functional neural basis for why such youths are more likely to repeat disadvantageous decisions. New treatment possibilities are raised, as pharmacologic modulations of serotonin and dopamine can affect this form of learning.
Neurocultural evidence that ideal affect match promotes giving
Park, BoKyung; Blevins, Elizabeth; Knutson, Brian
2017-01-01
Abstract Why do people give to strangers? We propose that people trust and give more to those whose emotional expressions match how they ideally want to feel (“ideal affect match”). European Americans and Koreans played multiple trials of the Dictator Game with recipients who varied in emotional expression (excited, calm), race (White, Asian) and sex (male, female). Consistent with their culture’s valued affect, European Americans trusted and gave more to excited than calm recipients, whereas Koreans trusted and gave more to calm than excited recipients. These findings held regardless of recipient race and sex. We then used fMRI to probe potential affective and mentalizing mechanisms. Increased activity in the nucleus accumbens (associated with reward anticipation) predicted giving, as did decreased activity in the right temporo-parietal junction (rTPJ; associated with reduced belief prediction error). Ideal affect match decreased rTPJ activity, suggesting that people may trust and give more to strangers whom they perceive to share their affective values. PMID:28379542
Context-sensitivity of the feedback-related negativity for zero-value feedback outcomes.
Pfabigan, Daniela M; Seidel, Eva-Maria; Paul, Katharina; Grahl, Arvina; Sailer, Uta; Lanzenberger, Rupert; Windischberger, Christian; Lamm, Claus
2015-01-01
The present study investigated whether the same visual stimulus indicating zero-value feedback (€0) elicits feedback-related negativity (FRN) variation, depending on whether the outcomes correspond with expectations or not. Thirty-one volunteers performed a monetary incentive delay (MID) task while EEG was recorded. FRN amplitudes were comparable and more negative when zero-value outcome deviated from expectations than with expected gain or loss, supporting theories emphasising the impact of unexpectedness and salience on FRN amplitudes. Surprisingly, expected zero-value outcomes elicited the most negative FRNs. However, source localisation showed that such outcomes evoked less activation in cingulate areas than unexpected zero-value outcomes. Our study illustrates the context dependency of identical zero-value feedback stimuli. Moreover, the results indicate that the incentive cues in the MID task evoke different reward prediction error signals. These prediction signals differ in FRN amplitude and neuronal sources, and have to be considered in the design and interpretation of future studies. Copyright © 2014 Elsevier B.V. All rights reserved.
Neurocultural evidence that ideal affect match promotes giving.
Park, BoKyung; Blevins, Elizabeth; Knutson, Brian; Tsai, Jeanne L
2017-07-01
Why do people give to strangers? We propose that people trust and give more to those whose emotional expressions match how they ideally want to feel ("ideal affect match"). European Americans and Koreans played multiple trials of the Dictator Game with recipients who varied in emotional expression (excited, calm), race (White, Asian) and sex (male, female). Consistent with their culture's valued affect, European Americans trusted and gave more to excited than calm recipients, whereas Koreans trusted and gave more to calm than excited recipients. These findings held regardless of recipient race and sex. We then used fMRI to probe potential affective and mentalizing mechanisms. Increased activity in the nucleus accumbens (associated with reward anticipation) predicted giving, as did decreased activity in the right temporo-parietal junction (rTPJ; associated with reduced belief prediction error). Ideal affect match decreased rTPJ activity, suggesting that people may trust and give more to strangers whom they perceive to share their affective values. © The Author (2017). Published by Oxford University Press.