Valenchon, Mathilde; Lévy, Frédéric; Moussu, Chantal; Lansade, Léa
2017-01-01
The present study investigated how stress affects instrumental learning performance in horses (Equus caballus) depending on the type of reinforcement. Horses were assigned to four groups (N = 15 per group); each group received training with negative or positive reinforcement in the presence or absence of stressors unrelated to the learning task. The instrumental learning task consisted of the horse entering one of two compartments at the appearance of a visual signal given by the experimenter. In the absence of stressors unrelated to the task, learning performance did not differ between negative and positive reinforcements. The presence of stressors unrelated to the task (exposure to novel and sudden stimuli) impaired learning performance. Interestingly, this learning deficit was smaller when the negative reinforcement was used. The negative reinforcement, considered as a stressor related to the task, could have counterbalanced the impact of the extrinsic stressor by focusing attention toward the learning task. In addition, learning performance appears to differ between certain dimensions of personality depending on the presence of stressors and the type of reinforcement. These results suggest that when negative reinforcement is used (i.e. stressor related to the task), the most fearful horses may be the best performers in the absence of stressors but the worst performers when stressors are present. On the contrary, when positive reinforcement is used, the most fearful horses appear to be consistently the worst performers, with and without exposure to stressors unrelated to the learning task. This study is the first to demonstrate in ungulates that stress affects learning performance differentially according to the type of reinforcement and in interaction with personality. It provides fundamental and applied perspectives in the understanding of the relationships between personality and training abilities. PMID:28475581
Reinforcement learning in computer vision
NASA Astrophysics Data System (ADS)
Bernstein, A. V.; Burnaev, E. V.
2018-04-01
Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computer vision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computer vision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computer vision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computer vision problems.
Apprenticeship Learning: Learning to Schedule from Human Experts
2016-06-09
approaches to learning such models are based on Markov models, such as reinforcement learning or inverse reinforcement learning (Busoniu, Babuska, and De...via inverse reinforcement learning. In ICML. Barto, A. G., and Mahadevan, S. 2003. Recent advances in hierarchical reinforcement learning. Discrete...of tasks with temporal constraints. In Proc. AAAI, 2110–2116. Odom, P., and Natarajan, S. 2015. Active advice seeking for inverse reinforcement
2014-09-29
Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity , and Performance W. Bradley Knox...positive a trainer’s reward values are; temporal discounting, the extent to which future reward is discounted in value; episodicity , whether task...learning occurs in discrete learning episodes instead of one continuing session; and task performance, the agent’s performance on the task the trainer
Therrien, Amanda S; Wolpert, Daniel M; Bastian, Amy J
2016-01-01
Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. © The Author (2015). Published by Oxford University Press on behalf of the Guarantors of Brain.
Therrien, Amanda S.; Wolpert, Daniel M.
2016-01-01
Abstract See Miall and Galea (doi: 10.1093/awv343 ) for a scientific commentary on this article. Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. PMID:26626368
Markou, Athina; Salamone, John D; Bussey, Timothy J; Mar, Adam C; Brunner, Daniela; Gilmour, Gary; Balsam, Peter
2013-11-01
The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu) meeting. A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. Copyright © 2013 Elsevier Ltd. All rights reserved.
Markou, Athina; Salamone, John D.; Bussey, Timothy; Mar, Adam; Brunner, Daniela; Gilmour, Gary; Balsam, Peter
2013-01-01
The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu). A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. PMID:23994273
When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition
ERIC Educational Resources Information Center
Janssen, Christian P.; Gray, Wayne D.
2012-01-01
Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other…
Negative reinforcement learning is affected in substance dependence.
Thompson, Laetitia L; Claus, Eric D; Mikulich-Gilbertson, Susan K; Banich, Marie T; Crowley, Thomas; Krmpotich, Theodore; Miller, David; Tanabe, Jody
2012-06-01
Negative reinforcement results in behavior to escape or avoid an aversive outcome. Withdrawal symptoms are purported to be negative reinforcers in perpetuating substance dependence, but little is known about negative reinforcement learning in this population. The purpose of this study was to examine reinforcement learning in substance dependent individuals (SDI), with an emphasis on assessing negative reinforcement learning. We modified the Iowa Gambling Task to separately assess positive and negative reinforcement. We hypothesized that SDI would show differences in negative reinforcement learning compared to controls and we investigated whether learning differed as a function of the relative magnitude or frequency of the reinforcer. Thirty subjects dependent on psychostimulants were compared with 28 community controls on a decision making task that manipulated outcome frequencies and magnitudes and required an action to avoid a negative outcome. SDI did not learn to avoid negative outcomes to the same degree as controls. This difference was driven by the magnitude, not the frequency, of negative feedback. In contrast, approach behaviors in response to positive reinforcement were similar in both groups. Our findings are consistent with a specific deficit in negative reinforcement learning in SDI. SDI were relatively insensitive to the magnitude, not frequency, of loss. If this generalizes to drug-related stimuli, it suggests that repeated episodes of withdrawal may drive relapse more than the severity of a single episode. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Effect of reinforcement learning on coordination of multiangent systems
NASA Astrophysics Data System (ADS)
Bukkapatnam, Satish T. S.; Gao, Greg
2000-12-01
For effective coordination of distributed environments involving multiagent systems, learning ability of each agent in the environment plays a crucial role. In this paper, we develop a simple group learning method based on reinforcement, and study its effect on coordination through application to a supply chain procurement scenario involving a computer manufacturer. Here, all parties are represented by self-interested, autonomous agents, each capable of performing specific simple tasks. They negotiate with each other to perform complex tasks and thus coordinate supply chain procurement. Reinforcement learning is intended to enable each agent to reach a best negotiable price within a shortest possible time. Our simulations of the application scenario under different learning strategies reveals the positive effects of reinforcement learning on an agent's as well as the system's performance.
11.2 YIP Human In the Loop Statistical RelationalLearners
2017-10-23
learning formalisms including inverse reinforcement learning [4] and statistical relational learning [7, 5, 8]. We have also applied our algorithms in...one introduced for label preferences. 4 Figure 2: Active Advice Seeking for Inverse Reinforcement Learning. active advice seeking is in selecting the...learning tasks. 1.2.1 Sequential Decision-Making Our previous work on advice for inverse reinforcement learning (IRL) defined advice as action
Probabilistic Reinforcement Learning in Adults with Autism Spectrum Disorders
Solomon, Marjorie; Smith, Anne C.; Frank, Michael J.; Ly, Stanford; Carter, Cameron S.
2017-01-01
Background Autism spectrum disorders (ASDs) can be conceptualized as disorders of learning, however there have been few experimental studies taking this perspective. Methods We examined the probabilistic reinforcement learning performance of 28 adults with ASDs and 30 typically developing adults on a task requiring learning relationships between three stimulus pairs consisting of Japanese characters with feedback that was valid with different probabilities (80%, 70%, and 60%). Both univariate and Bayesian state–space data analytic methods were employed. Hypotheses were based on the extant literature as well as on neurobiological and computational models of reinforcement learning. Results Both groups learned the task after training. However, there were group differences in early learning in the first task block where individuals with ASDs acquired the most frequently accurately reinforced stimulus pair (80%) comparably to typically developing individuals; exhibited poorer acquisition of the less frequently reinforced 70% pair as assessed by state–space learning curves; and outperformed typically developing individuals on the near chance (60%) pair. Individuals with ASDs also demonstrated deficits in using positive feedback to exploit rewarded choices. Conclusions Results support the contention that individuals with ASDs are slower learners. Based on neurobiology and on the results of computational modeling, one interpretation of this pattern of findings is that impairments are related to deficits in flexible updating of reinforcement history as mediated by the orbito-frontal cortex, with spared functioning of the basal ganglia. This hypothesis about the pathophysiology of learning in ASDs can be tested using functional magnetic resonance imaging. PMID:21425243
Preliminary Work for Examining the Scalability of Reinforcement Learning
NASA Technical Reports Server (NTRS)
Clouse, Jeff
1998-01-01
Researchers began studying automated agents that learn to perform multiple-step tasks early in the history of artificial intelligence (Samuel, 1963; Samuel, 1967; Waterman, 1970; Fikes, Hart & Nilsonn, 1972). Multiple-step tasks are tasks that can only be solved via a sequence of decisions, such as control problems, robotics problems, classic problem-solving, and game-playing. The objective of agents attempting to learn such tasks is to use the resources they have available in order to become more proficient at the tasks. In particular, each agent attempts to develop a good policy, a mapping from states to actions, that allows it to select actions that optimize a measure of its performance on the task; for example, reducing the number of steps necessary to complete the task successfully. Our study focuses on reinforcement learning, a set of learning techniques where the learner performs trial-and-error experiments in the task and adapts its policy based on the outcome of those experiments. Much of the work in reinforcement learning has focused on a particular, simple representation, where every problem state is represented explicitly in a table, and associated with each state are the actions that can be chosen in that state. A major advantage of this table lookup representation is that one can prove that certain reinforcement learning techniques will develop an optimal policy for the current task. The drawback is that the representation limits the application of reinforcement learning to multiple-step tasks with relatively small state-spaces. There has been a little theoretical work that proves that convergence to optimal solutions can be obtained when using generalization structures, but the structures are quite simple. The theory says little about complex structures, such as multi-layer, feedforward artificial neural networks (Rumelhart & McClelland, 1986), but empirical results indicate that the use of reinforcement learning with such structures is promising. These empirical results make no theoretical claims, nor compare the policies produced to optimal policies. A goal of our work is to be able to make the comparison between an optimal policy and one stored in an artificial neural network. A difficulty of performing such a study is finding a multiple-step task that is small enough that one can find an optimal policy using table lookup, yet large enough that, for practical purposes, an artificial neural network is really required. We have identified a limited form of the game OTHELLO as satisfying these requirements. The work we report here is in the very preliminary stages of research, but this paper provides background for the problem being studied and a description of our initial approach to examining the problem. In the remainder of this paper, we first describe reinforcement learning in more detail. Next, we present the game OTHELLO. Finally we argue that a restricted form of the game meets the requirements of our study, and describe our preliminary approach to finding an optimal solution to the problem.
General functioning predicts reward and punishment learning in schizophrenia.
Somlai, Zsuzsanna; Moustafa, Ahmed A; Kéri, Szabolcs; Myers, Catherine E; Gluck, Mark A
2011-04-01
Previous studies investigating feedback-driven reinforcement learning in patients with schizophrenia have provided mixed results. In this study, we explored the clinical predictors of reward and punishment learning using a probabilistic classification learning task. Patients with schizophrenia (n=40) performed similarly to healthy controls (n=30) on the classification learning task. However, more severe negative and general symptoms were associated with lower reward-learning performance, whereas poorer general psychosocial functioning was correlated with both lower reward- and punishment-learning performances. Multiple linear regression analyses indicated that general psychosocial functioning was the only significant predictor of reinforcement learning performance when education, antipsychotic dose, and positive, negative and general symptoms were included in the analysis. These results suggest a close relationship between reinforcement learning and general psychosocial functioning in schizophrenia. Published by Elsevier B.V.
Context transfer in reinforcement learning using action-value functions.
Mousavi, Amin; Nadjar Araabi, Babak; Nili Ahmadabadi, Majid
2014-01-01
This paper discusses the notion of context transfer in reinforcement learning tasks. Context transfer, as defined in this paper, implies knowledge transfer between source and target tasks that share the same environment dynamics and reward function but have different states or action spaces. In other words, the agents learn the same task while using different sensors and actuators. This requires the existence of an underlying common Markov decision process (MDP) to which all the agents' MDPs can be mapped. This is formulated in terms of the notion of MDP homomorphism. The learning framework is Q-learning. To transfer the knowledge between these tasks, the feature space is used as a translator and is expressed as a partial mapping between the state-action spaces of different tasks. The Q-values learned during the learning process of the source tasks are mapped to the sets of Q-values for the target task. These transferred Q-values are merged together and used to initialize the learning process of the target task. An interval-based approach is used to represent and merge the knowledge of the source tasks. Empirical results show that the transferred initialization can be beneficial to the learning process of the target task.
Context Transfer in Reinforcement Learning Using Action-Value Functions
Mousavi, Amin; Nadjar Araabi, Babak; Nili Ahmadabadi, Majid
2014-01-01
This paper discusses the notion of context transfer in reinforcement learning tasks. Context transfer, as defined in this paper, implies knowledge transfer between source and target tasks that share the same environment dynamics and reward function but have different states or action spaces. In other words, the agents learn the same task while using different sensors and actuators. This requires the existence of an underlying common Markov decision process (MDP) to which all the agents' MDPs can be mapped. This is formulated in terms of the notion of MDP homomorphism. The learning framework is Q-learning. To transfer the knowledge between these tasks, the feature space is used as a translator and is expressed as a partial mapping between the state-action spaces of different tasks. The Q-values learned during the learning process of the source tasks are mapped to the sets of Q-values for the target task. These transferred Q-values are merged together and used to initialize the learning process of the target task. An interval-based approach is used to represent and merge the knowledge of the source tasks. Empirical results show that the transferred initialization can be beneficial to the learning process of the target task. PMID:25610457
Segers, Elien; Beckers, Tom; Geurts, Hilde; Claes, Laurence; Danckaerts, Marina; van der Oord, Saskia
2018-01-01
Introduction: Behavioral Parent Training (BPT) is often provided for childhood psychiatric disorders. These disorders have been shown to be associated with working memory impairments. BPT is based on operant learning principles, yet how operant principles shape behavior (through the partial reinforcement (PRF) extinction effect, i.e., greater resistance to extinction that is created when behavior is reinforced partially rather than continuously) and the potential role of working memory therein is scarcely studied in children. This study explored the PRF extinction effect and the role of working memory therein using experimental tasks in typically developing children. Methods: Ninety-seven children (age 6–10) completed a working memory task and an operant learning task, in which children acquired a response-sequence rule under either continuous or PRF (120 trials), followed by an extinction phase (80 trials). Data of 88 children were used for analysis. Results: The PRF extinction effect was confirmed: We observed slower acquisition and extinction in the PRF condition as compared to the continuous reinforcement (CRF) condition. Working memory was negatively related to acquisition but not extinction performance. Conclusion: Both reinforcement contingencies and working memory relate to acquisition performance. Potential implications for BPT are that decreasing working memory load may enhance the chance of optimally learning through reinforcement. PMID:29643822
Reinforcement learning in multidimensional environments relies on attention mechanisms.
Niv, Yael; Daniel, Reka; Geana, Andra; Gershman, Samuel J; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C
2015-05-27
In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this "representation learning" process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the "curse of dimensionality" in reinforcement learning. Copyright © 2015 the authors 0270-6474/15/358145-13$15.00/0.
Changes in corticostriatal connectivity during reinforcement learning in humans.
Horga, Guillermo; Maia, Tiago V; Marsh, Rachel; Hao, Xuejun; Xu, Dongrong; Duan, Yunsuo; Tau, Gregory Z; Graniello, Barbara; Wang, Zhishun; Kangarlu, Alayar; Martinez, Diana; Packard, Mark G; Peterson, Bradley S
2015-02-01
Many computational models assume that reinforcement learning relies on changes in synaptic efficacy between cortical regions representing stimuli and striatal regions involved in response selection, but this assumption has thus far lacked empirical support in humans. We recorded hemodynamic signals with fMRI while participants navigated a virtual maze to find hidden rewards. We fitted a reinforcement-learning algorithm to participants' choice behavior and evaluated the neural activity and the changes in functional connectivity related to trial-by-trial learning variables. Activity in the posterior putamen during choice periods increased progressively during learning. Furthermore, the functional connections between the sensorimotor cortex and the posterior putamen strengthened progressively as participants learned the task. These changes in corticostriatal connectivity differentiated participants who learned the task from those who did not. These findings provide a direct link between changes in corticostriatal connectivity and learning, thereby supporting a central assumption common to several computational models of reinforcement learning. © 2014 Wiley Periodicals, Inc.
Autonomous Inter-Task Transfer in Reinforcement Learning Domains
2008-08-01
Twentieth International Joint Conference on Artificial Intelli - gence, 2007. 304 Fumihide Tanaka and Masayuki Yamamura. Multitask reinforcement learning...Functions . . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . 18 2.2.4 Instance-based...tures [Laird et al., 1986, Choi et al., 2007]. However, TL for RL tasks has only recently been gaining attention in the artificial intelligence
Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms
Daniel, Reka; Geana, Andra; Gershman, Samuel J.; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C.
2015-01-01
In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this “representation learning” process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the “curse of dimensionality” in reinforcement learning. PMID:26019331
Functional Contour-following via Haptic Perception and Reinforcement Learning.
Hellman, Randall B; Tekin, Cem; van der Schaar, Mihaela; Santos, Veronica J
2018-01-01
Many tasks involve the fine manipulation of objects despite limited visual feedback. In such scenarios, tactile and proprioceptive feedback can be leveraged for task completion. We present an approach for real-time haptic perception and decision-making for a haptics-driven, functional contour-following task: the closure of a ziplock bag. This task is challenging for robots because the bag is deformable, transparent, and visually occluded by artificial fingertip sensors that are also compliant. A deep neural net classifier was trained to estimate the state of a zipper within a robot's pinch grasp. A Contextual Multi-Armed Bandit (C-MAB) reinforcement learning algorithm was implemented to maximize cumulative rewards by balancing exploration versus exploitation of the state-action space. The C-MAB learner outperformed a benchmark Q-learner by more efficiently exploring the state-action space while learning a hard-to-code task. The learned C-MAB policy was tested with novel ziplock bag scenarios and contours (wire, rope). Importantly, this work contributes to the development of reinforcement learning approaches that account for limited resources such as hardware life and researcher time. As robots are used to perform complex, physically interactive tasks in unstructured or unmodeled environments, it becomes important to develop methods that enable efficient and effective learning with physical testbeds.
Dere, Ekrem; De Souza-Silva, Maria A; Topic, Bianca; Spieler, Richard E; Haas, Helmut L; Huston, Joseph P
2003-01-01
The brain's histaminergic system has been implicated in hippocampal synaptic plasticity, learning, and memory, as well as brain reward and reinforcement. Our past pharmacological and lesion studies indicated that the brain's histamine system exerts inhibitory effects on the brain's reinforcement respective reward system reciprocal to mesolimbic dopamine systems, thereby modulating learning and memory performance. Given the close functional relationship between brain reinforcement and memory processes, the total disruption of brain histamine synthesis via genetic disruption of its synthesizing enzyme, histidine decarboxylase (HDC), in the mouse might have differential effects on learning dependent on the task-inherent reinforcement contingencies. Here, we investigated the effects of an HDC gene disruption in the mouse in a nonreinforced object exploration task and a negatively reinforced water-maze task as well as on neo- and ventro-striatal dopamine systems known to be involved in brain reward and reinforcement. Histidine decarboxylase knockout (HDC-KO) mice had higher dihydrophenylacetic acid concentrations and a higher dihydrophenylacetic acid/dopamine ratio in the neostriatum. In the ventral striatum, dihydrophenylacetic acid/dopamine and 3-methoxytyramine/dopamine ratios were higher in HDC-KO mice. Furthermore, the HDC-KO mice showed improved water-maze performance during both hidden and cued platform tasks, but deficient object discrimination based on temporal relationships. Our data imply that disruption of brain histamine synthesis can have both memory promoting and suppressive effects via distinct and independent mechanisms and further indicate that these opposed effects are related to the task-inherent reinforcement contingencies.
Dere, E; Frisch, C; De Souza Silva, M A; Gödecke, A; Schrader, J; Huston, J P
2001-01-01
Proceeding from previous findings of a beneficial effect of endothelial nitric oxide synthase (eNOS) gene inactivation on negatively reinforced water maze performance, we asked whether this improvement in place learning capacities also holds for a positively reinforced radial maze task. Unlike its beneficial effects on the water maze task, eNOS gene inactivation did not facilitate radial maze performance. The acquisition performance over the days of place learning did not differ between eNOS knockout (eNOS-/-) and wild-type mice (eNOS+/+). eNOS-/- mice displayed a slight and eNOS+/+ mice a more severe working memory deficit in the place learning version of the radial maze compared to the genetic background C57BL/6 strain. Possible differential effects of eNOS inactivation, related to differences in reinforcement contingencies between the Morris water maze and radial maze tasks, behavioral strategy requirements, or to different emotional and physiological concomitants inherent in the two tasks are discussed. These task-unique characteristics might be differentially affected by the reported anxiogenic and hypertensional effects of eNOS gene inactivation. Post-mortem determination of acetylcholine concentrations in diverse brain structures revealed that acetylcholine and choline contents were not different between eNOS-/- and eNOS+/+ mice, but were increased in eNOS+/+ mice compared to C57BL/6 mice in the frontal cortex. Our findings demonstrate that phenotyping of learning and memory capacities should not rely on one learning task only, but should include tasks employing both negative and positive reinforcement contingencies in order to allow valid statements regarding differences in learning capacities between rodent strains.
ERIC Educational Resources Information Center
Peterson, Craig M.
A system of task analysis and positive reinforcement was used in the vocational training of a 19-year-old trainable retarded youth (MA=6 years). The task of polishing shoe skates was analyzed and programmed into 29 steps and was reinforced with praise and money. The trainee learned the task in 13 sessions (approximately 1 month) and was employed…
Deep imitation learning for 3D navigation tasks.
Hussein, Ahmed; Elyan, Eyad; Gaber, Mohamed Medhat; Jayne, Chrisina
2018-01-01
Deep learning techniques have shown success in learning from raw high-dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in imitation learning has been scarcely explored. Imitation learning can be an efficient method to teach intelligent agents by providing a set of demonstrations to learn from. However, generalizing to situations that are not represented in the demonstrations can be challenging, especially in 3D environments. In this paper, we propose a deep imitation learning method to learn navigation tasks from demonstrations in a 3D environment. The supervised policy is refined using active learning in order to generalize to unseen situations. This approach is compared to two popular deep reinforcement learning techniques: deep-Q-networks and Asynchronous actor-critic (A3C). The proposed method as well as the reinforcement learning methods employ deep convolutional neural networks and learn directly from raw visual input. Methods for combining learning from demonstrations and experience are also investigated. This combination aims to join the generalization ability of learning by experience with the efficiency of learning by imitation. The proposed methods are evaluated on 4 navigation tasks in a 3D simulated environment. Navigation tasks are a typical problem that is relevant to many real applications. They pose the challenge of requiring demonstrations of long trajectories to reach the target and only providing delayed rewards (usually terminal) to the agent. The experiments show that the proposed method can successfully learn navigation tasks from raw visual input while learning from experience methods fail to learn an effective policy. Moreover, it is shown that active learning can significantly improve the performance of the initially learned policy using a small number of active samples.
Schönberg, Tom; Daw, Nathaniel D; Joel, Daphna; O'Doherty, John P
2007-11-21
The computational framework of reinforcement learning has been used to forward our understanding of the neural mechanisms underlying reward learning and decision-making behavior. It is known that humans vary widely in their performance in decision-making tasks. Here, we used a simple four-armed bandit task in which subjects are almost evenly split into two groups on the basis of their performance: those who do learn to favor choice of the optimal action and those who do not. Using models of reinforcement learning we sought to determine the neural basis of these intrinsic differences in performance by scanning both groups with functional magnetic resonance imaging. We scanned 29 subjects while they performed the reward-based decision-making task. Our results suggest that these two groups differ markedly in the degree to which reinforcement learning signals in the striatum are engaged during task performance. While the learners showed robust prediction error signals in both the ventral and dorsal striatum during learning, the nonlearner group showed a marked absence of such signals. Moreover, the magnitude of prediction error signals in a region of dorsal striatum correlated significantly with a measure of behavioral performance across all subjects. These findings support a crucial role of prediction error signals, likely originating from dopaminergic midbrain neurons, in enabling learning of action selection preferences on the basis of obtained rewards. Thus, spontaneously observed individual differences in decision making performance demonstrate the suggested dependence of this type of learning on the functional integrity of the dopaminergic striatal system in humans.
Stimulus discriminability may bias value-based probabilistic learning.
Schutte, Iris; Slagter, Heleen A; Collins, Anne G E; Frank, Michael J; Kenemans, J Leon
2017-01-01
Reinforcement learning tasks are often used to assess participants' tendency to learn more from the positive or more from the negative consequences of one's action. However, this assessment often requires comparison in learning performance across different task conditions, which may differ in the relative salience or discriminability of the stimuli associated with more and less rewarding outcomes, respectively. To address this issue, in a first set of studies, participants were subjected to two versions of a common probabilistic learning task. The two versions differed with respect to the stimulus (Hiragana) characters associated with reward probability. The assignment of character to reward probability was fixed within version but reversed between versions. We found that performance was highly influenced by task version, which could be explained by the relative perceptual discriminability of characters assigned to high or low reward probabilities, as assessed by a separate discrimination experiment. Participants were more reliable in selecting rewarding characters that were more discriminable, leading to differences in learning curves and their sensitivity to reward probability. This difference in experienced reinforcement history was accompanied by performance biases in a test phase assessing ability to learn from positive vs. negative outcomes. In a subsequent large-scale web-based experiment, this impact of task version on learning and test measures was replicated and extended. Collectively, these findings imply a key role for perceptual factors in guiding reward learning and underscore the need to control stimulus discriminability when making inferences about individual differences in reinforcement learning.
Wang, Yiwen; Wang, Fang; Xu, Kai; Zhang, Qiaosheng; Zhang, Shaomin; Zheng, Xiaoxiang
2015-05-01
Reinforcement learning (RL)-based brain machine interfaces (BMIs) enable the user to learn from the environment through interactions to complete the task without desired signals, which is promising for clinical applications. Previous studies exploited Q-learning techniques to discriminate neural states into simple directional actions providing the trial initial timing. However, the movements in BMI applications can be quite complicated, and the action timing explicitly shows the intention when to move. The rich actions and the corresponding neural states form a large state-action space, imposing generalization difficulty on Q-learning. In this paper, we propose to adopt attention-gated reinforcement learning (AGREL) as a new learning scheme for BMIs to adaptively decode high-dimensional neural activities into seven distinct movements (directional moves, holdings and resting) due to the efficient weight-updating. We apply AGREL on neural data recorded from M1 of a monkey to directly predict a seven-action set in a time sequence to reconstruct the trajectory of a center-out task. Compared to Q-learning techniques, AGREL could improve the target acquisition rate to 90.16% in average with faster convergence and more stability to follow neural activity over multiple days, indicating the potential to achieve better online decoding performance for more complicated BMI tasks.
Davidow, Juliet Y; Foerde, Karin; Galván, Adriana; Shohamy, Daphna
2016-10-05
Adolescents are notorious for engaging in reward-seeking behaviors, a tendency attributed to heightened activity in the brain's reward systems during adolescence. It has been suggested that reward sensitivity in adolescence might be adaptive, but evidence of an adaptive role has been scarce. Using a probabilistic reinforcement learning task combined with reinforcement learning models and fMRI, we found that adolescents showed better reinforcement learning and a stronger link between reinforcement learning and episodic memory for rewarding outcomes. This behavioral benefit was related to heightened prediction error-related BOLD activity in the hippocampus and to stronger functional connectivity between the hippocampus and the striatum at the time of reinforcement. These findings reveal an important role for the hippocampus in reinforcement learning in adolescence and suggest that reward sensitivity in adolescence is related to adaptive differences in how adolescents learn from experience. Copyright © 2016 Elsevier Inc. All rights reserved.
Altered neural encoding of prediction errors in assault-related posttraumatic stress disorder.
Ross, Marisa C; Lenow, Jennifer K; Kilts, Clinton D; Cisler, Josh M
2018-05-12
Posttraumatic stress disorder (PTSD) is widely associated with deficits in extinguishing learned fear responses, which relies on mechanisms of reinforcement learning (e.g., updating expectations based on prediction errors). However, the degree to which PTSD is associated with impairments in general reinforcement learning (i.e., outside of the context of fear stimuli) remains poorly understood. Here, we investigate brain and behavioral differences in general reinforcement learning between adult women with and without a current diagnosis of PTSD. 29 adult females (15 PTSD with exposure to assaultive violence, 14 controls) underwent a neutral reinforcement-learning task (i.e., two arm bandit task) during fMRI. We modeled participant behavior using different adaptations of the Rescorla-Wagner (RW) model and used Independent Component Analysis to identify timecourses for large-scale a priori brain networks. We found that an anticorrelated and risk sensitive RW model best fit participant behavior, with no differences in computational parameters between groups. Women in the PTSD group demonstrated significantly less neural encoding of prediction errors in both a ventral striatum/mPFC and anterior insula network compared to healthy controls. Weakened encoding of prediction errors in the ventral striatum/mPFC and anterior insula during a general reinforcement learning task, outside of the context of fear stimuli, suggests the possibility of a broader conceptualization of learning differences in PTSD than currently proposed in current neurocircuitry models of PTSD. Copyright © 2018 Elsevier Ltd. All rights reserved.
Infant Contingency Learning in Different Cultural Contexts
ERIC Educational Resources Information Center
Graf, Frauke; Lamm, Bettina; Goertz, Claudia; Kolling, Thorsten; Freitag, Claudia; Spangler, Sibylle; Fassbender, Ina; Teubert, Manuel; Vierhaus, Marc; Keller, Heidi; Lohaus, Arnold; Schwarzer, Gudrun; Knopf, Monika
2012-01-01
Three-month-old Cameroonian Nso farmer and German middle-class infants were compared regarding learning and retention in a computerized mobile task. Infants achieving a preset learning criterion during reinforcement were tested for immediate and long-term retention measured in terms of an increased response rate after reinforcement and after a…
The Effects of Partial Reinforcement in the Acquisition and Extinction of Recurrent Serial Patterns.
ERIC Educational Resources Information Center
Dockstader, Steven L.
The purpose of these 2 experiments was to determine whether sequential response pattern behavior is affected by partial reinforcement in the same way as other behavior systems. The first experiment investigated the partial reinforcement extinction effects (PREE) in a sequential concept learning task where subjects were required to learn a…
Working Memory Contributions to Reinforcement Learning Impairments in Schizophrenia
Brown, Jaime K.; Gold, James M.; Waltz, James A.; Frank, Michael J.
2014-01-01
Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine-dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia. PMID:25297101
Reusable Reinforcement Learning via Shallow Trails.
Yu, Yang; Chen, Shi-Yong; Da, Qing; Zhou, Zhi-Hua
2018-06-01
Reinforcement learning has shown great success in helping learning agents accomplish tasks autonomously from environment interactions. Meanwhile in many real-world applications, an agent needs to accomplish not only a fixed task but also a range of tasks. For this goal, an agent can learn a metapolicy over a set of training tasks that are drawn from an underlying distribution. By maximizing the total reward summed over all the training tasks, the metapolicy can then be reused in accomplishing test tasks from the same distribution. However, in practice, we face two major obstacles to train and reuse metapolicies well. First, how to identify tasks that are unrelated or even opposite with each other, in order to avoid their mutual interference in the training. Second, how to characterize task features, according to which a metapolicy can be reused. In this paper, we propose the MetA-Policy LEarning (MAPLE) approach that overcomes the two difficulties by introducing the shallow trail. It probes a task by running a roughly trained policy. Using the rewards of the shallow trail, MAPLE automatically groups similar tasks. Moreover, when the task parameters are unknown, the rewards of the shallow trail also serve as task features. Empirical studies on several controlling tasks verify that MAPLE can train metapolicies well and receives high reward on test tasks.
Balcarras, Matthew; Ardid, Salva; Kaping, Daniel; Everling, Stefan; Womelsdorf, Thilo
2016-02-01
Attention includes processes that evaluate stimuli relevance, select the most relevant stimulus against less relevant stimuli, and bias choice behavior toward the selected information. It is not clear how these processes interact. Here, we captured these processes in a reinforcement learning framework applied to a feature-based attention task that required macaques to learn and update the value of stimulus features while ignoring nonrelevant sensory features, locations, and action plans. We found that value-based reinforcement learning mechanisms could account for feature-based attentional selection and choice behavior but required a value-independent stickiness selection process to explain selection errors while at asymptotic behavior. By comparing different reinforcement learning schemes, we found that trial-by-trial selections were best predicted by a model that only represents expected values for the task-relevant feature dimension, with nonrelevant stimulus features and action plans having only a marginal influence on covert selections. These findings show that attentional control subprocesses can be described by (1) the reinforcement learning of feature values within a restricted feature space that excludes irrelevant feature dimensions, (2) a stochastic selection process on feature-specific value representations, and (3) value-independent stickiness toward previous feature selections akin to perseveration in the motor domain. We speculate that these three mechanisms are implemented by distinct but interacting brain circuits and that the proposed formal account of feature-based stimulus selection will be important to understand how attentional subprocesses are implemented in primate brain networks.
Adolescent-specific patterns of behavior and neural activity during social reinforcement learning
Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, BJ
2014-01-01
Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents towards action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggests possible explanations for how peers may motivate adolescent behavior. PMID:24550063
Adolescent-specific patterns of behavior and neural activity during social reinforcement learning.
Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, B J
2014-06-01
Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.
Racial bias shapes social reinforcement learning.
Lindström, Björn; Selbing, Ida; Molapour, Tanaz; Olsson, Andreas
2014-03-01
Both emotional facial expressions and markers of racial-group belonging are ubiquitous signals in social interaction, but little is known about how these signals together affect future behavior through learning. To address this issue, we investigated how emotional (threatening or friendly) in-group and out-group faces reinforced behavior in a reinforcement-learning task. We asked whether reinforcement learning would be modulated by intergroup attitudes (i.e., racial bias). The results showed that individual differences in racial bias critically modulated reinforcement learning. As predicted, racial bias was associated with more efficiently learned avoidance of threatening out-group individuals. We used computational modeling analysis to quantitatively delimit the underlying processes affected by social reinforcement. These analyses showed that racial bias modulates the rate at which exposure to threatening out-group individuals is transformed into future avoidance behavior. In concert, these results shed new light on the learning processes underlying social interaction with racial-in-group and out-group individuals.
Improving Robot Motor Learning with Negatively Valenced Reinforcement Signals
Navarro-Guerrero, Nicolás; Lowe, Robert J.; Wermter, Stefan
2017-01-01
Both nociception and punishment signals have been used in robotics. However, the potential for using these negatively valenced types of reinforcement learning signals for robot learning has not been exploited in detail yet. Nociceptive signals are primarily used as triggers of preprogrammed action sequences. Punishment signals are typically disembodied, i.e., with no or little relation to the agent-intrinsic limitations, and they are often used to impose behavioral constraints. Here, we provide an alternative approach for nociceptive signals as drivers of learning rather than simple triggers of preprogrammed behavior. Explicitly, we use nociception to expand the state space while we use punishment as a negative reinforcement learning signal. We compare the performance—in terms of task error, the amount of perceived nociception, and length of learned action sequences—of different neural networks imbued with punishment-based reinforcement signals for inverse kinematic learning. We contrast the performance of a version of the neural network that receives nociceptive inputs to that without such a process. Furthermore, we provide evidence that nociception can improve learning—making the algorithm more robust against network initializations—as well as behavioral performance by reducing the task error, perceived nociception, and length of learned action sequences. Moreover, we provide evidence that punishment, at least as typically used within reinforcement learning applications, may be detrimental in all relevant metrics. PMID:28420976
Effects of Dopamine Medication on Sequence Learning with Stochastic Feedback in Parkinson's Disease
Seo, Moonsang; Beigi, Mazda; Jahanshahi, Marjan; Averbeck, Bruno B.
2010-01-01
A growing body of evidence suggests that the midbrain dopamine system plays a key role in reinforcement learning and disruption of the midbrain dopamine system in Parkinson's disease (PD) may lead to deficits on tasks that require learning from feedback. We examined how changes in dopamine levels (“ON” and “OFF” their dopamine medication) affect sequence learning from stochastic positive and negative feedback using Bayesian reinforcement learning models. We found deficits in sequence learning in patients with PD when they were “ON” and “OFF” medication relative to healthy controls, but smaller differences between patients “OFF” and “ON”. The deficits were mainly due to decreased learning from positive feedback, although across all participant groups learning was more strongly associated with positive than negative feedback in our task. The learning in our task is likely mediated by the relatively depleted dorsal striatum and not the relatively intact ventral striatum. Therefore, the changes we see in our task may be due to a strong loss of phasic dopamine signals in the dorsal striatum in PD. PMID:20740077
Martínez-Velázquez, Eduardo S; Ramos-Loyo, Julieta; González-Garrido, Andrés A; Sequeira, Henrique
2015-01-21
Feedback-related negativity (FRN) is a negative deflection that appears around 250 ms after the gain or loss of feedback to chosen alternatives in a gambling task in frontocentral regions following outcomes. Few studies have reported FRN enhancement in adolescents compared with adults in a gambling task without probabilistic reinforcement learning, despite the fact that learning from positive or negative consequences is crucial for decision-making during adolescence. Therefore, the aim of the present research was to identify differences in FRN amplitude and latency between adolescents and adults on a gambling task with favorable and unfavorable probabilistic reinforcement learning conditions, in addition to a nonlearning condition with monetary gains and losses. Higher rate scores of high-magnitude choices during the final 30 trials compared with the first 30 trials were observed during the favorable condition, whereas lower rates were observed during the unfavorable condition in both groups. Higher FRN amplitude in all conditions and longer latency in the nonlearning condition were observed in adolescents compared with adults and in relation to losses. Results indicate that both the adolescents and the adults improved their performance in relation to positive and negative feedback. However, the FRN findings suggest an increased sensitivity to external feedback to losses in adolescents compared with adults, irrespective of the presence or absence of probabilistic reinforcement learning. These results reflect processing differences on the neural monitoring system and provide new perspectives on the dynamic development of an adolescent's brain.
Neural correlates of reinforcement learning and social preferences in competitive bidding.
van den Bos, Wouter; Talwar, Arjun; McClure, Samuel M
2013-01-30
In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.
Feedback-related brain activity predicts learning from feedback in multiple-choice testing.
Ernst, Benjamin; Steinhauser, Marco
2012-06-01
Different event-related potentials (ERPs) have been shown to correlate with learning from feedback in decision-making tasks and with learning in explicit memory tasks. In the present study, we investigated which ERPs predict learning from corrective feedback in a multiple-choice test, which combines elements from both paradigms. Participants worked through sets of multiple-choice items of a Swahili-German vocabulary task. Whereas the initial presentation of an item required the participants to guess the answer, corrective feedback could be used to learn the correct response. Initial analyses revealed that corrective feedback elicited components related to reinforcement learning (FRN), as well as to explicit memory processing (P300) and attention (early frontal positivity). However, only the P300 and early frontal positivity were positively correlated with successful learning from corrective feedback, whereas the FRN was even larger when learning failed. These results suggest that learning from corrective feedback crucially relies on explicit memory processing and attentional orienting to corrective feedback, rather than on reinforcement learning.
Understanding Optimal Decision-Making
2015-06-01
Task (IGT) (Bechara, Damasio, Damasio, & Anderson,1994), a very common test of reinforcement learning that has been used in hundreds of psychology ... psychology task that elicits reinforcement learning (Bechara et al., 1994) and has been used in hundreds of studies (Krain et al., 2006). Subjects...34) 70 # LatByTrial<- LatByTrial+geom_line(data=player,aes(x=trial,y=ewma),linetype=1, colour ="grey8 8") # LatByTrial<- LatByTrial+geom_point
Human-level control through deep reinforcement learning.
Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis
2015-02-26
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Human-level control through deep reinforcement learning
NASA Astrophysics Data System (ADS)
Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis
2015-02-01
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Human Aided Reinforcement Learning in Complex Environments
learn to solve tasks through a trial -and- error process. As an agent takes ...task faster andmore accurately, a human expert can be added to the system to guide an agent in solving the task. This project seeks to expand on current...theenvironment, which works particularly well for reactive tasks . In more complex tasks , these systems do not work as intended. The manipulation
ERIC Educational Resources Information Center
Wilson, L. R.
1975-01-01
This study, by comparing for learning disabled and control children parental frequency rating of physical punishment and of completion of tasks, hypothesized that parental indulgence is associated with learning disorders. Referral children with learning problems (N=18), were rated significantly lower on both measures that students referred for…
Working memory contributions to reinforcement learning impairments in schizophrenia.
Collins, Anne G E; Brown, Jaime K; Gold, James M; Waltz, James A; Frank, Michael J
2014-10-08
Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine-dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia. Copyright © 2014 the authors 0270-6474/14/3413747-10$15.00/0.
Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji
2015-01-01
A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach.
Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji
2015-01-01
A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach. PMID:25734662
Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models
Najnin, Shamima; Banerjee, Bonny
2018-01-01
Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The “novel words to novel objects” language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task. PMID:29441027
A Novel Task for the Investigation of Action Acquisition
Stafford, Tom; Thirkettle, Martin; Walton, Tom; Vautrelle, Nicolas; Hetherington, Len; Port, Michael; Gurney, Kevin; Redgrave, Pete
2012-01-01
We present a behavioural task designed for the investigation of how novel instrumental actions are discovered and learnt. The task consists of free movement with a manipulandum, during which the full range of possible movements can be explored by the participant and recorded. A subset of these movements, the ‘target’, is set to trigger a reinforcing signal. The task is to discover what movements of the manipulandum evoke the reinforcement signal. Targets can be defined in spatial, temporal, or kinematic terms, can be a combination of these aspects, or can represent the concatenation of actions into a larger gesture. The task allows the study of how the specific elements of behaviour which cause the reinforcing signal are identified, refined and stored by the participant. The task provides a paradigm where the exploratory motive drives learning and as such we view it as in the tradition of Thorndike [1]. Most importantly it allows for repeated measures, since when a novel action is acquired the criterion for triggering reinforcement can be changed requiring a new action to be discovered. Here, we present data using both humans and rats as subjects, showing that our task is easily scalable in difficulty, adaptable across species, and produces a rich set of behavioural measures offering new and valuable insight into the action learning process. PMID:22675490
Robust reinforcement learning.
Morimoto, Jun; Doya, Kenji
2005-02-01
This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both offline learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H(infinity) control, we consider a differential game in which a "disturbing" agent tries to make the worst possible disturbance while a "control" agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H(infinity) control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.
Gruber, Aaron J; Thapa, Rajat
2016-01-01
The propensity of animals to shift choices immediately after unexpectedly poor reinforcement outcomes is a pervasive strategy across species and tasks. We report here that the memory supporting such lose-shift responding in rats rapidly decays during the intertrial interval and persists throughout training and testing on a binary choice task, despite being a suboptimal strategy. Lose-shift responding is not positively correlated with the prevalence and temporal dependence of win-stay responding, and it is inconsistent with predictions of reinforcement learning on the task. These data provide further evidence that win-stay and lose-shift are mediated by dissociated neural mechanisms and indicate that lose-shift responding presents a potential confound for the study of choice in the many operant choice tasks with short intertrial intervals. We propose that this immediate lose-shift responding is an intrinsic feature of the brain's choice mechanisms that is engaged as a choice reflex and works in parallel with reinforcement learning and other control mechanisms to guide action selection.
Insel, Catherine; Reinen, Jenna; Weber, Jochen; Wager, Tor D; Jarskog, L Fredrik; Shohamy, Daphna; Smith, Edward E
2014-03-01
Schizophrenia is characterized by an abnormal dopamine system, and dopamine blockade is the primary mechanism of antipsychotic treatment. Consistent with the known role of dopamine in reward processing, prior research has demonstrated that patients with schizophrenia exhibit impairments in reward-based learning. However, it remains unknown how treatment with antipsychotic medication impacts the behavioral and neural signatures of reinforcement learning in schizophrenia. The goal of this study was to examine whether antipsychotic medication modulates behavioral and neural responses to prediction error coding during reinforcement learning. Patients with schizophrenia completed a reinforcement learning task while undergoing functional magnetic resonance imaging. The task consisted of two separate conditions in which participants accumulated monetary gain or avoided monetary loss. Behavioral results indicated that antipsychotic medication dose was associated with altered behavioral approaches to learning, such that patients taking higher doses of medication showed increased sensitivity to negative reinforcement. Higher doses of antipsychotic medication were also associated with higher learning rates (LRs), suggesting that medication enhanced sensitivity to trial-by-trial feedback. Neuroimaging data demonstrated that antipsychotic dose was related to differences in neural signatures of feedback prediction error during the loss condition. Specifically, patients taking higher doses of medication showed attenuated prediction error responses in the striatum and the medial prefrontal cortex. These findings indicate that antipsychotic medication treatment may influence motivational processes in patients with schizophrenia.
Generalization of value in reinforcement learning by humans.
Wimmer, G Elliott; Daw, Nathaniel D; Shohamy, Daphna
2012-04-01
Research in decision-making has focused on the role of dopamine and its striatal targets in guiding choices via learned stimulus-reward or stimulus-response associations, behavior that is well described by reinforcement learning theories. However, basic reinforcement learning is relatively limited in scope and does not explain how learning about stimulus regularities or relations may guide decision-making. A candidate mechanism for this type of learning comes from the domain of memory, which has highlighted a role for the hippocampus in learning of stimulus-stimulus relations, typically dissociated from the role of the striatum in stimulus-response learning. Here, we used functional magnetic resonance imaging and computational model-based analyses to examine the joint contributions of these mechanisms to reinforcement learning. Humans performed a reinforcement learning task with added relational structure, modeled after tasks used to isolate hippocampal contributions to memory. On each trial participants chose one of four options, but the reward probabilities for pairs of options were correlated across trials. This (uninstructed) relationship between pairs of options potentially enabled an observer to learn about option values based on experience with the other options and to generalize across them. We observed blood oxygen level-dependent (BOLD) activity related to learning in the striatum and also in the hippocampus. By comparing a basic reinforcement learning model to one augmented to allow feedback to generalize between correlated options, we tested whether choice behavior and BOLD activity were influenced by the opportunity to generalize across correlated options. Although such generalization goes beyond standard computational accounts of reinforcement learning and striatal BOLD, both choices and striatal BOLD activity were better explained by the augmented model. Consistent with the hypothesized role for the hippocampus in this generalization, functional connectivity between the ventral striatum and hippocampus was modulated, across participants, by the ability of the augmented model to capture participants' choice. Our results thus point toward an interactive model in which striatal reinforcement learning systems may employ relational representations typically associated with the hippocampus. © 2012 The Authors. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.
Arnold, Megan A; Newland, M Christopher
2018-06-16
Behavioral inflexibility is often assessed using reversal learning tasks, which require a relatively low degree of response variability. No studies have assessed sensitivity to reinforcement contingencies that specifically select highly variable response patterns in mice, let alone in models of neurodevelopmental disorders involving limited response variation. Operant variability and incremental repeated acquisition (IRA) were used to assess unique aspects of behavioral variability of two mouse strains: BALB/c, a model of some deficits in ASD, and C57Bl/6. On the operant variability task, BALB/c mice responded more repetitively during adolescence than C57Bl/6 mice when reinforcement did not require variability but responded more variably when reinforcement required variability. During IRA testing in adulthood, both strains acquired an unchanging, performance sequence equally well. Strain differences emerged, however, after novel learning sequences began alternating with the performance sequence: BALB/c mice substantially outperformed C57Bl/6 mice. Using litter-mate controls, it was found that adolescent experience with variability did not affect either learning or performance on the IRA task in adulthood. These findings constrain the use of BALB/c mice as a model of ASD, but once again reveal this strain is highly sensitive to reinforcement contingencies and they are fast and robust learners. Copyright © 2018. Published by Elsevier B.V.
Model-based reinforcement learning with dimension reduction.
Tangkaratt, Voot; Morimoto, Jun; Sugiyama, Masashi
2016-12-01
The goal of reinforcement learning is to learn an optimal policy which controls an agent to acquire the maximum cumulative reward. The model-based reinforcement learning approach learns a transition model of the environment from data, and then derives the optimal policy using the transition model. However, learning an accurate transition model in high-dimensional environments requires a large amount of data which is difficult to obtain. To overcome this difficulty, in this paper, we propose to combine model-based reinforcement learning with the recently developed least-squares conditional entropy (LSCE) method, which simultaneously performs transition model estimation and dimension reduction. We also further extend the proposed method to imitation learning scenarios. The experimental results show that policy search combined with LSCE performs well for high-dimensional control tasks including real humanoid robot control. Copyright © 2016 Elsevier Ltd. All rights reserved.
Pleasurable music affects reinforcement learning according to the listener
Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira
2013-01-01
Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875
FMRQ-A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks.
Zhang, Zhen; Zhao, Dongbin; Gao, Junwei; Wang, Dongqing; Dai, Yujie
2017-06-01
In this paper, we propose a multiagent reinforcement learning algorithm dealing with fully cooperative tasks. The algorithm is called frequency of the maximum reward Q-learning (FMRQ). FMRQ aims to achieve one of the optimal Nash equilibria so as to optimize the performance index in multiagent systems. The frequency of obtaining the highest global immediate reward instead of immediate reward is used as the reinforcement signal. With FMRQ each agent does not need the observation of the other agents' actions and only shares its state and reward at each step. We validate FMRQ through case studies of repeated games: four cases of two-player two-action and one case of three-player two-action. It is demonstrated that FMRQ can converge to one of the optimal Nash equilibria in these cases. Moreover, comparison experiments on tasks with multiple states and finite steps are conducted. One is box-pushing and the other one is distributed sensor network problem. Experimental results show that the proposed algorithm outperforms others with higher performance.
Gershman, Samuel J.; Pesaran, Bijan; Daw, Nathaniel D.
2009-01-01
Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable, due to the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning – such as prediction error signals for action valuation associated with dopamine and the striatum – can cope with this “curse of dimensionality.” We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and BOLD activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to “divide and conquer” reinforcement learning over high-dimensional action spaces. PMID:19864565
Gershman, Samuel J; Pesaran, Bijan; Daw, Nathaniel D
2009-10-28
Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.
Learning to Predict Consequences as a Method of Knowledge Transfer in Reinforcement Learning.
Chalmers, Eric; Contreras, Edgar Bermudez; Robertson, Brandon; Luczak, Artur; Gruber, Aaron
2017-04-17
The reinforcement learning (RL) paradigm allows agents to solve tasks through trial-and-error learning. To be capable of efficient, long-term learning, RL agents should be able to apply knowledge gained in the past to new tasks they may encounter in the future. The ability to predict actions' consequences may facilitate such knowledge transfer. We consider here domains where an RL agent has access to two kinds of information: agent-centric information with constant semantics across tasks, and environment-centric information, which is necessary to solve the task, but with semantics that differ between tasks. For example, in robot navigation, environment-centric information may include the robot's geographic location, while agent-centric information may include sensor readings of various nearby obstacles. We propose that these situations provide an opportunity for a very natural style of knowledge transfer, in which the agent learns to predict actions' environmental consequences using agent-centric information. These predictions contain important information about the affordances and dangers present in a novel environment, and can effectively transfer knowledge from agent-centric to environment-centric learning systems. Using several example problems including spatial navigation and network routing, we show that our knowledge transfer approach can allow faster and lower cost learning than existing alternatives.
Hippocampus NMDA receptors selectively mediate latent extinction of place learning.
Goodman, Jarid; Gabriele, Amanda; Packard, Mark G
2016-09-01
Extinction of maze learning may be achieved with or without the animal performing the previously acquired response. In typical "response extinction," animals are given the opportunity to make the previously acquired approach response toward the goal location of the maze without reinforcement. In "latent extinction," animals are not given the opportunity to make the previously acquired response and instead are confined to the previous goal location without reinforcement. Previous evidence indicates that the effectiveness of these protocols may depend on the type of memory being extinguished. Thus, one aim of the present study was to further examine the effectiveness of response and latent extinction protocols across dorsolateral striatum (DLS)-dependent response learning and hippocampus-dependent place learning tasks. In addition, previous neural inactivation experiments indicate a selective role for the hippocampus in latent extinction, but have not investigated the precise neurotransmitter mechanisms involved. Thus, the present study also examined whether latent extinction of place learning might depend on NMDA receptor activity in the hippocampus. In experiment 1, adult male Long-Evans rats were trained in a response learning task in a water plus-maze, in which animals were reinforced to make a consistent body-turn response to reach an invisible escape platform. Results indicated that response extinction, but not latent extinction, was effective at extinguishing memory in the response learning task. In experiment 2, rats were trained in a place learning task, in which animals were reinforced to approach a consistent spatial location containing the hidden escape platform. In experiment 2, animals also received intra-hippocampal infusions of the NMDA receptor antagonist 2-amino-5-phosphopentanoic acid (AP5; 5.0 or 7.5 ug/0.5 µg) or saline vehicle immediately before response or latent extinction training. Results indicated that both extinction protocols were effective at extinguishing memory in the place learning task. In addition, intra-hippocampal AP5 (7.5 µg) impaired latent extinction, but not response extinction, suggesting that hippocampal NMDA receptors are selectively involved in latent extinction. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Shephard, E; Jackson, G M; Groom, M J
2014-01-01
This study examined neurocognitive differences between children and adults in the ability to learn and adapt simple stimulus-response associations through feedback. Fourteen typically developing children (mean age=10.2) and 15 healthy adults (mean age=25.5) completed a simple task in which they learned to associate visually presented stimuli with manual responses based on performance feedback (acquisition phase), and then reversed and re-learned those associations following an unexpected change in reinforcement contingencies (reversal phase). Electrophysiological activity was recorded throughout task performance. We found no group differences in learning-related changes in performance (reaction time, accuracy) or in the amplitude of event-related potentials (ERPs) associated with stimulus processing (P3 ERP) or feedback processing (feedback-related negativity; FRN) during the acquisition phase. However, children's performance was significantly more disrupted by the reversal than adults and FRN amplitudes were significantly modulated by the reversal phase in children but not adults. These findings indicate that children have specific difficulties with reinforcement learning when acquired behaviours must be altered. This may be caused by the added demands on immature executive functioning, specifically response monitoring, created by the requirement to reverse the associations, or a developmental difference in the way in which children and adults approach reinforcement learning. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.
Reinforcement learning state estimator.
Morimoto, Jun; Doya, Kenji
2007-03-01
In this study, we propose a novel use of reinforcement learning for estimating hidden variables and parameters of nonlinear dynamical systems. A critical issue in hidden-state estimation is that we cannot directly observe estimation errors. However, by defining errors of observable variables as a delayed penalty, we can apply a reinforcement learning frame-work to state estimation problems. Specifically, we derive a method to construct a nonlinear state estimator by finding an appropriate feedback input gain using the policy gradient method. We tested the proposed method on single pendulum dynamics and show that the joint angle variable could be successfully estimated by observing only the angular velocity, and vice versa. In addition, we show that we could acquire a state estimator for the pendulum swing-up task in which a swing-up controller is also acquired by reinforcement learning simultaneously. Furthermore, we demonstrate that it is possible to estimate the dynamics of the pendulum itself while the hidden variables are estimated in the pendulum swing-up task. Application of the proposed method to a two-linked biped model is also presented.
Deep Gate Recurrent Neural Network
2016-11-22
Schmidhuber. A system for robotic heart surgery that learns to tie knots using recurrent neural networks. In IEEE International Conference on...tasks, such as Machine Translation (Bahdanau et al. (2015)) or Robot Reinforcement Learning (Bakker (2001)). The main idea behind these networks is to...and J. Peters. Reinforcement learning in robotics : A survey. The International Journal of Robotics Research, 32:1238–1274, 2013. ISSN 0278-3649. doi
Neural Basis of Reinforcement Learning and Decision Making
Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan
2012-01-01
Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain. PMID:22462543
Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules
La Camera, Giancarlo; Richmond, Barry J.
2008-01-01
It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as “schedule length effect”). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: “framing,” wherein equivalent options are treated differently depending on the context in which they are presented, and the “sunk cost” effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys. PMID:18688266
Modeling the violation of reward maximization and invariance in reinforcement schedules.
La Camera, Giancarlo; Richmond, Barry J
2008-08-08
It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as "schedule length effect"). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: "framing," wherein equivalent options are treated differently depending on the context in which they are presented, and the "sunk cost" effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys.
A Flexible Mechanism of Rule Selection Enables Rapid Feature-Based Reinforcement Learning
Balcarras, Matthew; Womelsdorf, Thilo
2016-01-01
Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or color) and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or color. Two-thirds of subjects (n = 22/32) exhibited behavior that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behavior of other subjects (n = 10/32) was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioral rules by leveraging simple model-free reinforcement learning and context-specific selections to drive responses. PMID:27064794
Predictive Movements and Human Reinforcement Learning of Sequential Action
ERIC Educational Resources Information Center
de Kleijn, Roy; Kachergis, George; Hommel, Bernhard
2018-01-01
Sequential action makes up the bulk of human daily activity, and yet much remains unknown about how people learn such actions. In one motor learning paradigm, the serial reaction time (SRT) task, people are taught a consistent sequence of button presses by cueing them with the next target response. However, the SRT task only records keypress…
Fisher, Simon D.; Gray, Jason P.; Black, Melony J.; Davies, Jennifer R.; Bednark, Jeffery G.; Redgrave, Peter; Franz, Elizabeth A.; Abraham, Wickliffe C.; Reynolds, John N. J.
2014-01-01
Action discovery and selection are critical cognitive processes that are understudied at the cellular and systems neuroscience levels. Presented here is a new rodent joystick task suitable to test these processes due to the range of action possibilities that can be learnt while performing the task. Rats learned to manipulate a joystick while progressing through task milestones that required increasing degrees of movement accuracy. In a switching phase designed to measure action discovery, rats were repeatedly required to discover new target positions to meet changing task demands. Behavior was compared using both food and electrical brain stimulation reward (BSR) of the substantia nigra as reinforcement. Rats reinforced with food and those with BSR performed similarly overall, although BSR-treated rats exhibited greater vigor in responding. In the switching phase, rats learnt new actions to adapt to changing task demands, reflecting action discovery processes. Because subjects are required to learn different goal-directed actions, this task could be employed in further investigations of the cellular mechanisms of action discovery and selection. Additionally, this task could be used to assess the behavioral flexibility impairments seen in conditions such as Parkinson's disease and obsessive-compulsive disorder. The versatility of the task will enable cross-species investigations of these impairments. PMID:25477795
Stochastic abstract policies: generalizing knowledge to improve reinforcement learning.
Koga, Marcelo L; Freire, Valdinei; Costa, Anna H R
2015-01-01
Reinforcement learning (RL) enables an agent to learn behavior by acquiring experience through trial-and-error interactions with a dynamic environment. However, knowledge is usually built from scratch and learning to behave may take a long time. Here, we improve the learning performance by leveraging prior knowledge; that is, the learner shows proper behavior from the beginning of a target task, using the knowledge from a set of known, previously solved, source tasks. In this paper, we argue that building stochastic abstract policies that generalize over past experiences is an effective way to provide such improvement and this generalization outperforms the current practice of using a library of policies. We achieve that contributing with a new algorithm, AbsProb-PI-multiple and a framework for transferring knowledge represented as a stochastic abstract policy in new RL tasks. Stochastic abstract policies offer an effective way to encode knowledge because the abstraction they provide not only generalizes solutions but also facilitates extracting the similarities among tasks. We perform experiments in a robotic navigation environment and analyze the agent's behavior throughout the learning process and also assess the transfer ratio for different amounts of source tasks. We compare our method with the transfer of a library of policies, and experiments show that the use of a generalized policy produces better results by more effectively guiding the agent when learning a target task.
Bakic, Jasmina; Pourtois, Gilles; Jepma, Marieke; Duprat, Romain; De Raedt, Rudi; Baeken, Chris
2017-01-01
Major depressive disorder (MDD) creates debilitating effects on a wide range of cognitive functions, including reinforcement learning (RL). In this study, we sought to assess whether reward processing as such, or alternatively the complex interplay between motivation and reward might potentially account for the abnormal reward-based learning in MDD. A total of 35 treatment resistant MDD patients and 44 age matched healthy controls (HCs) performed a standard probabilistic learning task. RL was titrated using behavioral, computational modeling and event-related brain potentials (ERPs) data. MDD patients showed comparable learning rate compared to HCs. However, they showed decreased lose-shift responses as well as blunted subjective evaluations of the reinforcers used during the task, relative to HCs. Moreover, MDD patients showed normal internal (at the level of error-related negativity, ERN) but abnormal external (at the level of feedback-related negativity, FRN) reward prediction error (RPE) signals during RL, selectively when additional efforts had to be made to establish learning. Collectively, these results lend support to the assumption that MDD does not impair reward processing per se during RL. Instead, it seems to alter the processing of the emotional value of (external) reinforcers during RL, when additional intrinsic motivational processes have to be engaged. © 2016 Wiley Periodicals, Inc.
Hierarchically organized behavior and its neural foundations: A reinforcement-learning perspective
Botvinick, Matthew M.; Niv, Yael; Barto, Andrew C.
2009-01-01
Research on human and animal behavior has long emphasized its hierarchical structure — the divisibility of ongoing behavior into discrete tasks, which are comprised of subtask sequences, which in turn are built of simple actions. The hierarchical structure of behavior has also been of enduring interest within neuroscience, where it has been widely considered to reflect prefrontal cortical functions. In this paper, we reexamine behavioral hierarchy and its neural substrates from the point of view of recent developments in computational reinforcement learning. Specifically, we consider a set of approaches known collectively as hierarchical reinforcement learning, which extend the reinforcement learning paradigm by allowing the learning agent to aggregate actions into reusable subroutines or skills. A close look at the components of hierarchical reinforcement learning suggests how they might map onto neural structures, in particular regions within the dorsolateral and orbital prefrontal cortex. It also suggests specific ways in which hierarchical reinforcement learning might provide a complement to existing psychological models of hierarchically structured behavior. A particularly important question that hierarchical reinforcement learning brings to the fore is that of how learning identifies new action routines that are likely to provide useful building blocks in solving a wide range of future problems. Here and at many other points, hierarchical reinforcement learning offers an appealing framework for investigating the computational and neural underpinnings of hierarchically structured behavior. PMID:18926527
Concurrent Learning of Control in Multi agent Sequential Decision Tasks
2018-04-17
Concurrent Learning of Control in Multi-agent Sequential Decision Tasks The overall objective of this project was to develop multi-agent reinforcement...learning (MARL) approaches for intelligent agents to autonomously learn distributed control policies in decentral- ized partially observable...shall be subject to any oenalty for failing to comply with a collection of information if it does not display a currently valid OMB control number
Finger, Elizabeth C; Marsh, Abigail A; Blair, Karina S; Reid, Marguerite E; Sims, Courtney; Ng, Pamela; Pine, Daniel S; Blair, R James R
2011-02-01
Dysfunction in the amygdala and orbitofrontal cortex has been reported in youths and adults with psychopathic traits. The specific nature of the functional irregularities within these structures remains poorly understood. The authors used a passive avoidance task to examine the responsiveness of these systems to early stimulus-reinforcement exposure, when prediction errors are greatest and learning maximized, and to reward in youths with psychopathic traits and comparison youths. While performing the passive avoidance learning task, 15 youths with conduct disorder or oppositional defiant disorder plus a high level of psychopathic traits and 15 healthy subjects completed a 3.0-T fMRI scan. Relative to the comparison youths, the youths with a disruptive behavior disorder plus psychopathic traits showed less orbitofrontal responsiveness both to early stimulus-reinforcement exposure and to rewards, as well as less caudate response to early stimulus-reinforcement exposure. There were no group differences in amygdala responsiveness to these two task measures, but amygdala responsiveness throughout the task was lower in the youths with psychopathic traits. Compromised sensitivity to early reinforcement information in the orbitofrontal cortex and caudate and to reward outcome information in the orbitofrontal cortex of youths with conduct disorder or oppositional defiant disorder plus psychopathic traits suggests that the integrated functioning of the amygdala, caudate, and orbitofrontal cortex may be disrupted. This provides a functional neural basis for why such youths are more likely to repeat disadvantageous decisions. New treatment possibilities are raised, as pharmacologic modulations of serotonin and dopamine can affect this form of learning.
ERIC Educational Resources Information Center
Steingroever, Helen; Wetzels, Ruud; Wagenmakers, Eric-Jan
2013-01-01
The Iowa gambling task (IGT) is one of the most popular tasks used to study decision-making deficits in clinical populations. In order to decompose performance on the IGT in its constituent psychological processes, several cognitive models have been proposed (e.g., the Expectancy Valence (EV) and Prospect Valence Learning (PVL) models). Here we…
Otto, A Ross; Gershman, Samuel J; Markman, Arthur B; Daw, Nathaniel D
2013-05-01
A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls behavior-and under what circumstances-are still unclear. Following the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrated that having human decision makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement-learning strategy. Further, we showed that, across trials, people negotiate the trade-off between the two systems dynamically as a function of concurrent executive-function demands, and people's choice latencies reflect the computational expenses of the strategy they employ. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.
Otto, A. Ross; Gershman, Samuel J.; Markman, Arthur B.; Daw, Nathaniel D.
2013-01-01
A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. Along these lines, a flexible but computationally expensive model-based reinforcement learning system has been contrasted with a less flexible but more efficient model-free reinforcement learning system. The factors governing which system controls behavior—and under what circumstances—are still unclear. Based on the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrate that having human decision-makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement learning strategy. Further, we show that across trials, people negotiate this tradeoff dynamically as a function of concurrent executive function demands and their choice latencies reflect the computational expenses of the strategy employed. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources. PMID:23558545
Negative Reinforcement Impairs Overnight Memory Consolidation
ERIC Educational Resources Information Center
Stamm, Andrew W.; Nguyen, Nam D.; Seicol, Benjamin J.; Fagan, Abigail; Oh, Angela; Drumm, Michael; Lundt, Maureen; Stickgold, Robert; Wamsley, Erin J.
2014-01-01
Post-learning sleep is beneficial for human memory. However, it may be that not all memories benefit equally from sleep. Here, we manipulated a spatial learning task using monetary reward and performance feedback, asking whether enhancing the salience of the task would augment overnight memory consolidation and alter its incorporation into…
Explicit and implicit reinforcement learning across the psychosis spectrum.
Barch, Deanna M; Carter, Cameron S; Gold, James M; Johnson, Sheri L; Kring, Ann M; MacDonald, Angus W; Pizzagalli, Diego A; Ragland, J Daniel; Silverstein, Steven M; Strauss, Milton E
2017-07-01
Motivational and hedonic impairments are core features of a variety of types of psychopathology. An important aspect of motivational function is reinforcement learning (RL), including implicit (i.e., outside of conscious awareness) and explicit (i.e., including explicit representations about potential reward associations) learning, as well as both positive reinforcement (learning about actions that lead to reward) and punishment (learning to avoid actions that lead to loss). Here we present data from paradigms designed to assess both positive and negative components of both implicit and explicit RL, examine performance on each of these tasks among individuals with schizophrenia, schizoaffective disorder, and bipolar disorder with psychosis, and examine their relative relationships to specific symptom domains transdiagnostically. None of the diagnostic groups differed significantly from controls on the implicit RL tasks in either bias toward a rewarded response or bias away from a punished response. However, on the explicit RL task, both the individuals with schizophrenia and schizoaffective disorder performed significantly worse than controls, but the individuals with bipolar did not. Worse performance on the explicit RL task, but not the implicit RL task, was related to worse motivation and pleasure symptoms across all diagnostic categories. Performance on explicit RL, but not implicit RL, was related to working memory, which accounted for some of the diagnostic group differences. However, working memory did not account for the relationship of explicit RL to motivation and pleasure symptoms. These findings suggest transdiagnostic relationships across the spectrum of psychotic disorders between motivation and pleasure impairments and explicit RL. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Amygdala and Ventral Striatum Make Distinct Contributions to Reinforcement Learning.
Costa, Vincent D; Dal Monte, Olga; Lucas, Daniel R; Murray, Elisabeth A; Averbeck, Bruno B
2016-10-19
Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL, we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with an RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys' choice reaction times, which emphasized a speed-accuracy trade-off that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL. Published by Elsevier Inc.
Amygdala and ventral striatum make distinct contributions to reinforcement learning
Costa, Vincent D.; Monte, Olga Dal; Lucas, Daniel R.; Murray, Elisabeth A.; Averbeck, Bruno B.
2016-01-01
Summary Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with a RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys’ choice reaction times, which emphasized a speed-accuracy tradeoff that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL. PMID:27720488
Negative reinforcement impairs overnight memory consolidation.
Stamm, Andrew W; Nguyen, Nam D; Seicol, Benjamin J; Fagan, Abigail; Oh, Angela; Drumm, Michael; Lundt, Maureen; Stickgold, Robert; Wamsley, Erin J
2014-11-01
Post-learning sleep is beneficial for human memory. However, it may be that not all memories benefit equally from sleep. Here, we manipulated a spatial learning task using monetary reward and performance feedback, asking whether enhancing the salience of the task would augment overnight memory consolidation and alter its incorporation into dreaming. Contrary to our hypothesis, we found that the addition of reward impaired overnight consolidation of spatial memory. Our findings seemingly contradict prior reports that enhancing the reward value of learned information augments sleep-dependent memory processing. Given that the reward followed a negative reinforcement paradigm, consolidation may have been impaired via a stress-related mechanism. © 2014 Stamm et al.; Published by Cold Spring Harbor Laboratory Press.
Learning to use working memory: a reinforcement learning gating model of rule acquisition in rats
Lloyd, Kevin; Becker, Nadine; Jones, Matthew W.; Bogacz, Rafal
2012-01-01
Learning to form appropriate, task-relevant working memory representations is a complex process central to cognition. Gating models frame working memory as a collection of past observations and use reinforcement learning (RL) to solve the problem of when to update these observations. Investigation of how gating models relate to brain and behavior remains, however, at an early stage. The current study sought to explore the ability of simple RL gating models to replicate rule learning behavior in rats. Rats were trained in a maze-based spatial learning task that required animals to make trial-by-trial choices contingent upon their previous experience. Using an abstract version of this task, we tested the ability of two gating algorithms, one based on the Actor-Critic and the other on the State-Action-Reward-State-Action (SARSA) algorithm, to generate behavior consistent with the rats'. Both models produced rule-acquisition behavior consistent with the experimental data, though only the SARSA gating model mirrored faster learning following rule reversal. We also found that both gating models learned multiple strategies in solving the initial task, a property which highlights the multi-agent nature of such models and which is of importance in considering the neural basis of individual differences in behavior. PMID:23115551
ERIC Educational Resources Information Center
Zrinzo, Michelle L.
2010-01-01
I tested the effects of the absence of an adult on the observational conditioning effect (Greer & Singer-Dudek, 2008). Neutral stimuli (metal washers) did not function to reinforce performance or learning tasks for three preschool age children as determined by a counterbalanced reversal design for the pre-intervention performance tasks and…
Matzel, Louis D.; Light, Kenneth R.; Wass, Christopher; Colas-Zelin, Danielle; Denman-Brice, Alexander; Waddel, Adam C.; Kolata, Stefan
2011-01-01
Learning, attentional, and perseverative deficits are characteristic of cognitive aging. In this study, genetically diverse CD-1 mice underwent longitudinal training in a task asserted to tax working memory capacity and its dependence on selective attention. Beginning at 3 mo of age, animals were trained for 12 d to perform in a dual radial-arm maze task that required the mice to remember and operate on two sets of overlapping guidance (spatial) cues. As previously reported, this training resulted in an immediate (at 4 mo of age) improvement in the animals' aggregate performance across a battery of five learning tasks. Subsequently, these animals received an additional 3 d of working memory training at 3-wk intervals for 15 mo (totaling 66 training sessions), and at 18 mo of age were assessed on a selective attention task, a second set of learning tasks, and variations of those tasks that required the animals to modify the previously learned response. Both attentional and learning abilities (on passive avoidance, active avoidance, and reinforced alternation tasks) were impaired in aged animals that had not received working memory training. Likewise, these aged animals exhibited consistent deficits when required to modify a previously instantiated learned response (in reinforced alternation, active avoidance, and spatial water maze). In contrast, these attentional, learning, and perseverative deficits were attenuated in aged animals that had undergone lifelong working memory exercise. These results suggest that general impairments of learning, attention, and cognitive flexibility may be mitigated by a cognitive exercise regimen that requires chronic attentional engagement. PMID:21521768
Matzel, Louis D; Light, Kenneth R; Wass, Christopher; Colas-Zelin, Danielle; Denman-Brice, Alexander; Waddel, Adam C; Kolata, Stefan
2011-01-01
Learning, attentional, and perseverative deficits are characteristic of cognitive aging. In this study, genetically diverse CD-1 mice underwent longitudinal training in a task asserted to tax working memory capacity and its dependence on selective attention. Beginning at 3 mo of age, animals were trained for 12 d to perform in a dual radial-arm maze task that required the mice to remember and operate on two sets of overlapping guidance (spatial) cues. As previously reported, this training resulted in an immediate (at 4 mo of age) improvement in the animals' aggregate performance across a battery of five learning tasks. Subsequently, these animals received an additional 3 d of working memory training at 3-wk intervals for 15 mo (totaling 66 training sessions), and at 18 mo of age were assessed on a selective attention task, a second set of learning tasks, and variations of those tasks that required the animals to modify the previously learned response. Both attentional and learning abilities (on passive avoidance, active avoidance, and reinforced alternation tasks) were impaired in aged animals that had not received working memory training. Likewise, these aged animals exhibited consistent deficits when required to modify a previously instantiated learned response (in reinforced alternation, active avoidance, and spatial water maze). In contrast, these attentional, learning, and perseverative deficits were attenuated in aged animals that had undergone lifelong working memory exercise. These results suggest that general impairments of learning, attention, and cognitive flexibility may be mitigated by a cognitive exercise regimen that requires chronic attentional engagement.
Bublitz, Alexander; Weinhold, Severine R.; Strobel, Sophia; Dehnhardt, Guido; Hanke, Frederike D.
2017-01-01
Octopuses (Octopus vulgaris) are generally considered to possess extraordinary cognitive abilities including the ability to successfully perform in a serial reversal learning task. During reversal learning, an animal is presented with a discrimination problem and after reaching a learning criterion, the signs of the stimuli are reversed: the former positive becomes the negative stimulus and vice versa. If an animal improves its performance over reversals, it is ascribed advanced cognitive abilities. Reversal learning has been tested in octopus in a number of studies. However, the experimental procedures adopted in these studies involved pre-training on the new positive stimulus after a reversal, strong negative reinforcement or might have enabled secondary cueing by the experimenter. These procedures could have all affected the outcome of reversal learning. Thus, in this study, serial visual reversal learning was revisited in octopus. We trained four common octopuses (O. vulgaris) to discriminate between 2-dimensional stimuli presented on a monitor in a simultaneous visual discrimination task and reversed the signs of the stimuli each time the animals reached the learning criterion of ≥80% in two consecutive sessions. The animals were trained using operant conditioning techniques including a secondary reinforcer, a rod that was pushed up and down the feeding tube, which signaled the correctness of a response and preceded the subsequent primary reinforcement of food. The experimental protocol did not involve negative reinforcement. One animal completed four reversals and showed progressive improvement, i.e., it decreased its errors to criterion the more reversals it experienced. This animal developed a generalized response strategy. In contrast, another animal completed only one reversal, whereas two animals did not learn to reverse during the first reversal. In conclusion, some octopus individuals can learn to reverse in a visual task demonstrating behavioral flexibility even with a refined methodology. PMID:28223940
Beyond Stimulus Cues and Reinforcement Signals: A New Approach to Animal Metacognition
Couchman, Justin J.; Coutinho, Mariana V. C.; Beran, Michael J.; Smith, J. David
2010-01-01
Some metacognition paradigms for nonhuman animals encourage the alternative explanation that animals avoid difficult trials based only on reinforcement history and stimulus aversion. To explore this possibility, we placed humans and monkeys in successive uncertainty-monitoring tasks that were qualitatively different, eliminating many associative cues that might support transfer across tasks. In addition, task transfer occurred under conditions of deferred and rearranged feedback—both species completed blocks of trials followed by summary feedback. This ensured that animals received no trial-by-trial reinforcement. Despite distancing performance from associative cues, humans and monkeys still made adaptive uncertainty responses by declining the most difficult trials. These findings suggest that monkeys’ uncertainty responses could represent a higher-level, decisional process of cognitive monitoring, though that process need not involve full self-awareness or consciousness. The dissociation of performance from reinforcement has theoretical implications concerning the status of reinforcement as the critical binding force in animal learning. PMID:20836592
Place preference and vocal learning rely on distinct reinforcers in songbirds.
Murdoch, Don; Chen, Ruidong; Goldberg, Jesse H
2018-04-30
In reinforcement learning (RL) agents are typically tasked with maximizing a single objective function such as reward. But it remains poorly understood how agents might pursue distinct objectives at once. In machines, multiobjective RL can be achieved by dividing a single agent into multiple sub-agents, each of which is shaped by agent-specific reinforcement, but it remains unknown if animals adopt this strategy. Here we use songbirds to test if navigation and singing, two behaviors with distinct objectives, can be differentially reinforced. We demonstrate that strobe flashes aversively condition place preference but not song syllables. Brief noise bursts aversively condition song syllables but positively reinforce place preference. Thus distinct behavior-generating systems, or agencies, within a single animal can be shaped by correspondingly distinct reinforcement signals. Our findings suggest that spatially segregated vocal circuits can solve a credit assignment problem associated with multiobjective learning.
Autonomous reinforcement learning with experience replay.
Wawrzyński, Paweł; Tanwani, Ajay Kumar
2013-05-01
This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. Copyright © 2012 Elsevier Ltd. All rights reserved.
Microstimulation of the Human Substantia Nigra Alters Reinforcement Learning
Ramayya, Ashwin G.; Misra, Amrit
2014-01-01
Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action–reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action–reward associations rather than stimulus–reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action–reward associations during reinforcement learning. PMID:24828643
Social reinforcement can regulate localized brain activity.
Mathiak, Krystyna A; Koush, Yury; Dyck, Miriam; Gaber, Tilman J; Alawi, Eliza; Zepf, Florian D; Zvyagintsev, Mikhail; Mathiak, Klaus
2010-11-01
Social learning is essential for adaptive behavior in humans. Neurofeedback based on functional magnetic resonance imaging (fMRI) trains control over localized brain activity. It can disentangle learning processes at the neural level and thus investigate the mechanisms of operant conditioning with explicit social reinforcers. In a pilot study, a computer-generated face provided a positive feedback (smiling) when activity in the anterior cingulate cortex (ACC) increased and gradually returned to a neutral expression when the activity dropped. One female volunteer without previous experience in fMRI underwent training based on a social reinforcer. Directly before and after the neurofeedback runs, neural responses to a cognitive interference task (Simon task) were recorded. We observed a significant increase in activity within ACC during the neurofeedback blocks, correspondent with the a-priori defined anatomical region of interest. In the course of the neurofeedback training, the subject learned to regulate ACC activity and could maintain the control even without direct feedback. Moreover, ACC was activated significantly stronger during Simon task after the neurofeedback training when compared to before. Localized brain activity can be controlled by social reward. The increased ACC activity transferred to a cognitive task with the potential to reduce cognitive interference. Systematic studies are required to explore long-term effects on social behavior and clinical applications.
Lei, Yuming; Binder, Jeffrey R.
2015-01-01
The extent to which motor learning is generalized across the limbs is typically very limited. Here, we investigated how two motor learning hypotheses could be used to enhance the extent of interlimb transfer. According to one hypothesis, we predicted that reinforcement of successful actions by providing binary error feedback regarding task success or failure, in addition to terminal error feedback, during initial training would increase the extent of interlimb transfer following visuomotor adaptation (experiment 1). According to the other hypothesis, we predicted that performing a reaching task repeatedly with one arm without providing performance feedback (which prevented learning the task with this arm), while concurrently adapting to a visuomotor rotation with the other arm, would increase the extent of transfer (experiment 2). Results indicate that providing binary error feedback, compared with continuous visual feedback that provided movement direction and amplitude information, had no influence on the extent of transfer. In contrast, repeatedly performing (but not learning) a specific task with one arm while visuomotor adaptation occurred with the other arm led to nearly complete transfer. This suggests that the absence of motor instances associated with specific effectors and task conditions is the major reason for limited interlimb transfer and that reinforcement of successful actions during initial training is not beneficial for interlimb transfer. These findings indicate crucial contributions of effector- and task-specific motor instances, which are thought to underlie (a type of) model-free learning, to optimal motor learning and interlimb transfer. PMID:25632082
Towards autonomous neuroprosthetic control using Hebbian reinforcement learning.
Mahmoudi, Babak; Pohlmeyer, Eric A; Prins, Noeline W; Geng, Shijia; Sanchez, Justin C
2013-12-01
Our goal was to design an adaptive neuroprosthetic controller that could learn the mapping from neural states to prosthetic actions and automatically adjust adaptation using only a binary evaluative feedback as a measure of desirability/undesirability of performance. Hebbian reinforcement learning (HRL) in a connectionist network was used for the design of the adaptive controller. The method combines the efficiency of supervised learning with the generality of reinforcement learning. The convergence properties of this approach were studied using both closed-loop control simulations and open-loop simulations that used primate neural data from robot-assisted reaching tasks. The HRL controller was able to perform classification and regression tasks using its episodic and sequential learning modes, respectively. In our experiments, the HRL controller quickly achieved convergence to an effective control policy, followed by robust performance. The controller also automatically stopped adapting the parameters after converging to a satisfactory control policy. Additionally, when the input neural vector was reorganized, the controller resumed adaptation to maintain performance. By estimating an evaluative feedback directly from the user, the HRL control algorithm may provide an efficient method for autonomous adaptation of neuroprosthetic systems. This method may enable the user to teach the controller the desired behavior using only a simple feedback signal.
Tiger salamanders' (Ambystoma tigrinum) response learning and usage of visual cues.
Kundey, Shannon M A; Millar, Roberto; McPherson, Justin; Gonzalez, Maya; Fitz, Aleyna; Allen, Chadbourne
2016-05-01
We explored tiger salamanders' (Ambystoma tigrinum) learning to execute a response within a maze as proximal visual cue conditions varied. In Experiment 1, salamanders learned to turn consistently in a T-maze for reinforcement before the maze was rotated. All learned the initial task and executed the trained turn during test, suggesting that they learned to demonstrate the reinforced response during training and continued to perform it during test. In a second experiment utilizing a similar procedure, two visual cues were placed consistently at the maze junction. Salamanders were reinforced for turning towards one cue. Cue placement was reversed during test. All learned the initial task, but executed the trained turn rather than turning towards the visual cue during test, evidencing response learning. In Experiment 3, we investigated whether a compound visual cue could control salamanders' behaviour when it was the only cue predictive of reinforcement in a cross-maze by varying start position and cue placement. All learned to turn in the direction indicated by the compound visual cue, indicating that visual cues can come to control their behaviour. Following training, testing revealed that salamanders attended to stimuli foreground over background features. Overall, these results suggest that salamanders learn to execute responses over learning to use visual cues but can use visual cues if required. Our success with this paradigm offers the potential in future studies to explore salamanders' cognition further, as well as to shed light on how features of the tiger salamanders' life history (e.g. hibernation and metamorphosis) impact cognition.
Impedance learning for robotic contact tasks using natural actor-critic algorithm.
Kim, Byungchan; Park, Jooyoung; Park, Shinsuk; Kang, Sungchul
2010-04-01
Compared with their robotic counterparts, humans excel at various tasks by using their ability to adaptively modulate arm impedance parameters. This ability allows us to successfully perform contact tasks even in uncertain environments. This paper considers a learning strategy of motor skill for robotic contact tasks based on a human motor control theory and machine learning schemes. Our robot learning method employs impedance control based on the equilibrium point control theory and reinforcement learning to determine the impedance parameters for contact tasks. A recursive least-square filter-based episodic natural actor-critic algorithm is used to find the optimal impedance parameters. The effectiveness of the proposed method was tested through dynamic simulations of various contact tasks. The simulation results demonstrated that the proposed method optimizes the performance of the contact tasks in uncertain conditions of the environment.
McDonald, R J; Hong, N S
2004-01-01
This experiment tested the idea that the amygdala-based learning and memory system covertly acquires a stimulus-reward (stimulus-outcome) association during acquisition of a stimulus-response (S-R) habit task developed for the eight-arm radial maze. Groups of rats were given dorso-lateral striatal or amygdala lesions and then trained on the S-R habit task on the eight-arm radial maze. Rats with neurotoxic damage to the dorso-lateral striatum were severely impaired on the acquisition of the S-R habit task but showed a conditioned-cue preference for the stimulus reinforced during S-R habit training. Rats with neurotoxic damage to the amygdala were able to acquire the S-R habit task but did not show a conditioned-cue preference for the stimulus reinforced during S-R habit training. This pattern of results represents a dissociation of learning and memory functions of the dorsal striatum and amygdala on the same task.
Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis
2013-01-01
Background Depression is characterised partly by blunted reactions to reward. However, tasks probing this deficiency have not distinguished insensitivity to reward from insensitivity to the prediction errors for reward that determine learning and are putatively reported by the phasic activity of dopamine neurons. We attempted to disentangle these factors with respect to anhedonia in the context of stress, Major Depressive Disorder (MDD), Bipolar Disorder (BPD) and a dopaminergic challenge. Methods Six behavioural datasets involving 392 experimental sessions were subjected to a model-based, Bayesian meta-analysis. Participants across all six studies performed a probabilistic reward task that used an asymmetric reinforcement schedule to assess reward learning. Healthy controls were tested under baseline conditions, stress or after receiving the dopamine D2 agonist pramipexole. In addition, participants with current or past MDD or BPD were evaluated. Reinforcement learning models isolated the contributions of variation in reward sensitivity and learning rate. Results MDD and anhedonia reduced reward sensitivity more than they affected the learning rate, while a low dose of the dopamine D2 agonist pramipexole showed the opposite pattern. Stress led to a pattern consistent with a mixed effect on reward sensitivity and learning rate. Conclusion Reward-related learning reflected at least two partially separable contributions. The first related to phasic prediction error signalling, and was preferentially modulated by a low dose of the dopamine agonist pramipexole. The second related directly to reward sensitivity, and was preferentially reduced in MDD and anhedonia. Stress altered both components. Collectively, these findings highlight the contribution of model-based reinforcement learning meta-analysis for dissecting anhedonic behavior. PMID:23782813
Microstimulation of the human substantia nigra alters reinforcement learning.
Ramayya, Ashwin G; Misra, Amrit; Baltuch, Gordon H; Kahana, Michael J
2014-05-14
Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action-reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action-reward associations rather than stimulus-reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action-reward associations during reinforcement learning. Copyright © 2014 the authors 0270-6474/14/346887-09$15.00/0.
Zhang, Yunfeng; Paik, Jaehyon; Pirolli, Peter
2015-04-01
Animals routinely adapt to changes in the environment in order to survive. Though reinforcement learning may play a role in such adaptation, it is not clear that it is the only mechanism involved, as it is not well suited to producing rapid, relatively immediate changes in strategies in response to environmental changes. This research proposes that counterfactual reasoning might be an additional mechanism that facilitates change detection. An experiment is conducted in which a task state changes over time and the participants had to detect the changes in order to perform well and gain monetary rewards. A cognitive model is constructed that incorporates reinforcement learning with counterfactual reasoning to help quickly adjust the utility of task strategies in response to changes. The results show that the model can accurately explain human data and that counterfactual reasoning is key to reproducing the various effects observed in this change detection paradigm. Copyright © 2015 Cognitive Science Society, Inc.
Enhanced appetitive learning and reversal learning in a mouse model for Prader-Willi syndrome.
Relkovic, Dinko; Humby, Trevor; Hagan, Jim J; Wilkinson, Lawrence S; Isles, Anthony R
2012-06-01
Prader-Willi syndrome (PWS) is caused by lack of paternally derived gene expression from the imprinted gene cluster on human chromosome 15q11-q13. PWS is characterized by severe hypotonia, a failure to thrive in infancy and, on emerging from infancy, evidence of learning disabilities and overeating behavior due to an abnormal satiety response and increased motivation by food. We have previously shown that an imprinting center deletion mouse model (PWS-IC) is quicker to acquire a preference for, and consume more of a palatable food. Here we examined how the use of this palatable food as a reinforcer influences learning in PWS-IC mice performing a simple appetitive learning task. On a nonspatial maze-based task, PWS-IC mice acquired criteria much quicker, making fewer errors during initial acquisition and also reversal learning. A manipulation where the reinforcer was devalued impaired wild-type performance but had no effect on PWS-IC mice. This suggests that increased motivation for the reinforcer in PWS-IC mice may underlie their enhanced learning. This supports previous findings in PWS patients and is the first behavioral study of an animal model of PWS in which the motivation of behavior by food rewards has been examined. © 2012 American Psychological Association
The Interaction of Temporal Generalization Gradients Predicts the Context Effect
ERIC Educational Resources Information Center
de Castro, Ana Catarina; Machado, Armando
2012-01-01
In a temporal double bisection task, animals learn two discriminations. In the presence of Red and Green keys, responses to Red are reinforced after 1-s samples and responses to Green are reinforced after 4-s samples; in the presence of Blue and Yellow keys, responses to Blue are reinforced after 4-s samples and responses to Yellow are reinforced…
Policy Transfer via Markov Logic Networks
NASA Astrophysics Data System (ADS)
Torrey, Lisa; Shavlik, Jude
We propose using a statistical-relational model, the Markov Logic Network, for knowledge transfer in reinforcement learning. Our goal is to extract relational knowledge from a source task and use it to speed up learning in a related target task. We show that Markov Logic Networks are effective models for capturing both source-task Q-functions and source-task policies. We apply them via demonstration, which involves using them for decision making in an initial stage of the target task before continuing to learn. Through experiments in the RoboCup simulated-soccer domain, we show that transfer via Markov Logic Networks can significantly improve early performance in complex tasks, and that transferring policies is more effective than transferring Q-functions.
Pfeifer, Gaby; Garfinkel, Sarah N; Gould van Praag, Cassandra D; Sahota, Kuljit; Betka, Sophie; Critchley, Hugo D
2017-05-01
Feedback processing is critical to trial-and-error learning. Here, we examined whether interoceptive signals concerning the state of cardiovascular arousal influence the processing of reinforcing feedback during the learning of 'emotional' face-name pairs, with subsequent effects on retrieval. Participants (N=29) engaged in a learning task of face-name pairs (fearful, neutral, happy faces). Correct and incorrect learning decisions were reinforced by auditory feedback, which was delivered either at cardiac systole (on the heartbeat, when baroreceptors signal the contraction of the heart to the brain), or at diastole (between heartbeats during baroreceptor quiescence). We discovered a cardiac influence on feedback processing that enhanced the learning of fearful faces in people with heightened interoceptive ability. Individuals with enhanced accuracy on a heartbeat counting task learned fearful face-name pairs better when feedback was given at systole than at diastole. This effect was not present for neutral and happy faces. At retrieval, we also observed related effects of personality: First, individuals scoring higher for extraversion showed poorer retrieval accuracy. These individuals additionally manifested lower resting heart rate and lower state anxiety, suggesting that attenuated levels of cardiovascular arousal in extraverts underlies poorer performance. Second, higher extraversion scores predicted higher emotional intensity ratings of fearful faces reinforced at systole. Third, individuals scoring higher for neuroticism showed higher retrieval confidence for fearful faces reinforced at diastole. Our results show that cardiac signals shape feedback processing to influence learning of fearful faces, an effect underpinned by personality differences linked to psychophysiological arousal. Copyright © 2017 Elsevier B.V. All rights reserved.
GA-based fuzzy reinforcement learning for control of a magnetic bearing system.
Lin, C T; Jou, C P
2000-01-01
This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.
Fernie, Gordon; Tunney, Richard J
2006-02-01
The Iowa Gambling Task (Bechara, Damasio, Damasio, & Anderson, 1994) has become widely used as a laboratory test of "real-life" decision-making. However, aspects of its administration that have been varied by researchers may differentially affect performance and the conclusions researchers can draw. Some researchers have used facsimile money reinforcers while others have used real money reinforcers. More importantly, the instructions participants receive have also been varied. While no differences have been reported in performance dependent on reinforcer type, no previous comparison of participants' instructions has been conducted. This is despite one set of instructions giving participants a clear hint about the nature of the task. Additionally, in previous research one set of instructions have not been used exclusively with one reinforcer type making any differential or cumulative effects of these factors difficult to interpret. The present study compared the effects of instruction and reinforcer type on IGT performance. When participants received instructions without a hint performance was affected by reinforcer type. This was not the case when the instructions included a hint. In a second IGT session performance was improved in participants who had received the hint instructions compared with those who had not.
ERIC Educational Resources Information Center
Lozano, J. H.; Hernandez, J. M.; Rubio, V. J.; Santacreu, J.
2011-01-01
Although intelligence has traditionally been identified as "the ability to learn" (Peterson, 1925), this relationship has been questioned in simple operant learning tasks (Spielberger, 1962). Nevertheless, recent pieces of research have demonstrated a strong and significant correlation between associative learning measures and intelligence…
ERIC Educational Resources Information Center
Keen, Deb; Pennell, Donna
2015-01-01
Identifying and using preferred items and activities to increase motivation and participation of children with autism spectrum disorder (ASD) has been an important and frequently used intervention strategy. Preferred objects, typically identified through a preference assessment, are most frequently used during instruction as reinforcers. These…
Flow Navigation by Smart Microswimmers via Reinforcement Learning
NASA Astrophysics Data System (ADS)
Colabrese, Simona; Biferale, Luca; Celani, Antonio; Gustavsson, Kristian
2017-11-01
We have numerically modeled active particles which are able to acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. We show that those swimmers can learn effective strategies just by experience, using a reinforcement learning algorithm. As an example, we focus on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, exploiting the underlying flow whenever possible. The reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This work paves the way towards the engineering of smart microswimmers that solve difficult navigation problems. ERC AdG NewTURB 339032.
The limits and motivating potential of sensory stimuli as reinforcers for autistic children.
Ferrari, M; Harris, S L
1981-01-01
This study investigated the reinforcing properties, limits, and motivating potentials of sensory stimuli with autistic children. In the first phase of the study, four intellectually retarded autistic children were exposed to three different types of sensory stimulation (vibration, music, and strobe light) as well as edible and social reinforcers for ten-second intervals contingent upon six simple bar pressing responses. In the second phase, the same events were used as reinforcers for correct responses in learning object labels. The results indicated that: (a) sensory stimuli can be used effectively as reinforcers to maintain high, durable rates of responding in a simple pressing task; (b) ranked preferences for sensory stimuli revealed a unique configuration of responding for each child; and (c) sensory stimuli have motivating potentials comparable to those of the traditional food and social reinforcers even when training receptive language tasks.
Satterthwaite, Theodore D; Ruparel, Kosha; Loughead, James; Elliott, Mark A; Gerraty, Raphael T; Calkins, Monica E; Hakonarson, Hakon; Gur, Ruben C; Gur, Raquel E; Wolf, Daniel H
2012-07-02
The ventral striatum (VS) is a critical brain region for reinforcement learning and motivation. Intrinsically motivated subjects performing challenging cognitive tasks engage reinforcement circuitry including VS even in the absence of external feedback or incentives. However, little is known about how such VS responses develop with age, relate to task performance, and are influenced by task difficulty. Here we used fMRI to examine VS activation to correct and incorrect responses during a standard n-back working memory task in a large sample (n=304) of healthy children, adolescents and young adults aged 8-22. We found that bilateral VS activates more strongly to correct than incorrect responses, and that the VS response scales with the difficulty of the working memory task. Furthermore, VS response was correlated with discrimination performance during the task, and the magnitude of VS response peaked in mid-adolescence. These findings provide evidence for scalable intrinsic reinforcement signals during standard cognitive tasks, and suggest a novel link between motivation and cognition during adolescent development. Copyright © 2012 Elsevier Inc. All rights reserved.
Shivkumar, Sabyasachi; Muralidharan, Vignesh; Chakravarthy, V Srinivasa
2017-01-01
Basal ganglia circuit is an important subcortical system of the brain thought to be responsible for reward-based learning. Striatum, the largest nucleus of the basal ganglia, serves as an input port that maps cortical information. Microanatomical studies show that the striatum is a mosaic of specialized input-output structures called striosomes and regions of the surrounding matrix called the matrisomes. We have developed a computational model of the striatum using layered self-organizing maps to capture the center-surround structure seen experimentally and explain its functional significance. We believe that these structural components could build representations of state and action spaces in different environments. The striatum model is then integrated with other components of basal ganglia, making it capable of solving reinforcement learning tasks. We have proposed a biologically plausible mechanism of action-based learning where the striosome biases the matrisome activity toward a preferred action. Several studies indicate that the striatum is critical in solving context dependent problems. We build on this hypothesis and the proposed model exploits the modularity of the striatum to efficiently solve such tasks.
Shivkumar, Sabyasachi; Muralidharan, Vignesh; Chakravarthy, V. Srinivasa
2017-01-01
Basal ganglia circuit is an important subcortical system of the brain thought to be responsible for reward-based learning. Striatum, the largest nucleus of the basal ganglia, serves as an input port that maps cortical information. Microanatomical studies show that the striatum is a mosaic of specialized input-output structures called striosomes and regions of the surrounding matrix called the matrisomes. We have developed a computational model of the striatum using layered self-organizing maps to capture the center-surround structure seen experimentally and explain its functional significance. We believe that these structural components could build representations of state and action spaces in different environments. The striatum model is then integrated with other components of basal ganglia, making it capable of solving reinforcement learning tasks. We have proposed a biologically plausible mechanism of action-based learning where the striosome biases the matrisome activity toward a preferred action. Several studies indicate that the striatum is critical in solving context dependent problems. We build on this hypothesis and the proposed model exploits the modularity of the striatum to efficiently solve such tasks. PMID:28680395
Framework for robot skill learning using reinforcement learning
NASA Astrophysics Data System (ADS)
Wei, Yingzi; Zhao, Mingyang
2003-09-01
Robot acquiring skill is a process similar to human skill learning. Reinforcement learning (RL) is an on-line actor critic method for a robot to develop its skill. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. We present an augmented reward function that provides a new way for RL controller to incorporate prior knowledge and experience into the RL controller. Also, the difference form of augmented reward function is considered carefully. The additional reward beyond conventional reward will provide more heuristic information for RL. In this paper, we present a strategy for the task of complex skill learning. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. The new form of value function is introduced to attain smooth motion switching swiftly. We present a formal, but practical, framework for robot skill learning and also illustrate with an example the utility of method for learning skilled robot control on line.
Credit assignment in movement-dependent reinforcement learning
Boggess, Matthew J.; Crossley, Matthew J.; Parvin, Darius; Ivry, Richard B.; Taylor, Jordan A.
2016-01-01
When a person fails to obtain an expected reward from an object in the environment, they face a credit assignment problem: Did the absence of reward reflect an extrinsic property of the environment or an intrinsic error in motor execution? To explore this problem, we modified a popular decision-making task used in studies of reinforcement learning, the two-armed bandit task. We compared a version in which choices were indicated by key presses, the standard response in such tasks, to a version in which the choices were indicated by reaching movements, which affords execution failures. In the key press condition, participants exhibited a strong risk aversion bias; strikingly, this bias reversed in the reaching condition. This result can be explained by a reinforcement model wherein movement errors influence decision-making, either by gating reward prediction errors or by modifying an implicit representation of motor competence. Two further experiments support the gating hypothesis. First, we used a condition in which we provided visual cues indicative of movement errors but informed the participants that trial outcomes were independent of their actual movements. The main result was replicated, indicating that the gating process is independent of participants’ explicit sense of control. Second, individuals with cerebellar degeneration failed to modulate their behavior between the key press and reach conditions, providing converging evidence of an implicit influence of movement error signals on reinforcement learning. These results provide a mechanistically tractable solution to the credit assignment problem. PMID:27247404
Credit assignment in movement-dependent reinforcement learning.
McDougle, Samuel D; Boggess, Matthew J; Crossley, Matthew J; Parvin, Darius; Ivry, Richard B; Taylor, Jordan A
2016-06-14
When a person fails to obtain an expected reward from an object in the environment, they face a credit assignment problem: Did the absence of reward reflect an extrinsic property of the environment or an intrinsic error in motor execution? To explore this problem, we modified a popular decision-making task used in studies of reinforcement learning, the two-armed bandit task. We compared a version in which choices were indicated by key presses, the standard response in such tasks, to a version in which the choices were indicated by reaching movements, which affords execution failures. In the key press condition, participants exhibited a strong risk aversion bias; strikingly, this bias reversed in the reaching condition. This result can be explained by a reinforcement model wherein movement errors influence decision-making, either by gating reward prediction errors or by modifying an implicit representation of motor competence. Two further experiments support the gating hypothesis. First, we used a condition in which we provided visual cues indicative of movement errors but informed the participants that trial outcomes were independent of their actual movements. The main result was replicated, indicating that the gating process is independent of participants' explicit sense of control. Second, individuals with cerebellar degeneration failed to modulate their behavior between the key press and reach conditions, providing converging evidence of an implicit influence of movement error signals on reinforcement learning. These results provide a mechanistically tractable solution to the credit assignment problem.
Pavlovian to instrumental transfer of control in a human learning task.
Nadler, Natasha; Delgado, Mauricio R; Delamater, Andrew R
2011-10-01
Pavlovian learning tasks have been widely used as tools to understand basic cognitive and emotional processes in humans. The present studies investigated one particular task, Pavlovian-to-instrumental transfer (PIT), with human participants in an effort to examine potential cognitive and emotional effects of Pavlovian cues upon instrumentally trained performance. In two experiments, subjects first learned two separate instrumental response-outcome relationships (i.e., R1-O1 and R2-O2) and then were exposed to various stimulus-outcome relationships (i.e., S1-O1, S2-O2, S3-O3, and S4-) before the effects of the Pavlovian stimuli on instrumental responding were assessed during a non-reinforced test. In Experiment 1, instrumental responding was established using a positive-reinforcement procedure, whereas in Experiment 2, a quasi-avoidance learning task was used. In both cases, the Pavlovian stimuli exerted selective control over instrumental responding, whereby S1 and S2 selectively elevated the instrumental response with which it shared an outcome. In addition, in Experiment 2, S3 exerted a nonselective transfer of control effect, whereby both responses were elevated over baseline levels. These data identify two ways, one specific and one general, in which Pavlovian processes can exert control over instrumental responding in human learning paradigms, suggesting that this method may serve as a useful tool in the study of basic cognitive and emotional processes in human learning.
Schifani, Christin; Sukhanov, Ilya; Dorofeikova, Mariia; Bespalov, Anton
2017-07-28
There is a need to develop cognitive tasks that address valid neuropsychological constructs implicated in disease mechanisms and can be used in animals and humans to guide novel drug discovery. Present experiments aimed to characterize a novel reinforcement learning task based on a classical operant behavioral phenomenon observed in multiple species - differences in response patterning under variable (VI) vs fixed interval (FI) schedules of reinforcement. Wistar rats were trained to press a lever for food under VI30s and later weekly test sessions were introduced with reinforcement schedule switched to FI30s. During the FI30s test session, post-reinforcement pauses (PRPs) gradually grew towards the end of the session reaching 22-43% of the initial values. Animals could be retrained under VI30s conditions, and FI30s test sessions were repeated over a period of several months without appreciable signs of a practice effect. Administration of the non-competitive N-methyl-d-aspartate (NMDA) receptor antagonist MK-801 ((5S,10R)-(+)-5-Methyl-10,11-dihydro-5H-dibenzo[a,d]cyclohepten-5,10-imine maleate) prior to FI30s sessions prevented adjustment of PRPs associated with the change from VI to FI schedule. This effect was most pronounced at the highest tested dose of MK-801 and appeared to be independent of the effects of this dose on response rates. These results provide initial evidence for the possibility to use different response patterning under VI and FI schedules with equivalent reinforcement density for studying effects of drug treatment on reinforcement learning. Copyright © 2017 Elsevier B.V. All rights reserved.
Discrimination Learning in Children
ERIC Educational Resources Information Center
Ochocki, Thomas E.; And Others
1975-01-01
Examined the learning performance of 192 fourth-, fifth-, and sixth-grade children on either a two or four choice simultaneous color discrimination task. Compared the use of verbal reinforcement and/or punishment, under conditions of either complete or incomplete instructions. (Author/SDH)
Makowiecki, Kalina; Hammond, Geoff; Rodger, Jennifer
2012-01-01
In behavioural experiments, motivation to learn can be achieved using food rewards as positive reinforcement in food-restricted animals. Previous studies reduce animal weights to 80–90% of free-feeding body weight as the criterion for food restriction. However, effects of different degrees of food restriction on task performance have not been assessed. We compared learning task performance in mice food-restricted to 80 or 90% body weight (BW). We used adult wildtype (WT; C57Bl/6j) and knockout (ephrin-A2−/−) mice, previously shown to have a reverse learning deficit. Mice were trained in a two-choice visual discrimination task with food reward as positive reinforcement. When mice reached criterion for one visual stimulus (80% correct in three consecutive 10 trial sets) they began the reverse learning phase, where the rewarded stimulus was switched to the previously incorrect stimulus. For the initial learning and reverse phase of the task, mice at 90%BW took almost twice as many trials to reach criterion as mice at 80%BW. Furthermore, WT 80 and 90%BW groups significantly differed in percentage correct responses and learning strategy in the reverse learning phase, whereas no differences between weight restriction groups were observed in ephrin-A2−/− mice. Most importantly, genotype-specific differences in reverse learning strategy were only detected in the 80%BW groups. Our results indicate that increased food restriction not only results in better performance and a shorter training period, but may also be necessary for revealing behavioural differences between experimental groups. This has important ethical and animal welfare implications when deciding extent of diet restriction in behavioural studies. PMID:23144936
Makowiecki, Kalina; Hammond, Geoff; Rodger, Jennifer
2012-01-01
In behavioural experiments, motivation to learn can be achieved using food rewards as positive reinforcement in food-restricted animals. Previous studies reduce animal weights to 80-90% of free-feeding body weight as the criterion for food restriction. However, effects of different degrees of food restriction on task performance have not been assessed. We compared learning task performance in mice food-restricted to 80 or 90% body weight (BW). We used adult wildtype (WT; C57Bl/6j) and knockout (ephrin-A2⁻/⁻) mice, previously shown to have a reverse learning deficit. Mice were trained in a two-choice visual discrimination task with food reward as positive reinforcement. When mice reached criterion for one visual stimulus (80% correct in three consecutive 10 trial sets) they began the reverse learning phase, where the rewarded stimulus was switched to the previously incorrect stimulus. For the initial learning and reverse phase of the task, mice at 90%BW took almost twice as many trials to reach criterion as mice at 80%BW. Furthermore, WT 80 and 90%BW groups significantly differed in percentage correct responses and learning strategy in the reverse learning phase, whereas no differences between weight restriction groups were observed in ephrin-A2⁻/⁻ mice. Most importantly, genotype-specific differences in reverse learning strategy were only detected in the 80%BW groups. Our results indicate that increased food restriction not only results in better performance and a shorter training period, but may also be necessary for revealing behavioural differences between experimental groups. This has important ethical and animal welfare implications when deciding extent of diet restriction in behavioural studies.
Towards a genetics-based adaptive agent to support flight testing
NASA Astrophysics Data System (ADS)
Cribbs, Henry Brown, III
Although the benefits of aircraft simulation have been known since the late 1960s, simulation almost always entails interaction with a human test pilot. This "pilot-in-the-loop" simulation process provides useful evaluative information to the aircraft designer and provides a training tool to the pilot. Emulation of a pilot during the early phases of the aircraft design process might provide designers a useful evaluative tool. Machine learning might emulate a pilot in a simulated aircraft/cockpit setting. Preliminary work in the application of machine learning techniques, such as reinforcement learning, to aircraft maneuvering have shown promise. These studies used simplified interfaces between machine learning agent and the aircraft simulation. The simulations employed low order equivalent system models. High-fidelity aircraft simulations exist, such as the simulations developed by NASA at its Dryden Flight Research Center. To expand the applicational domain of reinforcement learning to aircraft designs, this study presents a series of experiments that examine a reinforcement learning agent in the role of test pilot. The NASA X-31 and F-106 high-fidelity simulations provide realistic aircraft for the agent to maneuver. The approach of the study is to examine an agent possessing a genetic-based, artificial neural network to approximate long-term, expected cost (Bellman value) in a basic maneuvering task. The experiments evaluate different learning methods based on a common feedback function and an identical task. The learning methods evaluated are: Q-learning, Q(lambda)-learning, SARSA learning, and SARSA(lambda) learning. Experimental results indicate that, while prediction error remain quite high, similar, repeatable behaviors occur in both aircraft. Similar behavior exhibits portability of the agent between aircraft with different handling qualities (dynamics). Besides the adaptive behavior aspects of the study, the genetic algorithm used in the agent is shown to play an additive role in the shaping of the artificial neural network to the prediction task.
Mesolimbic confidence signals guide perceptual learning in the absence of external feedback
Guggenmos, Matthias; Wilbertz, Gregor; Hebart, Martin N; Sterzer, Philipp
2016-01-01
It is well established that learning can occur without external feedback, yet normative reinforcement learning theories have difficulties explaining such instances of learning. Here, we propose that human observers are capable of generating their own feedback signals by monitoring internal decision variables. We investigated this hypothesis in a visual perceptual learning task using fMRI and confidence reports as a measure for this monitoring process. Employing a novel computational model in which learning is guided by confidence-based reinforcement signals, we found that mesolimbic brain areas encoded both anticipation and prediction error of confidence—in remarkable similarity to previous findings for external reward-based feedback. We demonstrate that the model accounts for choice and confidence reports and show that the mesolimbic confidence prediction error modulation derived through the model predicts individual learning success. These results provide a mechanistic neurobiological explanation for learning without external feedback by augmenting reinforcement models with confidence-based feedback. DOI: http://dx.doi.org/10.7554/eLife.13388.001 PMID:27021283
Learning tactile skills through curious exploration
Pape, Leo; Oddo, Calogero M.; Controzzi, Marco; Cipriani, Christian; Förster, Alexander; Carrozza, Maria C.; Schmidhuber, Jürgen
2012-01-01
We present curiosity-driven, autonomous acquisition of tactile exploratory skills on a biomimetic robot finger equipped with an array of microelectromechanical touch sensors. Instead of building tailored algorithms for solving a specific tactile task, we employ a more general curiosity-driven reinforcement learning approach that autonomously learns a set of motor skills in absence of an explicit teacher signal. In this approach, the acquisition of skills is driven by the information content of the sensory input signals relative to a learner that aims at representing sensory inputs using fewer and fewer computational resources. We show that, from initially random exploration of its environment, the robotic system autonomously develops a small set of basic motor skills that lead to different kinds of tactile input. Next, the system learns how to exploit the learned motor skills to solve supervised texture classification tasks. Our approach demonstrates the feasibility of autonomous acquisition of tactile skills on physical robotic platforms through curiosity-driven reinforcement learning, overcomes typical difficulties of engineered solutions for active tactile exploration and underactuated control, and provides a basis for studying developmental learning through intrinsic motivation in robots. PMID:22837748
Dual learning processes in interactive skill acquisition.
Fu, Wai-Tat; Anderson, John R
2008-06-01
Acquisition of interactive skills involves the use of internal and external cues. Experiment 1 showed that when actions were interdependent, learning was effective with and without external cues in the single-task condition but was effective only with the presence of external cues in the dual-task condition. In the dual-task condition, actions closer to the feedback were learned faster than actions farther away but this difference was reversed in the single-task condition. Experiment 2 tested how knowledge acquired in single and dual-task conditions would transfer to a new reward structure. Results confirmed the two forms of learning mediated by the secondary task: A declarative memory encoding process that simultaneously assigned credits to actions and a reinforcement-learning process that slowly propagated credits backward from the feedback. The results showed that both forms of learning were engaged during training, but only at the response selection stage, one form of knowledge may dominate over the other depending on the availability of attentional resources. (c) 2008 APA, all rights reserved
Motor Learning Enhances Use-Dependent Plasticity
2017-01-01
Motor behaviors are shaped not only by current sensory signals but also by the history of recent experiences. For instance, repeated movements toward a particular target bias the subsequent movements toward that target direction. This process, called use-dependent plasticity (UDP), is considered a basic and goal-independent way of forming motor memories. Most studies consider movement history as the critical component that leads to UDP (Classen et al., 1998; Verstynen and Sabes, 2011). However, the effects of learning (i.e., improved performance) on UDP during movement repetition have not been investigated. Here, we used transcranial magnetic stimulation in two experiments to assess plasticity changes occurring in the primary motor cortex after individuals repeated reinforced and nonreinforced actions. The first experiment assessed whether learning a skill task modulates UDP. We found that a group that successfully learned the skill task showed greater UDP than a group that did not accumulate learning, but made comparable repeated actions. The second experiment aimed to understand the role of reinforcement learning in UDP while controlling for reward magnitude and action kinematics. We found that providing subjects with a binary reward without visual feedback of the cursor led to increased UDP effects. Subjects in the group that received comparable reward not associated with their actions maintained the previously induced UDP. Our findings illustrate how reinforcing consistent actions strengthens use-dependent memories and provide insight into operant mechanisms that modulate plastic changes in the motor cortex. SIGNIFICANCE STATEMENT Performing consistent motor actions induces use-dependent plastic changes in the motor cortex. This plasticity reflects one of the basic forms of human motor learning. Past studies assumed that this form of learning is exclusively affected by repetition of actions. However, here we showed that success-based reinforcement signals could affect the human use-dependent plasticity (UDP) process. Our results indicate that learning augments and interacts with UDP. This effect is important to the understanding of the interplay between the different forms of motor learning and suggests that reinforcement is not only important to learning new behaviors, but can shape our subsequent behavior via its interaction with UDP. PMID:28143961
Okada, Ken-ichi; Nakamura, Kae; Kobayashi, Yasushi
2011-01-01
Dopamine, acetylcholine, and serotonin, the main modulators of the central nervous system, have been proposed to play important roles in the execution of movement, control of several forms of attentional behavior, and reinforcement learning. While the response pattern of midbrain dopaminergic neurons and its specific role in reinforcement learning have been revealed, the role of the other neuromodulators remains rather elusive. Here, we review our recent studies using extracellular recording from neurons in the pedunculopontine tegmental nucleus, where many cholinergic neurons exist, and the dorsal raphe nucleus, where many serotonergic neurons exist, while monkeys performed eye movement tasks to obtain different reward values. The firing patterns of these neurons are often tonic throughout the task period, while dopaminergic neurons exhibited a phasic activity pattern to the task event. The different modulation patterns, together with the activity of dopaminergic neurons, reveal dynamic information processing between these different neuromodulator systems. PMID:22013541
Punishment insensitivity and impaired reinforcement learning in preschoolers.
Briggs-Gowan, Margaret J; Nichols, Sara R; Voss, Joel; Zobel, Elvira; Carter, Alice S; McCarthy, Kimberly J; Pine, Daniel S; Blair, James; Wakschlag, Lauren S
2014-01-01
Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a developmental vulnerability to psychopathic traits. One hundred and fifty-seven preschoolers (mean age 4.7 ± 0.8 years) participated in a substudy that was embedded within a larger project. Children completed the 'Stars-in-Jars' task, which involved learning to select rewarded jars and avoid punished jars. Maternal report of responsiveness to socialization was assessed with the Punishment Insensitivity and Low Concern for Others scales of the Multidimensional Assessment of Preschool Disruptive Behavior (MAP-DB). Punishment Insensitivity, but not Low Concern for Others, was significantly associated with reinforcement learning in multivariate models that accounted for age and sex. Specifically, higher Punishment Insensitivity was associated with significantly lower overall performance and more errors on punished trials ('passive avoidance'). Impairments in reinforcement learning manifest in preschoolers who are high in maternal ratings of Punishment Insensitivity. If replicated, these findings may help to pinpoint the neurodevelopmental antecedents of psychopathic tendencies and suggest novel intervention targets beginning in early childhood. © 2013 The Authors. Journal of Child Psychology and Psychiatry © 2013 Association for Child and Adolescent Mental Health.
Incidental orthographic learning during a color detection task.
Protopapas, Athanassios; Mitsi, Anna; Koustoumbardis, Miltiadis; Tsitsopoulou, Sofia M; Leventi, Marianna; Seitz, Aaron R
2017-09-01
Orthographic learning refers to the acquisition of knowledge about specific spelling patterns forming words and about general biases and constraints on letter sequences. It is thought to occur by strengthening simultaneously activated visual and phonological representations during reading. Here we demonstrate that a visual perceptual learning procedure that leaves no time for articulation can result in orthographic learning evidenced in improved reading and spelling performance. We employed task-irrelevant perceptual learning (TIPL), in which the stimuli to be learned are paired with an easy task target. Assorted line drawings and difficult-to-spell words were presented in red color among sequences of other black-colored words and images presented in rapid succession, constituting a fast-TIPL procedure with color detection being the explicit task. In five experiments, Greek children in Grades 4-5 showed increased recognition of words and images that had appeared in red, both during and after the training procedure, regardless of within-training testing, and also when targets appeared in blue instead of red. Significant transfer to reading and spelling emerged only after increased training intensity. In a sixth experiment, children in Grades 2-3 showed generalization to words not presented during training that carried the same derivational affixes as in the training set. We suggest that reinforcement signals related to detection of the target stimuli contribute to the strengthening of orthography-phonology connections beyond earlier levels of visually-based orthographic representation learning. These results highlight the potential of perceptual learning procedures for the reinforcement of higher-level orthographic representations. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Incremental learning of skill collections based on intrinsic motivation
Metzen, Jan H.; Kirchner, Frank
2013-01-01
Life-long learning of reusable, versatile skills is a key prerequisite for embodied agents that act in a complex, dynamic environment and are faced with different tasks over their lifetime. We address the question of how an agent can learn useful skills efficiently during a developmental period, i.e., when no task is imposed on him and no external reward signal is provided. Learning of skills in a developmental period needs to be incremental and self-motivated. We propose a new incremental, task-independent skill discovery approach that is suited for continuous domains. Furthermore, the agent learns specific skills based on intrinsic motivation mechanisms that determine on which skills learning is focused at a given point in time. We evaluate the approach in a reinforcement learning setup in two continuous domains with complex dynamics. We show that an intrinsically motivated, skill learning agent outperforms an agent which learns task solutions from scratch. Furthermore, we compare different intrinsic motivation mechanisms and how efficiently they make use of the agent's developmental period. PMID:23898265
Interest Inventory Items as Reinforcing Stimuli: A Test of the A-R-D Theory.
ERIC Educational Resources Information Center
Staats, Arthur W.; And Others
An experiement was conducted to test the hypothesis that interest inventory items would function as reinforcing stimuli in a visual discrimination task. When previously rated liked and disliked items from the Strong Vocational Interest Blank were differentially presented following one of two responses, subjects learned to respond to the stimulus…
Operant conditioning of enhanced pain sensitivity by heat-pain titration.
Becker, Susanne; Kleinböhl, Dieter; Klossika, Iris; Hölzl, Rupert
2008-11-15
Operant conditioning mechanisms have been demonstrated to be important in the development of chronic pain. Most experimental studies have investigated the operant modulation of verbal pain reports with extrinsic reinforcement, such as verbal reinforcement. Whether this reflects actual changes in the subjective experience of the nociceptive stimulus remained unclear. This study replicates and extends our previous demonstration that enhanced pain sensitivity to prolonged heat-pain stimulation could be learned in healthy participants through intrinsic reinforcement (contingent changes in nociceptive input) independent of verbal pain reports. In addition, we examine whether different magnitudes of reinforcement differentially enhance pain sensitivity using an operant heat-pain titration paradigm. It is based on the previously developed non-verbal behavioral discrimination task for the assessment of sensitization, which uses discriminative down- or up-regulation of stimulus temperatures in response to changes in subjective intensity. In operant heat-pain titration, this discriminative behavior and not verbal pain report was contingently reinforced or punished by acute decreases or increases in heat-pain intensity. The magnitude of reinforcement was varied between three groups: low (N1=13), medium (N2=11) and high reinforcement (N3=12). Continuous reinforcement was applied to acquire and train the operant behavior, followed by partial reinforcement to analyze the underlying learning mechanisms. Results demonstrated that sensitization to prolonged heat-pain stimulation was enhanced by operant learning within 1h. The extent of sensitization was directly dependent on the received magnitude of reinforcement. Thus, operant learning mechanisms based on intrinsic reinforcement may provide an explanation for the gradual development of sustained hypersensitivity during pain that is becoming chronic.
Managing Learning Disabled Students' Academic Frustration through Self-Control.
ERIC Educational Resources Information Center
Ammer, Jerome J.
1982-01-01
Teachers can help learning and behavior disordered students in middle and secondary grades develop self control through a strategy in which students are taught to stop, look, listen, and think before carrying out a task. The final step is to reinforce themselves. (CL)
Hybrid computing using a neural network with dynamic external memory.
Graves, Alex; Wayne, Greg; Reynolds, Malcolm; Harley, Tim; Danihelka, Ivo; Grabska-Barwińska, Agnieszka; Colmenarejo, Sergio Gómez; Grefenstette, Edward; Ramalho, Tiago; Agapiou, John; Badia, Adrià Puigdomènech; Hermann, Karl Moritz; Zwols, Yori; Ostrovski, Georg; Cain, Adam; King, Helen; Summerfield, Christopher; Blunsom, Phil; Kavukcuoglu, Koray; Hassabis, Demis
2016-10-27
Artificial neural networks are remarkably adept at sensory processing, sequence learning and reinforcement learning, but are limited in their ability to represent variables and data structures and to store data over long timescales, owing to the lack of an external memory. Here we introduce a machine learning model called a differentiable neural computer (DNC), which consists of a neural network that can read from and write to an external memory matrix, analogous to the random-access memory in a conventional computer. Like a conventional computer, it can use its memory to represent and manipulate complex data structures, but, like a neural network, it can learn to do so from data. When trained with supervised learning, we demonstrate that a DNC can successfully answer synthetic questions designed to emulate reasoning and inference problems in natural language. We show that it can learn tasks such as finding the shortest path between specified points and inferring the missing links in randomly generated graphs, and then generalize these tasks to specific graphs such as transport networks and family trees. When trained with reinforcement learning, a DNC can complete a moving blocks puzzle in which changing goals are specified by sequences of symbols. Taken together, our results demonstrate that DNCs have the capacity to solve complex, structured tasks that are inaccessible to neural networks without external read-write memory.
Flow Navigation by Smart Microswimmers via Reinforcement Learning
NASA Astrophysics Data System (ADS)
Colabrese, Simona; Gustavsson, Kristian; Celani, Antonio; Biferale, Luca
2017-04-01
Smart active particles can acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. Their goal is to learn the best way to navigate by exploiting the underlying flow whenever possible. As an example, we focus our attention on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, given the constraints enforced by fluid mechanics. By means of numerical experiments, we show that swimmers indeed learn nearly optimal strategies just by experience. A reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This Letter illustrates the potential of reinforcement learning algorithms to model adaptive behavior in complex flows and paves the way towards the engineering of smart microswimmers that solve difficult navigation problems.
Volkert, Valerie M; Lerman, Dorothea C; Trosclair, Nicole; Addison, Laura; Kodak, Tiffany
2008-01-01
Research has demonstrated that interspersing mastered tasks with new tasks facilitates learning under certain conditions; however, little is known about factors that influence the effectiveness of this treatment strategy. The initial purpose of the current investigation was to evaluate the effects of similar versus dissimilar interspersed tasks while teaching object labels to children diagnosed with autism or developmental delays. We then conducted a series of exploratory analyses involving the type of reinforcer delivered for correct responses on trials with unknown or known object labels. Performance was enhanced under the interspersal condition only when either brief praise was delivered for all correct responses or presumably more preferred reinforcers were provided for performance on known trials rather than on unknown trials.
Iijima, Yudai; Takano, Keisuke; Boddez, Yannick; Raes, Filip; Tanno, Yoshihiko
2017-01-01
Learning theories of depression have proposed that depressive cognitions, such as negative thoughts with reference to oneself, can develop through a reinforcement learning mechanism. This negative self-reference is considered to be positively reinforced by rewarding experiences such as genuine support from others after negative self-disclosure, and negatively reinforced by avoidance of potential aversive situations. The learning account additionally predicts that negative self-reference would be maintained by an inability to adjust one’s behavior when negative self-reference no longer leads to such reward. To test this prediction, we designed an adapted version of the reversal-learning task. In this task, participants were reinforced to choose and engage in either negative or positive self-reference by probabilistic economic reward and punishment. Although participants were initially trained to choose negative self-reference, the stimulus-reward contingencies were reversed to prompt a shift toward positive self-reference (Study 1) and a further shift toward negative self-reference (Study 2). Model-based computational analyses showed that depressive symptoms were associated with a low learning rate of negative self-reference, indicating a high level of reward expectancy for negative self-reference even after the contingency reversal. Furthermore, the difficulty in updating outcome predictions of negative self-reference was significantly associated with the extent to which one possesses negative self-images. These results suggest that difficulty in adjusting action-outcome estimates for negative self-reference increases the chance to be faced with negative aspects of self, which may result in depressive symptoms. PMID:28824511
Cella, Matteo; Bishara, Anthony J.; Medin, Evelina; Swan, Sarah; Reeder, Clare; Wykes, Til
2014-01-01
Objective: Converging research suggests that individuals with schizophrenia show a marked impairment in reinforcement learning, particularly in tasks requiring flexibility and adaptation. The problem has been associated with dopamine reward systems. This study explores, for the first time, the characteristics of this impairment and how it is affected by a behavioral intervention—cognitive remediation. Method: Using computational modelling, 3 reinforcement learning parameters based on the Wisconsin Card Sorting Test (WCST) trial-by-trial performance were estimated: R (reward sensitivity), P (punishment sensitivity), and D (choice consistency). In Study 1 the parameters were compared between a group of individuals with schizophrenia (n = 100) and a healthy control group (n = 50). In Study 2 the effect of cognitive remediation therapy (CRT) on these parameters was assessed in 2 groups of individuals with schizophrenia, one receiving CRT (n = 37) and the other receiving treatment as usual (TAU, n = 34). Results: In Study 1 individuals with schizophrenia showed impairment in the R and P parameters compared with healthy controls. Study 2 demonstrated that sensitivity to negative feedback (P) and reward (R) improved in the CRT group after therapy compared with the TAU group. R and P parameter change correlated with WCST outputs. Improvements in R and P after CRT were associated with working memory gains and reduction of negative symptoms, respectively. Conclusion: Schizophrenia reinforcement learning difficulties negatively influence performance in shift learning tasks. CRT can improve sensitivity to reward and punishment. Identifying parameters that show change may be useful in experimental medicine studies to identify cognitive domains susceptible to improvement. PMID:24214932
Iijima, Yudai; Takano, Keisuke; Boddez, Yannick; Raes, Filip; Tanno, Yoshihiko
2017-01-01
Learning theories of depression have proposed that depressive cognitions, such as negative thoughts with reference to oneself, can develop through a reinforcement learning mechanism. This negative self-reference is considered to be positively reinforced by rewarding experiences such as genuine support from others after negative self-disclosure, and negatively reinforced by avoidance of potential aversive situations. The learning account additionally predicts that negative self-reference would be maintained by an inability to adjust one's behavior when negative self-reference no longer leads to such reward. To test this prediction, we designed an adapted version of the reversal-learning task. In this task, participants were reinforced to choose and engage in either negative or positive self-reference by probabilistic economic reward and punishment. Although participants were initially trained to choose negative self-reference, the stimulus-reward contingencies were reversed to prompt a shift toward positive self-reference (Study 1) and a further shift toward negative self-reference (Study 2). Model-based computational analyses showed that depressive symptoms were associated with a low learning rate of negative self-reference, indicating a high level of reward expectancy for negative self-reference even after the contingency reversal. Furthermore, the difficulty in updating outcome predictions of negative self-reference was significantly associated with the extent to which one possesses negative self-images. These results suggest that difficulty in adjusting action-outcome estimates for negative self-reference increases the chance to be faced with negative aspects of self, which may result in depressive symptoms.
Investment Portfolio Simulation: An Assessment Task in Finance
ERIC Educational Resources Information Center
Parle, Gabrielle; Laing, Gregory K.
2017-01-01
The use of an investment portfolio simulation as an assessment task is intended to reinforce learning by involving students in practical application of theoretical principles in a real-time actual financial market. Simulation as a teaching pedagogy promotes individual involvement and provides students with a deeper understanding of the issues, and…
ERIC Educational Resources Information Center
Fernie, Gordon; Tunney, Richard J.
2006-01-01
The Iowa Gambling Task (Bechara, Damasio, Damasio, & Anderson, 1994) has become widely used as a laboratory test of "real-life" decision-making. However, aspects of its administration that have been varied by researchers may differentially affect performance and the conclusions researchers can draw. Some researchers have used facsimile money…
ERIC Educational Resources Information Center
Whitford, Denise K.; Liaupsin, Carl J.; Umbreit, John; Ferro, Jolenea B.
2013-01-01
A comprehensive function-based intervention was developed to address the chronic, high levels of off-task behavior by a 15-year-old ninth grade Caucasian male with learning disabilities and ADHD. A descriptive FBA identified that the student's off-task behavior was reinforced by peer attention and task avoidance. Intervention involved the…
Pilarski, Patrick M; Dawson, Michael R; Degris, Thomas; Fahimi, Farbod; Carey, Jason P; Sutton, Richard S
2011-01-01
As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis. © 2011 IEEE
Stress enhances model-free reinforcement learning only after negative outcome
Lee, Daeyeol
2017-01-01
Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i.e., how quickly information from the new environment is incorporated into choices. Participants were randomly assigned to either a stress or a control condition, and performed a two-stage Markov decision-making task in which the reward probabilities underwent periodic reversals without notice. We found that stress increased the contribution of model-free reinforcement learning only after negative outcome. Furthermore, stress decreased the learning rate. The results suggest that stress diminishes one’s ability to make adaptive choices in multiple aspects of reinforcement learning. This finding has implications for understanding how stress facilitates maladaptive habits, such as addictive behavior, and other dysfunctional behaviors associated with stress in clinical and educational contexts. PMID:28723943
Stress enhances model-free reinforcement learning only after negative outcome.
Park, Heyeon; Lee, Daeyeol; Chey, Jeanyung
2017-01-01
Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i.e., how quickly information from the new environment is incorporated into choices. Participants were randomly assigned to either a stress or a control condition, and performed a two-stage Markov decision-making task in which the reward probabilities underwent periodic reversals without notice. We found that stress increased the contribution of model-free reinforcement learning only after negative outcome. Furthermore, stress decreased the learning rate. The results suggest that stress diminishes one's ability to make adaptive choices in multiple aspects of reinforcement learning. This finding has implications for understanding how stress facilitates maladaptive habits, such as addictive behavior, and other dysfunctional behaviors associated with stress in clinical and educational contexts.
Effect of Reinforcement History on Hand Choice in an Unconstrained Reaching Task
Stoloff, Rebecca H.; Taylor, Jordan A.; Xu, Jing; Ridderikhoff, Arne; Ivry, Richard B.
2011-01-01
Choosing which hand to use for an action is one of the most frequent decisions people make in everyday behavior. We developed a simple reaching task in which we vary the lateral position of a target and the participant is free to reach to it with either the right or left hand. While people exhibit a strong preference to use the hand ipsilateral to the target, there is a region of uncertainty within which hand choice varies across trials. We manipulated the reinforcement rates for the two hands, either by increasing the likelihood that a reach with the non-dominant hand would successfully intersect the target or decreasing the likelihood that a reach with the dominant hand would be successful. While participants had minimal awareness of these manipulations, we observed an increase in the use of the non-dominant hand for targets presented in the region of uncertainty. We modeled the shift in hand use using a Q-learning model of reinforcement learning. The results provided a good fit of the data and indicate that the effects of increasing and decreasing the rate of positive reinforcement are additive. These experiments emphasize the role of decision processes for effector selection, and may point to a novel approach for physical rehabilitation based on intrinsic reinforcement. PMID:21472031
Wong, Scott A; Randolph, Sienna H; Ivan, Victorita E; Gruber, Aaron J
2017-09-29
Δ-9-Tetrahydrocannabinol (THC) is the main psychoactive component of marijuana and has potent effects on decision-making, including a proposed reduction in cognitive flexibility. We demonstrate here that acute THC administration differentially affects some of the processes that contribute to cognitive flexibility. Specifically, THC reduces lose-shift responding in which female rats tend to immediately shift choice responses away from options that result in reward omission on the previous trial. THC, however, did not impair the ability of rats to flexibly bias responses toward feeders with higher probability of reward in a reversal task. This response adaptation developed over several trials, suggesting that THC did not impair slower forms of reinforcement learning needed to choose among options with unequal utility. This dissociation of THC's effects on innate/rapid and learned/gradual decision-making processes was unexpected, but is supported by emerging evidence that lose-shift responding is mediated by neural mechanisms distinct from those involved in other forms of reinforcement learning. The present data suggest that, at least in some tasks, the apparent reductions in cognitive flexibility by THC may be explained by the immediate effects on loss sensitivity, rather than impairments of all processes used for choice adaptation. Copyright © 2017 Elsevier B.V. All rights reserved.
Learning alternative movement coordination patterns using reinforcement feedback.
Lin, Tzu-Hsiang; Denomme, Amber; Ranganathan, Rajiv
2018-05-01
One of the characteristic features of the human motor system is redundancy-i.e., the ability to achieve a given task outcome using multiple coordination patterns. However, once participants settle on using a specific coordination pattern, the process of learning to use a new alternative coordination pattern to perform the same task is still poorly understood. Here, using two experiments, we examined this process of how participants shift from one coordination pattern to another using different reinforcement schedules. Participants performed a virtual reaching task, where they moved a cursor to different targets positioned on the screen. Our goal was to make participants use a coordination pattern with greater trunk motion, and to this end, we provided reinforcement by making the cursor disappear if the trunk motion during the reach did not cross a specified threshold value. In Experiment 1, we compared two reinforcement schedules in two groups of participants-an abrupt group, where the threshold was introduced immediately at the beginning of practice; and a gradual group, where the threshold was introduced gradually with practice. Results showed that both abrupt and gradual groups were effective in shifting their coordination patterns to involve greater trunk motion, but the abrupt group showed greater retention when the reinforcement was removed. In Experiment 2, we examined the basis of this advantage in the abrupt group using two additional control groups. Results showed that the advantage of the abrupt group was because of a greater number of practice trials with the desired coordination pattern. Overall, these results show that reinforcement can be successfully used to shift coordination patterns, which has potential in the rehabilitation of movement disorders.
Instructional control of reinforcement learning: A behavioral and neurocomputational investigation
Doll, Bradley B.; Jacobs, W. Jake; Sanfey, Alan G.; Frank, Michael J.
2011-01-01
Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is “overridden” at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract “Q-learning” and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a “confirmation bias” in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes. PMID:19595993
Emotion-based learning systems and the development of morality.
Blair, R J R
2017-10-01
In this paper it is proposed that important components of moral development and moral judgment rely on two forms of emotional learning: stimulus-reinforcement and response-outcome learning. Data in support of this position will be primarily drawn from work with individuals with the developmental condition of psychopathy as well as fMRI studies with healthy individuals. Individuals with psychopathy show impairment on moral judgment tasks and a pronounced increased risk for instrumental antisocial behavior. It will be argued that these impairments are developmental consequences of impaired stimulus-aversive conditioning on the basis of distress cue reinforcers and response-outcome learning in individuals with this disorder. Copyright © 2017. Published by Elsevier B.V.
ERIC Educational Resources Information Center
Nyce, Peggy A.; And Others
1977-01-01
Forty-four third graders were given a two-choice conceptual discrimination learning task. The two major factors were (1) four treatment groups varying at the extremes on two personality measures, approval motivation and locus of control and (2) sex. (MS)
ERIC Educational Resources Information Center
Le Pelley, M. E.
2012-01-01
Monkeys will selectively and adaptively learn to avoid the most difficult trials of a perceptual discrimination learning task. Couchman, Coutinho, Beran, and Smith (2010) have recently demonstrated that this pattern of responding does not depend on animals receiving trial-by-trial feedback for their responses; it also obtains if experience of the…
A Discussion of Possibility of Reinforcement Learning Using Event-Related Potential in BCI
NASA Astrophysics Data System (ADS)
Yamagishi, Yuya; Tsubone, Tadashi; Wada, Yasuhiro
Recently, Brain computer interface (BCI) which is a direct connecting pathway an external device such as a computer or a robot and a human brain have gotten a lot of attention. Since BCI can control the machines as robots by using the brain activity without using the voluntary muscle, the BCI may become a useful communication tool for handicapped persons, for instance, amyotrophic lateral sclerosis patients. However, in order to realize the BCI system which can perform precise tasks on various environments, it is necessary to design the control rules to adapt to the dynamic environments. Reinforcement learning is one approach of the design of the control rule. If this reinforcement leaning can be performed by the brain activity, it leads to the attainment of BCI that has general versatility. In this research, we paid attention to P300 of event-related potential as an alternative signal of the reward of reinforcement learning. We discriminated between the success and the failure trials from P300 of the EEG of the single trial by using the proposed discrimination algorithm based on Support vector machine. The possibility of reinforcement learning was examined from the viewpoint of the number of discriminated trials. It was shown that there was a possibility to be able to learn in most subjects.
Akam, Thomas; Costa, Rui; Dayan, Peter
2015-12-01
The recently developed 'two-step' behavioural task promises to differentiate model-based from model-free reinforcement learning, while generating neurophysiologically-friendly decision datasets with parametric variation of decision variables. These desirable features have prompted its widespread adoption. Here, we analyse the interactions between a range of different strategies and the structure of transitions and outcomes in order to examine constraints on what can be learned from behavioural performance. The task involves a trade-off between the need for stochasticity, to allow strategies to be discriminated, and a need for determinism, so that it is worth subjects' investment of effort to exploit the contingencies optimally. We show through simulation that under certain conditions model-free strategies can masquerade as being model-based. We first show that seemingly innocuous modifications to the task structure can induce correlations between action values at the start of the trial and the subsequent trial events in such a way that analysis based on comparing successive trials can lead to erroneous conclusions. We confirm the power of a suggested correction to the analysis that can alleviate this problem. We then consider model-free reinforcement learning strategies that exploit correlations between where rewards are obtained and which actions have high expected value. These generate behaviour that appears model-based under these, and also more sophisticated, analyses. Exploiting the full potential of the two-step task as a tool for behavioural neuroscience requires an understanding of these issues.
Akam, Thomas; Costa, Rui; Dayan, Peter
2015-01-01
The recently developed ‘two-step’ behavioural task promises to differentiate model-based from model-free reinforcement learning, while generating neurophysiologically-friendly decision datasets with parametric variation of decision variables. These desirable features have prompted its widespread adoption. Here, we analyse the interactions between a range of different strategies and the structure of transitions and outcomes in order to examine constraints on what can be learned from behavioural performance. The task involves a trade-off between the need for stochasticity, to allow strategies to be discriminated, and a need for determinism, so that it is worth subjects’ investment of effort to exploit the contingencies optimally. We show through simulation that under certain conditions model-free strategies can masquerade as being model-based. We first show that seemingly innocuous modifications to the task structure can induce correlations between action values at the start of the trial and the subsequent trial events in such a way that analysis based on comparing successive trials can lead to erroneous conclusions. We confirm the power of a suggested correction to the analysis that can alleviate this problem. We then consider model-free reinforcement learning strategies that exploit correlations between where rewards are obtained and which actions have high expected value. These generate behaviour that appears model-based under these, and also more sophisticated, analyses. Exploiting the full potential of the two-step task as a tool for behavioural neuroscience requires an understanding of these issues. PMID:26657806
ERIC Educational Resources Information Center
Kerfoot, Erin C.; Agarwal, Isha; Lee, Hongjoo J.; Holland, Peter C.
2007-01-01
Through associative learning, cues for biologically significant reinforcers such as food may gain access to mental representations of those reinforcers. Here, we used devaluation procedures, behavioral assessment of hedonic taste-reactivity responses, and measurement of immediate-early gene (IEG) expression to show that a cue for food engages…
Ren, Xi; Valle-Inclán, Fernando; Tukaiev, Sergii; Hackley, Steven A
2017-07-01
According to reinforcement learning theory, dopamine-dependent anticipatory processes play a critical role in learning from action outcomes such as feedback or reward. To better understand outcome anticipation, we examined variation in slow cortical potentials and assessed their changes over the course of motor-skill acquisition. Healthy young adults learned a series of precisely timed, key press sequences. Feedback was delivered at a delay of either 2.5 or 8 s, to encourage use of either the striatally mediated, habit learning system or the hippocampus-dependent, episodic memory system, respectively. During the 2.5-s delay, the stimulus-preceding negativity (SPN) was shown to decline in amplitude across trials, confirming previous results from a perceptual categorization task (Morís, Luque, & Rodríguez-Fornells, 2013). This falsifies the hypothesis that SPN reflects specific outcome predictions, on the assumption that the ability to make such predictions should improve as a task is mastered. An SPN was also evident during the 8-s delay, but it increased in amplitude across trials. At the conclusion of the 8-s but not the 2.5-s prefeedback interval, a reversed-polarity lateralized readiness potential (LRP) was noted. It was suggested that this might indicate maintenance of an action representation for comparison with the feedback display. If so, this would constitute the first direct psychophysiological evidence for a popular hypothetical construct in quantitative models of reinforcement learning, the so-called eligibility trace. © 2017 Society for Psychophysiological Research.
Reinforcement learning and episodic memory in humans and animals: an integrative framework
Gershman, Samuel J.; Daw, Nathaniel D.
2018-01-01
We review the psychology and neuroscience of reinforcement learning (RL), which has witnessed significant progress in the last two decades, enabled by the comprehensive experimental study of simple learning and decision-making tasks. However, the simplicity of these tasks misses important aspects of reinforcement learning in the real world: (i) State spaces are high-dimensional, continuous, and partially observable; this implies that (ii) data are relatively sparse: indeed precisely the same situation may never be encountered twice; and also that (iii) rewards depend on long-term consequences of actions in ways that violate the classical assumptions that make RL tractable. A seemingly distinct challenge is that, cognitively, these theories have largely connected with procedural and semantic memory: how knowledge about action values or world models extracted gradually from many experiences can drive choice. This misses many aspects of memory related to traces of individual events, such as episodic memory. We suggest that these two gaps are related. In particular, the computational challenges can be dealt with, in part, by endowing RL systems with episodic memory, allowing them to (i) efficiently approximate value functions over complex state spaces, (ii) learn with very little data, and (iii) bridge long-term dependencies between actions and rewards. We review the computational theory underlying this proposal and the empirical evidence to support it. Our proposal suggests that the ubiquitous and diverse roles of memory in RL may function as part of an integrated learning system. PMID:27618944
Reinforcement learning in professional basketball players
Neiman, Tal; Loewenstein, Yonatan
2011-01-01
Reinforcement learning in complex natural environments is a challenging task because the agent should generalize from the outcomes of actions taken in one state of the world to future actions in different states of the world. The extent to which human experts find the proper level of generalization is unclear. Here we show, using the sequences of field goal attempts made by professional basketball players, that the outcome of even a single field goal attempt has a considerable effect on the rate of subsequent 3 point shot attempts, in line with standard models of reinforcement learning. However, this change in behaviour is associated with negative correlations between the outcomes of successive field goal attempts. These results indicate that despite years of experience and high motivation, professional players overgeneralize from the outcomes of their most recent actions, which leads to decreased performance. PMID:22146388
Smith, Tim J.; Senju, Atsushi
2017-01-01
While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue–reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue–reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. PMID:28250186
Vernetti, Angélina; Smith, Tim J; Senju, Atsushi
2017-03-15
While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. © 2017 The Authors.
Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning
Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien
2015-01-01
Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction. PMID:26065018
Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning.
Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien
2015-01-01
Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction.
Volkert, Valerie M; Lerman, Dorothea C; Trosclair, Nicole; Addison, Laura; Kodak, Tiffany
2008-01-01
Research has demonstrated that interspersing mastered tasks with new tasks facilitates learning under certain conditions; however, little is known about factors that influence the effectiveness of this treatment strategy. The initial purpose of the current investigation was to evaluate the effects of similar versus dissimilar interspersed tasks while teaching object labels to children diagnosed with autism or developmental delays. We then conducted a series of exploratory analyses involving the type of reinforcer delivered for correct responses on trials with unknown or known object labels. Performance was enhanced under the interspersal condition only when either brief praise was delivered for all correct responses or presumably more preferred reinforcers were provided for performance on known trials rather than on unknown trials. PMID:18816973
Economic decision-making in the ultimatum game by smokers.
Takahashi, Taiki
2007-10-01
No study to date compared degrees of inequity aversion in economic decision-making in the ultimatum game between non-addictive and addictive reinforcers. The comparison is potentially important in neuroeconomics and reinforcement learning theory of addiction. We compared the degrees of inequity aversion in the ultimatum game between money and cigarettes in habitual smokers. Smokers avoided inequity in the ultimatum game more dramatically for money than for cigarettes; i.e., there was a "domain effect" in decision-making in the ultimatum game. Reward-processing neural activities in the brain for non-addictive and addictive reinforcers may be distinct and the insula activation due to cue-induced craving may conflict with unfair offer-induced insula activation. Future studies in neuroeconomics of addiction should employ game-theoretic decision tasks for elucidating reinforcement learning processes in dopaminergic neural circuits.
Phillips, Benjamin U; Heath, Christopher J; Ossowska, Zofia; Bussey, Timothy J; Saksida, Lisa M
2017-09-01
Operant testing is a widely used and highly effective method of studying cognition in rodents. Performance on such tasks is sensitive to reinforcer strength. It is therefore advantageous to select effective reinforcers to minimize training times and maximize experimental throughput. To quantitatively investigate the control of behavior by different reinforcers, performance of mice was tested with either strawberry milkshake or a known powerful reinforcer, super saccharin (1.5% or 2% (w/v) saccharin/1.5% (w/v) glucose/water mixture). Mice were tested on fixed (FR)- and progressive-ratio (PR) schedules in the touchscreen-operant testing system. Under an FR schedule, both the rate of responding and number of trials completed were higher in animals responding for strawberry milkshake versus super saccharin. Under a PR schedule, mice were willing to emit similar numbers of responses for strawberry milkshake and super saccharin; however, analysis of the rate of responding revealed a significantly higher rate of responding by animals reinforced with milkshake versus super saccharin. To determine the impact of reinforcer strength on cognitive performance, strawberry milkshake and super saccharin-reinforced animals were compared on a touchscreen visual discrimination task. Animals reinforced by strawberry milkshake were significantly faster to acquire the discrimination than animals reinforced by super saccharin. Taken together, these results suggest that strawberry milkshake is superior to super saccharin for operant behavioral testing and further confirms that the application of response rate analysis to multiple ratio tasks is a highly sensitive method for the detection of behavioral differences relevant to learning and motivation.
Cella, Matteo; Bishara, Anthony J; Medin, Evelina; Swan, Sarah; Reeder, Clare; Wykes, Til
2014-11-01
Converging research suggests that individuals with schizophrenia show a marked impairment in reinforcement learning, particularly in tasks requiring flexibility and adaptation. The problem has been associated with dopamine reward systems. This study explores, for the first time, the characteristics of this impairment and how it is affected by a behavioral intervention-cognitive remediation. Using computational modelling, 3 reinforcement learning parameters based on the Wisconsin Card Sorting Test (WCST) trial-by-trial performance were estimated: R (reward sensitivity), P (punishment sensitivity), and D (choice consistency). In Study 1 the parameters were compared between a group of individuals with schizophrenia (n = 100) and a healthy control group (n = 50). In Study 2 the effect of cognitive remediation therapy (CRT) on these parameters was assessed in 2 groups of individuals with schizophrenia, one receiving CRT (n = 37) and the other receiving treatment as usual (TAU, n = 34). In Study 1 individuals with schizophrenia showed impairment in the R and P parameters compared with healthy controls. Study 2 demonstrated that sensitivity to negative feedback (P) and reward (R) improved in the CRT group after therapy compared with the TAU group. R and P parameter change correlated with WCST outputs. Improvements in R and P after CRT were associated with working memory gains and reduction of negative symptoms, respectively. Schizophrenia reinforcement learning difficulties negatively influence performance in shift learning tasks. CRT can improve sensitivity to reward and punishment. Identifying parameters that show change may be useful in experimental medicine studies to identify cognitive domains susceptible to improvement. © The Author 2013. Published by Oxford University Press on behalf of the Maryland Psychiatric Research Center. All rights reserved. For permissions, please email: journals.permissions@oup.com.
ERIC Educational Resources Information Center
Murphy, Colleen; Martin, Garry L.; Yu, C. T.
2014-01-01
The Assessment of Basic Learning Abilities (ABLA) is an empirically validated clinical tool for assessing the learning ability of persons with intellectual disabilities and children with autism. An ABLA tester uses standardized prompting and reinforcement procedures to attempt to teach, individually, each of six tasks, called levels, to a testee,…
Dissociating error-based and reinforcement-based loss functions during sensorimotor learning
McGregor, Heather R.; Mohatarem, Ayman
2017-01-01
It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback. PMID:28753634
Dissociating error-based and reinforcement-based loss functions during sensorimotor learning.
Cashaback, Joshua G A; McGregor, Heather R; Mohatarem, Ayman; Gribble, Paul L
2017-07-01
It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback.
Robotic action acquisition with cognitive biases in coarse-grained state space.
Uragami, Daisuke; Kohno, Yu; Takahashi, Tatsuji
2016-07-01
Some of the authors have previously proposed a cognitively inspired reinforcement learning architecture (LS-Q) that mimics cognitive biases in humans. LS-Q adaptively learns under uniform, coarse-grained state division and performs well without parameter tuning in a giant-swing robot task. However, these results were shown only in simulations. In this study, we test the validity of the LS-Q implemented in a robot in a real environment. In addition, we analyze the learning process to elucidate the mechanism by which the LS-Q adaptively learns under the partially observable environment. We argue that the LS-Q may be a versatile reinforcement learning architecture, which is, despite its simplicity, easily applicable and does not require well-prepared settings. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Mi, Misa; Gould, Douglas
2014-01-01
A Wiki group project was integrated into a neuroscience course for first-year medical students. The project was developed as a self-directed, collaborative learning task to help medical students review course content and make clinically important connections. The goals of the project were to enhance students' understanding of key concepts in neuroscience, promote active learning, and reinforce their information literacy skills. The objective of the exploratory study was to provide a formative evaluation of the Wiki group project and to examine how Wiki technology was utilized to enhance active and collaborative learning of first-year medical students in the course and to reinforce information literacy skills.
Katnani, Husam A; Patel, Shaun R; Kwon, Churl-Su; Abdel-Aziz, Samer; Gale, John T; Eskandar, Emad N
2016-01-04
The primate brain has the remarkable ability of mapping sensory stimuli into motor behaviors that can lead to positive outcomes. We have previously shown that during the reinforcement of visual-motor behavior, activity in the caudate nucleus is correlated with the rate of learning. Moreover, phasic microstimulation in the caudate during the reinforcement period was shown to enhance associative learning, demonstrating the importance of temporal specificity to manipulate learning related changes. Here we present evidence that extends upon our previous finding by demonstrating that temporally coordinated phasic deep brain stimulation across both the nucleus accumbens and caudate can further enhance associative learning. Monkeys performed a visual-motor associative learning task and received stimulation at time points critical to learning related changes. Resulting performance revealed an enhancement in the rate, ceiling, and reaction times of learning. Stimulation of each brain region alone or at different time points did not generate the same effect.
Wu, Howard G; Miyamoto, Yohsuke R; Gonzalez Castro, Luis Nicolas; Ölveczky, Bence P; Smith, Maurice A
2014-02-01
Individual differences in motor learning ability are widely acknowledged, yet little is known about the factors that underlie them. Here we explore whether movement-to-movement variability in motor output, a ubiquitous if often unwanted characteristic of motor performance, predicts motor learning ability. Surprisingly, we found that higher levels of task-relevant motor variability predicted faster learning both across individuals and across tasks in two different paradigms, one relying on reward-based learning to shape specific arm movement trajectories and the other relying on error-based learning to adapt movements in novel physical environments. We proceeded to show that training can reshape the temporal structure of motor variability, aligning it with the trained task to improve learning. These results provide experimental support for the importance of action exploration, a key idea from reinforcement learning theory, showing that motor variability facilitates motor learning in humans and that our nervous systems actively regulate it to improve learning.
Temporal structure of motor variability is dynamically regulated and predicts motor learning ability
Wu, Howard G; Miyamoto, Yohsuke R; Castro, Luis Nicolas Gonzalez; Ölveczky, Bence P; Smith, Maurice A
2015-01-01
Individual differences in motor learning ability are widely acknowledged, yet little is known about the factors that underlie them. Here we explore whether movement-to-movement variability in motor output, a ubiquitous if often unwanted characteristic of motor performance, predicts motor learning ability. Surprisingly, we found that higher levels of task-relevant motor variability predicted faster learning both across individuals and across tasks in two different paradigms, one relying on reward-based learning to shape specific arm movement trajectories and the other relying on error-based learning to adapt movements in novel physical environments. We proceeded to show that training can reshape the temporal structure of motor variability, aligning it with the trained task to improve learning. These results provide experimental support for the importance of action exploration, a key idea from reinforcement learning theory, showing that motor variability facilitates motor learning in humans and that our nervous systems actively regulate it to improve learning. PMID:24413700
Shteingart, Hanan; Loewenstein, Yonatan
2016-01-01
There is a long history of experiments in which participants are instructed to generate a long sequence of binary random numbers. The scope of this line of research has shifted over the years from identifying the basic psychological principles and/or the heuristics that lead to deviations from randomness, to one of predicting future choices. In this paper, we used generalized linear regression and the framework of Reinforcement Learning in order to address both points. In particular, we used logistic regression analysis in order to characterize the temporal sequence of participants' choices. Surprisingly, a population analysis indicated that the contribution of the most recent trial has only a weak effect on behavior, compared to more preceding trials, a result that seems irreconcilable with standard sequential effects that decay monotonously with the delay. However, when considering each participant separately, we found that the magnitudes of the sequential effect are a monotonous decreasing function of the delay, yet these individual sequential effects are largely averaged out in a population analysis because of heterogeneity. The substantial behavioral heterogeneity in this task is further demonstrated quantitatively by considering the predictive power of the model. We show that a heterogeneous model of sequential dependencies captures the structure available in random sequence generation. Finally, we show that the results of the logistic regression analysis can be interpreted in the framework of reinforcement learning, allowing us to compare the sequential effects in the random sequence generation task to those in an operant learning task. We show that in contrast to the random sequence generation task, sequential effects in operant learning are far more homogenous across the population. These results suggest that in the random sequence generation task, different participants adopt different cognitive strategies to suppress sequential dependencies when generating the "random" sequences.
Age moderates the effect of acute dopamine depletion on passive avoidance learning.
Kelm, Mary Katherine; Boettiger, Charlotte Ann
2015-04-01
Despite extensive links between reinforcement-based learning and dopamine (DA), studies to date have not found consistent effects of acute DA reduction on reinforcement learning in both men and women. Here, we tested the effects of reducing DA on reward- and punishment-based learning using the deterministic passive avoidance learning (PAL) task. We tested 16 (5 female) adults (ages 22-40) in a randomized, cross-over design to determine whether reducing global DA by administering an amino acid beverage deficient in the DA precursors, phenylalanine and tyrosine (P/T[-]), would affect PAL task performance. We found that P/T[-] beverage effects on PAL performance were modulated by age. Specifically, we found that P/T depletion significantly improved learning from punishment with increasing participant age. Participants committed 1.49 fewer passive avoidance errors per additional year of age (95% CI, -0.71 - -2.27, r=-0.74, p=0.001). Moreover, P/T depletion improved learning from punishment in adults (ages 26-40) while it impaired learning from punishment in emerging adults (ages 22-25). We observed similar, but non-significant trends in learning from reward. While there was no overall effect of P/T-depletion on reaction time (RT), there was a relationship between the effect of P/T depletion on PAL performance and RT; those who responded more slowly on the P/T[-] beverage also made more errors on the P/T[-] beverage. When P/T-depletion slowed RT after a correct response, there was a worsening of PAL task performance; there was no similar relationship for the RT after an incorrect response and PAL task performance. Moreover, among emerging adults, changes in mood on the P/T[-] beverage negatively correlated with learning from reward on the P/T[-] beverage. Together, we found that both reward- and punishment-based learning are sensitive to central catecholamine levels, and that these effects of acute DA reduction vary with age. Copyright © 2015 Elsevier Inc. All rights reserved.
English and the Learning-Disabled Student: A Survey of Research.
ERIC Educational Resources Information Center
Siegel, Gerald
The author reviews literature on teaching the learning disabled (LD) in college English classrooms. He notes work by V. Davis which suggests the following methods and techniques: (1) reinforce coping techniques the students have already developed; (2) provide help with reading tasks through summaries of vocabulary; (3) allow taping of classes (to…
Reinforcement Learning Deficits in People with Schizophrenia Persist after Extended Trials
Cicero, David C.; Martin, Elizabeth A.; Becker, Theresa M.; Kerns, John G.
2014-01-01
Previous research suggests that people with schizophrenia have difficulty learning from positive feedback and when learning needs to occur rapidly. However, they seem to have relatively intact learning from negative feedback when learning occurs gradually. Participants are typically given a limited amount of acquisition trials to learn the reward contingencies and then tested about what they learned. The current study examined whether participants with schizophrenia continue to display these deficits when given extra time to learn the contingences. Participants with schizophrenia and matched healthy controls completed the Probabilistic Selection Task, which measures positive and negative feedback learning separately. Participants with schizophrenia showed a deficit in learning from both positive and negative feedback. These reward learning deficits persisted even if people with schizophrenia are given extra time (up to 10 blocks of 60 trials) to learn the reward contingencies. These results suggest that the observed deficits cannot be attributed solely to slower learning and instead reflect a specific deficit in reinforcement learning. PMID:25172610
Reinforcement learning deficits in people with schizophrenia persist after extended trials.
Cicero, David C; Martin, Elizabeth A; Becker, Theresa M; Kerns, John G
2014-12-30
Previous research suggests that people with schizophrenia have difficulty learning from positive feedback and when learning needs to occur rapidly. However, they seem to have relatively intact learning from negative feedback when learning occurs gradually. Participants are typically given a limited amount of acquisition trials to learn the reward contingencies and then tested about what they learned. The current study examined whether participants with schizophrenia continue to display these deficits when given extra time to learn the contingences. Participants with schizophrenia and matched healthy controls completed the Probabilistic Selection Task, which measures positive and negative feedback learning separately. Participants with schizophrenia showed a deficit in learning from both positive feedback and negative feedback. These reward learning deficits persisted even if people with schizophrenia are given extra time (up to 10 blocks of 60 trials) to learn the reward contingencies. These results suggest that the observed deficits cannot be attributed solely to slower learning and instead reflect a specific deficit in reinforcement learning. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
The Effect of Context Change on Simple Acquisition Disappears with Increased Training
ERIC Educational Resources Information Center
Leon, Samuel P.; Abad, Maria J. F.; Rosas, Juan M.
2010-01-01
The goal of this experiment was to assess the impact that experience with a task has on the context specificity of the learning that occurs. Participants performed an instrumental task within a computer game where different responses were performed in the presence of discriminative stimuli to obtain reinforcers. The number of training trials (3,…
Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters.
Khamassi, Mehdi; Enel, Pierre; Dominey, Peter Ford; Procyk, Emmanuel
2013-01-01
Converging evidence suggest that the medial prefrontal cortex (MPFC) is involved in feedback categorization, performance monitoring, and task monitoring, and may contribute to the online regulation of reinforcement learning (RL) parameters that would affect decision-making processes in the lateral prefrontal cortex (LPFC). Previous neurophysiological experiments have shown MPFC activities encoding error likelihood, uncertainty, reward volatility, as well as neural responses categorizing different types of feedback, for instance, distinguishing between choice errors and execution errors. Rushworth and colleagues have proposed that the involvement of MPFC in tracking the volatility of the task could contribute to the regulation of one of RL parameters called the learning rate. We extend this hypothesis by proposing that MPFC could contribute to the regulation of other RL parameters such as the exploration rate and default action values in case of task shifts. Here, we analyze the sensitivity to RL parameters of behavioral performance in two monkey decision-making tasks, one with a deterministic reward schedule and the other with a stochastic one. We show that there exist optimal parameter values specific to each of these tasks, that need to be found for optimal performance and that are usually hand-tuned in computational models. In contrast, automatic online regulation of these parameters using some heuristics can help producing a good, although non-optimal, behavioral performance in each task. We finally describe our computational model of MPFC-LPFC interaction used for online regulation of the exploration rate and its application to a human-robot interaction scenario. There, unexpected uncertainties are produced by the human introducing cued task changes or by cheating. The model enables the robot to autonomously learn to reset exploration in response to such uncertain cues and events. The combined results provide concrete evidence specifying how prefrontal cortical subregions may cooperate to regulate RL parameters. It also shows how such neurophysiologically inspired mechanisms can control advanced robots in the real world. Finally, the model's learning mechanisms that were challenged in the last robotic scenario provide testable predictions on the way monkeys may learn the structure of the task during the pretraining phase of the previous laboratory experiments. Copyright © 2013 Elsevier B.V. All rights reserved.
What is the optimal task difficulty for reinforcement learning of brain self-regulation?
Bauer, Robert; Vukelić, Mathias; Gharabaghi, Alireza
2016-09-01
The balance between action and reward during neurofeedback may influence reinforcement learning of brain self-regulation. Eleven healthy volunteers participated in three runs of motor imagery-based brain-machine interface feedback where a robot passively opened the hand contingent to β-band modulation. For each run, the β-desynchronization threshold to initiate the hand robot movement increased in difficulty (low, moderate, and demanding). In this context, the incentive to learn was estimated by the change of reward per action, operationalized as the change in reward duration per movement onset. Variance analysis revealed a significant interaction between threshold difficulty and the relationship between reward duration and number of movement onsets (p<0.001), indicating a negative learning incentive for low difficulty, but a positive learning incentive for moderate and demanding runs. Exploration of different thresholds in the same data set indicated that the learning incentive peaked at higher thresholds than the threshold which resulted in maximum classification accuracy. Specificity is more important than sensitivity of neurofeedback for reinforcement learning of brain self-regulation. Learning efficiency requires adequate challenge by neurofeedback interventions. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons
Frémaux, Nicolas; Sprekeler, Henning; Gerstner, Wulfram
2013-01-01
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity. PMID:23592970
Reinforcement learning using a continuous time actor-critic framework with spiking neurons.
Frémaux, Nicolas; Sprekeler, Henning; Gerstner, Wulfram
2013-04-01
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.
Decker, Johannes H.; Otto, A. Ross; Daw, Nathaniel D.; Hartley, Catherine A.
2016-01-01
Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative strategies has not been well characterized. In this study, children, adolescents, and adults, performed a sequential reinforcement-learning task that enables estimation of model-based and model-free contributions to choice. Whereas a model-free strategy was evident in choice behavior across all age groups, evidence of a model-based strategy only emerged during adolescence and continued to increase into adulthood. These results suggest that recruitment of model-based valuation systems represents a critical cognitive component underlying the gradual maturation of goal-directed behavior. PMID:27084852
Learning to cooperate in solving the traveling salesman problem.
Qi, Dehu; Sun, Ron
2005-01-01
A cooperative team of agents may perform many tasks better than single agents. The question is how cooperation among self-interested agents should be achieved. It is important that, while we encourage cooperation among agents in a team, we maintain autonomy of individual agents as much as possible, so as to maintain flexibility and generality. This paper presents an approach based on bidding utilizing reinforcement values acquired through reinforcement learning. We tested and analyzed this approach and demonstrated that a team indeed performed better than the best single agent as well as the average of single agents.
ERIC Educational Resources Information Center
Firestone, Philip; Douglas, Virginia I.
1977-01-01
Impulsive and reflective children performed in a discrimination learning task which included four reinforcement conditions: verbal-reward, verbal-punishment, material-reward, and material-punishment. (SB)
Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework.
Gershman, Samuel J; Daw, Nathaniel D
2017-01-03
We review the psychology and neuroscience of reinforcement learning (RL), which has experienced significant progress in the past two decades, enabled by the comprehensive experimental study of simple learning and decision-making tasks. However, one challenge in the study of RL is computational: The simplicity of these tasks ignores important aspects of reinforcement learning in the real world: (a) State spaces are high-dimensional, continuous, and partially observable; this implies that (b) data are relatively sparse and, indeed, precisely the same situation may never be encountered twice; furthermore, (c) rewards depend on the long-term consequences of actions in ways that violate the classical assumptions that make RL tractable. A seemingly distinct challenge is that, cognitively, theories of RL have largely involved procedural and semantic memory, the way in which knowledge about action values or world models extracted gradually from many experiences can drive choice. This focus on semantic memory leaves out many aspects of memory, such as episodic memory, related to the traces of individual events. We suggest that these two challenges are related. The computational challenge can be dealt with, in part, by endowing RL systems with episodic memory, allowing them to (a) efficiently approximate value functions over complex state spaces, (b) learn with very little data, and (c) bridge long-term dependencies between actions and rewards. We review the computational theory underlying this proposal and the empirical evidence to support it. Our proposal suggests that the ubiquitous and diverse roles of memory in RL may function as part of an integrated learning system.
Salvador, Alexandre; Worbe, Yulia; Delorme, Cécile; Coricelli, Giorgio; Gaillard, Raphaël; Robbins, Trevor W; Hartmann, Andreas; Palminteri, Stefano
2017-07-24
The dopamine partial agonist aripiprazole is increasingly used to treat pathologies for which other antipsychotics are indicated because it displays fewer side effects, such as sedation and depression-like symptoms, than other dopamine receptor antagonists. Previously, we showed that aripiprazole may protect motivational function by preserving reinforcement-related signals used to sustain reward-maximization. However, the effect of aripiprazole on more cognitive facets of human reinforcement learning, such as learning from the forgone outcomes of alternative courses of action (i.e., counterfactual learning), is unknown. To test the influence of aripiprazole on counterfactual learning, we administered a reinforcement learning task that involves both direct learning from obtained outcomes and indirect learning from forgone outcomes to two groups of Gilles de la Tourette (GTS) patients, one consisting of patients who were completely unmedicated and the other consisting of patients who were receiving aripiprazole monotherapy, and to healthy subjects. We found that whereas learning performance improved in the presence of counterfactual feedback in both healthy controls and unmedicated GTS patients, this was not the case in aripiprazole-medicated GTS patients. Our results suggest that whereas aripiprazole preserves direct learning of action-outcome associations, it may impair more complex inferential processes, such as counterfactual learning from forgone outcomes, in GTS patients treated with this medication.
Rodríguez-Gironés, Miguel A.; Trillo, Alejandro; Corcobado, Guadalupe
2013-01-01
The results of behavioural experiments provide important information about the structure and information-processing abilities of the visual system. Nevertheless, if we want to infer from behavioural data how the visual system operates, it is important to know how different learning protocols affect performance and to devise protocols that minimise noise in the response of experimental subjects. The purpose of this work was to investigate how reinforcement schedule and individual variability affect the learning process in a colour discrimination task. Free-flying bumblebees were trained to discriminate between two perceptually similar colours. The target colour was associated with sucrose solution, and the distractor could be associated with water or quinine solution throughout the experiment, or with one substance during the first half of the experiment and the other during the second half. Both acquisition and final performance of the discrimination task (measured as proportion of correct choices) were determined by the choice of reinforcer during the first half of the experiment: regardless of whether bees were trained with water or quinine during the second half of the experiment, bees trained with quinine during the first half learned the task faster and performed better during the whole experiment. Our results confirm that the choice of stimuli used during training affects the rate at which colour discrimination tasks are acquired and show that early contact with a strongly aversive stimulus can be sufficient to maintain high levels of attention during several hours. On the other hand, bees which took more time to decide on which flower to alight were more likely to make correct choices than bees which made fast decisions. This result supports the existence of a trade-off between foraging speed and accuracy, and highlights the importance of measuring choice latencies during behavioural experiments focusing on cognitive abilities. PMID:23951186
Impairments in action-outcome learning in schizophrenia.
Morris, Richard W; Cyrzon, Chad; Green, Melissa J; Le Pelley, Mike E; Balleine, Bernard W
2018-03-03
Learning the causal relation between actions and their outcomes (AO learning) is critical for goal-directed behavior when actions are guided by desire for the outcome. This can be contrasted with habits that are acquired by reinforcement and primed by prevailing stimuli, in which causal learning plays no part. Recently, we demonstrated that goal-directed actions are impaired in schizophrenia; however, whether this deficit exists alongside impairments in habit or reinforcement learning is unknown. The present study distinguished deficits in causal learning from reinforcement learning in schizophrenia. We tested people with schizophrenia (SZ, n = 25) and healthy adults (HA, n = 25) in a vending machine task. Participants learned two action-outcome contingencies (e.g., push left to get a chocolate M&M, push right to get a cracker), and they also learned one contingency was degraded by delivery of noncontingent outcomes (e.g., free M&Ms), as well as changes in value by outcome devaluation. Both groups learned the best action to obtain rewards; however, SZ did not distinguish the more causal action when one AO contingency was degraded. Moreover, action selection in SZ was insensitive to changes in outcome value unless feedback was provided, and this was related to the deficit in AO learning. The failure to encode the causal relation between action and outcome in schizophrenia occurred without any apparent deficit in reinforcement learning. This implies that poor goal-directed behavior in schizophrenia cannot be explained by a more primary deficit in reward learning such as insensitivity to reward value or reward prediction errors.
Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning.
Feng, Yuntian; Zhang, Hongjun; Hao, Wenning; Chen, Gang
2017-01-01
We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q -Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.
Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning
Zhang, Hongjun; Chen, Gang
2017-01-01
We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q-Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score. PMID:28894463
Intrinsically motivated reinforcement learning for human-robot interaction in the real-world.
Qureshi, Ahmed Hussain; Nakamura, Yutaka; Yoshikawa, Yuichiro; Ishiguro, Hiroshi
2018-03-26
For a natural social human-robot interaction, it is essential for a robot to learn the human-like social skills. However, learning such skills is notoriously hard due to the limited availability of direct instructions from people to teach a robot. In this paper, we propose an intrinsically motivated reinforcement learning framework in which an agent gets the intrinsic motivation-based rewards through the action-conditional predictive model. By using the proposed method, the robot learned the social skills from the human-robot interaction experiences gathered in the real uncontrolled environments. The results indicate that the robot not only acquired human-like social skills but also took more human-like decisions, on a test dataset, than a robot which received direct rewards for the task achievement. Copyright © 2018 Elsevier Ltd. All rights reserved.
Myers, Catherine E.; Moustafa, Ahmed A.; Sheynin, Jony; VanMeenen, Kirsten M.; Gilbertson, Mark W.; Orr, Scott P.; Beck, Kevin D.; Pang, Kevin C. H.; Servatius, Richard J.
2013-01-01
Post-traumatic stress disorder (PTSD) symptoms include behavioral avoidance which is acquired and tends to increase with time. This avoidance may represent a general learning bias; indeed, individuals with PTSD are often faster than controls on acquiring conditioned responses based on physiologically-aversive feedback. However, it is not clear whether this learning bias extends to cognitive feedback, or to learning from both reward and punishment. Here, male veterans with self-reported current, severe PTSD symptoms (PTSS group) or with few or no PTSD symptoms (control group) completed a probabilistic classification task that included both reward-based and punishment-based trials, where feedback could take the form of reward, punishment, or an ambiguous “no-feedback” outcome that could signal either successful avoidance of punishment or failure to obtain reward. The PTSS group outperformed the control group in total points obtained; the PTSS group specifically performed better than the control group on reward-based trials, with no difference on punishment-based trials. To better understand possible mechanisms underlying observed performance, we used a reinforcement learning model of the task, and applied maximum likelihood estimation techniques to derive estimated parameters describing individual participants’ behavior. Estimations of the reinforcement value of the no-feedback outcome were significantly greater in the control group than the PTSS group, suggesting that the control group was more likely to value this outcome as positively reinforcing (i.e., signaling successful avoidance of punishment). This is consistent with the control group’s generally poorer performance on reward trials, where reward feedback was to be obtained in preference to the no-feedback outcome. Differences in the interpretation of ambiguous feedback may contribute to the facilitated reinforcement learning often observed in PTSD patients, and may in turn provide new insight into how pathological behaviors are acquired and maintained in PTSD. PMID:24015254
Deep Direct Reinforcement Learning for Financial Signal Representation and Trading.
Deng, Yue; Bao, Feng; Kong, Youyong; Ren, Zhiquan; Dai, Qionghai
2017-03-01
Can we train the computer to beat experienced traders for financial assert trading? In this paper, we try to address this challenge by introducing a recurrent deep neural network (NN) for real-time financial signal representation and trading. Our model is inspired by two biological-related learning concepts of deep learning (DL) and reinforcement learning (RL). In the framework, the DL part automatically senses the dynamic market condition for informative feature learning. Then, the RL module interacts with deep representations and makes trading decisions to accumulate the ultimate rewards in an unknown environment. The learning system is implemented in a complex NN that exhibits both the deep and recurrent structures. Hence, we propose a task-aware backpropagation through time method to cope with the gradient vanishing issue in deep training. The robustness of the neural system is verified on both the stock and the commodity future markets under broad testing conditions.
Phillips, Benjamin U; Dewan, Sigma; Nilsson, Simon R O; Robbins, Trevor W; Heath, Christopher J; Saksida, Lisa M; Bussey, Timothy J; Alsiö, Johan
2018-04-22
Dysregulation of the serotonin (5-HT) system is a pathophysiological component in major depressive disorder (MDD), a condition closely associated with abnormal emotional responsivity to positive and negative feedback. However, the precise mechanism through which 5-HT tone biases feedback responsivity remains unclear. 5-HT2C receptors (5-HT2CRs) are closely linked with aspects of depressive symptomatology, including abnormalities in reinforcement processes and response to stress. Thus, we aimed to determine the impact of 5-HT2CR function on response to feedback in biased reinforcement learning. We used two touchscreen assays designed to assess the impact of positive and negative feedback on probabilistic reinforcement in mice, including a novel valence-probe visual discrimination (VPVD) and a probabilistic reversal learning procedure (PRL). Systemic administration of a 5-HT2CR agonist and antagonist resulted in selective changes in the balance of feedback sensitivity bias on these tasks. Specifically, on VPVD, SB 242084, the 5-HT2CR antagonist, impaired acquisition of a discrimination dependent on appropriate integration of positive and negative feedback. On PRL, SB 242084 at 1 mg/kg resulted in changes in behaviour consistent with reduced sensitivity to positive feedback. In contrast, WAY 163909, the 5-HT2CR agonist, resulted in changes associated with increased sensitivity to positive feedback and decreased sensitivity to negative feedback. These results suggest that 5-HT2CRs tightly regulate feedback sensitivity bias in mice with consequent effects on learning and cognitive flexibility and specify a framework for the influence of 5-HT2CRs on sensitivity to reinforcement.
ERIC Educational Resources Information Center
Broomfield, Laura; McHugh, Louise; Reed, Phil
2010-01-01
Stimulus overselectivity occurs when only one of potentially many aspects of the environment controls behavior. Adult participants were trained and tested on a trial-and-error discrimination learning task while engaging in a concurrent load task, and overselectivity emerged. When responding to the overselected stimulus was reduced by reinforcing a…
Neural Modularity Helps Organisms Evolve to Learn New Skills without Forgetting Old Skills
Ellefsen, Kai Olav; Mouret, Jean-Baptiste; Clune, Jeff
2015-01-01
A long-standing goal in artificial intelligence is creating agents that can learn a variety of different skills for different problems. In the artificial intelligence subfield of neural networks, a barrier to that goal is that when agents learn a new skill they typically do so by losing previously acquired skills, a problem called catastrophic forgetting. That occurs because, to learn the new task, neural learning algorithms change connections that encode previously acquired skills. How networks are organized critically affects their learning dynamics. In this paper, we test whether catastrophic forgetting can be reduced by evolving modular neural networks. Modularity intuitively should reduce learning interference between tasks by separating functionality into physically distinct modules in which learning can be selectively turned on or off. Modularity can further improve learning by having a reinforcement learning module separate from sensory processing modules, allowing learning to happen only in response to a positive or negative reward. In this paper, learning takes place via neuromodulation, which allows agents to selectively change the rate of learning for each neural connection based on environmental stimuli (e.g. to alter learning in specific locations based on the task at hand). To produce modularity, we evolve neural networks with a cost for neural connections. We show that this connection cost technique causes modularity, confirming a previous result, and that such sparsely connected, modular networks have higher overall performance because they learn new skills faster while retaining old skills more and because they have a separate reinforcement learning module. Our results suggest (1) that encouraging modularity in neural networks may help us overcome the long-standing barrier of networks that cannot learn new skills without forgetting old ones, and (2) that one benefit of the modularity ubiquitous in the brains of natural animals might be to alleviate the problem of catastrophic forgetting. PMID:25837826
Neural modularity helps organisms evolve to learn new skills without forgetting old skills.
Ellefsen, Kai Olav; Mouret, Jean-Baptiste; Clune, Jeff
2015-04-01
A long-standing goal in artificial intelligence is creating agents that can learn a variety of different skills for different problems. In the artificial intelligence subfield of neural networks, a barrier to that goal is that when agents learn a new skill they typically do so by losing previously acquired skills, a problem called catastrophic forgetting. That occurs because, to learn the new task, neural learning algorithms change connections that encode previously acquired skills. How networks are organized critically affects their learning dynamics. In this paper, we test whether catastrophic forgetting can be reduced by evolving modular neural networks. Modularity intuitively should reduce learning interference between tasks by separating functionality into physically distinct modules in which learning can be selectively turned on or off. Modularity can further improve learning by having a reinforcement learning module separate from sensory processing modules, allowing learning to happen only in response to a positive or negative reward. In this paper, learning takes place via neuromodulation, which allows agents to selectively change the rate of learning for each neural connection based on environmental stimuli (e.g. to alter learning in specific locations based on the task at hand). To produce modularity, we evolve neural networks with a cost for neural connections. We show that this connection cost technique causes modularity, confirming a previous result, and that such sparsely connected, modular networks have higher overall performance because they learn new skills faster while retaining old skills more and because they have a separate reinforcement learning module. Our results suggest (1) that encouraging modularity in neural networks may help us overcome the long-standing barrier of networks that cannot learn new skills without forgetting old ones, and (2) that one benefit of the modularity ubiquitous in the brains of natural animals might be to alleviate the problem of catastrophic forgetting.
Katan, Pesia; Kahta, Shani; Sasson, Ayelet; Schiff, Rachel
2017-07-01
Graph complexity as measured by topological entropy has been previously shown to affect performance on artificial grammar learning tasks among typically developing children. The aim of this study was to examine the effect of graph complexity on implicit sequential learning among children with developmental dyslexia. Our goal was to determine whether children's performance depends on the complexity level of the grammar system learned. We conducted two artificial grammar learning experiments that compared performance of children with developmental dyslexia with that of age- and reading level-matched controls. Experiment 1 was a high topological entropy artificial grammar learning task that aimed to establish implicit learning phenomena in children with developmental dyslexia using previously published experimental conditions. Experiment 2 is a lower topological entropy variant of that task. Results indicated that given a high topological entropy grammar system, children with developmental dyslexia who were similar to the reading age-matched control group had substantial difficulty in performing the task as compared to typically developing children, who exhibited intact implicit learning of the grammar. On the other hand, when tested on a lower topological entropy grammar system, all groups performed above chance level, indicating that children with developmental dyslexia were able to identify rules from a given grammar system. The results reinforced the significance of graph complexity when experimenting with artificial grammar learning tasks, particularly with dyslexic participants.
Zendehrouh, Sareh
2015-11-01
Recent work on decision-making field offers an account of dual-system theory for decision-making process. This theory holds that this process is conducted by two main controllers: a goal-directed system and a habitual system. In the reinforcement learning (RL) domain, the habitual behaviors are connected with model-free methods, in which appropriate actions are learned through trial-and-error experiences. However, goal-directed behaviors are associated with model-based methods of RL, in which actions are selected using a model of the environment. Studies on cognitive control also suggest that during processes like decision-making, some cortical and subcortical structures work in concert to monitor the consequences of decisions and to adjust control according to current task demands. Here a computational model is presented based on dual system theory and cognitive control perspective of decision-making. The proposed model is used to simulate human performance on a variant of probabilistic learning task. The basic proposal is that the brain implements a dual controller, while an accompanying monitoring system detects some kinds of conflict including a hypothetical cost-conflict one. The simulation results address existing theories about two event-related potentials, namely error related negativity (ERN) and feedback related negativity (FRN), and explore the best account of them. Based on the results, some testable predictions are also presented. Copyright © 2015 Elsevier Ltd. All rights reserved.
Learning outdoors: male lizards show flexible spatial learning under semi-natural conditions
Noble, Daniel W. A.; Carazo, Pau; Whiting, Martin J.
2012-01-01
Spatial cognition is predicted to be a fundamental component of fitness in many lizard species, and yet some studies suggest that it is relatively slow and inflexible. However, such claims are based on work conducted using experimental designs or in artificial contexts that may underestimate their cognitive abilities. We used a biologically realistic experimental procedure (using simulated predatory attacks) to study spatial learning and its flexibility in the lizard Eulamprus quoyii in semi-natural outdoor enclosures under similar conditions to those experienced by lizards in the wild. To evaluate the flexibility of spatial learning, we conducted a reversal spatial-learning task in which positive and negative reinforcements of learnt spatial stimuli were switched. Nineteen (32%) male lizards learnt both tasks within 10 days (spatial task mean: 8.16 ± 0.69 (s.e.) and reversal spatial task mean: 10.74 ± 0.98 (s.e.) trials). We demonstrate that E. quoyii are capable of flexible spatial learning and suggest that future studies focus on a range of lizard species which differ in phylogeny and/or ecology, using biologically relevant cognitive tasks, in an effort to bridge the cognitive divide between ecto- and endotherms. PMID:23075525
Dynamic Response-by-Response Models of Matching Behavior in Rhesus Monkeys
Lau, Brian; Glimcher, Paul W
2005-01-01
We studied the choice behavior of 2 monkeys in a discrete-trial task with reinforcement contingencies similar to those Herrnstein (1961) used when he described the matching law. In each session, the monkeys experienced blocks of discrete trials at different relative-reinforcer frequencies or magnitudes with unsignalled transitions between the blocks. Steady-state data following adjustment to each transition were well characterized by the generalized matching law; response ratios undermatched reinforcer frequency ratios but matched reinforcer magnitude ratios. We modelled response-by-response behavior with linear models that used past reinforcers as well as past choices to predict the monkeys' choices on each trial. We found that more recently obtained reinforcers more strongly influenced choice behavior. Perhaps surprisingly, we also found that the monkeys' actions were influenced by the pattern of their own past choices. It was necessary to incorporate both past reinforcers and past choices in order to accurately capture steady-state behavior as well as the fluctuations during block transitions and the response-by-response patterns of behavior. Our results suggest that simple reinforcement learning models must account for the effects of past choices to accurately characterize behavior in this task, and that models with these properties provide a conceptual tool for studying how both past reinforcers and past choices are integrated by the neural systems that generate behavior. PMID:16596980
Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C
2014-03-01
Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.
Application of a model of instrumental conditioning to mobile robot control
NASA Astrophysics Data System (ADS)
Saksida, Lisa M.; Touretzky, D. S.
1997-09-01
Instrumental conditioning is a psychological process whereby an animal learns to associate its actions with their consequences. This type of learning is exploited in animal training techniques such as 'shaping by successive approximations,' which enables trainers to gradually adjust the animal's behavior by giving strategically timed reinforcements. While this is similar in principle to reinforcement learning, the real phenomenon includes many subtle effects not considered in the machine learning literature. In addition, a good deal of domain information is utilized by an animal learning a new task; it does not start from scratch every time it learns a new behavior. For these reasons, it is not surprising that mobile robot learning algorithms have yet to approach the sophistication and robustness of animal learning. A serious attempt to model instrumental learning could prove fruitful for improving machine learning techniques. In the present paper, we develop a computational theory of shaping at a level appropriate for controlling mobile robots. The theory is based on a series of mechanisms for 'behavior editing,' in which pre-existing behaviors, either innate or previously learned, can be dramatically changed in magnitude, shifted in direction, or otherwise manipulated so as to produce new behavioral routines. We have implemented our theory on Amelia, an RWI B21 mobile robot equipped with a gripper and color video camera. We provide results from training Amelia on several tasks, all of which were constructed as variations of one innate behavior, object-pursuit.
Strauss, Gregory P.; Frank, Michael J.; Waltz, James A.; Kasanova, Zuzana; Herbener, Ellen S.; Gold, James M.
2011-01-01
Background Negative symptoms are core features of schizophrenia; however, the cognitive and neural basis for individual negative symptom domains remains unclear. Converging evidence suggests a role for striatal and prefrontal dopamine in reward learning and the exploration of actions that might produce outcomes that are better than the status quo. The current study examines whether deficits in reinforcement learning and uncertainty-driven exploration predict specific negative symptoms domains. Methods We administered a temporal decision making task, which required trial-by-trial adjustment of reaction time (RT) to maximize reward receipt, to 51 patients with schizophrenia and 39 age-matched healthy controls. Task conditions were designed such that expected value (probability * magnitude) increased (IEV), decreased (DEV), or remained constant (CEV) with increasing response times. Computational analyses were applied to estimate the degree to which trial-by-trial responses are influenced by reinforcement history. Results Individuals with schizophrenia showed impaired Go learning, but intact NoGo learning relative to controls. These effects were pronounced as a function of global measures of negative symptom. Uncertainty-based exploration was substantially reduced in individuals with schizophrenia, and selectively correlated with clinical ratings of anhedonia. Conclusions Schizophrenia patients, particularly those with high negative symptoms, failed to speed RT's to increase positive outcomes and showed reduced tendency to explore when alternative actions could lead to better outcomes than the status quo. Results are interpreted in the context of current computational, genetic, and pharmacological data supporting the roles of striatal and prefrontal dopamine in these processes. PMID:21168124
The effects of aging on the interaction between reinforcement learning and attention.
Radulescu, Angela; Daniel, Reka; Niv, Yael
2016-11-01
Reinforcement learning (RL) in complex environments relies on selective attention to uncover those aspects of the environment that are most predictive of reward. Whereas previous work has focused on age-related changes in RL, it is not known whether older adults learn differently from younger adults when selective attention is required. In 2 experiments, we examined how aging affects the interaction between RL and selective attention. Younger and older adults performed a learning task in which only 1 stimulus dimension was relevant to predicting reward, and within it, 1 "target" feature was the most rewarding. Participants had to discover this target feature through trial and error. In Experiment 1, stimuli varied on 1 or 3 dimensions and participants received hints that revealed the target feature, the relevant dimension, or gave no information. Group-related differences in accuracy and RTs differed systematically as a function of the number of dimensions and the type of hint available. In Experiment 2 we used trial-by-trial computational modeling of the learning process to test for age-related differences in learning strategies. Behavior of both young and older adults was explained well by a reinforcement-learning model that uses selective attention to constrain learning. However, the model suggested that older adults restricted their learning to fewer features, employing more focused attention than younger adults. Furthermore, this difference in strategy predicted age-related deficits in accuracy. We discuss these results suggesting that a narrower filter of attention may reflect an adaptation to the reduced capabilities of the reinforcement learning system. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Teng, Cindy; Otero, Marcela; Geraci, Marilla; Blair, R J R; Pine, Daniel S; Grillon, Christian; Blair, Karina S
2016-03-30
There is preliminary data indicating that patients with generalized anxiety disorder (GAD) show impairment on decision-making tasks requiring the appropriate representation of reinforcement value. The current study aimed to extend this literature using the passive avoidance (PA) learning task, where the participant has to learn to respond to stimuli that engender reward and avoid responding to stimuli that engender punishment. Six stimuli engendering reward and six engendering punishment are presented once per block for 10 blocks of trials. Thirty-nine medication-free patients with GAD and 29 age-, IQ and gender matched healthy comparison individuals performed the task. In addition, indexes of social functioning as assessed by the Global Assessment of Functioning (GAF) scale were obtained to allow for correlational analyzes of potential relations between cognitive and social impairments. The results revealed a Group-by-Error Type-by-Block interaction; patients with GAD committed significantly more commission (passive avoidance) errors than comparison individuals in the later blocks (blocks 7,8, and 9). In addition, the extent of impairment on these blocks was associated with their functional impairment as measured by the GAF scale. These results link GAD with anomalous decision-making and indicate that a potential problem in reinforcement representation may contribute to the severity of expression of their disorder. Copyright © 2016. Published by Elsevier Ireland Ltd.
Short Term Gains, Long Term Pains: How Cues About State Aid Learning in Dynamic Environments
Gureckis, Todd M.; Love, Bradley C.
2009-01-01
Successful investors seeking returns, animals foraging for food, and pilots controlling aircraft all must take into account how their current decisions will impact their future standing. One challenge facing decision makers is that options that appear attractive in the short-term may not turn out best in the long run. In this paper, we explore human learning in a dynamic decision-making task which places short- and long-term rewards in conflict. Our goal in these studies was to evaluate how people’s mental representation of a task affects their ability to discover an optimal decision strategy. We find that perceptual cues that readily align with the underlying state of the task environment help people overcome the impulsive appeal of short-term rewards. Our experimental manipulations, predictions, and analyses are motivated by current work in reinforcement learning which details how learners value delayed outcomes in sequential tasks and the importance that “state” identification plays in effective learning. PMID:19427635
ERIC Educational Resources Information Center
Polli, Frida E.; Barton, Jason J. S.; Thakkar, Katharine N.; Greve, Douglas N.; Goff, Donald C.; Rauch, Scott L.; Manoach, Dara S.
2008-01-01
To perform well on any challenging task, it is necessary to evaluate your performance so that you can learn from errors. Recent theoretical and experimental work suggests that the neural sequellae of error commission in a dorsal anterior cingulate circuit index a type of contingency- or reinforcement-based learning, while activation in a rostral…
ERIC Educational Resources Information Center
Lytle, Rebecca; Todd, Teri
2009-01-01
Shane, who is in Ms. Jones's third-grade class, has autism. Ms. Jones has provided him with a schedule, a picture communication system, and a positive reinforcement system for his learning tasks. He is demonstrating progress toward his individualized education program (IEP) goals, but he still struggles with attending for any length of time,…
ChemOkey: A Game to Reinforce Nomenclature
ERIC Educational Resources Information Center
Kavak, Nusret
2012-01-01
Learning the symbolic language of chemistry is a difficult task that can be frustrating for students. This article introduces a game, ChemOkey, that can help students learn the names and symbols of common ions and their compounds in a fun environment. ChemOkey, a game similar to Rummikub, is played with a set of 106 plastic or wooden tiles. The…
Safe Exploration Algorithms for Reinforcement Learning Controllers.
Mannucci, Tommaso; van Kampen, Erik-Jan; de Visser, Cornelis; Chu, Qiping
2018-04-01
Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.
van de Vijver, Irene; Ridderinkhof, K Richard; Harsay, Helga; Reneman, Liesbeth; Cavanagh, James F; Buitenweg, Jessika I V; Cohen, Michael X
2016-10-01
Reinforcement learning (RL) is supported by a network of striatal and frontal cortical structures that are connected through white-matter fiber bundles. With age, the integrity of these white-matter connections declines. The role of structural frontostriatal connectivity in individual and age-related differences in RL is unclear, although local white-matter density and diffusivity have been linked to individual differences in RL. Here we show that frontostriatal tract counts in young human adults (aged 18-28), as assessed noninvasively with diffusion-weighted magnetic resonance imaging and probabilistic tractography, positively predicted individual differences in RL when learning was difficult (70% valid feedback). In older adults (aged 63-87), in contrast, learning under both easy (90% valid feedback) and difficult conditions was predicted by tract counts in the same frontostriatal network. Furthermore, network-level analyses showed a double dissociation between the task-relevant networks in young and older adults, suggesting that older adults relied on different frontostriatal networks than young adults to obtain the same task performance. These results highlight the importance of successful information integration across striatal and frontal regions during RL, especially with variable outcomes. Copyright © 2016 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Donohue, Melanie M.; Casey, Laura Baylot; Bicard, David F.; Bicard, Sara E.
2012-01-01
Children with Autism Spectrum Disorder (ASD) are faced with many challenging behaviors that could impede their learning. One commonly reported problem behavior is noncompliance, which is often defined as a delay in response (latency), decrease in rate of responding (fluency), or failure to complete a task. This failure to comply in an appropriate…
ERIC Educational Resources Information Center
Comer, Debra R.; Holbrook, Robert L., Jr.
2012-01-01
The authors present an efficient and easy-to-implement experiential exercise that reinforces for students key concepts about task groups (i.e., group cohesiveness, conflict within groups, group effectiveness, group norms, and group roles). The exercise, which uses a documentary about the making of Fleetwood Mac's "Rumours" album to demonstrate the…
Goal-seeking neural net for recall and recognition
NASA Astrophysics Data System (ADS)
Omidvar, Omid M.
1990-07-01
Neural networks have been used to mimic cognitive processes which take place in animal brains. The learning capability inherent in neural networks makes them suitable candidates for adaptive tasks such as recall and recognition. The synaptic reinforcements create a proper condition for adaptation, which results in memorization, formation of perception, and higher order information processing activities. In this research a model of a goal seeking neural network is studied and the operation of the network with regard to recall and recognition is analyzed. In these analyses recall is defined as retrieval of stored information where little or no matching is involved. On the other hand recognition is recall with matching; therefore it involves memorizing a piece of information with complete presentation. This research takes the generalized view of reinforcement in which all the signals are potential reinforcers. The neuronal response is considered to be the source of the reinforcement. This local approach to adaptation leads to the goal seeking nature of the neurons as network components. In the proposed model all the synaptic strengths are reinforced in parallel while the reinforcement among the layers is done in a distributed fashion and pipeline mode from the last layer inward. A model of complex neuron with varying threshold is developed to account for inhibitory and excitatory behavior of real neuron. A goal seeking model of a neural network is presented. This network is utilized to perform recall and recognition tasks. The performance of the model with regard to the assigned tasks is presented.
Decker, Johannes H; Otto, A Ross; Daw, Nathaniel D; Hartley, Catherine A
2016-06-01
Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative strategies has not been well characterized. In this study, children, adolescents, and adults performed a sequential reinforcement-learning task that enabled estimation of model-based and model-free contributions to choice. Whereas a model-free strategy was apparent in choice behavior across all age groups, a model-based strategy was absent in children, became evident in adolescents, and strengthened in adults. These results suggest that recruitment of model-based valuation systems represents a critical cognitive component underlying the gradual maturation of goal-directed behavior. © The Author(s) 2016.
Effect of Spatial Titration on Task Performance
ERIC Educational Resources Information Center
Glowacki, Lawrence
1976-01-01
A reinforcement schedule and spatial titration method were used to determine task-reinforcement area separation most preferred and effective in two third-grade boys. Errors in task performance decreased task-reinforcement area separation, while correct responses in task performance increased task-reinforcement area separation. (Author)
Fidelity of the representation of value in decision-making
Dowding, Ben A.
2017-01-01
The ability to make optimal decisions depends on evaluating the expected rewards associated with different potential actions. This process is critically dependent on the fidelity with which reward value information can be maintained in the nervous system. Here we directly probe the fidelity of value representation following a standard reinforcement learning task. The results demonstrate a previously-unrecognized bias in the representation of value: extreme reward values, both low and high, are stored significantly more accurately and precisely than intermediate rewards. The symmetry between low and high rewards pertained despite substantially higher frequency of exposure to high rewards, resulting from preferential exploitation of more rewarding options. The observed variation in fidelity of value representation retrospectively predicted performance on the reinforcement learning task, demonstrating that the bias in representation has an impact on decision-making. A second experiment in which one or other extreme-valued option was omitted from the learning sequence showed that representational fidelity is primarily determined by the relative position of an encoded value on the scale of rewards experienced during learning. Both variability and guessing decreased with the reduction in the number of options, consistent with allocation of a limited representational resource. These findings have implications for existing models of reward-based learning, which typically assume defectless representation of reward value. PMID:28248958
Age Moderates the Effect of Acute Dopamine Depletion on Passive Avoidance Learning
Kelm, Mary Katherine; Boettiger, Charlotte Ann
2015-01-01
Despite extensive links between reinforcement-based learning and dopamine (DA), studies to date have not found consistent effects of acute DA reduction on reinforcement learning in both men and women. Here, we tested the effects of reducing DA on reward- and punishment-based learning using the deterministic passive avoidance learning (PAL) task We tested 16 (5 female) adults (ages 22–40) in a randomized, cross-over design to determine whether reducing global DA by administering an amino acid beverage deficient in the DA precursors, phenylalanine and tyrosine (P/T[−]), would affect performance on the PAL task. We found that P/T[−] beverage effects on PAL performance were modulated by age. In particular, we found that P/T depletion significantly improved learning from punishment with increasing participant age. Participants committed 1.49 fewer passive avoidance errors per additional year of age (95% CI, −0.71 – −2.27, r=−0.74, p=0.001). Moreover, in this small sample, P/T depletion improved learning from punishment in adults (ages 26–40) while it impaired learning from punishment in emerging adults (ages 22–25). We observed similar, but non-significant trends in learning from reward. While there was no overall effect of P/T-depletion on reaction time (RT), there was a relationship between the effect of P/T depletion on PAL performance and RT; those who responded more slowly on the P/T[−] beverage also made more errors on the P/T[−] beverage. When P/T-depletion slowed RT after a correct response, there was a worsening of PAL task performance; there was no similar relationship for the RT after an incorrect response and PAL task performance. Moreover, among emerging adults, changes in mood on the P/T[−] beverage negatively correlated with learning from reward on the P/T[−] beverage. Together, we found that both reward- and punishment-based learning are sensitive to central catecholamine levels, and that these effects of acute DA reduction vary with age. PMID:25636601
Wingard, Jeffrey C; Goodman, Jarid; Leong, Kah-Chung; Packard, Mark G
2015-09-01
Studies employing brain lesion or intracerebral drug infusions in rats have demonstrated a double dissociation between the roles of the hippocampus and dorsolateral striatum in place and response learning. The hippocampus mediates a rapid cognitive learning process underlying place learning, whereas the dorsolateral striatum mediates a relatively slower learning process in which stimulus-response habits underlying response learning are acquired in an incremental fashion. One potential implication of these findings is that hippocampus-dependent learning may benefit from a relative massing of training trials, whereas dorsal striatum-dependent learning may benefit from a relative distribution of training trials. In order to examine this hypothesis, the present study compared the effects of massed (30s inter-trial interval; ITI) or spaced (30min ITI) training on acquisition of a hippocampus-dependent place learning task, and a dorsolateral striatum-dependent response task in a plus-maze. In the place task rats swam from varying start points (N or S) to a hidden escape platform located in a consistent spatial location (W). In the response task rats swam from varying start points (N or S) to a hidden escape platform located in the maze arm consistent with a body-turn response (left). In the place task, rats trained with the massed trial schedule acquired the task quicker than rats trained with the spaced trial schedule. In the response task, rats trained with the spaced trial schedule acquired the task quicker than rats trained with the massed trial schedule. The double dissociation observed suggests that the reinforcement parameters most conducive to effective learning in hippocampus-dependent and dorsolateral striatum-dependent learning may have differential temporal characteristics. Copyright © 2015 Elsevier B.V. All rights reserved.
Miller, Rikki L.A.
2017-01-01
Abstract The mediodorsal nucleus (MD) interacts with medial prefrontal cortex (mPFC) to support learning and adaptive decision-making. MD receives driver (layer 5) and modulatory (layer 6) projections from PFC and is the main source of driver thalamic projections to middle cortical layers of PFC. Little is known about the activity of MD neurons and their influence on PFC during decision-making. We recorded MD neurons in rats performing a dynamic delayed nonmatching to position (dDNMTP) task and compared results to a previous study of mPFC with the same task (Onos et al., 2016). Criterion event-related responses were observed for 22% (254/1179) of neurons recorded in MD, 237 (93%) of which exhibited activity consistent with mPFC response types. More MD than mPFC neurons exhibited responses related to movement (45% vs. 29%) and reinforcement (51% vs. 27%). MD had few responses related to lever presses, and none related to preparation or memory delay, which constituted 43% of event-related activity in mPFC. Comparison of averaged normalized population activity and population response times confirmed the broad similarity of common response types in MD and mPFC and revealed differences in the onset and offset of some response types. Our results show that MD represents information about actions and outcomes essential for decision-making during dDNMTP, consistent with evidence from lesion studies that MD supports reward-based learning and action-selection. These findings support the hypothesis that MD reinforces task-relevant neural activity in PFC that gives rise to adaptive behavior. PMID:29034318
Tulip, Jennifer; Zimmermann, Jonas B; Farningham, David; Jackson, Andrew
2017-06-15
Behavioural training through positive reinforcement techniques is a well-recognised refinement to laboratory animal welfare. Behavioural neuroscience research requires subjects to be trained to perform repetitions of specific behaviours for food/fluid reward. Some animals fail to perform at a sufficient level, limiting the amount of data that can be collected and increasing the number of animals required for each study. We have implemented automated positive reinforcement training systems (comprising a button press task with variable levels of difficulty using LED cues and a fluid reward) at the breeding facility and research facility, to compare performance across these different settings, to pre-screen animals for selection and refine training protocols. Animals learned 1- and 4-choice button tasks within weeks of home enclosure training, with some inter-individual differences. High performance levels (∼200-300 trials per 60min session at ∼80% correct) were obtained without food or fluid restriction. Moreover, training quickly transferred to a laboratory version of the task. Animals that acquired the task at the breeding facility subsequently performed better both in early home enclosure sessions upon arrival at the research facility, and also in laboratory sessions. Automated systems at the breeding facility may be used to pre-screen animals for suitability for behavioural neuroscience research. In combination with conventional training, both the breeding and research facility systems facilitate acquisition and transference of learning. Automated systems have the potential to refine training protocols and minimise requirements for food/fluid control. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Miller, Rikki L A; Francoeur, Miranda J; Gibson, Brett M; Mair, Robert G
2017-01-01
The mediodorsal nucleus (MD) interacts with medial prefrontal cortex (mPFC) to support learning and adaptive decision-making. MD receives driver (layer 5) and modulatory (layer 6) projections from PFC and is the main source of driver thalamic projections to middle cortical layers of PFC. Little is known about the activity of MD neurons and their influence on PFC during decision-making. We recorded MD neurons in rats performing a dynamic delayed nonmatching to position (dDNMTP) task and compared results to a previous study of mPFC with the same task (Onos et al., 2016). Criterion event-related responses were observed for 22% (254/1179) of neurons recorded in MD, 237 (93%) of which exhibited activity consistent with mPFC response types. More MD than mPFC neurons exhibited responses related to movement (45% vs. 29%) and reinforcement (51% vs. 27%). MD had few responses related to lever presses, and none related to preparation or memory delay, which constituted 43% of event-related activity in mPFC. Comparison of averaged normalized population activity and population response times confirmed the broad similarity of common response types in MD and mPFC and revealed differences in the onset and offset of some response types. Our results show that MD represents information about actions and outcomes essential for decision-making during dDNMTP, consistent with evidence from lesion studies that MD supports reward-based learning and action-selection. These findings support the hypothesis that MD reinforces task-relevant neural activity in PFC that gives rise to adaptive behavior.
Sensitivity to value-driven attention is predicted by how we learn from value.
Jahfari, Sara; Theeuwes, Jan
2017-04-01
Reward learning is known to influence the automatic capture of attention. This study examined how the rate of learning, after high- or low-value reward outcomes, can influence future transfers into value-driven attentional capture. Participants performed an instrumental learning task that was directly followed by an attentional capture task. A hierarchical Bayesian reinforcement model was used to infer individual differences in learning from high or low reward. Results showed a strong relationship between high-reward learning rates (or the weight that is put on learning after a high reward) and the magnitude of attentional capture with high-reward colors. Individual differences in learning from high or low rewards were further related to performance differences when high- or low-value distractors were present. These findings provide novel insight into the development of value-driven attentional capture by showing how information updating after desired or undesired outcomes can influence future deployments of automatic attention.
Warren, Christopher M.; Holroyd, Clay B.
2012-01-01
We applied the event-related brain potential (ERP) technique to investigate the involvement of two neuromodulatory systems in learning and decision making: The locus coeruleus–norepinephrine system (NE system) and the mesencephalic dopamine system (DA system). We have previously presented evidence that the N2, a negative deflection in the ERP elicited by task-relevant events that begins approximately 200 ms after onset of the eliciting stimulus and that is sensitive to low-probability events, is a manifestation of cortex-wide noradrenergic modulation recruited to facilitate the processing of unexpected stimuli. Further, we hold that the impact of DA reinforcement learning signals on the anterior cingulate cortex (ACC) produces a component of the ERP called the feedback-related negativity (FRN). The N2 and the FRN share a similar time range, a similar topography, and similar antecedent conditions. We varied factors related to the degree of cognitive deliberation across a series of experiments to dissociate these two ERP components. Across four experiments we varied the demand for a deliberative strategy, from passively watching feedback, to more complex/challenging decision tasks. Consistent with our predictions, the FRN was largest in the experiment involving active learning and smallest in the experiment involving passive learning whereas the N2 exhibited the opposite effect. Within each experiment, when subjects attended to color, the N2 was maximal at frontal–central sites, and when they attended to gender it was maximal over lateral-occipital areas, whereas the topology of the FRN was frontal–central in both task conditions. We conclude that both the DA system and the NE system act in concert when learning from rewards that vary in expectedness, but that the DA system is relatively more exercised when subjects are relatively more engaged by the learning task. PMID:22493568
Associability-modulated loss learning is increased in posttraumatic stress disorder
Brown, Vanessa M; Zhu, Lusha; Wang, John M; Frueh, B Christopher
2018-01-01
Disproportionate reactions to unexpected stimuli in the environment are a cardinal symptom of posttraumatic stress disorder (PTSD). Here, we test whether these heightened responses are associated with disruptions in distinct components of reinforcement learning. Specifically, using functional neuroimaging, a loss-learning task, and a computational model-based approach, we assessed the mechanistic hypothesis that overreactions to stimuli in PTSD arise from anomalous gating of attention during learning (i.e., associability). Behavioral choices of combat-deployed veterans with and without PTSD were fit to a reinforcement learning model, generating trial-by-trial prediction errors (signaling unexpected outcomes) and associability values (signaling attention allocation to the unexpected outcomes). Neural substrates of associability value and behavioral parameter estimates of associability updating, but not prediction error, increased with PTSD during loss learning. Moreover, the interaction of PTSD severity with neural markers of associability value predicted behavioral choices. These results indicate that increased attention-based learning may underlie aspects of PTSD and suggest potential neuromechanistic treatment targets. PMID:29313489
Hassani, S. A.; Oemisch, M.; Balcarras, M.; Westendorff, S.; Ardid, S.; van der Meer, M. A.; Tiesinga, P.; Womelsdorf, T.
2017-01-01
Noradrenaline is believed to support cognitive flexibility through the alpha 2A noradrenergic receptor (a2A-NAR) acting in prefrontal cortex. Enhanced flexibility has been inferred from improved working memory with the a2A-NA agonist Guanfacine. But it has been unclear whether Guanfacine improves specific attention and learning mechanisms beyond working memory, and whether the drug effects can be formalized computationally to allow single subject predictions. We tested and confirmed these suggestions in a case study with a healthy nonhuman primate performing a feature-based reversal learning task evaluating performance using Bayesian and Reinforcement learning models. In an initial dose-testing phase we found a Guanfacine dose that increased performance accuracy, decreased distractibility and improved learning. In a second experimental phase using only that dose we examined the faster feature-based reversal learning with Guanfacine with single-subject computational modeling. Parameter estimation suggested that improved learning is not accounted for by varying a single reinforcement learning mechanism, but by changing the set of parameter values to higher learning rates and stronger suppression of non-chosen over chosen feature information. These findings provide an important starting point for developing nonhuman primate models to discern the synaptic mechanisms of attention and learning functions within the context of a computational neuropsychiatry framework. PMID:28091572
Adaptive optimal training of animal behavior
NASA Astrophysics Data System (ADS)
Bak, Ji Hyun; Choi, Jung Yoon; Akrami, Athena; Witten, Ilana; Pillow, Jonathan
Neuroscience experiments often require training animals to perform tasks designed to elicit various sensory, cognitive, and motor behaviors. Training typically involves a series of gradual adjustments of stimulus conditions and rewards in order to bring about learning. However, training protocols are usually hand-designed, and often require weeks or months to achieve a desired level of task performance. Here we combine ideas from reinforcement learning and adaptive optimal experimental design to formulate methods for efficient training of animal behavior. Our work addresses two intriguing problems at once: first, it seeks to infer the learning rules underlying an animal's behavioral changes during training; second, it seeks to exploit these rules to select stimuli that will maximize the rate of learning toward a desired objective. We develop and test these methods using data collected from rats during training on a two-interval sensory discrimination task. We show that we can accurately infer the parameters of a learning algorithm that describes how the animal's internal model of the task evolves over the course of training. We also demonstrate by simulation that our method can provide a substantial speedup over standard training methods.
Time-Extended Policies in Mult-Agent Reinforcement Learning
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Agogino, Adrian K.
2004-01-01
Reinforcement learning methods perform well in many domains where a single agent needs to take a sequence of actions to perform a task. These methods use sequences of single-time-step rewards to create a policy that tries to maximize a time-extended utility, which is a (possibly discounted) sum of these rewards. In this paper we build on our previous work showing how these methods can be extended to a multi-agent environment where each agent creates its own policy that works towards maximizing a time-extended global utility over all agents actions. We show improved methods for creating time-extended utilities for the agents that are both "aligned" with the global utility and "learnable." We then show how to crate single-time-step rewards while avoiding the pi fall of having rewards aligned with the global reward leading to utilities not aligned with the global utility. Finally, we apply these reward functions to the multi-agent Gridworld problem. We explicitly quantify a utility's learnability and alignment, and show that reinforcement learning agents using the prescribed reward functions successfully tradeoff learnability and alignment. As a result they outperform both global (e.g., team games ) and local (e.g., "perfectly learnable" ) reinforcement learning solutions by as much as an order of magnitude.
Modeling Avoidance in Mood and Anxiety Disorders Using Reinforcement Learning.
Mkrtchian, Anahit; Aylward, Jessica; Dayan, Peter; Roiser, Jonathan P; Robinson, Oliver J
2017-10-01
Serious and debilitating symptoms of anxiety are the most common mental health problem worldwide, accounting for around 5% of all adult years lived with disability in the developed world. Avoidance behavior-avoiding social situations for fear of embarrassment, for instance-is a core feature of such anxiety. However, as for many other psychiatric symptoms the biological mechanisms underlying avoidance remain unclear. Reinforcement learning models provide formal and testable characterizations of the mechanisms of decision making; here, we examine avoidance in these terms. A total of 101 healthy participants and individuals with mood and anxiety disorders completed an approach-avoidance go/no-go task under stress induced by threat of unpredictable shock. We show an increased reliance in the mood and anxiety group on a parameter of our reinforcement learning model that characterizes a prepotent (pavlovian) bias to withhold responding in the face of negative outcomes. This was particularly the case when the mood and anxiety group was under stress. This formal description of avoidance within the reinforcement learning framework provides a new means of linking clinical symptoms with biophysically plausible models of neural circuitry and, as such, takes us closer to a mechanistic understanding of mood and anxiety disorders. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Implicit learning in cotton-top tamarins (Saguinus oedipus) and pigeons (Columba livia).
Locurto, Charles; Fox, Maura; Mazzella, Andrea
2015-06-01
There is considerable interest in the conditions under which human subjects learn patterned information without explicit instructions to learn that information. This form of learning, termed implicit or incidental learning, can be approximated in nonhumans by exposing subjects to patterned information but delivering reinforcement randomly, thereby not requiring the subjects to learn the information in order to be reinforced. Following acquisition, nonhuman subjects are queried as to what they have learned about the patterned information. In the present experiment, we extended the study of implicit learning in nonhumans by comparing two species, cotton-top tamarins (Saguinus oedipus) and pigeons (Columba livia), on an implicit learning task that used an artificial grammar to generate the patterned elements for training. We equated the conditions of training and testing as much as possible between the two species. The results indicated that both species demonstrated approximately the same magnitude of implicit learning, judged both by a random test and by choice tests between pairs of training elements. This finding suggests that the ability to extract patterned information from situations in which such learning is not demanded is of longstanding origin.
ERIC Educational Resources Information Center
Sweeney, John
1988-01-01
Punishment given in a caring, supportive environment can assist children to learn some tasks more quickly, when used in conjunction with programmed positive reinforcement. The manner in which a punishment is implemented impacts its effectiveness. Two experiments are presented in which teachers used creative punishment to produce classroom behavior…
A Novel Clustering Method Curbing the Number of States in Reinforcement Learning
NASA Astrophysics Data System (ADS)
Kotani, Naoki; Nunobiki, Masayuki; Taniguchi, Kenji
We propose an efficient state-space construction method for a reinforcement learning. Our method controls the number of categories with improving the clustering method of Fuzzy ART which is an autonomous state-space construction method. The proposed method represents weight vector as the mean value of input vectors in order to curb the number of new categories and eliminates categories whose state values are low to curb the total number of categories. As the state value is updated, the size of category becomes small to learn policy strictly. We verified the effectiveness of the proposed method with simulations of a reaching problem for a two-link robot arm. We confirmed that the number of categories was reduced and the agent achieved the complex task quickly.
Compound Stimulus Presentation Does Not Deepen Extinction in Human Causal Learning
Griffiths, Oren; Holmes, Nathan; Westbrook, R. Fred
2017-01-01
Models of associative learning have proposed that cue-outcome learning critically depends on the degree of prediction error encountered during training. Two experiments examined the role of error-driven extinction learning in a human causal learning task. Target cues underwent extinction in the presence of additional cues, which differed in the degree to which they predicted the outcome, thereby manipulating outcome expectancy and, in the absence of any change in reinforcement, prediction error. These prediction error manipulations have each been shown to modulate extinction learning in aversive conditioning studies. While both manipulations resulted in increased prediction error during training, neither enhanced extinction in the present human learning task (one manipulation resulted in less extinction at test). The results are discussed with reference to the types of associations that are regulated by prediction error, the types of error terms involved in their regulation, and how these interact with parameters involved in training. PMID:28232809
Ketchum, Myles J; Weyand, Theodore G; Weed, Peter F; Winsauer, Peter J
2016-05-01
Learning is believed to be reflected in the activity of the hippocampus. However, neural correlates of learning have been difficult to characterize because hippocampal activity is integrated with ongoing behavior. To address this issue, male rats (n = 5) implanted with electrodes (n = 14) in the CA1 subfield responded during two tasks within a single test session. In one task, subjects acquired a new 3-response sequence (acquisition), whereas in the other task, subjects completed a well-rehearsed 3-response sequence (performance). Both tasks though could be completed using an identical response topography and used the same sensory stimuli and schedule of reinforcement. More important, comparing neural patterns during sequence acquisition to those during sequence performance allows for a subtractive approach whereby activity associated with learning could potentially be dissociated from the activity associated with ongoing behavior. At sites where CA1 activity was closely associated with behavior, the patterns of activity were differentially modulated by key position and the serial position of a response within the schedule of reinforcement. Temporal shifts between peak activity and responding on particular keys also occurred during sequence acquisition, but not during sequence performance. Ethanol disrupted CA1 activity while producing rate-decreasing effects in both tasks and error-increasing effects that were more selective for sequence acquisition than sequence performance. Ethanol also produced alterations in the magnitude of modulations and temporal pattern of CA1 activity, although these effects were not selective for sequence acquisition. Similar to ethanol, hippocampal micro-stimulation decreased response rate in both tasks and selectively increased the percentage of errors during sequence acquisition, and provided a more direct demonstration of hippocampal involvement during sequence acquisition. Together, these results strongly support the notion that ethanol disrupts sequence acquisition by disrupting hippocampal activity and that the hippocampus is necessary for the conditioned associations required for sequence acquisition. © 2015 Wiley Periodicals, Inc.
Ketchum, Myles J.; Weyand, Theodore G.; Weed, Peter F.; Winsauer, Peter J.
2015-01-01
Learning is believed to be reflected in the activity of the hippocampus. However, neural correlates of learning have been difficult to characterize because hippocampal activity is integrated with ongoing behavior. To address this issue, male rats (n=5) implanted with electrodes (n=14) in the CA1 subfield responded during two tasks within a single test session. In one task, subjects acquired a new 3-response sequence (acquisition), whereas in the other task, subjects completed a well-rehearsed 3-response sequence (performance). Both tasks though could be completed using an identical response topography and used the same sensory stimuli and schedule of reinforcement. More important, comparing neural patterns during sequence acquisition to those during sequence performance allows for a subtractive approach whereby activity associated with learning could potentially be dissociated from the activity associated with ongoing behavior. At sites where CA1 activity was closely associated with behavior, the patterns of activity were differentially modulated by key position and the serial position of a response within the schedule of reinforcement. Temporal shifts between peak activity and responding on particular keys also occurred during sequence acquisition, but not during sequence performance. Ethanol disrupted CA1 activity while producing rate-decreasing effects in both tasks and error-increasing effects that were more selective for sequence acquisition than sequence performance. Ethanol also produced alterations in the magnitude of modulations and temporal pattern of CA1 activity, although these effects were not selective for sequence acquisition. Similar to ethanol, hippocampal micro-stimulation decreased response rate in both tasks and selectively increased the percentage of errors during sequence acquisition, and provided a more direct demonstration of hippocampal involvement during sequence acquisition. Together, these results strongly support the notion that ethanol disrupts sequence acquisition by disrupting hippocampal activity and that the hippocampus is necessary for the conditioned associations required for sequence acquisition. PMID:26482846
Neural mechanisms of cue-approach training
Bakkour, Akram; Lewis-Peacock, Jarrod A.; Poldrack, Russell A.; Schonberg, Tom
2016-01-01
Biasing choices may prove a useful way to implement behavior change. Previous work has shown that a simple training task (the cue-approach task), which does not rely on external reinforcement, can robustly influence choice behavior by biasing choice toward items that were targeted during training. In the current study, we replicate previous behavioral findings and explore the neural mechanisms underlying the shift in preferences following cue-approach training. Given recent successes in the development and application of machine learning techniques to task-based fMRI data, which have advanced understanding of the neural substrates of cognition, we sought to leverage the power of these techniques to better understand neural changes during cue-approach training that subsequently led to a shift in choice behavior. Contrary to our expectations, we found that machine learning techniques applied to fMRI data during non-reinforced training were unsuccessful in elucidating the neural mechanism underlying the behavioral effect. However, univariate analyses during training revealed that the relationship between BOLD and choices for Go items increases as training progresses compared to choices of NoGo items primarily in lateral prefrontal cortical areas. This new imaging finding suggests that preferences are shifted via differential engagement of task control networks that interact with value networks during cue-approach training. PMID:27677231
Dunne, Simon; D'Souza, Arun; O'Doherty, John P
2016-06-01
A major open question is whether computational strategies thought to be used during experiential learning, specifically model-based and model-free reinforcement learning, also support observational learning. Furthermore, the question of how observational learning occurs when observers must learn about the value of options from observing outcomes in the absence of choice has not been addressed. In the present study we used a multi-armed bandit task that encouraged human participants to employ both experiential and observational learning while they underwent functional magnetic resonance imaging (fMRI). We found evidence for the presence of model-based learning signals during both observational and experiential learning in the intraparietal sulcus. However, unlike during experiential learning, model-free learning signals in the ventral striatum were not detectable during this form of observational learning. These results provide insight into the flexibility of the model-based learning system, implicating this system in learning during observation as well as from direct experience, and further suggest that the model-free reinforcement learning system may be less flexible with regard to its involvement in observational learning. Copyright © 2016 the American Physiological Society.
Iigaya, Kiyohito; Fonseca, Madalena S; Murakami, Masayoshi; Mainen, Zachary F; Dayan, Peter
2018-06-26
Serotonin has widespread, but computationally obscure, modulatory effects on learning and cognition. Here, we studied the impact of optogenetic stimulation of dorsal raphe serotonin neurons in mice performing a non-stationary, reward-driven decision-making task. Animals showed two distinct choice strategies. Choices after short inter-trial-intervals (ITIs) depended only on the last trial outcome and followed a win-stay-lose-switch pattern. In contrast, choices after long ITIs reflected outcome history over multiple trials, as described by reinforcement learning models. We found that optogenetic stimulation during a trial significantly boosted the rate of learning that occurred due to the outcome of that trial, but these effects were only exhibited on choices after long ITIs. This suggests that serotonin neurons modulate reinforcement learning rates, and that this influence is masked by alternate, unaffected, decision mechanisms. These results provide insight into the role of serotonin in treating psychiatric disorders, particularly its modulation of neural plasticity and learning.
Goal-Directed and Habit-Like Modulations of Stimulus Processing during Reinforcement Learning.
Luque, David; Beesley, Tom; Morris, Richard W; Jack, Bradley N; Griffiths, Oren; Whitford, Thomas J; Le Pelley, Mike E
2017-03-15
Recent research has shown that perceptual processing of stimuli previously associated with high-value rewards is automatically prioritized even when rewards are no longer available. It has been hypothesized that such reward-related modulation of stimulus salience is conceptually similar to an "attentional habit." Recording event-related potentials in humans during a reinforcement learning task, we show strong evidence in favor of this hypothesis. Resistance to outcome devaluation (the defining feature of a habit) was shown by the stimulus-locked P1 component, reflecting activity in the extrastriate visual cortex. Analysis at longer latencies revealed a positive component (corresponding to the P3b, from 550-700 ms) sensitive to outcome devaluation. Therefore, distinct spatiotemporal patterns of brain activity were observed corresponding to habitual and goal-directed processes. These results demonstrate that reinforcement learning engages both attentional habits and goal-directed processes in parallel. Consequences for brain and computational models of reinforcement learning are discussed. SIGNIFICANCE STATEMENT The human attentional network adapts to detect stimuli that predict important rewards. A recent hypothesis suggests that the visual cortex automatically prioritizes reward-related stimuli, driven by cached representations of reward value; that is, stimulus-response habits. Alternatively, the neural system may track the current value of the predicted outcome. Our results demonstrate for the first time that visual cortex activity is increased for reward-related stimuli even when the rewarding event is temporarily devalued. In contrast, longer-latency brain activity was specifically sensitive to transient changes in reward value. Therefore, we show that both habit-like attention and goal-directed processes occur in the same learning episode at different latencies. This result has important consequences for computational models of reinforcement learning. Copyright © 2017 the authors 0270-6474/17/373009-09$15.00/0.
Chalmers, Eric; Luczak, Artur; Gruber, Aaron J.
2016-01-01
The mammalian brain is thought to use a version of Model-based Reinforcement Learning (MBRL) to guide “goal-directed” behavior, wherein animals consider goals and make plans to acquire desired outcomes. However, conventional MBRL algorithms do not fully explain animals' ability to rapidly adapt to environmental changes, or learn multiple complex tasks. They also require extensive computation, suggesting that goal-directed behavior is cognitively expensive. We propose here that key features of processing in the hippocampus support a flexible MBRL mechanism for spatial navigation that is computationally efficient and can adapt quickly to change. We investigate this idea by implementing a computational MBRL framework that incorporates features inspired by computational properties of the hippocampus: a hierarchical representation of space, “forward sweeps” through future spatial trajectories, and context-driven remapping of place cells. We find that a hierarchical abstraction of space greatly reduces the computational load (mental effort) required for adaptation to changing environmental conditions, and allows efficient scaling to large problems. It also allows abstract knowledge gained at high levels to guide adaptation to new obstacles. Moreover, a context-driven remapping mechanism allows learning and memory of multiple tasks. Simulating dorsal or ventral hippocampal lesions in our computational framework qualitatively reproduces behavioral deficits observed in rodents with analogous lesions. The framework may thus embody key features of how the brain organizes model-based RL to efficiently solve navigation and other difficult tasks. PMID:28018203
Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.
Uchibe, Eiji; Doya, Kenji
2008-12-01
Understanding the design principle of reward functions is a substantial challenge both in artificial intelligence and neuroscience. Successful acquisition of a task usually requires not only rewards for goals, but also for intermediate states to promote effective exploration. This paper proposes a method for designing 'intrinsic' rewards of autonomous agents by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use Cyber Rodent robots, in which collision avoidance, recharging from battery packs, and 'mating' by software reproduction are three major 'extrinsic' rewards. We show in hardware experiments that the robots can find appropriate 'intrinsic' rewards for the vision of battery packs and other robots to promote approach behaviors.
Multi-Objective Reinforcement Learning for Cognitive Radio-Based Satellite Communications
NASA Technical Reports Server (NTRS)
Ferreira, Paulo Victor R.; Paffenroth, Randy; Wyglinski, Alexander M.; Hackett, Timothy M.; Bilen, Sven G.; Reinhart, Richard C.; Mortensen, Dale J.
2016-01-01
Previous research on cognitive radios has addressed the performance of various machine-learning and optimization techniques for decision making of terrestrial link properties. In this paper, we present our recent investigations with respect to reinforcement learning that potentially can be employed by future cognitive radios installed onboard satellite communications systems specifically tasked with radio resource management. This work analyzes the performance of learning, reasoning, and decision making while considering multiple objectives for time-varying communications channels, as well as different cross-layer requirements. Based on the urgent demand for increased bandwidth, which is being addressed by the next generation of high-throughput satellites, the performance of cognitive radio is assessed considering links between a geostationary satellite and a fixed ground station operating at Ka-band (26 GHz). Simulation results show multiple objective performance improvements of more than 3.5 times for clear sky conditions and 6.8 times for rain conditions.
Multi-Objective Reinforcement Learning for Cognitive Radio Based Satellite Communications
NASA Technical Reports Server (NTRS)
Ferreira, Paulo; Paffenroth, Randy; Wyglinski, Alexander; Hackett, Timothy; Bilen, Sven; Reinhart, Richard; Mortensen, Dale John
2016-01-01
Previous research on cognitive radios has addressed the performance of various machine learning and optimization techniques for decision making of terrestrial link properties. In this paper, we present our recent investigations with respect to reinforcement learning that potentially can be employed by future cognitive radios installed onboard satellite communications systems specifically tasked with radio resource management. This work analyzes the performance of learning, reasoning, and decision making while considering multiple objectives for time-varying communications channels, as well as different crosslayer requirements. Based on the urgent demand for increased bandwidth, which is being addressed by the next generation of high-throughput satellites, the performance of cognitive radio is assessed considering links between a geostationary satellite and a fixed ground station operating at Ka-band (26 GHz). Simulation results show multiple objective performance improvements of more than 3:5 times for clear sky conditions and 6:8 times for rain conditions.
van den Bos, Wouter; Cohen, Michael X; Kahnt, Thorsten; Crone, Eveline A
2012-06-01
During development, children improve in learning from feedback to adapt their behavior. However, it is still unclear which neural mechanisms might underlie these developmental changes. In the current study, we used a reinforcement learning model to investigate neurodevelopmental changes in the representation and processing of learning signals. Sixty-seven healthy volunteers between ages 8 and 22 (children: 8-11 years, adolescents: 13-16 years, and adults: 18-22 years) performed a probabilistic learning task while in a magnetic resonance imaging scanner. The behavioral data demonstrated age differences in learning parameters with a stronger impact of negative feedback on expected value in children. Imaging data revealed that the neural representation of prediction errors was similar across age groups, but functional connectivity between the ventral striatum and the medial prefrontal cortex changed as a function of age. Furthermore, the connectivity strength predicted the tendency to alter expectations after receiving negative feedback. These findings suggest that the underlying mechanisms of developmental changes in learning are not related to differences in the neural representation of learning signals per se but rather in how learning signals are used to guide behavior and expectations.
Moustafa, Ahmed A; Gluck, Mark A; Herzallah, Mohammad M; Myers, Catherine E
2015-01-01
Previous research has shown that trial ordering affects cognitive performance, but this has not been tested using category-learning tasks that differentiate learning from reward and punishment. Here, we tested two groups of healthy young adults using a probabilistic category learning task of reward and punishment in which there are two types of trials (reward, punishment) and three possible outcomes: (1) positive feedback for correct responses in reward trials; (2) negative feedback for incorrect responses in punishment trials; and (3) no feedback for incorrect answers in reward trials and correct answers in punishment trials. Hence, trials without feedback are ambiguous, and may represent either successful avoidance of punishment or failure to obtain reward. In Experiment 1, the first group of subjects received an intermixed task in which reward and punishment trials were presented in the same block, as a standard baseline task. In Experiment 2, a second group completed the separated task, in which reward and punishment trials were presented in separate blocks. Additionally, in order to understand the mechanisms underlying performance in the experimental conditions, we fit individual data using a Q-learning model. Results from Experiment 1 show that subjects who completed the intermixed task paradoxically valued the no-feedback outcome as a reinforcer when it occurred on reinforcement-based trials, and as a punisher when it occurred on punishment-based trials. This is supported by patterns of empirical responding, where subjects showed more win-stay behavior following an explicit reward than following an omission of punishment, and more lose-shift behavior following an explicit punisher than following an omission of reward. In Experiment 2, results showed similar performance whether subjects received reward-based or punishment-based trials first. However, when the Q-learning model was applied to these data, there were differences between subjects in the reward-first and punishment-first conditions on the relative weighting of neutral feedback. Specifically, early training on reward-based trials led to omission of reward being treated as similar to punishment, but prior training on punishment-based trials led to omission of reward being treated more neutrally. This suggests that early training on one type of trials, specifically reward-based trials, can create a bias in how neutral feedback is processed, relative to those receiving early punishment-based training or training that mixes positive and negative outcomes.
Machine learning in cardiovascular medicine: are we there yet?
Shameer, Khader; Johnson, Kipp W; Glicksberg, Benjamin S; Dudley, Joel T; Sengupta, Partho P
2018-01-19
Artificial intelligence (AI) broadly refers to analytical algorithms that iteratively learn from data, allowing computers to find hidden insights without being explicitly programmed where to look. These include a family of operations encompassing several terms like machine learning, cognitive learning, deep learning and reinforcement learning-based methods that can be used to integrate and interpret complex biomedical and healthcare data in scenarios where traditional statistical methods may not be able to perform. In this review article, we discuss the basics of machine learning algorithms and what potential data sources exist; evaluate the need for machine learning; and examine the potential limitations and challenges of implementing machine in the context of cardiovascular medicine. The most promising avenues for AI in medicine are the development of automated risk prediction algorithms which can be used to guide clinical care; use of unsupervised learning techniques to more precisely phenotype complex disease; and the implementation of reinforcement learning algorithms to intelligently augment healthcare providers. The utility of a machine learning-based predictive model will depend on factors including data heterogeneity, data depth, data breadth, nature of modelling task, choice of machine learning and feature selection algorithms, and orthogonal evidence. A critical understanding of the strength and limitations of various methods and tasks amenable to machine learning is vital. By leveraging the growing corpus of big data in medicine, we detail pathways by which machine learning may facilitate optimal development of patient-specific models for improving diagnoses, intervention and outcome in cardiovascular medicine. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Assessing the Effects of Momentary Priming on Memory Retention During an Interference Task
NASA Technical Reports Server (NTRS)
Schutte, Paul C.
2007-01-01
A memory aid, that used brief (33ms) presentations of previously learned information (target words), was assessed on its ability to reinforce memory for target words while the subject was performing an interference task. The interference task required subjects to learn new words and thus interfered with their memory of the target words. The brief presentation (momentary memory priming) was hypothesized to refresh the subjects memory of the target words. 143 subjects, in a within subject design, were given a 33ms presentation of the target memory words during the interference task in a treatment condition and a blank 33ms presentation in the control condition. The primary dependent measure, memory loss over the interference trial, was not significantly different between the two conditions. The memory prime did not appear to hinder the subjects performance on the interference task. This paper describes the experiment and the results along with suggestions for future research.
Task Demands in OSCEs Influence Learning Strategies.
Lafleur, Alexandre; Laflamme, Jonathan; Leppink, Jimmie; Côté, Luc
2017-01-01
Models on pre-assessment learning effects confirmed that task demands stand out among the factors assessors can modify in an assessment to influence learning. However, little is known about which tasks in objective structured clinical examinations (OSCEs) improve students' cognitive and metacognitive processes. Research is needed to support OSCE designs that benefit students' metacognitive strategies when they are studying, reinforcing a hypothesis-driven approach. With that intent, hypothesis-driven physical examination (HDPE) assessments ask students to elicit and interpret findings of the physical exam to reach a diagnosis ("Examine this patient with a painful shoulder to reach a diagnosis"). When studying for HDPE, students will dedicate more time to hypothesis-driven discussions and practice than when studying for a part-task OSCE ("Perform the shoulder exam"). It is expected that the whole-task nature of HDPE will lead to a hypothesis-oriented use of the learning resources, a frequent use of adjustment strategies, and persistence with learning. In a mixed-methods study, 40 medical students were randomly paired and filmed while studying together for two hypothetical OSCE stations. Each 25-min study period began with video cues asking to study for either a part-task OSCE or an HDPE. In a crossover design, sequences were randomized for OSCEs and contents (shoulder or spine). Time-on-task for discussions or practice were categorized as "hypothesis-driven" or "sequence of signs and maneuvers." Content analysis of focus group interviews summarized students' perception of learning resources, adjustment strategies, and persistence with learning. When studying for HDPE, students allocate significantly more time for hypothesis-driven discussions and practice. Students use resources contrasting diagnoses and report persistence with learning. When studying for part-task OSCEs, time-on-task is reversed, spent on rehearsing a sequence of signs and maneuvers. OSCEs with similar contents but different task demands lead to opposite learning strategies regarding how students manage their study time. Measuring pre-assessment effects from a metacognitive perspective provides empirical evidence to redesign assessments for learning.
Somatosensory Contribution to the Initial Stages of Human Motor Learning
Bernardi, Nicolò F.; Darainy, Mohammad
2015-01-01
The early stages of motor skill acquisition are often marked by uncertainty about the sensory and motor goals of the task, as is the case in learning to speak or learning the feel of a good tennis serve. Here we present an experimental model of this early learning process, in which targets are acquired by exploration and reinforcement rather than sensory error. We use this model to investigate the relative contribution of motor and sensory factors to human motor learning. Participants make active reaching movements or matched passive movements to an unseen target using a robot arm. We find that learning through passive movements paired with reinforcement is comparable with learning associated with active movement, both in terms of magnitude and durability, with improvements due to training still observable at a 1 week retest. Motor learning is also accompanied by changes in somatosensory perceptual acuity. No stable changes in motor performance are observed for participants that train, actively or passively, in the absence of reinforcement, or for participants who are given explicit information about target position in the absence of somatosensory experience. These findings indicate that the somatosensory system dominates learning in the early stages of motor skill acquisition. SIGNIFICANCE STATEMENT The research focuses on the initial stages of human motor learning, introducing a new experimental model that closely approximates the key features of motor learning outside of the laboratory. The finding indicates that it is the somatosensory system rather than the motor system that dominates learning in the early stages of motor skill acquisition. This is important given that most of our computational models of motor learning are based on the idea that learning is motoric in origin. This is also a valuable finding for rehabilitation of patients with limited mobility as it shows that reinforcement in conjunction with passive movement results in benefits to motor learning that are as great as those observed for active movement training. PMID:26490869
NASA Astrophysics Data System (ADS)
Thomaz, Andrea; Breazeal, Cynthia
2008-06-01
We present a learning system, socially guided exploration, in which a social robot learns new tasks through a combination of self-exploration and social interaction. The system's motivational drives, along with social scaffolding from a human partner, bias behaviour to create learning opportunities for a hierarchical reinforcement learning mechanism. The robot is able to learn on its own, but can flexibly take advantage of the guidance of a human teacher. We report the results of an experiment that analyses what the robot learns on its own as compared to being taught by human subjects. We also analyse the video of these interactions to understand human teaching behaviour and the social dynamics of the human-teacher/robot-learner system. With respect to learning performance, human guidance results in a task set that is significantly more focused and efficient at the tasks the human was trying to teach, whereas self-exploration results in a more diverse set. Analysis of human teaching behaviour reveals insights of social coupling between the human teacher and robot learner, different teaching styles, strong consistency in the kinds and frequency of scaffolding acts across teachers and nuances in the communicative intent behind positive and negative feedback.
Butler, Kevin; Rusted, Jennifer; Gard, Paul; Jackson, Anne
2017-05-01
Impaired monitoring of errors and conflict (performance monitoring; PM) is well documented in substance dependence (SD) including nicotine dependence and may contribute to continued drug use. Contemporary models of PM and complementary behavioural evidence suggest that PM works by integrating recent reinforcement history rather than evaluating individual behaviours. Despite this, studies of PM in SD have typically used indices derived from reaction to task error or conflict on individual trials. Consequently impaired integration of reinforcement history during action selection tasks requiring behavioural control in SD populations has been underexplored. A reinforcement learning task assessed the ability of abstinent, satiated, former and never smokers (N=60) to integrate recent reinforcement history alongside a more typical behavioural index of PM reflecting the degree of reaction time slowing following an error (post-punishment slowing; PPS). On both indices there was a consistent pattern in PM data: Former smokers had the greatest and satiated smokers the poorest PM. Specifically satiated smokers had poorer reinforcement integration than former (p=0.005) and never smokers (p=0.041) and had less post-punishment slowing than former (p<0.001), never (p=0.003) and abstinent smokers (p=0.026). These are the first data examining the effects of smoking status on PM that use an integration of reinforcement history metric. The concordance of the reinforcement integration and PPS data suggest that this could be a promising method to interrogate PM in future studies. PM is influenced by smoking status. As PM is associated with adapting behaviour, poor PM in satiated smokers may contribute towards continued smoking despite negative consequences. Former smokers show elevated PM suggesting this may be a good relapse prevention target for individuals struggling to remain abstinent however prospective and intervention studies are needed. A better understanding of PM deficits in terms of reinforcement integration failure may stimulate development of novel treatment approaches. Copyright © 2017 Elsevier Inc. All rights reserved.
Striatal dysfunction during reversal learning in unmedicated schizophrenia patients☆
Schlagenhauf, Florian; Huys, Quentin J.M.; Deserno, Lorenz; Rapp, Michael A.; Beck, Anne; Heinze, Hans-Joachim; Dolan, Ray; Heinz, Andreas
2014-01-01
Subjects with schizophrenia are impaired at reinforcement-driven reversal learning from as early as their first episode. The neurobiological basis of this deficit is unknown. We obtained behavioral and fMRI data in 24 unmedicated, primarily first episode, schizophrenia patients and 24 age-, IQ- and gender-matched healthy controls during a reversal learning task. We supplemented our fMRI analysis, focusing on learning from prediction errors, with detailed computational modeling to probe task solving strategy including an ability to deploy an internal goal directed model of the task. Patients displayed reduced functional activation in the ventral striatum (VS) elicited by prediction errors. However, modeling task performance revealed that a subgroup did not adjust their behavior according to an accurate internal model of the task structure, and these were also the more severely psychotic patients. In patients who could adapt their behavior, as well as in controls, task solving was best described by cognitive strategies according to a Hidden Markov Model. When we compared patients and controls who acted according to this strategy, patients still displayed a significant reduction in VS activation elicited by informative errors that precede salient changes of behavior (reversals). Thus, our study shows that VS dysfunction in schizophrenia patients during reward-related reversal learning remains a core deficit even when controlling for task solving strategies. This result highlights VS dysfunction is tightly linked to a reward-related reversal learning deficit in early, unmedicated schizophrenia patients. PMID:24291614
Stimulus function in simultaneous discrimination1
Biederman, Gerald B.
1968-01-01
In discrimination learning, the negativity of the stimulus correlated with nonreinforcement (S−) declines after 100 training trials while the stimulus correlated with reinforcement (S+) is paradoxically more positive with lesser amounts of discrimination training. Training subjects on two simultaneous discrimination tasks revealed a within-subjects overlearning reversal effect, where a more-frequently presented discrimination problem was better learned in reversal than was a discrimination problem presented less frequently during training. PMID:5672254
Learing and Memory Enhancement by Neuropeptides
1989-09-27
autoshaping task, in which rats learn to touch a lever to obtain food. Substantial progress towards the goals of the project were made, especially...impairment of "delayed reinforcement autoshaped behavior caused by low doses of trimethyltin, Psychxoharmacology 93(1987): 301-307. S.B. Sparber, C.A...impairs autoshaped learning. Behavioral and Neural Biolog 51 (1989)34-45. The following papers have been submitted for publication: b -~iL101 Rita B
Nemirovsky, Sergio I; Avale, M Elena; Brunner, Daniela; Rubinstein, Marcelo
2009-11-01
The dopamine D4 receptor (D4R) is predominantly expressed in the prefrontal cortex, a brain area that integrates motor, rewarding, and cognitive information. Because participation of D4Rs in executive learning is largely unknown, we challenged D4R knockout mice (Drd4(-/-)) and their wild-type (WT) littermates, neonatally treated with 6-hydroxydopamine (6-OHDA; icv) or vehicle in two operant learning paradigms. A continuous reinforcement task, in which one food-pellet was delivered after every lever press, showed that 6-OHDA-treated mice (hypodopaminergic) WT mice pressed the reinforcing lever at much lower rates than normodopaminergic WT mice. In contrast, Drd4(-/-) mice displayed increased lever pressing rates, regardless of their dopamine content. In another study, mice were trained to solve an operant two-choice task in which a first showing lever was coupled to the delivery of one food pellet only after a second lever emerged. Interval between presentation of both levers was initially 12 s and progressively shortened to 6, 2, and finally 0.5 s. Normodopaminergic WT mice obtained a pellet reward in more than 75% of the trials at 12, 6, and 2 s, whereas hypodopaminergic WT mice were severely impaired to select the reward-paired lever. Absence of D4Rs was not detrimental in this task. Moreover, hypodopaminergic Drd4(-/-) mice were as efficient as their normodopaminergic Drd4(-/-) siblings in selecting the reward-paired lever. In summary, hypodopaminergic mice exhibit severe impairments to retrieve rewards in two operant positive reinforcement tasks, but these deleterious effects are totally prevented in the absence of functional D4Rs.
Silvetti, Massimo; Wiersema, Jan R; Sonuga-Barke, Edmund; Verguts, Tom
2013-10-01
Attention Deficit/Hyperactivity Disorder (ADHD) is a pathophysiologically complex and heterogeneous condition with both cognitive and motivational components. We propose a novel computational hypothesis of motivational deficits in ADHD, drawing together recent evidence on the role of anterior cingulate cortex (ACC) and associated mesolimbic dopamine circuits in both reinforcement learning and ADHD. Based on findings of dopamine dysregulation and ACC involvement in ADHD we simulated a lesion in a previously validated computational model of ACC (Reward Value and Prediction Model, RVPM). We explored the effects of the lesion on the processing of reinforcement signals. We tested specific behavioral predictions about the profile of reinforcement-related deficits in ADHD in three experimental contexts; probability tracking task, partial and continuous reward schedules, and immediate versus delayed rewards. In addition, predictions were made at the neurophysiological level. Behavioral and neurophysiological predictions from the RVPM-based lesion-model of motivational dysfunction in ADHD were confirmed by data from previously published studies. RVPM represents a promising model of ADHD reinforcement learning suggesting that ACC dysregulation might play a role in the pathogenesis of motivational deficits in ADHD. However, more behavioral and neurophysiological studies are required to test core predictions of the model. In addition, the interaction with different brain networks underpinning other aspects of ADHD neuropathology (i.e., executive function) needs to be better understood. Copyright © 2013 Elsevier Ltd. All rights reserved.
Moon, J; Ota, K T; Driscoll, L L; Levitsky, D A; Strupp, B J
2008-07-01
This study was designed to further assess cognitive and affective functioning in a mouse model of Fragile X syndrome (FXS), the Fmr1(tm1Cgr) or Fmr1 "knockout" (KO) mouse. Male KO mice and wild-type littermate controls were tested on learning set and reversal learning tasks. The KO mice were not impaired in associative learning, transfer of learning, or reversal learning, based on measures of learning rate. Analyses of videotapes of the reversal learning task revealed that both groups of mice exhibited higher levels of activity and wall-climbing during the initial sessions of the task than during the final sessions, a pattern also seen for trials following an error relative to those following a correct response. Notably, the increase in both behavioral measures seen early in the task was significantly more pronounced for the KO mice than for controls, as was the error-induced increase in activity level. This pattern of effects suggests that the KO mice reacted more strongly than controls to the reversal of contingencies and pronounced drop in reinforcement rate, and to errors in general. This pattern of effects is consistent with the heightened emotional reactivity frequently described for humans with FXS. (c) 2008 Wiley Periodicals, Inc.
Machado, Armando; Arantes, Joana
2006-06-01
To contrast two models of timing, Scalar Expectancy Theory (SET) and Learning to Time (LeT), pigeons were exposed to a double temporal bisection procedure. On half of the trials, they learned to choose a red key after a 1s signal and a green key after a 4s signal; on the other half of the trials, they learned to choose a blue key after a 4-s signal and a yellow key after a 16-s signal. This was Phase A of an ABA design. On Phase B, the pigeons were divided into two groups and exposed to a new bisection task in which the signals ranged from 1 to 16s and the choice keys were blue and green. One group was reinforced for choosing blue after 1-s signals and green after 16-s signals and the other group was reinforced for the opposite mapping (green after 1-s signals and blue after 16-s signals). Whereas SET predicted no differences between the groups, LeT predicted that the former group would learn the new discrimination faster than the latter group. The results were consistent with LeT. Finally, the pigeons returned to Phase A. Only LeT made specific predictions regarding the reacquisition of the four temporal discriminations. These predictions were only partly consistent with the results.
Dopaminergic Contributions to Vocal Learning
Hoffmann, Lukas A.; Saravanan, Varun; Wood, Alynda N.; He, Li
2016-01-01
Although the brain relies on auditory information to calibrate vocal behavior, the neural substrates of vocal learning remain unclear. Here we demonstrate that lesions of the dopaminergic inputs to a basal ganglia nucleus in a songbird species (Bengalese finches, Lonchura striata var. domestica) greatly reduced the magnitude of vocal learning driven by disruptive auditory feedback in a negative reinforcement task. These lesions produced no measureable effects on the quality of vocal performance or the amount of song produced. Our results suggest that dopaminergic inputs to the basal ganglia selectively mediate reinforcement-driven vocal plasticity. In contrast, dopaminergic lesions produced no measurable effects on the birds' ability to restore song acoustics to baseline following the cessation of reinforcement training, suggesting that different forms of vocal plasticity may use different neural mechanisms. SIGNIFICANCE STATEMENT During skill learning, the brain relies on sensory feedback to improve motor performance. However, the neural basis of sensorimotor learning is poorly understood. Here, we investigate the role of the neurotransmitter dopamine in regulating vocal learning in the Bengalese finch, a songbird with an extremely precise singing behavior that can nevertheless be reshaped dramatically by auditory feedback. Our findings show that reduction of dopamine inputs to a region of the songbird basal ganglia greatly impairs vocal learning but has no detectable effect on vocal performance. These results suggest a specific role for dopamine in regulating vocal plasticity. PMID:26888928
Frontal Theta Reflects Uncertainty and Unexpectedness during Exploration and Exploitation
Figueroa, Christina M.; Cohen, Michael X; Frank, Michael J.
2012-01-01
In order to understand the exploitation/exploration trade-off in reinforcement learning, previous theoretical and empirical accounts have suggested that increased uncertainty may precede the decision to explore an alternative option. To date, the neural mechanisms that support the strategic application of uncertainty-driven exploration remain underspecified. In this study, electroencephalography (EEG) was used to assess trial-to-trial dynamics relevant to exploration and exploitation. Theta-band activities over middle and lateral frontal areas have previously been implicated in EEG studies of reinforcement learning and strategic control. It was hypothesized that these areas may interact during top-down strategic behavioral control involved in exploratory choices. Here, we used a dynamic reward–learning task and an associated mathematical model that predicted individual response times. This reinforcement-learning model generated value-based prediction errors and trial-by-trial estimates of exploration as a function of uncertainty. Mid-frontal theta power correlated with unsigned prediction error, although negative prediction errors had greater power overall. Trial-to-trial variations in response-locked frontal theta were linearly related to relative uncertainty and were larger in individuals who used uncertainty to guide exploration. This finding suggests that theta-band activities reflect prefrontal-directed strategic control during exploratory choices. PMID:22120491
Intrinsic motivation and learning in a schizophrenia spectrum sample.
Choi, Jimmy; Medalia, Alice
2010-05-01
A motivation is a telling hallmark of negative symptomatology in schizophrenia, and it impacts nearly every facet of behavior, including inclination to attempt the difficult cognitive tasks involved in cognitive remediation therapy. Experiences of external reward, reinforcement, and hedonic anticipatory enjoyment are diminished in psychosis, so therapeutics which instead target intrinsic motivation for cognitive tasks may enhance task engagement, and subsequently, remediation outcome. We examined whether outpatients could attain benefits from an intrinsically motivating instructional approach which (a) presents learning materials in a meaningful game-like context, (b) personalizes elements of the learning materials into themes of high interest value, and (c) offers choices so patients can increase their control over the learning process. We directly compared one learning method that incorporated the motivational paradigm into an arithmetic learning program against another method that carefully manipulated out the motivational variables in the same learning program. Fifty-seven subjects with schizophrenia or schizoaffective disorder were randomly assigned to one of the two learning programs for 10 thirty-minute sessions while an intent-to-treat convenience subsample (n=15) was used to account for practice effect. Outcome measures were arithmetic learning, attention, motivation, self competency, and symptom severity. Results showed the motivational group (a) acquired more arithmetic skill, (b) possessed greater intrinsic motivation for the task, (c) reported greater feelings of self competency post-treatment, and (d) demonstrated better post-test attention. Interestingly, baseline perception of self competency was a significant predictor of post-test arithmetic scores. Results demonstrated that incorporating intrinsically motivating instructional techniques into a difficult cognitive task promoted greater learning of the material, higher levels of intrinsic motivation to attempt the demanding task, and greater feelings of self efficacy and achievement to learn. Copyright (c) 2009 Elsevier B.V. All rights reserved.
Witt, Karsten; Daniels, Christine; Daniel, Victoria; Schmitt-Eliassen, Julia; Volkmann, Jens; Deuschl, Günther
2006-01-01
Implicit memory and learning mechanisms are composed of multiple processes and systems. Previous studies demonstrated a basal ganglia involvement in purely cognitive tasks that form stimulus response habits by reinforcement learning such as implicit classification learning. We will test the basal ganglia influence on two cognitive implicit tasks previously described by Berry and Broadbent, the sugar production task and the personal interaction task. Furthermore, we will investigate the relationship between certain aspects of an executive dysfunction and implicit learning. To this end, we have tested 22 Parkinsonian patients and 22 age-matched controls on two implicit cognitive tasks, in which participants learned to control a complex system. They interacted with the system by choosing an input value and obtaining an output that was related in a complex manner to the input. The objective was to reach and maintain a specific target value across trials (dynamic system learning). The two tasks followed the same underlying complex rule but had different surface appearances. Subsequently, participants performed an executive test battery including the Stroop test, verbal fluency and the Wisconsin card sorting test (WCST). The results demonstrate intact implicit learning in patients, despite an executive dysfunction in the Parkinsonian group. They lead to the conclusion that the basal ganglia system affected in Parkinson's disease does not contribute to the implicit acquisition of a new cognitive skill. Furthermore, the Parkinsonian patients were able to reach a specific goal in an implicit learning context despite impaired goal directed behaviour in the WCST, a classic test of executive functions. These results demonstrate a functional independence of implicit cognitive skill learning and certain aspects of executive functions.
Optimizing microstimulation using a reinforcement learning framework.
Brockmeier, Austin J; Choi, John S; Distasio, Marcello M; Francis, Joseph T; Príncipe, José C
2011-01-01
The ability to provide sensory feedback is desired to enhance the functionality of neuroprosthetics. Somatosensory feedback provides closed-loop control to the motor system, which is lacking in feedforward neuroprosthetics. In the case of existing somatosensory function, a template of the natural response can be used as a template of desired response elicited by electrical microstimulation. In the case of no initial training data, microstimulation parameters that produce responses close to the template must be selected in an online manner. We propose using reinforcement learning as a framework to balance the exploration of the parameter space and the continued selection of promising parameters for further stimulation. This approach avoids an explicit model of the neural response from stimulation. We explore a preliminary architecture--treating the task as a k-armed bandit--using offline data recorded for natural touch and thalamic microstimulation, and we examine the methods efficiency in exploring the parameter space while concentrating on promising parameter forms. The best matching stimulation parameters, from k = 68 different forms, are selected by the reinforcement learning algorithm consistently after 334 realizations.
Distributed reinforcement learning for adaptive and robust network intrusion response
NASA Astrophysics Data System (ADS)
Malialis, Kleanthis; Devlin, Sam; Kudenko, Daniel
2015-07-01
Distributed denial of service (DDoS) attacks constitute a rapidly evolving threat in the current Internet. Multiagent Router Throttling is a novel approach to defend against DDoS attacks where multiple reinforcement learning agents are installed on a set of routers and learn to rate-limit or throttle traffic towards a victim server. The focus of this paper is on online learning and scalability. We propose an approach that incorporates task decomposition, team rewards and a form of reward shaping called difference rewards. One of the novel characteristics of the proposed system is that it provides a decentralised coordinated response to the DDoS problem, thus being resilient to DDoS attacks themselves. The proposed system learns remarkably fast, thus being suitable for online learning. Furthermore, its scalability is successfully demonstrated in experiments involving 1000 learning agents. We compare our approach against a baseline and a popular state-of-the-art throttling technique from the network security literature and show that the proposed approach is more effective, adaptive to sophisticated attack rate dynamics and robust to agent failures.
The Effects of Non-Contingent Reinforcement on Children.
ERIC Educational Resources Information Center
Tramill, James L.; Kleinhammer, P. Jeannie
Typical learned helplessness research has involved the presentation of non-contingent, aversive events followed by measures of performance on subsequent tasks; recent investigations have focused on the effect of non-contingent rewards. To examine the effects of non-contingent rewards on children, two studies were conducted, in which children were…
Strauss, Gregory P; Thaler, Nicholas S; Matveeva, Tatyana M; Vogel, Sally J; Sutton, Griffin P; Lee, Bern G; Allen, Daniel N
2015-08-01
There is increasing evidence that schizophrenia (SZ) and bipolar disorder (BD) share a number of cognitive, neurobiological, and genetic markers. Shared features may be most prevalent among SZ and BD with a history of psychosis. This study extended this literature by examining reinforcement learning (RL) performance in individuals with SZ (n = 29), BD with a history of psychosis (BD+; n = 24), BD without a history of psychosis (BD-; n = 23), and healthy controls (HC; n = 24). RL was assessed through a probabilistic stimulus selection task with acquisition and test phases. Computational modeling evaluated competing accounts of the data. Each participant's trial-by-trial decision-making behavior was fit to 3 computational models of RL: (a) a standard actor-critic model simulating pure basal ganglia-dependent learning, (b) a pure Q-learning model simulating action selection as a function of learned expected reward value, and (c) a hybrid model where an actor-critic is "augmented" by a Q-learning component, meant to capture the top-down influence of orbitofrontal cortex value representations on the striatum. The SZ group demonstrated greater reinforcement learning impairments at acquisition and test phases than the BD+, BD-, and HC groups. The BD+ and BD- groups displayed comparable performance at acquisition and test phases. Collapsing across diagnostic categories, greater severity of current psychosis was associated with poorer acquisition of the most rewarding stimuli as well as poor go/no-go learning at test. Model fits revealed that reinforcement learning in SZ was best characterized by a pure actor-critic model where learning is driven by prediction error signaling alone. In contrast, BD-, BD+, and HC were best fit by a hybrid model where prediction errors are influenced by top-down expected value representations that guide decision making. These findings suggest that abnormalities in the reward system are more prominent in SZ than BD; however, current psychotic symptoms may be associated with reinforcement learning deficits regardless of a Diagnostic and Statistical Manual of Mental Disorders (5th Edition; American Psychiatric Association, 2013) diagnosis. (c) 2015 APA, all rights reserved).
Ott, Derek V M; Ullsperger, Markus; Jocham, Gerhard; Neumann, Jane; Klein, Tilmann A
2011-07-15
The prefrontal cortex is known to play a key role in higher-order cognitive functions. Recently, we showed that this brain region is active in reinforcement learning, during which subjects constantly have to integrate trial outcomes in order to optimize performance. To further elucidate the role of the dorsolateral prefrontal cortex (DLPFC) in reinforcement learning, we applied continuous theta-burst stimulation (cTBS) either to the left or right DLPFC, or to the vertex as a control region, respectively, prior to the performance of a probabilistic learning task in an fMRI environment. While there was no influence of cTBS on learning performance per se, we observed a stimulation-dependent modulation of reward vs. punishment sensitivity: Left-hemispherical DLPFC stimulation led to a more reward-guided performance, while right-hemispherical cTBS induced a more avoidance-guided behavior. FMRI results showed enhanced prediction error coding in the ventral striatum in subjects stimulated over the left as compared to the right DLPFC. Both behavioral and imaging results are in line with recent findings that left, but not right-hemispherical stimulation can trigger a release of dopamine in the ventral striatum, which has been suggested to increase the relative impact of rewards rather than punishment on behavior. Copyright © 2011 Elsevier Inc. All rights reserved.
Starosta, Sarah; Stüttgen, Maik C; Güntürkün, Onur
2014-06-02
While the subject of learning has attracted immense interest from both behavioral and neural scientists, only relatively few investigators have observed single-neuron activity while animals are acquiring an operantly conditioned response, or when that response is extinguished. But even in these cases, observation periods usually encompass only a single stage of learning, i.e. acquisition or extinction, but not both (exceptions include protocols employing reversal learning; see Bingman et al.(1) for an example). However, acquisition and extinction entail different learning mechanisms and are therefore expected to be accompanied by different types and/or loci of neural plasticity. Accordingly, we developed a behavioral paradigm which institutes three stages of learning in a single behavioral session and which is well suited for the simultaneous recording of single neurons' action potentials. Animals are trained on a single-interval forced choice task which requires mapping each of two possible choice responses to the presentation of different novel visual stimuli (acquisition). After having reached a predefined performance criterion, one of the two choice responses is no longer reinforced (extinction). Following a certain decrement in performance level, correct responses are reinforced again (reacquisition). By using a new set of stimuli in every session, animals can undergo the acquisition-extinction-reacquisition process repeatedly. Because all three stages of learning occur in a single behavioral session, the paradigm is ideal for the simultaneous observation of the spiking output of multiple single neurons. We use pigeons as model systems, but the task can easily be adapted to any other species capable of conditioned discrimination learning.
Stress attenuates the flexible updating of aversive value
Raio, Candace M.; Hartley, Catherine A.; Orederu, Temidayo A.; Li, Jian; Phelps, Elizabeth A.
2017-01-01
In a dynamic environment, sources of threat or safety can unexpectedly change, requiring the flexible updating of stimulus−outcome associations that promote adaptive behavior. However, aversive contexts in which we are required to update predictions of threat are often marked by stress. Acute stress is thought to reduce behavioral flexibility, yet its influence on the modulation of aversive value has not been well characterized. Given that stress exposure is a prominent risk factor for anxiety and trauma-related disorders marked by persistent, inflexible responses to threat, here we examined how acute stress affects the flexible updating of threat responses. Participants completed an aversive learning task, in which one stimulus was probabilistically associated with an electric shock, while the other stimulus signaled safety. A day later, participants underwent an acute stress or control manipulation before completing a reversal learning task during which the original stimulus−outcome contingencies switched. Skin conductance and neuroendocrine responses provided indices of sympathetic arousal and stress responses, respectively. Despite equivalent initial learning, stressed participants showed marked impairments in reversal learning relative to controls. Additionally, reversal learning deficits across participants were related to heightened levels of alpha-amylase, a marker of noradrenergic activity. Finally, fitting arousal data to a computational reinforcement learning model revealed that stress-induced reversal learning deficits emerged from stress-specific changes in the weight assigned to prediction error signals, disrupting the adaptive adjustment of learning rates. Our findings provide insight into how stress renders individuals less sensitive to changes in aversive reinforcement and have implications for understanding clinical conditions marked by stress-related psychopathology. PMID:28973957
A universal role of the ventral striatum in reward-based learning: Evidence from human studies
Daniel, Reka; Pollmann, Stefan
2014-01-01
Reinforcement learning enables organisms to adjust their behavior in order to maximize rewards. Electrophysiological recordings of dopaminergic midbrain neurons have shown that they code the difference between actual and predicted rewards, i.e., the reward prediction error, in many species. This error signal is conveyed to both the striatum and cortical areas and is thought to play a central role in learning to optimize behavior. However, in human daily life rewards are diverse and often only indirect feedback is available. Here we explore the range of rewards that are processed by the dopaminergic system in human participants, and examine whether it is also involved in learning in the absence of explicit rewards. While results from electrophysiological recordings in humans are sparse, evidence linking dopaminergic activity to the metabolic signal recorded from the midbrain and striatum with functional magnetic resonance imaging (fMRI) is available. Results from fMRI studies suggest that the human ventral striatum (VS) receives valuation information for a diverse set of rewarding stimuli. These range from simple primary reinforcers such as juice rewards over abstract social rewards to internally generated signals on perceived correctness, suggesting that the VS is involved in learning from trial-and-error irrespective of the specific nature of provided rewards. In addition, we summarize evidence that the VS can also be implicated when learning from observing others, and in tasks that go beyond simple stimulus-action-outcome learning, indicating that the reward system is also recruited in more complex learning tasks. PMID:24825620
Goal-oriented robot navigation learning using a multi-scale space representation.
Llofriu, M; Tejera, G; Contreras, M; Pelc, T; Fellous, J M; Weitzenfeld, A
2015-12-01
There has been extensive research in recent years on the multi-scale nature of hippocampal place cells and entorhinal grid cells encoding which led to many speculations on their role in spatial cognition. In this paper we focus on the multi-scale nature of place cells and how they contribute to faster learning during goal-oriented navigation when compared to a spatial cognition system composed of single scale place cells. The task consists of a circular arena with a fixed goal location, in which a robot is trained to find the shortest path to the goal after a number of learning trials. Synaptic connections are modified using a reinforcement learning paradigm adapted to the place cells multi-scale architecture. The model is evaluated in both simulation and physical robots. We find that larger scale and combined multi-scale representations favor goal-oriented navigation task learning. Copyright © 2015 Elsevier Ltd. All rights reserved.
When Does Model-Based Control Pay Off?
2016-01-01
Many accounts of decision making and reinforcement learning posit the existence of two distinct systems that control choice: a fast, automatic system and a slow, deliberative system. Recent research formalizes this distinction by mapping these systems to “model-free” and “model-based” strategies in reinforcement learning. Model-free strategies are computationally cheap, but sometimes inaccurate, because action values can be accessed by inspecting a look-up table constructed through trial-and-error. In contrast, model-based strategies compute action values through planning in a causal model of the environment, which is more accurate but also more cognitively demanding. It is assumed that this trade-off between accuracy and computational demand plays an important role in the arbitration between the two strategies, but we show that the hallmark task for dissociating model-free and model-based strategies, as well as several related variants, do not embody such a trade-off. We describe five factors that reduce the effectiveness of the model-based strategy on these tasks by reducing its accuracy in estimating reward outcomes and decreasing the importance of its choices. Based on these observations, we describe a version of the task that formally and empirically obtains an accuracy-demand trade-off between model-free and model-based strategies. Moreover, we show that human participants spontaneously increase their reliance on model-based control on this task, compared to the original paradigm. Our novel task and our computational analyses may prove important in subsequent empirical investigations of how humans balance accuracy and demand. PMID:27564094
Optimized Assistive Human-Robot Interaction Using Reinforcement Learning.
Modares, Hamidreza; Ranatunga, Isura; Lewis, Frank L; Popa, Dan O
2016-03-01
An intelligent human-robot interaction (HRI) system with adjustable robot behavior is presented. The proposed HRI system assists the human operator to perform a given task with minimum workload demands and optimizes the overall human-robot system performance. Motivated by human factor studies, the presented control structure consists of two control loops. First, a robot-specific neuro-adaptive controller is designed in the inner loop to make the unknown nonlinear robot behave like a prescribed robot impedance model as perceived by a human operator. In contrast to existing neural network and adaptive impedance-based control methods, no information of the task performance or the prescribed robot impedance model parameters is required in the inner loop. Then, a task-specific outer-loop controller is designed to find the optimal parameters of the prescribed robot impedance model to adjust the robot's dynamics to the operator skills and minimize the tracking error. The outer loop includes the human operator, the robot, and the task performance details. The problem of finding the optimal parameters of the prescribed robot impedance model is transformed into a linear quadratic regulator (LQR) problem which minimizes the human effort and optimizes the closed-loop behavior of the HRI system for a given task. To obviate the requirement of the knowledge of the human model, integral reinforcement learning is used to solve the given LQR problem. Simulation results on an x - y table and a robot arm, and experimental implementation results on a PR2 robot confirm the suitability of the proposed method.
When Does Model-Based Control Pay Off?
Kool, Wouter; Cushman, Fiery A; Gershman, Samuel J
2016-08-01
Many accounts of decision making and reinforcement learning posit the existence of two distinct systems that control choice: a fast, automatic system and a slow, deliberative system. Recent research formalizes this distinction by mapping these systems to "model-free" and "model-based" strategies in reinforcement learning. Model-free strategies are computationally cheap, but sometimes inaccurate, because action values can be accessed by inspecting a look-up table constructed through trial-and-error. In contrast, model-based strategies compute action values through planning in a causal model of the environment, which is more accurate but also more cognitively demanding. It is assumed that this trade-off between accuracy and computational demand plays an important role in the arbitration between the two strategies, but we show that the hallmark task for dissociating model-free and model-based strategies, as well as several related variants, do not embody such a trade-off. We describe five factors that reduce the effectiveness of the model-based strategy on these tasks by reducing its accuracy in estimating reward outcomes and decreasing the importance of its choices. Based on these observations, we describe a version of the task that formally and empirically obtains an accuracy-demand trade-off between model-free and model-based strategies. Moreover, we show that human participants spontaneously increase their reliance on model-based control on this task, compared to the original paradigm. Our novel task and our computational analyses may prove important in subsequent empirical investigations of how humans balance accuracy and demand.
Zanutto, B. Silvano
2017-01-01
Animals are proposed to learn the latent rules governing their environment in order to maximize their chances of survival. However, rules may change without notice, forcing animals to keep a memory of which one is currently at work. Rule switching can lead to situations in which the same stimulus/response pairing is positively and negatively rewarded in the long run, depending on variables that are not accessible to the animal. This fact raises questions on how neural systems are capable of reinforcement learning in environments where the reinforcement is inconsistent. Here we address this issue by asking about which aspects of connectivity, neural excitability and synaptic plasticity are key for a very general, stochastic spiking neural network model to solve a task in which rules change without being cued, taking the serial reversal task (SRT) as paradigm. Contrary to what could be expected, we found strong limitations for biologically plausible networks to solve the SRT. Especially, we proved that no network of neurons can learn a SRT if it is a single neural population that integrates stimuli information and at the same time is responsible of choosing the behavioural response. This limitation is independent of the number of neurons, neuronal dynamics or plasticity rules, and arises from the fact that plasticity is locally computed at each synapse, and that synaptic changes and neuronal activity are mutually dependent processes. We propose and characterize a spiking neural network model that solves the SRT, which relies on separating the functions of stimuli integration and response selection. The model suggests that experimental efforts to understand neural function should focus on the characterization of neural circuits according to their connectivity, neural dynamics, and the degree of modulation of synaptic plasticity with reward. PMID:29077735
Arslan, Burcu; Taatgen, Niels A; Verbrugge, Rineke
2017-01-01
The focus of studies on second-order false belief reasoning generally was on investigating the roles of executive functions and language with correlational studies. Different from those studies, we focus on the question how 5-year-olds select and revise reasoning strategies in second-order false belief tasks by constructing two computational cognitive models of this process: an instance-based learning model and a reinforcement learning model. Unlike the reinforcement learning model, the instance-based learning model predicted that children who fail second-order false belief tasks would give answers based on first-order theory of mind (ToM) reasoning as opposed to zero-order reasoning. This prediction was confirmed with an empirical study that we conducted with 72 5- to 6-year-old children. The results showed that 17% of the answers were correct and 83% of the answers were wrong. In line with our prediction, 65% of the wrong answers were based on a first-order ToM strategy, while only 29% of them were based on a zero-order strategy (the remaining 6% of subjects did not provide any answer). Based on our instance-based learning model, we propose that when children get feedback "Wrong," they explicitly revise their strategy to a higher level instead of implicitly selecting one of the available ToM strategies. Moreover, we predict that children's failures are due to lack of experience and that with exposure to second-order false belief reasoning, children can revise their wrong first-order reasoning strategy to a correct second-order reasoning strategy.
Arslan, Burcu; Taatgen, Niels A.; Verbrugge, Rineke
2017-01-01
The focus of studies on second-order false belief reasoning generally was on investigating the roles of executive functions and language with correlational studies. Different from those studies, we focus on the question how 5-year-olds select and revise reasoning strategies in second-order false belief tasks by constructing two computational cognitive models of this process: an instance-based learning model and a reinforcement learning model. Unlike the reinforcement learning model, the instance-based learning model predicted that children who fail second-order false belief tasks would give answers based on first-order theory of mind (ToM) reasoning as opposed to zero-order reasoning. This prediction was confirmed with an empirical study that we conducted with 72 5- to 6-year-old children. The results showed that 17% of the answers were correct and 83% of the answers were wrong. In line with our prediction, 65% of the wrong answers were based on a first-order ToM strategy, while only 29% of them were based on a zero-order strategy (the remaining 6% of subjects did not provide any answer). Based on our instance-based learning model, we propose that when children get feedback “Wrong,” they explicitly revise their strategy to a higher level instead of implicitly selecting one of the available ToM strategies. Moreover, we predict that children’s failures are due to lack of experience and that with exposure to second-order false belief reasoning, children can revise their wrong first-order reasoning strategy to a correct second-order reasoning strategy. PMID:28293206
Kruse, Lauren C; Schindler, Abigail G; Williams, Rapheal G; Weber, Sophia J; Clark, Jeremy J
2017-01-01
According to recent WHO reports, alcohol remains the number one substance used and abused by adolescents, despite public health efforts to curb its use. Adolescence is a critical period of biological maturation where brain development, particularly the mesocorticolimbic dopamine system, undergoes substantial remodeling. These circuits are implicated in complex decision making, incentive learning and reinforcement during substance use and abuse. An appealing theoretical approach has been to suggest that alcohol alters the normal development of these processes to promote deficits in reinforcement learning and decision making, which together make individuals vulnerable to developing substance use disorders in adulthood. Previously we have used a preclinical model of voluntary alcohol intake in rats to show that use in adolescence promotes risky decision making in adulthood that is mirrored by selective perturbations in dopamine network dynamics. Further, we have demonstrated that incentive learning processes in adulthood are also altered by adolescent alcohol use, again mirrored by changes in cue-evoked dopamine signaling. Indeed, we have proposed that these two processes, risk-based decision making and incentive learning, are fundamentally linked through dysfunction of midbrain circuitry where inputs to the dopamine system are disrupted by adolescent alcohol use. Here, we test the behavioral predictions of this model in rats and present the findings in the context of the prevailing literature with reference to the long-term consequences of early-life substance use on the vulnerability to develop substance use disorders. We utilize an impulsive choice task to assess the selectivity of alcohol's effect on decision-making profiles and conditioned reinforcement to parse out the effect of incentive value attribution, one mechanism of incentive learning. Finally, we use the differential reinforcement of low rates of responding (DRL) task to examine the degree to which behavioral disinhibition may contribute to an overall decision-making profile. The findings presented here support the proposition that early life alcohol use selectively alters risk-based choice behavior through modulation of incentive learning processes, both of which may be inexorably linked through perturbations in mesolimbic circuitry and may serve as fundamental vulnerabilities to the development of substance use disorders.
Kruse, Lauren C.; Schindler, Abigail G.; Williams, Rapheal G.; Weber, Sophia J.; Clark, Jeremy J.
2017-01-01
According to recent WHO reports, alcohol remains the number one substance used and abused by adolescents, despite public health efforts to curb its use. Adolescence is a critical period of biological maturation where brain development, particularly the mesocorticolimbic dopamine system, undergoes substantial remodeling. These circuits are implicated in complex decision making, incentive learning and reinforcement during substance use and abuse. An appealing theoretical approach has been to suggest that alcohol alters the normal development of these processes to promote deficits in reinforcement learning and decision making, which together make individuals vulnerable to developing substance use disorders in adulthood. Previously we have used a preclinical model of voluntary alcohol intake in rats to show that use in adolescence promotes risky decision making in adulthood that is mirrored by selective perturbations in dopamine network dynamics. Further, we have demonstrated that incentive learning processes in adulthood are also altered by adolescent alcohol use, again mirrored by changes in cue-evoked dopamine signaling. Indeed, we have proposed that these two processes, risk-based decision making and incentive learning, are fundamentally linked through dysfunction of midbrain circuitry where inputs to the dopamine system are disrupted by adolescent alcohol use. Here, we test the behavioral predictions of this model in rats and present the findings in the context of the prevailing literature with reference to the long-term consequences of early-life substance use on the vulnerability to develop substance use disorders. We utilize an impulsive choice task to assess the selectivity of alcohol’s effect on decision-making profiles and conditioned reinforcement to parse out the effect of incentive value attribution, one mechanism of incentive learning. Finally, we use the differential reinforcement of low rates of responding (DRL) task to examine the degree to which behavioral disinhibition may contribute to an overall decision-making profile. The findings presented here support the proposition that early life alcohol use selectively alters risk-based choice behavior through modulation of incentive learning processes, both of which may be inexorably linked through perturbations in mesolimbic circuitry and may serve as fundamental vulnerabilities to the development of substance use disorders. PMID:28790900
Efficacy of Multimedia Instruction and an Introduction to Digital Multimedia Technology
1992-07-01
performed by Bandura , Ro3s and Ross (1961). They found that children exposed to an adult displaying aggression toward a Bobo doll later also performed...and enjoy successful task performance. 7 Modeling Bandura (1969) describes modeling as the ability of individuals to learn a behavior or attitude... Bandura argued that all learning involving direct reinforcement could also result from observation. A classic study of modeling is an experiment
Rule learning in autism: the role of reward type and social context.
Jones, E J H; Webb, S J; Estes, A; Dawson, G
2013-01-01
Learning abstract rules is central to social and cognitive development. Across two experiments, we used Delayed Non-Matching to Sample tasks to characterize the longitudinal development and nature of rule-learning impairments in children with Autism Spectrum Disorder (ASD). Results showed that children with ASD consistently experienced more difficulty learning an abstract rule from a discrete physical reward than children with DD. Rule learning was facilitated by the provision of more concrete reinforcement, suggesting an underlying difficulty in forming conceptual connections. Learning abstract rules about social stimuli remained challenging through late childhood, indicating the importance of testing executive functions in both social and non-social contexts.
Gaussian Processes for Data-Efficient Learning in Robotics and Control.
Deisenroth, Marc Peter; Fox, Dieter; Rasmussen, Carl Edward
2015-02-01
Autonomous learning has been a promising direction in control and robotics for more than a decade since data-driven learning allows to reduce the amount of engineering knowledge, which is otherwise required. However, autonomous reinforcement learning (RL) approaches typically require many interactions with the system to learn controllers, which is a practical limitation in real systems, such as robots, where many interactions can be impractical and time consuming. To address this problem, current learning approaches typically require task-specific knowledge in form of expert demonstrations, realistic simulators, pre-shaped policies, or specific knowledge about the underlying dynamics. In this paper, we follow a different approach and speed up learning by extracting more information from data. In particular, we learn a probabilistic, non-parametric Gaussian process transition model of the system. By explicitly incorporating model uncertainty into long-term planning and controller learning our approach reduces the effects of model errors, a key problem in model-based learning. Compared to state-of-the art RL our model-based policy search method achieves an unprecedented speed of learning. We demonstrate its applicability to autonomous learning in real robot and control tasks.
Mice lacking hippocampal left-right asymmetry show non-spatial learning deficits.
Shimbo, Akihiro; Kosaki, Yutaka; Ito, Isao; Watanabe, Shigeru
2018-01-15
Left-right asymmetry is known to exist at several anatomical levels in the brain and recent studies have provided further evidence to show that it also exists at a molecular level in the hippocampal CA3-CA1 circuit. The distribution of N-methyl-d-aspartate (NMDA) receptor NR2B subunits in the apical and basal synapses of CA1 pyramidal neurons is asymmetrical if the input arrives from the left or right CA3 pyramidal neurons. In the present study, we examined the role of hippocampal asymmetry in cognitive function using β2-microglobulin knock-out (β2m KO) mice, which lack hippocampal asymmetry. We tested β2m KO mice in a series of spatial and non-spatial learning tasks and compared the performances of β2m KO and C57BL6/J wild-type (WT) mice. The β2m KO mice appeared normal in both spatial reference memory and spatial working memory tasks but they took more time than WT mice in learning the two non-spatial learning tasks (i.e., a differential reinforcement of lower rates of behavior (DRL) task and a straight runway task). The β2m KO mice also showed less precision in their response timing in the DRL task and showed weaker spontaneous recovery during extinction in the straight runway task. These results indicate that hippocampal asymmetry is important for certain characteristics of non-spatial learning. Copyright © 2017 Elsevier B.V. All rights reserved.
Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing
Lefebvre, Germain; Blakemore, Sarah-Jayne
2017-01-01
Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice. PMID:28800597
Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing.
Palminteri, Stefano; Lefebvre, Germain; Kilford, Emma J; Blakemore, Sarah-Jayne
2017-08-01
Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice.
Parallel Online Temporal Difference Learning for Motor Control.
Caarls, Wouter; Schuitema, Erik
2016-07-01
Temporal difference (TD) learning, a key concept in reinforcement learning, is a popular method for solving simulated control problems. However, in real systems, this method is often avoided in favor of policy search methods because of its long learning time. But policy search suffers from its own drawbacks, such as the necessity of informed policy parameterization and initialization. In this paper, we show that TD learning can work effectively in real robotic systems as well, using parallel model learning and planning. Using locally weighted linear regression and trajectory sampled planning with 14 concurrent threads, we can achieve a speedup of almost two orders of magnitude over regular TD control on simulated control benchmarks. For a real-world pendulum swing-up task and a two-link manipulator movement task, we report a speedup of 20× to 60× , with a real-time learning speed of less than half a minute. The results are competitive with state-of-the-art policy search.
Raufelder, Diana; Boehme, Rebecca; Romund, Lydia; Golde, Sabrina; Lorenz, Robert C.; Gleich, Tobias; Beck, Anne
2016-01-01
This multi-methodological study applied functional magnetic resonance imaging to investigate neural activation in a group of adolescent students (N = 88) during a probabilistic reinforcement learning task. We related patterns of emerging brain activity and individual learning rates to socio-motivational (in-)dependence manifested in four different motivation types (MTs): (1) peer-dependent MT, (2) teacher-dependent MT, (3) peer-and-teacher-dependent MT, (4) peer-and-teacher-independent MT. A multinomial regression analysis revealed that the individual learning rate predicts students’ membership to the independent MT, or the peer-and-teacher-dependent MT. Additionally, the striatum, a brain region associated with behavioral adaptation and flexibility, showed increased learning-related activation in students with motivational independence. Moreover, the prefrontal cortex, which is involved in behavioral control, was more active in students of the peer-and-teacher-dependent MT. Overall, this study offers new insights into the interplay of motivation and learning with (1) a focus on inter-individual differences in the role of peers and teachers as source of students’ individual motivation and (2) its potential neurobiological basis. PMID:27199873
Pechtel, Pia; Pizzagalli, Diego A.
2013-01-01
Context Childhood sexual abuse (CSA) has been associated with psychopathology, particularly major depressive disorder (MDD), and high-risk behaviors. Despite grave epidemiological data, the mechanisms underlying these maladaptive outcomes remain poorly understood. Objective We examined whether CSA history, particularly in conjunction with past MDD, is associated with behavioral and neural dysfunction in reinforcement learning, and whether such dysfunction is linked to maladaptive behavior. Design Participants completed a clinical evaluation and a probabilistic reinforcement task while 128-channel event-related potentials were recorded. Setting Academic setting; participants recruited from the community. Participants Fifteen remitted depressed females with CSA history (CSA+rMDD), 16 remitted depressed females without CSA history (rMDD), and 18 healthy females. Main Outcome Measures Participants’ preference for choosing the most rewarded stimulus and avoiding the most punished stimulus was evaluated. The feedback-related negativity (FRN) and error-related negativity (ERN)–hypothesized to reflect activation in the anterior cingulate cortex–were used as electrophysiological indices of reinforcement learning. Results No group differences emerged in the acquisition of reinforcement contingencies. In trials requiring to rely partially or exclusively on previously rewarded information, the CSA+rMDD group showed (1) lower accuracy (relative to both controls and rMDD), (2) blunted electrophysiological differentiation between correct and incorrect responses (relative to controls), and (3) increased activation in the subgenual anterior cingulate cortex (relative to rMDD). CSA history was not associated with impairments in avoiding the most punished stimulus. Self-harm and suicidal behaviors correlated with poorer performance of previously rewarded–but not previously punished–trials. Conclusions Irrespective of past MDD, women with CSA histories showed neural and behavioral deficits in utilizing previous reinforcement to optimize decision-making in the absence of feedback (blunted “Go learning”). While the current study provides initial evidence for reward-specific deficits associated with CSA, future research is warranted to determine if disrupted positive reinforcement learning predicts high-risk behavior following CSA. PMID:23487253
Ryu, Vin; Ha, Ra Yeon; Lee, Su Jin; Ha, Kyooseob; Cho, Hyun-Sang
2017-03-01
Bipolar disorder is characterized by behavioral changes such as risk-taking and increasing goal-directed activities, which may result from altered reward processing. Patients with bipolar disorder show impaired reward learning in situations that require the integration of reinforced feedback over time. In this study, we examined the behavioral and electrophysiological characteristics of reward learning in manic and euthymic patients with bipolar disorder using a probabilistic reward task. Twenty-four manic and 20 euthymic patients with bipolar I disorder and 24 healthy control subjects performed the probabilistic reward task. We assessed response bias (RB) as a preference for the stimulus paired with the more frequent reward and feedback-related negativity (FRN) to correct identification of the rich stimulus. Both manic and euthymic patients showed significantly lower RB scores in the early learning stage (block 1) in comparison with the late learning stage (block 2 or block 3) of the task, as well as significantly lower RB scores in the early stage compared to healthy subjects. Relatively more negative FRN amplitude is elicited by no presentation of an expected reward, compared to that elicited by presentation of expected feedback. The FRN became significantly more negative from the early (block 1) to the later stages (blocks 2 and 3) in both manic and euthymic patients, but not in healthy subjects. Changes in RB scores and FRN amplitudes between blocks 2 and 3 and block 1 correlated positively in healthy controls, but correlated negatively in manic and euthymic patients. The severity of manic symptoms correlated positively with reward learning scores and negatively with the FRN. These findings suggest that patients with bipolar disorder during euthymic or manic states have behavioral and electrophysiological alterations in reward learning compared to healthy subjects. This dysfunctional reward processing may be related to the abnormal decision-making or altered goal-directed activities frequently seen in patients with bipolar disorder. © 2017 John Wiley & Sons Ltd.
van Duin, Esther D A; Kasanova, Zuzana; Hernaus, Dennis; Ceccarini, Jenny; Heinzel, Alexander; Mottaghy, Felix; Mohammadkhani-Shali, Siamak; Winz, Oliver; Frank, Michael; Beck, Merrit C H; Booij, Jan; Myin-Germeys, Inez; van Amelsvoort, Thérèse
2018-06-01
22q11.2 deletion syndrome (22q11DS) is a genetic disorder caused by a microdeletion on chromosome 22q11.2 and associated with an increased risk for developing psychosis. The catechol-O-methyltransferase (COMT) gene is located in the deleted region and involved in dopamine (DA) breakdown. Impaired reinforcement learning (RL) is a recurrent feature in psychosis and thought to be related to abnormal striatal DA function. This study aims to examine RL and the potential association with striatal DA-ergic neuromodulation in 22q11DS. Twelve non-psychotic adults with 22q11DS and 16 healthy controls (HC) were included. A dopamine D 2/3 receptor [ 18 F]fallypride positron emission tomography (PET) scan was acquired while participants performed a modified version of the probabilistic stimulus selection task. RL-task performance was significantly worse in 22q11DS compared to HC. There were no group difference in striatal nondisplaceable binding potential (BP ND ) and task-induced DA release. In HC, striatal task-induced DA release was positively associated with task performance, but no such relation was found in 22q11DS subjects. Moreover, higher caudate nucleus task-induced DA release was found in COMT Met hemizygotes relative to Val hemizygotes. This study is the first to show impairments in RL in 22q11DS. It suggests that potentially motivational impairments are not only present in psychosis, but also in this genetic high risk group. These deficits may be underlain by abnormal striatal task-induced DA release, perhaps as a consequence of COMT haplo-insufficiency. Copyright © 2018 Elsevier B.V. and ECNP. All rights reserved.
Baymann, Ulrike; Langbein, Jan; Siebert, Katrin; Nürnberg, Gerd; Manteuffel, Gerhard; Mohr, Elmar
2007-01-01
The influence of social rank and social environment on visual discrimination learning of small groups of Nigerian dwarf goats (Capra hircus, n = 79) was studied using a computer-controlled learning device integrated in the animals' home pen. The experiment was divided into three sections (LE1, LE1 u, LE2; each 14d). In LE1 the goats learned a discrimination task in a socially stable environment. In LE1u animals were mixed and relocated to another pen and given the same task as in LE1. In LE2 the animals were mixed and relocated again and given a new discrimination task. We used drinking water as a primary reinforcer. The rank category of the goats were analysed as alpha, omega or middle ranking for each section of the experiment. The rank category had an influence on daily learning success (percentage of successful trials per day) only in LE1 u. Daily learning success decreased after mixing and relocation of the animals in LE1 u and LE2 compared to LE1. That resulted in an undersupply of drinking water on the first day of both these tasks. We discuss social stress induced by agonistic interactions after mixing as a reason for that decline. The absolute learning performance (trials to reach the learning criterion) of the omega animals was lower in LE2 compared to the other rank categories. Furthermore, their absolute learning performance was lower in LE2 compared to LE1. For future application of similar automated learning devices in animal husbandry, we recommend against the combination of management routines like mixing and relocation with changes in the learning task because of the negative effects on learning performance, particularly of the omega animals.
Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning.
Sidarta, Ananda; Vahdat, Shahabeddin; Bernardi, Nicolò F; Ostry, David J
2016-11-16
As one learns to dance or play tennis, the desired somatosensory state is typically unknown. Trial and error is important as motor behavior is shaped by successful and unsuccessful movements. As an experimental model, we designed a task in which human participants make reaching movements to a hidden target and receive positive reinforcement when successful. We identified somatic and reinforcement-based sources of plasticity on the basis of changes in functional connectivity using resting-state fMRI before and after learning. The neuroimaging data revealed reinforcement-related changes in both motor and somatosensory brain areas in which a strengthening of connectivity was related to the amount of positive reinforcement during learning. Areas of prefrontal cortex were similarly altered in relation to reinforcement, with connectivity between sensorimotor areas of putamen and the reward-related ventromedial prefrontal cortex strengthened in relation to the amount of successful feedback received. In other analyses, we assessed connectivity related to changes in movement direction between trials, a type of variability that presumably reflects exploratory strategies during learning. We found that connectivity in a network linking motor and somatosensory cortices increased with trial-to-trial changes in direction. Connectivity varied as well with the change in movement direction following incorrect movements. Here the changes were observed in a somatic memory and decision making network involving ventrolateral prefrontal cortex and second somatosensory cortex. Our results point to the idea that the initial stages of motor learning are not wholly motor but rather involve plasticity in somatic and prefrontal networks related both to reward and exploration. In the initial stages of motor learning, the placement of the limbs is learned primarily through trial and error. In an experimental analog, participants make reaching movements to a hidden target and receive positive feedback when successful. We identified sources of plasticity based on changes in functional connectivity using resting-state fMRI. The main finding is that there is a strengthening of connectivity between reward-related prefrontal areas and sensorimotor areas in the basal ganglia and frontal cortex. There is also a strengthening of connectivity related to movement exploration in sensorimotor circuits involved in somatic memory and decision making. The results indicate that initial stages of motor learning depend on plasticity in somatic and prefrontal networks related to reward and exploration. Copyright © 2016 the authors 0270-6474/16/3611682-11$15.00/0.
Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning
Sidarta, Ananda; Vahdat, Shahabeddin; Bernardi, Nicolò F.
2016-01-01
As one learns to dance or play tennis, the desired somatosensory state is typically unknown. Trial and error is important as motor behavior is shaped by successful and unsuccessful movements. As an experimental model, we designed a task in which human participants make reaching movements to a hidden target and receive positive reinforcement when successful. We identified somatic and reinforcement-based sources of plasticity on the basis of changes in functional connectivity using resting-state fMRI before and after learning. The neuroimaging data revealed reinforcement-related changes in both motor and somatosensory brain areas in which a strengthening of connectivity was related to the amount of positive reinforcement during learning. Areas of prefrontal cortex were similarly altered in relation to reinforcement, with connectivity between sensorimotor areas of putamen and the reward-related ventromedial prefrontal cortex strengthened in relation to the amount of successful feedback received. In other analyses, we assessed connectivity related to changes in movement direction between trials, a type of variability that presumably reflects exploratory strategies during learning. We found that connectivity in a network linking motor and somatosensory cortices increased with trial-to-trial changes in direction. Connectivity varied as well with the change in movement direction following incorrect movements. Here the changes were observed in a somatic memory and decision making network involving ventrolateral prefrontal cortex and second somatosensory cortex. Our results point to the idea that the initial stages of motor learning are not wholly motor but rather involve plasticity in somatic and prefrontal networks related both to reward and exploration. SIGNIFICANCE STATEMENT In the initial stages of motor learning, the placement of the limbs is learned primarily through trial and error. In an experimental analog, participants make reaching movements to a hidden target and receive positive feedback when successful. We identified sources of plasticity based on changes in functional connectivity using resting-state fMRI. The main finding is that there is a strengthening of connectivity between reward-related prefrontal areas and sensorimotor areas in the basal ganglia and frontal cortex. There is also a strengthening of connectivity related to movement exploration in sensorimotor circuits involved in somatic memory and decision making. The results indicate that initial stages of motor learning depend on plasticity in somatic and prefrontal networks related to reward and exploration. PMID:27852776
Reddy, Lena Felice; Waltz, James A; Green, Michael F; Wynn, Jonathan K; Horan, William P
2016-07-01
Although individuals with schizophrenia show impaired feedback-driven learning on probabilistic reversal learning (PRL) tasks, the specific factors that contribute to these deficits remain unknown. Recent work has suggested several potential causes including neurocognitive impairments, clinical symptoms, and specific types of feedback-related errors. To examine this issue, we administered a PRL task to 126 stable schizophrenia outpatients and 72 matched controls, and patients were retested 4 weeks later. The task involved an initial probabilistic discrimination learning phase and subsequent reversal phases in which subjects had to adjust their responses to sudden shifts in the reinforcement contingencies. Patients showed poorer performance than controls for both the initial discrimination and reversal learning phases of the task, and performance overall showed good test-retest reliability among patients. A subgroup analysis of patients (n = 64) and controls (n = 49) with good initial discrimination learning revealed no between-group differences in reversal learning, indicating that the patients who were able to achieve all of the initial probabilistic discriminations were not impaired in reversal learning. Regarding potential contributors to impaired discrimination learning, several factors were associated with poor PRL, including higher levels of neurocognitive impairment, poor learning from both positive and negative feedback, and higher levels of indiscriminate response shifting. The results suggest that poor PRL performance in schizophrenia can be the product of multiple mechanisms. © The Author 2016. Published by Oxford University Press on behalf of the Maryland Psychiatric Research Center. All rights reserved. For permissions, please email: journals.permissions@oup.com.
On Adaptation, Maximization, and Reinforcement Learning among Cognitive Strategies
ERIC Educational Resources Information Center
Erev, Ido; Barron, Greg
2005-01-01
Analysis of binary choice behavior in iterated tasks with immediate feedback reveals robust deviations from maximization that can be described as indications of 3 effects: (a) a payoff variability effect, in which high payoff variability seems to move choice behavior toward random choice; (b) underweighting of rare events, in which alternatives…
Hierarchically Organized Behavior and Its Neural Foundations: A Reinforcement Learning Perspective
ERIC Educational Resources Information Center
Botvinick, Matthew M.; Niv, Yael; Barto, Andrew C.
2009-01-01
Research on human and animal behavior has long emphasized its hierarchical structure--the divisibility of ongoing behavior into discrete tasks, which are comprised of subtask sequences, which in turn are built of simple actions. The hierarchical structure of behavior has also been of enduring interest within neuroscience, where it has been widely…
Auto Mechanics. Instructional System Development Model for Vermont Area Vocational Centers.
ERIC Educational Resources Information Center
The model curriculum guide was developed to teach automotive mechanics in secondary schools in Vermont. It is composed of a series of units related to tasks identified as skills, concepts, and values, which are stated in behavioral terms, supported by suggested learning activities, reinforced by teacher resource needs and suggested evaluation…
Pitch Systems and Curwen Hand Signs: A Review of Literature
ERIC Educational Resources Information Center
Frey-Clark, Marta
2017-01-01
Learning to sing from notation is a complex task, and accurately performing pitches without an external reference can be particularly challenging. As such, the use of mnemonic devices to reinforce tonal relationships is a long-standing practice among musicians. Chief among these mnemonic devices are pitch syllable systems and Curwen hand signs.…
NASA Astrophysics Data System (ADS)
Quirion, Nate
Unmanned Aerial Systems (UASs) today are fulfilling more roles than ever before. There is a general push to have these systems feature more advanced autonomous capabilities in the near future. To achieve autonomous behavior requires some unique approaches to control and decision making. More advanced versions of these approaches are able to adapt their own behavior and examine their past experiences to increase their future mission performance. To achieve adaptive behavior and decision making capabilities this study used Reinforcement Learning algorithms. In this research the effects of sensor performance, as modeled through Signal Detection Theory (SDT), on the ability of RL algorithms to accomplish a target localization task are examined. Three levels of sensor sensitivity are simulated and compared to the results of the same system using a perfect sensor. To accomplish the target localization task, a hierarchical architecture used two distinct agents. A simulated human operator is assumed to be a perfect decision maker, and is used in the system feedback. An evaluation of the system is performed using multiple metrics, including episodic reward curves and the time taken to locate all targets. Statistical analyses are employed to detect significant differences in the comparison of steady-state behavior of different systems.
Moustafa, Ahmed A.; Gluck, Mark A.; Herzallah, Mohammad M.; Myers, Catherine E.
2015-01-01
Previous research has shown that trial ordering affects cognitive performance, but this has not been tested using category-learning tasks that differentiate learning from reward and punishment. Here, we tested two groups of healthy young adults using a probabilistic category learning task of reward and punishment in which there are two types of trials (reward, punishment) and three possible outcomes: (1) positive feedback for correct responses in reward trials; (2) negative feedback for incorrect responses in punishment trials; and (3) no feedback for incorrect answers in reward trials and correct answers in punishment trials. Hence, trials without feedback are ambiguous, and may represent either successful avoidance of punishment or failure to obtain reward. In Experiment 1, the first group of subjects received an intermixed task in which reward and punishment trials were presented in the same block, as a standard baseline task. In Experiment 2, a second group completed the separated task, in which reward and punishment trials were presented in separate blocks. Additionally, in order to understand the mechanisms underlying performance in the experimental conditions, we fit individual data using a Q-learning model. Results from Experiment 1 show that subjects who completed the intermixed task paradoxically valued the no-feedback outcome as a reinforcer when it occurred on reinforcement-based trials, and as a punisher when it occurred on punishment-based trials. This is supported by patterns of empirical responding, where subjects showed more win-stay behavior following an explicit reward than following an omission of punishment, and more lose-shift behavior following an explicit punisher than following an omission of reward. In Experiment 2, results showed similar performance whether subjects received reward-based or punishment-based trials first. However, when the Q-learning model was applied to these data, there were differences between subjects in the reward-first and punishment-first conditions on the relative weighting of neutral feedback. Specifically, early training on reward-based trials led to omission of reward being treated as similar to punishment, but prior training on punishment-based trials led to omission of reward being treated more neutrally. This suggests that early training on one type of trials, specifically reward-based trials, can create a bias in how neutral feedback is processed, relative to those receiving early punishment-based training or training that mixes positive and negative outcomes. PMID:26257616
Schiffino, Felipe L; Zhou, Vivian; Holland, Peter C
2014-02-01
Within most contemporary learning theories, reinforcement prediction error, the difference between the obtained and expected reinforcer value, critically influences associative learning. In some theories, this prediction error determines the momentary effectiveness of the reinforcer itself, such that the same physical event produces more learning when its presentation is surprising than when it is expected. In other theories, prediction error enhances attention to potential cues for that reinforcer by adjusting cue-specific associability parameters, biasing the processing of those stimuli so that they more readily enter into new associations in the future. A unique feature of these latter theories is that such alterations in stimulus associability must be represented in memory in an enduring fashion. Indeed, considerable data indicate that altered associability may be expressed days after its induction. Previous research from our laboratory identified brain circuit elements critical to the enhancement of stimulus associability by the omission of an expected event, and to the subsequent expression of that altered associability in more rapid learning. Here, for the first time, we identified a brain region, the posterior parietal cortex, as a potential site for a memorial representation of altered stimulus associability. In three experiments using rats and a serial prediction task, we found that intact posterior parietal cortex function was essential during the encoding, consolidation, and retrieval of an associability memory enhanced by surprising omissions. We discuss these new results in the context of our previous findings and additional plausible frontoparietal and subcortical networks. © 2013 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
The Neural Foundations of Reaction and Action in Aversive Motivation.
Campese, Vincent D; Sears, Robert M; Moscarello, Justin M; Diaz-Mataix, Lorenzo; Cain, Christopher K; LeDoux, Joseph E
2016-01-01
Much of the early research in aversive learning concerned motivation and reinforcement in avoidance conditioning and related paradigms. When the field transitioned toward the focus on Pavlovian threat conditioning in isolation, this paved the way for the clear understanding of the psychological principles and neural and molecular mechanisms responsible for this type of learning and memory that has unfolded over recent decades. Currently, avoidance conditioning is being revisited, and with what has been learned about associative aversive learning, rapid progress is being made. We review, below, the literature on the neural substrates critical for learning in instrumental active avoidance tasks and conditioned aversive motivation.
Kumar, Poornima; Eickhoff, Simon B.; Dombrovski, Alexandre Y.
2015-01-01
Reinforcement learning describes motivated behavior in terms of two abstract signals. The representation of discrepancies between expected and actual rewards/punishments – prediction error – is thought to update the expected value of actions and predictive stimuli. Electrophysiological and lesion studies suggest that mesostriatal prediction error signals control behavior through synaptic modification of cortico-striato-thalamic networks. Signals in the ventromedial prefrontal and orbitofrontal cortex are implicated in representing expected value. To obtain unbiased maps of these representations in the human brain, we performed a meta-analysis of functional magnetic resonance imaging studies that employed algorithmic reinforcement learning models, across a variety of experimental paradigms. We found that the ventral striatum (medial and lateral) and midbrain/thalamus represented reward prediction errors, consistent with animal studies. Prediction error signals were also seen in the frontal operculum/insula, particularly for social rewards. In Pavlovian studies, striatal prediction error signals extended into the amygdala, while instrumental tasks engaged the caudate. Prediction error maps were sensitive to the model-fitting procedure (fixed or individually-estimated) and to the extent of spatial smoothing. A correlate of expected value was found in a posterior region of the ventromedial prefrontal cortex, caudal and medial to the orbitofrontal regions identified in animal studies. These findings highlight a reproducible motif of reinforcement learning in the cortico-striatal loops and identify methodological dimensions that may influence the reproducibility of activation patterns across studies. PMID:25665667
Role of dopamine D2 receptors in optimizing choice strategy in a dynamic and uncertain environment
Kwak, Shinae; Huh, Namjung; Seo, Ji-Seon; Lee, Jung-Eun; Han, Pyung-Lim; Jung, Min W.
2014-01-01
In order to investigate roles of dopamine receptor subtypes in reward-based learning, we examined choice behavior of dopamine D1 and D2 receptor-knockout (D1R-KO and D2R-KO, respectively) mice in an instrumental learning task with progressively increasing reversal frequency and a dynamic two-armed bandit task. Performance of D2R-KO mice was progressively impaired in the former as the frequency of reversal increased and profoundly impaired in the latter even with prolonged training, whereas D1R-KO mice showed relatively minor performance deficits. Choice behavior in the dynamic two-armed bandit task was well explained by a hybrid model including win-stay-lose-switch and reinforcement learning terms. A model-based analysis revealed increased win-stay, but impaired value updating and decreased value-dependent action selection in D2R-KO mice, which were detrimental to maximizing rewards in the dynamic two-armed bandit task. These results suggest an important role of dopamine D2 receptors in learning from past choice outcomes for rapid adjustment of choice behavior in a dynamic and uncertain environment. PMID:25389395
Attention control learning in the decision space using state estimation
NASA Astrophysics Data System (ADS)
Gharaee, Zahra; Fatehi, Alireza; Mirian, Maryam S.; Nili Ahmadabadi, Majid
2016-05-01
The main goal of this paper is modelling attention while using it in efficient path planning of mobile robots. The key challenge in concurrently aiming these two goals is how to make an optimal, or near-optimal, decision in spite of time and processing power limitations, which inherently exist in a typical multi-sensor real-world robotic application. To efficiently recognise the environment under these two limitations, attention of an intelligent agent is controlled by employing the reinforcement learning framework. We propose an estimation method using estimated mixture-of-experts task and attention learning in perceptual space. An agent learns how to employ its sensory resources, and when to stop observing, by estimating its perceptual space. In this paper, static estimation of the state space in a learning task problem, which is examined in the WebotsTM simulator, is performed. Simulation results show that a robot learns how to achieve an optimal policy with a controlled cost by estimating the state space instead of continually updating sensory information.
NASA Astrophysics Data System (ADS)
Zhou, Changjiu; Meng, Qingchun; Guo, Zhongwen; Qu, Wiefen; Yin, Bo
2002-04-01
Robot learning in unstructured environments has been proved to be an extremely challenging problem, mainly because of many uncertainties always present in the real world. Human beings, on the other hand, seem to cope very well with uncertain and unpredictable environments, often relying on perception-based information. Furthermore, humans beings can also utilize perceptions to guide their learning on those parts of the perception-action space that are actually relevant to the task. Therefore, we conduct a research aimed at improving robot learning through the incorporation of both perception-based and measurement-based information. For this reason, a fuzzy reinforcement learning (FRL) agent is proposed in this paper. Based on a neural-fuzzy architecture, different kinds of information can be incorporated into the FRL agent to initialise its action network, critic network and evaluation feedback module so as to accelerate its learning. By making use of the global optimisation capability of GAs (genetic algorithms), a GA-based FRL (GAFRL) agent is presented to solve the local minima problem in traditional actor-critic reinforcement learning. On the other hand, with the prediction capability of the critic network, GAs can perform a more effective global search. Different GAFRL agents are constructed and verified by using the simulation model of a physical biped robot. The simulation analysis shows that the biped learning rate for dynamic balance can be improved by incorporating perception-based information on biped balancing and walking evaluation. The biped robot can find its application in ocean exploration, detection or sea rescue activity, as well as military maritime activity.
The transfer of category knowledge by macaques (Macaca mulatta) and humans (Homo sapiens).
Zakrzewski, Alexandria C; Church, Barbara A; Smith, J David
2018-02-01
Cognitive psychologists distinguish implicit, procedural category learning (stimulus-response associations learned outside declarative cognition) from explicit-declarative category learning (conscious category rules). These systems are dissociated by category learning tasks with either a multidimensional, information-integration (II) solution or a unidimensional, rule-based (RB) solution. In the present experiments, humans and two monkeys learned II and RB category tasks fostering implicit and explicit learning, respectively. Then they received occasional transfer trials-never directly reinforced-drawn from untrained regions of the stimulus space. We hypothesized that implicit-procedural category learning-allied to associative learning-would transfer weakly because it is yoked to the training stimuli. This result was confirmed for humans and monkeys. We hypothesized that explicit category learning-allied to abstract category rules-would transfer robustly. This result was confirmed only for humans. That is, humans displayed explicit category knowledge that transferred flawlessly. Monkeys did not. This result illuminates the distinctive abstractness, stimulus independence, and representational portability of humans' explicit category rules. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Molina-Hernández, Miguel; Téllez-Alcántara, N Patricia
2004-07-01
During the learning of instrumental tasks, rats are usually fasted to increase reinforced learning. However, fasting produces several undesirable side effects. The aim of this study was to test the hypothesis that control rats, i.e. full-fed and group-reared rats, will learn an autoshaping task to the same level as fasted or singly-reared rats. The interaction between fasting and single-rearing of rats was also tested. Results showed that control rats and fasted rats acquired the autoshaping task similarly, independently of rearing condition or gender. However, fasted or singly-reared rats produced fear-like behaviour, since male rats group-reared and fasted (85% body/wt, P <0.05), male rats singly-reared (full fed, P <0.05; 12 h fasted, P <0.05; 85% body/wt, P <0.05), female rats group-reared (12 h fasted, P <0.05; 85% body/wt, P <0.05) and female rats singly reared (full fed, P <0.05; 12 h fasted, P <0.05; 85% body/wt, P <0.05) displayed reduced amounts of time exploring the open arms of the elevated plus-maze. In conclusion, control rats learned the autoshaping task to the same level as fasted or singly-reared rats. However, fasting or single-rearing produced fear-like behaviour. Thus, the training of control rats in autoshaping tasks may be an option that improves animal welfare.
The effects of response cost and species-typical behaviors on a daily time-place learning task.
Deibel, Scott H; Thorpe, Christina M
2013-03-01
Two theories that have been hypothesized to mediate acquisition in daily time-place learning (TPL) tasks were investigated in a free operant daily TPL task: the response cost hypothesis and the species-typical behavior hypothesis. One lever at the end of one of the choice arms of a T-maze provided food in the morning, and 6 h later, a lever in the other choice arm provided food. Four groups were used to assess the effect of two possible sources of response cost: physical effort of the task and costs associated with foraging ecology. One group was used to assess the effect of explicitly allowing for species-typical behaviors. If only first arm choice data were considered, there was little evidence of learning. However, both first press and percentage of presses on the correct lever prior to the first reinforcement revealed evidence of TPL in most rats tested. Unexpectedly, the high response cost groups for both of the proposed sources did not perform better than the low response cost groups. The groups that allowed animals to display species-typical behaviors performed the worst. Skip session probe trials confirmed that the majority of the rats that acquired the task were using a circadian timing strategy. The results from the present study suggest that learning in free operant daily TPL tasks might not be dependent on response cost.
Ellwood, Ian T.; Patel, Tosha; Wadia, Varun; Lee, Anthony T.; Liptak, Alayna T.
2017-01-01
Dopamine neurons in the ventral tegmental area (VTA) encode reward prediction errors and can drive reinforcement learning through their projections to striatum, but much less is known about their projections to prefrontal cortex (PFC). Here, we studied these projections and observed phasic VTA–PFC fiber photometry signals after the delivery of rewards. Next, we studied how optogenetic stimulation of these projections affects behavior using conditioned place preference and a task in which mice learn associations between cues and food rewards and then use those associations to make choices. Neither phasic nor tonic stimulation of dopaminergic VTA–PFC projections elicited place preference. Furthermore, substituting phasic VTA–PFC stimulation for food rewards was not sufficient to reinforce new cue–reward associations nor maintain previously learned ones. However, the same patterns of stimulation that failed to reinforce place preference or cue–reward associations were able to modify behavior in other ways. First, continuous tonic stimulation maintained previously learned cue–reward associations even after they ceased being valid. Second, delivering phasic stimulation either continuously or after choices not previously associated with reward induced mice to make choices that deviated from previously learned associations. In summary, despite the fact that dopaminergic VTA–PFC projections exhibit phasic increases in activity that are time locked to the delivery of rewards, phasic activation of these projections does not necessarily reinforce specific actions. Rather, dopaminergic VTA–PFC activity can control whether mice maintain or deviate from previously learned cue–reward associations. SIGNIFICANCE STATEMENT Dopaminergic inputs from ventral tegmental area (VTA) to striatum encode reward prediction errors and reinforce specific actions; however, it is currently unknown whether dopaminergic inputs to prefrontal cortex (PFC) play similar or distinct roles. Here, we used bulk Ca2+ imaging to show that unexpected rewards or reward-predicting cues elicit phasic increases in the activity of dopaminergic VTA–PFC fibers. However, in multiple behavioral paradigms, we failed to observe reinforcing effects after stimulation of these fibers. In these same experiments, we did find that tonic or phasic patterns of stimulation caused mice to maintain or deviate from previously learned cue–reward associations, respectively. Therefore, although they may exhibit similar patterns of activity, dopaminergic inputs to striatum and PFC can elicit divergent behavioral effects. PMID:28739583
Rule Learning in Autism: The Role of Reward Type and Social Context
Jones, E. J. H.; Webb, S. J.; Estes, A.; Dawson, G.
2013-01-01
Learning abstract rules is central to social and cognitive development. Across two experiments, we used Delayed Non-Matching to Sample tasks to characterize the longitudinal development and nature of rule-learning impairments in children with Autism Spectrum Disorder (ASD). Results showed that children with ASD consistently experienced more difficulty learning an abstract rule from a discrete physical reward than children with DD. Rule learning was facilitated by the provision of more concrete reinforcement, suggesting an underlying difficulty in forming conceptual connections. Learning abstract rules about social stimuli remained challenging through late childhood, indicating the importance of testing executive functions in both social and non-social contexts. PMID:23311315
Obayashi, Chihiro; Tamei, Tomoya; Shibata, Tomohiro
2014-05-01
This paper proposes a novel robotic trainer for motor skill learning. It is user-adaptive inspired by the assist-as-needed principle well known in the field of physical therapy. Most previous studies in the field of the robotic assistance of motor skill learning have used predetermined desired trajectories, and it has not been examined intensively whether these trajectories were optimal for each user. Furthermore, the guidance hypothesis states that humans tend to rely too much on external assistive feedback, resulting in interference with the internal feedback necessary for motor skill learning. A few studies have proposed a system that adjusts its assistive strength according to the user's performance in order to prevent the user from relying too much on the robotic assistance. There are, however, problems in these studies, in that a physical model of the user's motor system is required, which is inherently difficult to construct. In this paper, we propose a framework for a robotic trainer that is user-adaptive and that neither requires a specific desired trajectory nor a physical model of the user's motor system, and we achieve this using model-free reinforcement learning. We chose dart-throwing as an example motor-learning task as it is one of the simplest throwing tasks, and its performance can easily be and quantitatively measured. Training experiments with novices, aiming at maximizing the score with the darts and minimizing the physical robotic assistance, demonstrate the feasibility and plausibility of the proposed framework. Copyright © 2014 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Zaman, Maliha
2010-01-01
Students may avoid working on difficult tasks because it takes them longer to complete those tasks, which results in a delay to reinforcement. Research studies show that reinforcer and response dimensions can be manipulated within a concurrent operants framework to bias choice allocation toward more difficult tasks. The current study extends…
Effects of Ventral Striatum Lesions on Stimulus-Based versus Action-Based Reinforcement Learning.
Rothenhoefer, Kathryn M; Costa, Vincent D; Bartolo, Ramón; Vicario-Feliciano, Raquel; Murray, Elisabeth A; Averbeck, Bruno B
2017-07-19
Learning the values of actions versus stimuli may depend on separable neural circuits. In the current study, we evaluated the performance of rhesus macaques with ventral striatum (VS) lesions on a two-arm bandit task that had randomly interleaved blocks of stimulus-based and action-based reinforcement learning (RL). Compared with controls, monkeys with VS lesions had deficits in learning to select rewarding images but not rewarding actions. We used a RL model to quantify learning and choice consistency and found that, in stimulus-based RL, the VS lesion monkeys were more influenced by negative feedback and had lower choice consistency than controls. Using a Bayesian model to parse the groups' learning strategies, we also found that VS lesion monkeys defaulted to an action-based choice strategy. Therefore, the VS is involved specifically in learning the value of stimuli, not actions. SIGNIFICANCE STATEMENT Reinforcement learning models of the ventral striatum (VS) often assume that it maintains an estimate of state value. This suggests that it plays a general role in learning whether rewards are assigned based on a chosen action or stimulus. In the present experiment, we examined the effects of VS lesions on monkeys' ability to learn that choosing a particular action or stimulus was more likely to lead to reward. We found that VS lesions caused a specific deficit in the monkeys' ability to discriminate between images with different values, whereas their ability to discriminate between actions with different values remained intact. Our results therefore suggest that the VS plays a specific role in learning to select rewarded stimuli. Copyright © 2017 the authors 0270-6474/17/376902-13$15.00/0.
Pavlovian to Instrumental Transfer of Control in a Human Learning Task
Nadler, Natasha; Delgado, Mauricio R.; Delamater, Andrew R.
2011-01-01
Pavlovian learning tasks have been widely used as tools to understand basic cognitive and emotional processes in humans. The present studies investigated one particular task, Pavlovian-to-instrumental transfer (PIT), with human participants in an effort to examine potential cognitive and emotional effects of Pavlovian cues upon instrumentally-trained performance. In two experiments subjects first learned two separate instrumental response-outcome relationships (R1-O1, R2-O2) and then were exposed to various stimulus-outcome relationships (S1-O1, S2-O2, S3-O3, S4-) before the effects of the Pavlovian stimuli on instrumental responding were assessed during a nonreinforced test. In Experiment 1 instrumental responding was established using a positive reinforcement procedure whereas in Experiment 2 a quasi-avoidance learning task was used. In both cases the Pavlovian stimuli exerted selective control over instrumental responding, whereby S1 & S2 selectively elevated the instrumental response with which it shared an outcome. In addition, in Experiment 2, S3 exerted a nonselective transfer of control effect, whereby both responses were elevated over baseline levels. These data identify two ways, one specific and one general, in which Pavlovian processes can exert control over instrumental responding in human learning paradigms, and suggest that this method may serve as a useful tool in the study of basic cognitive and emotional processes in human learning. PMID:21534664
Relational Learning in a Context of Transposition: A Review
ERIC Educational Resources Information Center
Lazareva, Olga F.
2012-01-01
In a typical transposition task, an animal is presented with a single pair of stimuli (for example, S3+S4-, where plus and minus denote reward and nonreward and digits denote stimulus location on a sensory dimension such as size). Subsequently, an animal is presented with a testing pair that contains a previously reinforced or nonreinforced…
Reversal Learning Task in Children with Autism Spectrum Disorder: A Robot-Based Approach
ERIC Educational Resources Information Center
Costescu, Cristina A.; Vanderborght, Bram; David, Daniel O.
2015-01-01
Children with autism spectrum disorder (ASD) engage in highly perseverative and inflexible behaviours. Technological tools, such as robots, received increased attention as social reinforces and/or assisting tools for improving the performance of children with ASD. The aim of our study is to investigate the role of the robotic toy Keepon in a…
DeVido, Jeffrey; Jones, Matthew; Geraci, Marilla; Hollon, Nick; Blair, R. J. R.; Pine, Daniel S.; Blair, Karina
2010-01-01
Background Generalized Social Phobia (GSP) involves the fear/avoidance of social situations while Generalized Anxiety Disorder (GAD) involves an intrusive worry about everyday life circumstances. It remains unclear whether these, highly comorbid, conditions represent distinct disorders or alternative presentations of a single underlying pathology. In this study, we examined stimulus-reinforcement based decision-making in GSP and GAD. Methods Twenty unmedicated patients with GSP, sixteen unmedicated patients with GAD and nineteen age, IQ, and gender matched healthy comparison individuals completed the Differential Reward/ Punishment Learning Task (DRPLT). In this task, the subject chooses between two objects associated with different levels of reward or punishment. Thus, response choice indexes not only reward/ punishment sensitivity but also sensitivity to reward/ punishment level according to between-object reinforcement distance. Results We found that patients with GAD committed a significantly greater number of errors compared to both the patients with GSP and the healthy comparison individuals. In contrast, the patients with GSP and the healthy comparison individuals did not differ in performance on this task. Conclusions These results link GAD with an anomalous non-affective based decision-making. Further, they are indicative that GSP and GAD are associated with distinct pathophysiologies. PMID:19102795
Collins, Anne G E; Albrecht, Matthew A; Waltz, James A; Gold, James M; Frank, Michael J
2017-09-15
When studying learning, researchers directly observe only the participants' choices, which are often assumed to arise from a unitary learning process. However, a number of separable systems, such as working memory (WM) and reinforcement learning (RL), contribute simultaneously to human learning. Identifying each system's contributions is essential for mapping the neural substrates contributing in parallel to behavior; computational modeling can help to design tasks that allow such a separable identification of processes and infer their contributions in individuals. We present a new experimental protocol that separately identifies the contributions of RL and WM to learning, is sensitive to parametric variations in both, and allows us to investigate whether the processes interact. In experiments 1 and 2, we tested this protocol with healthy young adults (n = 29 and n = 52, respectively). In experiment 3, we used it to investigate learning deficits in medicated individuals with schizophrenia (n = 49 patients, n = 32 control subjects). Experiments 1 and 2 established WM and RL contributions to learning, as evidenced by parametric modulations of choice by load and delay and reward history, respectively. They also showed interactions between WM and RL, where RL was enhanced under high WM load. Moreover, we observed a cost of mental effort when controlling for reinforcement history: participants preferred stimuli they encountered under low WM load. Experiment 3 revealed selective deficits in WM contributions and preserved RL value learning in individuals with schizophrenia compared with control subjects. Computational approaches allow us to disentangle contributions of multiple systems to learning and, consequently, to further our understanding of psychiatric diseases. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Collins, Anne G. E.; Frank, Michael J.
2012-01-01
Instrumental learning involves corticostriatal circuitry and the dopaminergic system. This system is typically modeled in the reinforcement learning (RL) framework by incrementally accumulating reward values of states and actions. However, human learning also implicates prefrontal cortical mechanisms involved in higher level cognitive functions. The interaction of these systems remains poorly understood, and models of human behavior often ignore working memory (WM) and therefore incorrectly assign behavioral variance to the RL system. Here we designed a task that highlights the profound entanglement of these two processes, even in simple learning problems. By systematically varying the size of the learning problem and delay between stimulus repetitions, we separately extracted WM-specific effects of load and delay on learning. We propose a new computational model that accounts for the dynamic integration of RL and WM processes observed in subjects' behavior. Incorporating capacity-limited WM into the model allowed us to capture behavioral variance that could not be captured in a pure RL framework even if we (implausibly) allowed separate RL systems for each set size. The WM component also allowed for a more reasonable estimation of a single RL process. Finally, we report effects of two genetic polymorphisms having relative specificity for prefrontal and basal ganglia functions. Whereas the COMT gene coding for catechol-O-methyl transferase selectively influenced model estimates of WM capacity, the GPR6 gene coding for G-protein-coupled receptor 6 influenced the RL learning rate. Thus, this study allowed us to specify distinct influences of the high-level and low-level cognitive functions on instrumental learning, beyond the possibilities offered by simple RL models. PMID:22487033
Reversal Learning Task in Children with Autism Spectrum Disorder: A Robot-Based Approach.
Costescu, Cristina A; Vanderborght, Bram; David, Daniel O
2015-11-01
Children with autism spectrum disorder (ASD) engage in highly perseverative and inflexible behaviours. Technological tools, such as robots, received increased attention as social reinforces and/or assisting tools for improving the performance of children with ASD. The aim of our study is to investigate the role of the robotic toy Keepon in a cognitive flexibility task performed by children with ASD and typically developing (TD) children. The number of participants included in this study is 81 children: 40 TD children and 41 children with ASD. Each participant had to go through two conditions: robot interaction and human interaction in which they had performed the reversal learning task. Our primary outcomes are the number of errors from acquisition phase and from reversal phase of the task; as secondary outcomes we have measured attentional engagement and positive affect. The results of this study showed that children with ASD are more engaged in the task and they seem to enjoy more the task when interacting with the robot compared with the interaction with the adult. On the other hand their cognitive flexibility performance is, in general, similar in the robot and the human conditions with the exception of the learning phase where the robot can interfere with the performance. Implication for future research and practice are discussed.
Reward-based training of recurrent neural networks for cognitive and value-based tasks
Song, H Francis; Yang, Guangyu R; Wang, Xiao-Jing
2017-01-01
Trained neural network models, which exhibit features of neural activity recorded from behaving animals, may provide insights into the circuit mechanisms of cognitive functions through systematic analysis of network activity and connectivity. However, in contrast to the graded error signals commonly used to train networks through supervised learning, animals learn from reward feedback on definite actions through reinforcement learning. Reward maximization is particularly relevant when optimal behavior depends on an animal’s internal judgment of confidence or subjective preferences. Here, we implement reward-based training of recurrent neural networks in which a value network guides learning by using the activity of the decision network to predict future reward. We show that such models capture behavioral and electrophysiological findings from well-known experimental paradigms. Our work provides a unified framework for investigating diverse cognitive and value-based computations, and predicts a role for value representation that is essential for learning, but not executing, a task. DOI: http://dx.doi.org/10.7554/eLife.21492.001 PMID:28084991
Reinforcing and timing properties of water in the schedule-induced drinking situation.
Ruiz, Jorge A; López-Tolsa, Gabriela E; Pellón, Ricardo
2016-06-01
A series of recent studies from our laboratory have added to the preceding literature on the potential role of water (in addition to food) as a positive reinforcer in the schedule-induced drinking situation, thus suggesting that adjunctive behaviors might have motivational properties that make their engagement a preferable alternative. It has also been suggested that adjunctive behaviors serve as a behavioral clock that helps organisms to estimate time, making their engagement motivational, so that they enable more accurate time adjustment under temporal schedules. Here, we review some of these experiments on conditioned reinforcement and concurrent chains, as well as on temporal learning. Data presented in this article suggest that adjunctive behaviors may be a part of the behavior patterns maintained by reinforcement, thus serving towards a better performance in temporal tasks. Copyright © 2016 Elsevier B.V. All rights reserved.
LaCrosse, Amber L.; Burrows, Brian T.; Angulo, Rachel M.; Conrad, Phoebe R.; Himes, Sarah M.; Mathews, Nordia; Wegner, Scott A.; Taylor, Sara B.; Olive, M. Foster
2014-01-01
Rationale Positive allosteric modulators (PAMs) of type 5 metabotropic glutamate receptors (mGluR5) exert pro-cognitive effects in animal models of various neuropsychiatric diseases. However, few studies to date have examined ability of mGluR5 PAMs to reverse cognitive deficits in operant delayed matching/non-matching-to-sample (DMS/DNMS) tasks. Objectives To determine the ability of the mGluR5 PAM 3-cyano-N-1,3-diphenyl-1H-pyrazol-5-yl)benzamide (CDPPB) to reverse set-shifting deficits induced by the NMDA receptor antagonist MK-801. Methods Male Sprague-Dawley rats were initially trained to lever press for sucrose reinforcement under either DMS or DNMS conditions. Following successful acquisition of the task, reinforcement conditions were reversed (DNMS→DMS or DMS→DNMS). In Experiment 1, rats were treated daily prior to each session with either vehicle/vehicle, vehicle/MK-801 (0.06 mg/kg) simultaneously, CDPPB (20 mg/kg)/MK-801 simultaneously, or CDPPB 30 min prior to MK-801. In Experiment 2, rats were treated with either vehicle/vehicle, vehicle/MK-801, or CDPPB 30 min prior to MK-801 only prior to sessions that followed task reversal. Results In Experiment 1, no group differences in initial task acquisition were observed. Rats treated with vehicle+MK−801 showed significant set-shifting impairments following task reversal, which were partially attenuated by simultaneous administration of CDPPB/MK-801, and completely precluded by administration of CDPPB 30 min prior to MK-801. In Experiment 2, MK-801 did not impair reversal learning and no other group differences were observed. Conclusions MK-801 induced deficits in operant set-shifting ability were prevented by pretreatment with CDPPB. MK-801 did not produce deficits in initial task learning or when treatment was initiated following task reversal. PMID:24973895
Moran, Erin K.; Culbreth, Adam J.; Barch, Deanna M.
2017-01-01
Negative symptoms are a core clinical feature of schizophrenia, but conceptual and methodological problems with current instruments can make their assessment challenging. One hypothesis is that current symptom assessments may be influenced by impairments in memory and may not be fully reflective of actual functioning outside of the laboratory. The present study sought to investigate the validity of assessing negative symptoms using ecological momentary assessment (EMA). Participants with schizophrenia (N=31) completed electronic questionnaires on smartphones four times a day for one week. Participants also completed Effort-Based Decision Making and Reinforcement Learning (RL) tasks to assess the relationship between EMA and laboratory measures, which tap into negative symptom relevant domains. Hierarchical linear modeling analyses revealed that clinician-rated and self-report measures of negative symptoms were significantly related to negative symptoms assessed via EMA. However, working memory moderated the relationship between EMA and retrospective measures of negative symptoms, such that there was a stronger relationship between EMA and retrospective negative symptom measures among individuals with better working memory. We also found that negative symptoms assessed via EMA were related to poor performance on the Effort task, while clinician-rated symptoms and self-reports were not. Further, we found that negative symptoms were related to poorer performance on learning reward contingencies. Our findings suggest that negative symptoms can be assessed through EMA and that working memory impairments frequently seen in schizophrenia may affect recall of symptoms. Moreover, these findings suggest the importance of examining the relationship between laboratory tasks and symptoms assessed during daily life. PMID:27893230
ERIC Educational Resources Information Center
Beaver, Brittany N.; Reeve, Sharon A.; Reeve, Kenneth F.; DeBar, Ruth M.
2017-01-01
The current study assessed whether four 15- to 17-year-old individuals diagnosed with autism would remain on-task for more intervals and complete tasks independently as a function of using self-reinforcement or teacher-delivered reinforcement. An adapted alternating-treatments design with teacher-delivered reinforcement, self-reinforcement, and a…
Neural mechanisms of reinforcement learning in unmedicated patients with major depressive disorder.
Rothkirch, Marcus; Tonn, Jonas; Köhler, Stephan; Sterzer, Philipp
2017-04-01
According to current concepts, major depressive disorder is strongly related to dysfunctional neural processing of motivational information, entailing impairments in reinforcement learning. While computational modelling can reveal the precise nature of neural learning signals, it has not been used to study learning-related neural dysfunctions in unmedicated patients with major depressive disorder so far. We thus aimed at comparing the neural coding of reward and punishment prediction errors, representing indicators of neural learning-related processes, between unmedicated patients with major depressive disorder and healthy participants. To this end, a group of unmedicated patients with major depressive disorder (n = 28) and a group of age- and sex-matched healthy control participants (n = 30) completed an instrumental learning task involving monetary gains and losses during functional magnetic resonance imaging. The two groups did not differ in their learning performance. Patients and control participants showed the same level of prediction error-related activity in the ventral striatum and the anterior insula. In contrast, neural coding of reward prediction errors in the medial orbitofrontal cortex was reduced in patients. Moreover, neural reward prediction error signals in the medial orbitofrontal cortex and ventral striatum showed negative correlations with anhedonia severity. Using a standard instrumental learning paradigm we found no evidence for an overall impairment of reinforcement learning in medication-free patients with major depressive disorder. Importantly, however, the attenuated neural coding of reward in the medial orbitofrontal cortex and the relation between anhedonia and reduced reward prediction error-signalling in the medial orbitofrontal cortex and ventral striatum likely reflect an impairment in experiencing pleasure from rewarding events as a key mechanism of anhedonia in major depressive disorder. © The Author (2017). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex.
Chadderdon, George L; Neymotin, Samuel A; Kerr, Cliff C; Lytton, William W
2012-01-01
Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint "forearm" to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1), no learning (0), or punishment (-1), corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior.
Cheng, Zhenbo; Deng, Zhidong; Hu, Xiaolin; Zhang, Bo; Yang, Tianming
2015-12-01
The brain often has to make decisions based on information stored in working memory, but the neural circuitry underlying working memory is not fully understood. Many theoretical efforts have been focused on modeling the persistent delay period activity in the prefrontal areas that is believed to represent working memory. Recent experiments reveal that the delay period activity in the prefrontal cortex is neither static nor homogeneous as previously assumed. Models based on reservoir networks have been proposed to model such a dynamical activity pattern. The connections between neurons within a reservoir are random and do not require explicit tuning. Information storage does not depend on the stable states of the network. However, it is not clear how the encoded information can be retrieved for decision making with a biologically realistic algorithm. We therefore built a reservoir-based neural network to model the neuronal responses of the prefrontal cortex in a somatosensory delayed discrimination task. We first illustrate that the neurons in the reservoir exhibit a heterogeneous and dynamical delay period activity observed in previous experiments. Then we show that a cluster population circuit decodes the information from the reservoir with a winner-take-all mechanism and contributes to the decision making. Finally, we show that the model achieves a good performance rapidly by shaping only the readout with reinforcement learning. Our model reproduces important features of previous behavior and neurophysiology data. We illustrate for the first time how task-specific information stored in a reservoir network can be retrieved with a biologically plausible reinforcement learning training scheme. Copyright © 2015 the American Physiological Society.
Differential effects of reinforcement on the self-monitoring of on-task behavior.
Otero, Tiffany L; Haut, Jillian M
2016-03-01
In the current study, the differential effects of reinforcement on a self-monitoring intervention were evaluated. Three students nominated by their teachers for having a marked difficultly maintaining on-task behaviors participated in the study. Using an alternating treatments single-case design to assess self-monitoring with and without reinforcement, students self-monitored their on-task behavior while being prompted by a vibrating timer at 1-min intervals for 20-min sessions. The investigators collected data regarding the students' percentage of intervals on-task and the accuracy of their recordings. Accuracy was measured by calculating the percent of agreement between the observer and student. For half of the self-monitoring sessions, students were provided reinforcement for matching at least 80% of their self-monitored ratings with those of the observer. Results indicated that self-monitoring alone was effective for 2 students in increasing their on-task behaviors in a general education classroom and self-monitoring with reinforcement was effective for all 3. Two students demonstrated an increase in on-task behavior when self-monitoring was paired with the opportunity to receive reinforcement compared to self-monitoring alone. Percentage of nonoverlapping data for self-monitoring without reinforcement ranged from 16.6% to 100%, and self-monitoring with reinforcement ranged from 83% to 100%. Additionally, the opportunity to receive reinforcement impacted students' accuracy in self-monitoring resulting in more accurate self-recording of on-task behavior. Including reinforcement as a component of a self-monitoring intervention package is an important consideration as it may impact the effectiveness of the intervention for students with significant difficulties maintaining attention to tasks. (c) 2016 APA, all rights reserved).
Sethi, Arjun; Voon, Valerie; Critchley, Hugo D; Cercignani, Mara; Harrison, Neil A
2018-05-01
Computational models of reinforcement learning have helped dissect discrete components of reward-related function and characterize neurocognitive deficits in psychiatric illnesses. Stimulus novelty biases decision-making, even when unrelated to choice outcome, acting as if possessing intrinsic reward value to guide decisions toward uncertain options. Heightened novelty seeking is characteristic of attention deficit hyperactivity disorder, yet how this influences reward-related decision-making is computationally encoded, or is altered by stimulant medication, is currently uncertain. Here we used an established reinforcement-learning task to model effects of novelty on reward-related behaviour during functional MRI in 30 adults with attention deficit hyperactivity disorder and 30 age-, sex- and IQ-matched control subjects. Each participant was tested on two separate occasions, once ON and once OFF stimulant medication. OFF medication, patients with attention deficit hyperactivity disorder showed significantly impaired task performance (P = 0.027), and greater selection of novel options (P = 0.004). Moreover, persistence in selecting novel options predicted impaired task performance (P = 0.025). These behavioural deficits were accompanied by a significantly lower learning rate (P = 0.011) and heightened novelty signalling within the substantia nigra/ventral tegmental area (family-wise error corrected P < 0.05). Compared to effects in controls, stimulant medication improved attention deficit hyperactivity disorder participants' overall task performance (P = 0.011), increased reward-learning rates (P = 0.046) and enhanced their ability to differentiate optimal from non-optimal novel choices (P = 0.032). It also reduced substantia nigra/ventral tegmental area responses to novelty. Preliminary cross-sectional evidence additionally suggested an association between long-term stimulant treatment and a reduction in the rewarding value of novelty. These data suggest that aberrant substantia nigra/ventral tegmental area novelty processing plays an important role in the suboptimal reward-related decision-making characteristic of attention deficit hyperactivity disorder. Compared to effects in controls, abnormalities in novelty processing and reward-related learning were improved by stimulant medication, suggesting that they may be disorder-specific targets for the pharmacological management of attention deficit hyperactivity disorder symptoms.
Brown, Elliot C; Hack, Samantha M; Gold, James M; Carpenter, William T; Fischer, Bernard A; Prentice, Kristen P; Waltz, James A
2015-01-01
The Iowa Gambling Task (IGT; Bechara et al., 1994) has frequently been used to assess risky decision making in clinical populations, including patients with schizophrenia (SZ). Poor performance on the IGT is often attributed to reduced sensitivity to punishment, which contrasts with recent findings from reinforcement learning studies in schizophrenia. In order to investigate possible sources of IGT performance deficits in SZ patients, we combined data from the IGT from 59 SZ patients and 43 demographically-matched controls with data from the Balloon Analog Risk Task (BART) in the same participants. Our analyses sought to specifically uncover the role of punishment sensitivity and delineate the capacity to integrate frequency and magnitude information in decision-making under risk. Although SZ patients, on average, made more choices from disadvantageous decks than controls did on the IGT, they avoided decks with frequent punishments at a rate similar to controls. Patients also exhibited excessive loss-avoidance behavior on the BART. We argue that, rather than stemming from reduced sensitivity to negative consequences, performance deficits on the IGT in SZ patients are more likely the result of a reinforcement learning deficit, specifically involving the integration of frequencies and magnitudes of rewards and punishments in the trial-by-trial estimation of expected value. Copyright © 2015 Elsevier Ltd. All rights reserved.
Jonkman, Sietse; Everitt, Barry J.
2009-01-01
The integrity of the rodent anterior cingulate cortex (ACC) is essential for various aspects of instrumental behavior, but it is not clear if the ACC is important for the acquisition of a simple instrumental response. Here, it was demonstrated that post-session infusions of anisomycin into the rat ACC completely prevented the acquisition of instrumental responding. The experimental use of post-session intracranial infusions of plasticity inhibitors is assumed to affect local consolidation of plasticity, but not behavioral task performance. However, in associative appetitive conditioning, post-session intracranial infusion of pharmaco-active compounds could actually interfere with subsequent task performance indirectly through retrospective effects on the valuation of ingested rewards. Thus, it was subsequently demonstrated that the intracranial infusion of anisomycin into the ACC after sucrose pellet consumption significantly reduced subsequent pellet consumption, suggesting that the infusion of anisomycin into the ACC produced conditioned taste avoidance. In the third experiment, an innovative procedure was introduced that dissociated the effects of intracranial infusions after conditioning sessions on task-learning and unconditioned stimulus valuation. With this procedure, the infusion of anisomycin into the ACC after instrumental sessions did not affect instrumental reinforcer valuation or the acquisition of instrumental responding, suggesting that plasticity in the ACC is not necessary for the acquisition of instrumental behavior. PMID:19864297
Brown, Elliot C.; Hack, Samantha M.; Gold, James M.; Carpenter, William T.; Fischer, Bernard A.; Prentice, Kristen P.; Waltz, James A.
2015-01-01
Background The Iowa Gambling Task (IGT; Bechara, Damasio, Damasio, & Anderson, 1994) has frequently been used to assess risky decision making in clinical populations, including patients with schizophrenia (SZ). Poor performance on the IGT is often attributed to reduced sensitivity to punishment, which contrasts with recent findings from reinforcement learning studies in schizophrenia. Methods In order to investigate possible sources of IGT performance deficits in SZ patients, we combined data from the IGT from 59 SZ patients and 43 demographically-matched controls with data from the Balloon Analog Risk Task (BART) in the same participants. Our analyses sought to specifically uncover the role of punishment sensitivity and delineate the capacity to integrate frequency and magnitude information in decision-making under risk. Results Although SZ patients, on average, made more choices from disadvantageous decks than controls did on the IGT, they avoided decks with frequent punishments at a rate similar to controls. Patients also exhibited excessive loss-avoidance behavior on the BART. Conclusions We argue that, rather than stemming from reduced sensitivity to negative consequences, performance deficits on the IGT in SZ patients are more likely the result of a reinforcement learning deficit, specifically involving the integration of frequencies and magnitudes of rewards and punishments in the trial-by-trial estimation of expected value. PMID:25959618
Higher incentives can impair performance: neural evidence on reinforcement and rationality
Achtziger, Anja; Hügelschäfer, Sabine; Steinhauser, Marco
2015-01-01
Standard economic thinking postulates that increased monetary incentives should increase performance. Human decision makers, however, frequently focus on past performance, a form of reinforcement learning occasionally at odds with rational decision making. We used an incentivized belief-updating task from economics to investigate this conflict through measurements of neural correlates of reward processing. We found that higher incentives fail to improve performance when immediate feedback on decision outcomes is provided. Subsequent analysis of the feedback-related negativity, an early event-related potential following feedback, revealed the mechanism behind this paradoxical effect. As incentives increase, the win/lose feedback becomes more prominent, leading to an increased reliance on reinforcement and more errors. This mechanism is relevant for economic decision making and the debate on performance-based payment. PMID:25816816
Elfwing, Stefan; Uchibe, Eiji; Doya, Kenji
2016-12-01
Free-energy based reinforcement learning (FERL) was proposed for learning in high-dimensional state and action spaces. However, the FERL method does only really work well with binary, or close to binary, state input, where the number of active states is fewer than the number of non-active states. In the FERL method, the value function is approximated by the negative free energy of a restricted Boltzmann machine (RBM). In our earlier study, we demonstrated that the performance and the robustness of the FERL method can be improved by scaling the free energy by a constant that is related to the size of network. In this study, we propose that RBM function approximation can be further improved by approximating the value function by the negative expected energy (EERL), instead of the negative free energy, as well as being able to handle continuous state input. We validate our proposed method by demonstrating that EERL: (1) outperforms FERL, as well as standard neural network and linear function approximation, for three versions of a gridworld task with high-dimensional image state input; (2) achieves new state-of-the-art results in stochastic SZ-Tetris in both model-free and model-based learning settings; and (3) significantly outperforms FERL and standard neural network function approximation for a robot navigation task with raw and noisy RGB images as state input and a large number of actions. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Konovalov, Arkady; Krajbich, Ian
2016-01-01
Organisms appear to learn and make decisions using different strategies known as model-free and model-based learning; the former is mere reinforcement of previously rewarded actions and the latter is a forward-looking strategy that involves evaluation of action-state transition probabilities. Prior work has used neural data to argue that both model-based and model-free learners implement a value comparison process at trial onset, but model-based learners assign more weight to forward-looking computations. Here using eye-tracking, we report evidence for a different interpretation of prior results: model-based subjects make their choices prior to trial onset. In contrast, model-free subjects tend to ignore model-based aspects of the task and instead seem to treat the decision problem as a simple comparison process between two differentially valued items, consistent with previous work on sequential-sampling models of decision making. These findings illustrate a problem with assuming that experimental subjects make their decisions at the same prescribed time. PMID:27511383
Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning.
Morimura, Tetsuro; Uchibe, Eiji; Yoshimoto, Junichiro; Peters, Jan; Doya, Kenji
2010-02-01
Most conventional policy gradient reinforcement learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the policy parameter. That term involves the derivative of the stationary state distribution that corresponds to the sensitivity of its distribution to changes in the policy parameter. Although the bias introduced by this omission can be reduced by setting the forgetting rate gamma for the value functions close to 1, these algorithms do not permit gamma to be set exactly at gamma = 1. In this article, we propose a method for estimating the log stationary state distribution derivative (LSD) as a useful form of the derivative of the stationary state distribution through backward Markov chain formulation and a temporal difference learning framework. A new policy gradient (PG) framework with an LSD is also proposed, in which the average reward gradient can be estimated by setting gamma = 0, so it becomes unnecessary to learn the value functions. We also test the performance of the proposed algorithms using simple benchmark tasks and show that these can improve the performances of existing PG methods.
Bai, Yu; Katahira, Kentaro; Ohira, Hideki
2014-01-01
Humans are capable of correcting their actions based on actions performed in the past, and this ability enables them to adapt to a changing environment. The computational field of reinforcement learning (RL) has provided a powerful explanation for understanding such processes. Recently, the dual learning system, modeled as a hybrid model that incorporates value update based on reward-prediction error and learning rate modulation based on the surprise signal, has gained attention as a model for explaining various neural signals. However, the functional significance of the hybrid model has not been established. In the present study, we used computer simulation in a reversal learning task to address functional significance in a probabilistic reversal learning task. The hybrid model was found to perform better than the standard RL model in a large parameter setting. These results suggest that the hybrid model is more robust against the mistuning of parameters compared with the standard RL model when decision-makers continue to learn stimulus-reward contingencies, which can create abrupt changes. The parameter fitting results also indicated that the hybrid model fit better than the standard RL model for more than 50% of the participants, which suggests that the hybrid model has more explanatory power for the behavioral data than the standard RL model. PMID:25161635
ERIC Educational Resources Information Center
Reichle, Erik D.; Laurent, Patryk A.
2006-01-01
The eye movements of skilled readers are typically very regular (K. Rayner, 1998). This regularity may arise as a result of the perceptual, cognitive, and motor limitations of the reader (e.g., limited visual acuity) and the inherent constraints of the task (e.g., identifying the words in their correct order). To examine this hypothesis,…
Chin, Brian; Nelson, Brady D; Jackson, Felicia; Hajcak, Greg
2016-01-01
Fear conditioning research on threat predictability has primarily examined the impact of temporal (i.e., timing) predictability on the startle reflex. However, there are other key features of threat that can vary in predictability. For example, the reinforcement rate (i.e., frequency) of threat is a crucial factor underlying fear learning. The present study examined the impact of threat reinforcement rate on the startle reflex and self-reported anxiety during a fear conditioning paradigm. Forty-five participants completed a fear learning task in which the conditioned stimulus was reinforced with an electric shock to the forearm on 50% of trials in one block and 75% of trials in a second block, in counter-balanced order. The present study also examined whether intolerance of uncertainty (IU), the tendency to perceive or experience uncertainty as stressful or unpleasant, was associated with the startle reflex during conditions of low (50%) vs. high (75%) reinforcement. Results indicated that, across all participants, startle was greater during the 75% relative to the 50% reinforcement condition. IU was positively correlated with startle potentiation (i.e., increased startle response to the CS+ relative to the CS-) during the 50%, but not the 75%, reinforcement condition. Thus, despite receiving fewer electric shocks during the 50% reinforcement condition, individuals with high IU uniquely demonstrated greater defense system activation when impending threat was more uncertain. The association between IU and startle was independent of state anxiety. The present study adds to a growing literature on threat predictability and aversive responding, and suggests IU is associated with abnormal responding in the context of uncertain threat. Copyright © 2015 Elsevier B.V. All rights reserved.
Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning.
Fernandez-Gauna, Borja; Etxeberria-Agiriano, Ismael; Graña, Manuel
2015-01-01
Multi-Agent Reinforcement Learning (MARL) algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL) algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV) in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS) in the relevant state-action subspace. Each agent's local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS) control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.
Pechtel, Pia; Pizzagalli, Diego A
2013-05-01
Childhood sexual abuse (CSA) has been associated with psychopathology, particularly major depressive disorder (MDD), and high-risk behaviors. Despite the epidemiological data available, the mechanisms underlying these maladaptive outcomes remain poorly understood. We examined whether a history of CSA, particularly in conjunction with a past episode of MDD, is associated with behavioral and neural dysfunction in reinforcement learning, and whether such dysfunction is linked to maladaptive behavior. Participants completed a clinical evaluation and a probabilistic reinforcement task while 128-channel event-related potentials were recorded. Academic setting; participants recruited from the community. Fifteen women with a history of CSA and remitted MDD (CSA + rMDD), 16 women with remitted MDD with no history of CSA (rMDD), and 18 healthy women (controls). Three or more episodes of coerced sexual contact (mean [SD] duration, 3.00 [2.20] years) between the ages of 7 and 12 years by at least 1 male perpetrator. Participants' preference for choosing the most rewarded stimulus and avoiding the most punished stimulus was evaluated. The feedback-related negativity and error-related negativity-hypothesized to reflect activation in the anterior cingulate cortex-were used as electrophysiological indices of reinforcement learning. No group differences emerged in the acquisition of reinforcement contingencies. In trials requiring participants to rely partially or exclusively on previously rewarded information, the CSA + rMDD group showed (1) lower accuracy (relative to both controls and the rMDD group), (2) blunted electrophysiological differentiation between correct and incorrect responses (relative to controls), and (3) increased activation in the subgenual anterior cingulate cortex (relative to the rMDD group). A history of CSA was not associated with impairments in avoiding the most punished stimulus. Self-harm and suicidal behaviors correlated with poorer performance of previously rewarded, but not previously punished, trials. Irrespective of past MDD episodes, women with a history of CSA showed neural and behavioral deficits in utilizing previous reinforcement to optimize decision making in the absence of feedback (blunted "Go learning"). Although our study provides initial evidence for reward-specific deficits associated with CSA, future research is warranted to determine if disrupted positive reinforcement learning predicts high-risk behavior following CSA.
Nature vs Nurture: Effects of Learning on Evolution
NASA Astrophysics Data System (ADS)
Nagrani, Nagina
In the field of Evolutionary Robotics, the design, development and application of artificial neural networks as controllers have derived their inspiration from biology. Biologists and artificial intelligence researchers are trying to understand the effects of neural network learning during the lifetime of the individuals on evolution of these individuals by qualitative and quantitative analyses. The conclusion of these analyses can help develop optimized artificial neural networks to perform any given task. The purpose of this thesis is to study the effects of learning on evolution. This has been done by applying Temporal Difference Reinforcement Learning methods to the evolution of Artificial Neural Tissue controller. The controller has been assigned the task to collect resources in a designated area in a simulated environment. The performance of the individuals is measured by the amount of resources collected. A comparison has been made between the results obtained by incorporating learning in evolution and evolution alone. The effects of learning parameters: learning rate, training period, discount rate, and policy on evolution have also been studied. It was observed that learning delays the performance of the evolving individuals over the generations. However, the non zero learning rate throughout the evolution process signifies natural selection preferring individuals possessing plasticity.
NASA Technical Reports Server (NTRS)
Sitterley, T. E.; Zaitzeff, L. P.; Berge, W. A.
1972-01-01
Flight control and procedural task skill degradation, and the effectiveness of retraining methods were evaluated for a simulated space vehicle approach and landing under instrument and visual flight conditions. Fifteen experienced pilots were trained and then tested after 4 months either without the benefits of practice or with static rehearsal, dynamic rehearsal or with dynamic warmup practice. Performance on both the flight control and procedure tasks degraded significantly after 4 months. The rehearsal methods effectively countered procedure task skill degradation, while dynamic rehearsal or a combination of static rehearsal and dynamic warmup practice was required for the flight control tasks. The quality of the retraining methods appeared to be primarily dependent on the efficiency of visual cue reinforcement.
Learning in Noise: Dynamic Decision-Making in a Variable Environment
Gureckis, Todd M.; Love, Bradley C.
2009-01-01
In engineering systems, noise is a curse, obscuring important signals and increasing the uncertainty associated with measurement. However, the negative effects of noise and uncertainty are not universal. In this paper, we examine how people learn sequential control strategies given different sources and amounts of feedback variability. In particular, we consider people’s behavior in a task where short- and long-term rewards are placed in conflict (i.e., the best option in the short-term is worst in the long-term). Consistent with a model based on reinforcement learning principles (Gureckis & Love, in press), we find that learners differentially weight information predictive of the current task state. In particular, when cues that signal state are noisy and uncertain, we find that participants’ ability to identify an optimal strategy is strongly impaired relative to equivalent amounts of uncertainty that obscure the rewards/valuations of those states. In other situations, we find that noise and uncertainty in reward signals may paradoxically improve performance by encouraging exploration. Our results demonstrate how experimentally-manipulated task variability can be used to test predictions about the mechanisms that learners engage in dynamic decision making tasks. PMID:20161328
Risk-sensitive reinforcement learning.
Shen, Yun; Tobia, Michael J; Sommer, Tobias; Obermayer, Klaus
2014-07-01
We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents' behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, 1979 ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used.
Carl Aberg, Kristoffer; Doell, Kimberly C.; Schwartz, Sophie
2016-01-01
Learning how to gain rewards (approach learning) and avoid punishments (avoidance learning) is fundamental for everyday life. While individual differences in approach and avoidance learning styles have been related to genetics and aging, the contribution of personality factors, such as traits, remains undetermined. Moreover, little is known about the computational mechanisms mediating differences in learning styles. Here, we used a probabilistic selection task with positive and negative feedbacks, in combination with computational modelling, to show that individuals displaying better approach (vs. avoidance) learning scored higher on measures of approach (vs. avoidance) trait motivation, but, paradoxically, also displayed reduced learning speed following positive (vs. negative) outcomes. These data suggest that learning different types of information depend on associated reward values and internal motivational drives, possibly determined by personality traits. PMID:27851807
Molina, Michael; Plaza, Victoria; Fuentes, Luis J.; Estévez, Angeles F.
2015-01-01
Memory for medical recommendations is a prerequisite for good adherence to treatment, and therefore to ameliorate the negative effects of the disease, a problem that mainly affects people with memory deficits. We conducted a simulated study to test the utility of a procedure (the differential outcomes procedure, DOP) that may improve adherence to treatment by increasing the patient’s learning and retention of medical recommendations regarding medication. The DOP requires the structure of a conditional discriminative learning task in which correct choice responses to specific stimulus–stimulus associations are reinforced with a particular reinforcer or outcome. In two experiments, participants had to learn and retain in their memory the pills that were associated with particular disorders. To assess whether the DOP improved long-term retention of the learned disorder/pill associations, participants were asked to perform two recognition memory tests, 1 h and 1 week after completing the learning phase. The results showed that compared with the standard non-differential outcomes procedure, the DOP produced better learning and long-term retention of the previously learned associations. These findings suggest that the DOP can be used as a useful complementary technique in intervention programs targeted at increasing adherence to clinical recommendations. PMID:26913010
Quantum machine learning with glow for episodic tasks and decision games
NASA Astrophysics Data System (ADS)
Clausen, Jens; Briegel, Hans J.
2018-02-01
We consider a general class of models, where a reinforcement learning (RL) agent learns from cyclic interactions with an external environment via classical signals. Perceptual inputs are encoded as quantum states, which are subsequently transformed by a quantum channel representing the agent's memory, while the outcomes of measurements performed at the channel's output determine the agent's actions. The learning takes place via stepwise modifications of the channel properties. They are described by an update rule that is inspired by the projective simulation (PS) model and equipped with a glow mechanism that allows for a backpropagation of policy changes, analogous to the eligibility traces in RL and edge glow in PS. In this way, the model combines features of PS with the ability for generalization, offered by its physical embodiment as a quantum system. We apply the agent to various setups of an invasion game and a grid world, which serve as elementary model tasks allowing a direct comparison with a basic classical PS agent.
Generalization of value in reinforcement learning by humans
Wimmer, G. Elliott; Daw, Nathaniel D.; Shohamy, Daphna
2012-01-01
Research in decision making has focused on the role of dopamine and its striatal targets in guiding choices via learned stimulus-reward or stimulus-response associations, behavior that is well-described by reinforcement learning (RL) theories. However, basic RL is relatively limited in scope and does not explain how learning about stimulus regularities or relations may guide decision making. A candidate mechanism for this type of learning comes from the domain of memory, which has highlighted a role for the hippocampus in learning of stimulus-stimulus relations, typically dissociated from the role of the striatum in stimulus-response learning. Here, we used fMRI and computational model-based analyses to examine the joint contributions of these mechanisms to RL. Humans performed an RL task with added relational structure, modeled after tasks used to isolate hippocampal contributions to memory. On each trial participants chose one of four options, but the reward probabilities for pairs of options were correlated across trials. This (uninstructed) relationship between pairs of options potentially enabled an observer to learn about options’ values based on experience with the other options and to generalize across them. We observed BOLD activity related to learning in the striatum and also in the hippocampus. By comparing a basic RL model to one augmented to allow feedback to generalize between correlated options, we tested whether choice behavior and BOLD activity were influenced by the opportunity to generalize across correlated options. Although such generalization goes beyond standard computational accounts of RL and striatal BOLD, both choices and striatal BOLD were better explained by the augmented model. Consistent with the hypothesized role for the hippocampus in this generalization, functional connectivity between the ventral striatum and hippocampus was modulated, across participants, by the ability of the augmented model to capture participants’ choice. Our results thus point toward an interactive model in which striatal RL systems may employ relational representations typically associated with the hippocampus. PMID:22487039
Behavioral research in pigeons with ARENA: An automated remote environmental navigation apparatus
Leising, Kenneth J.; Garlick, Dennis; Parenteau, Michael; Blaisdell, Aaron P.
2009-01-01
Three experiments established the effectiveness of an Automated Remote Environmental Navigation Apparatus (ARENA) developed in our lab to study behavioral processes in pigeons. The technology utilizes one or more wireless modules, each capable of presenting colored lights as visual stimuli to signal reward and of detecting subject peck responses. In Experiment 1, subjects were instrumentally shaped to peck at a single ARENA module following an unsuccessful autoshaping procedure. In Experiment 2, pigeons were trained with a simultaneous discrimination procedure during which two modules were illuminated different colors; pecks to one color (S+) were reinforced while pecks to the other color (S−) were not. Pigeons learned to preferentially peck the module displaying the S+. In Experiment 3, two modules were lit the same color concurrently from a set of six colors in a conditional discrimination task. For three of the colors pecks to the module in one location (e.g., upper quadrant) were reinforced while for the remaining colors pecks at the other module (e.g., lower quadrant) were reinforced. After learning this discrimination, the color-reinforced location assignments were reversed. Pigeons successfully acquired the reversal. ARENA is an automated system for open-field studies and a more ecologically valid alternative to the touchscreen. PMID:19429204
Use of Frontal Lobe Hemodynamics as Reinforcement Signals to an Adaptive Controller
DiStasio, Marcello M.; Francis, Joseph T.
2013-01-01
Decision-making ability in the frontal lobe (among other brain structures) relies on the assignment of value to states of the animal and its environment. Then higher valued states can be pursued and lower (or negative) valued states avoided. The same principle forms the basis for computational reinforcement learning controllers, which have been fruitfully applied both as models of value estimation in the brain, and as artificial controllers in their own right. This work shows how state desirability signals decoded from frontal lobe hemodynamics, as measured with near-infrared spectroscopy (NIRS), can be applied as reinforcers to an adaptable artificial learning agent in order to guide its acquisition of skills. A set of experiments carried out on an alert macaque demonstrate that both oxy- and deoxyhemoglobin concentrations in the frontal lobe show differences in response to both primarily and secondarily desirable (versus undesirable) stimuli. This difference allows a NIRS signal classifier to serve successfully as a reinforcer for an adaptive controller performing a virtual tool-retrieval task. The agent's adaptability allows its performance to exceed the limits of the NIRS classifier decoding accuracy. We also show that decoding state desirabilities is more accurate when using relative concentrations of both oxyhemoglobin and deoxyhemoglobin, rather than either species alone. PMID:23894500
Long-term memory of color stimuli in the jungle crow (Corvus macrorhynchos).
Bogale, Bezawork Afework; Sugawara, Satoshi; Sakano, Katsuhisa; Tsuda, Sonoko; Sugita, Shoei
2012-03-01
Wild-caught jungle crows (n = 20) were trained to discriminate between color stimuli in a two-alternative discrimination task. Next, crows were tested for long-term memory after 1-, 2-, 3-, 6-, and 10-month retention intervals. This preliminary study showed that jungle crows learn the task and reach a discrimination criterion (80% or more correct choices in two consecutive sessions of ten trials) in a few trials, and some even in a single session. Most, if not all, crows successfully remembered the constantly reinforced visual stimulus during training after all retention intervals. These results suggest that jungle crows have a high retention capacity for learned information, at least after a 10-month retention interval and make no or very few errors. This study is the first to show long-term memory capacity of color stimuli in corvids following a brief training that memory rather than rehearsal was apparent. Memory of visual color information is vital for exploitation of biological resources in crows. We suspect that jungle crows could remember the learned color discrimination task even after a much longer retention interval.
Reinforcement Learning Performance and Risk for Psychosis in Youth.
Waltz, James A; Demro, Caroline; Schiffman, Jason; Thompson, Elizabeth; Kline, Emily; Reeves, Gloria; Xu, Ziye; Gold, James
2015-12-01
Early identification efforts for psychosis have thus far yielded many more individuals "at risk" than actually develop psychotic illness. Here, we test whether measures of reinforcement learning (RL), known to be impaired in chronic schizophrenia, are related to the severity of clinical risk symptoms. Because of the reliance of RL on dopamine-rich frontostriatal systems and evidence of dopamine system dysfunction in the psychosis prodrome, RL measures are of specific interest in this clinical population. The current study examines relationships between psychosis risk symptoms and RL task performance in a sample of adolescents and young adults (n = 70) receiving mental health services. We observed significant correlations between multiple measures of RL performance and measures of both positive and negative symptoms. These results suggest that RL measures may provide a psychosis risk signal in treatment-seeking youth. Further research is necessary to understand the potential predictive role of RL measures for conversion to psychosis.
Cost-Benefit Arbitration Between Multiple Reinforcement-Learning Systems.
Kool, Wouter; Gershman, Samuel J; Cushman, Fiery A
2017-09-01
Human behavior is sometimes determined by habit and other times by goal-directed planning. Modern reinforcement-learning theories formalize this distinction as a competition between a computationally cheap but inaccurate model-free system that gives rise to habits and a computationally expensive but accurate model-based system that implements planning. It is unclear, however, how people choose to allocate control between these systems. Here, we propose that arbitration occurs by comparing each system's task-specific costs and benefits. To investigate this proposal, we conducted two experiments showing that people increase model-based control when it achieves greater accuracy than model-free control, and especially when the rewards of accurate performance are amplified. In contrast, they are insensitive to reward amplification when model-based and model-free control yield equivalent accuracy. This suggests that humans adaptively balance habitual and planned action through on-line cost-benefit analysis.
Dynamical genetic programming in XCSF.
Preen, Richard J; Bull, Larry
2013-01-01
A number of representation schemes have been presented for use within learning classifier systems, ranging from binary encodings to artificial neural networks. This paper presents results from an investigation into using a temporally dynamic symbolic representation within the XCSF learning classifier system. In particular, dynamical arithmetic networks are used to represent the traditional condition-action production system rules to solve continuous-valued reinforcement learning problems and to perform symbolic regression, finding competitive performance with traditional genetic programming on a number of composite polynomial tasks. In addition, the network outputs are later repeatedly sampled at varying temporal intervals to perform multistep-ahead predictions of a financial time series.
Ide, Jaime S; Nedic, Sanja; Wong, Kin F; Strey, Shmuel L; Lawson, Elizabeth A; Dickerson, Bradford C; Wald, Lawrence L; La Camera, Giancarlo; Mujica-Parodi, Lilianne R
2018-07-01
Oxytocin (OT) is an endogenous neuropeptide that, while originally thought to promote trust, has more recently been found to be context-dependent. Here we extend experimental paradigms previously restricted to de novo decision-to-trust, to a more realistic environment in which social relationships evolve in response to iterative feedback over twenty interactions. In a randomized, double blind, placebo-controlled within-subject/crossover experiment of human adult males, we investigated the effects of a single dose of intranasal OT (40 IU) on Bayesian expectation updating and reinforcement learning within a social context, with associated brain circuit dynamics. Subjects participated in a neuroeconomic task (Iterative Trust Game) designed to probe iterative social learning while their brains were scanned using ultra-high field (7T) fMRI. We modeled each subject's behavior using Bayesian updating of belief-states ("willingness to trust") as well as canonical measures of reinforcement learning (learning rate, inverse temperature). Behavioral trajectories were then used as regressors within fMRI activation and connectivity analyses to identify corresponding brain network functionality affected by OT. Behaviorally, OT reduced feedback learning, without bias with respect to positive versus negative reward. Neurobiologically, reduced learning under OT was associated with muted communication between three key nodes within the reward circuit: the orbitofrontal cortex, amygdala, and lateral (limbic) habenula. Our data suggest that OT, rather than inspiring feelings of generosity, instead attenuates the brain's encoding of prediction error and therefore its ability to modulate pre-existing beliefs. This effect may underlie OT's putative role in promoting what has typically been reported as 'unjustified trust' in the face of information that suggests likely betrayal, while also resolving apparent contradictions with regard to OT's context-dependent behavioral effects. Copyright © 2018 Elsevier Inc. All rights reserved.
Learning and Memory Enhancement by Neuropeptides
1987-12-31
extxosure to other toxicants ,’e.g. other heavy metals, organic solvents), or arising from disease statee. We study leaming ii, an autoshaping task...Sparber, S.B. Selective leaming impairment of j delayed reinforcement autoshaped behavior caused by low doses of trimethyltin. S EPsychopharn-.acology...impairs acquis:tion of autoshaped behavior, whether given before or after training sessions. manuscript des -")ing this work is in preparation, and an
Learning New Basic Movements for Robotics
NASA Astrophysics Data System (ADS)
Kober, Jens; Peters, Jan
Obtaining novel skills is one of the most important problems in robotics. Machine learning techniques may be a promising approach for automatic and autonomous acquisition of movement policies. However, this requires both an appropriate policy representation and suitable learning algorithms. Employing the most recent form of the dynamical systems motor primitives originally introduced by Ijspeert et al. [1], we show how both discrete and rhythmic tasks can be learned using a concerted approach of both imitation and reinforcement learning, and present our current best performing learning algorithms. Finally, we show that it is possible to include a start-up phase in rhythmic primitives. We apply our approach to two elementary movements, i.e., Ball-in-a-Cup and Ball-Paddling, which can be learned on a real Barrett WAM robot arm at a pace similar to human learning.
Reduction of Pavlovian Bias in Schizophrenia: Enhanced Effects in Clozapine-Administered Patients
Albrecht, Matthew A.; Waltz, James A.; Cavanagh, James F.; Frank, Michael J.; Gold, James M.
2016-01-01
The negative symptoms of schizophrenia (SZ) are associated with a pattern of reinforcement learning (RL) deficits likely related to degraded representations of reward values. However, the RL tasks used to date have required active responses to both reward and punishing stimuli. Pavlovian biases have been shown to affect performance on these tasks through invigoration of action to reward and inhibition of action to punishment, and may be partially responsible for the effects found in patients. Forty-five patients with schizophrenia and 30 demographically-matched controls completed a four-stimulus reinforcement learning task that crossed action (“Go” or “NoGo”) and the valence of the optimal outcome (reward or punishment-avoidance), such that all combinations of action and outcome valence were tested. Behaviour was modelled using a six-parameter RL model and EEG was simultaneously recorded. Patients demonstrated a reduction in Pavlovian performance bias that was evident in a reduced Go bias across the full group. In a subset of patients administered clozapine, the reduction in Pavlovian bias was enhanced. The reduction in Pavlovian bias in SZ patients was accompanied by feedback processing differences at the time of the P3a component. The reduced Pavlovian bias in patients is suggested to be due to reduced fidelity in the communication between striatal regions and frontal cortex. It may also partially account for previous findings of poorer “Go-learning” in schizophrenia where “Go” responses or Pavlovian consistent responses are required for optimal performance. An attenuated P3a component dynamic in patients is consistent with a view that deficits in operant learning are due to impairments in adaptively using feedback to update representations of stimulus value. PMID:27044008
Human-robot skills transfer interfaces for a flexible surgical robot.
Calinon, Sylvain; Bruno, Danilo; Malekzadeh, Milad S; Nanayakkara, Thrishantha; Caldwell, Darwin G
2014-09-01
In minimally invasive surgery, tools go through narrow openings and manipulate soft organs to perform surgical tasks. There are limitations in current robot-assisted surgical systems due to the rigidity of robot tools. The aim of the STIFF-FLOP European project is to develop a soft robotic arm to perform surgical tasks. The flexibility of the robot allows the surgeon to move within organs to reach remote areas inside the body and perform challenging procedures in laparoscopy. This article addresses the problem of designing learning interfaces enabling the transfer of skills from human demonstration. Robot programming by demonstration encompasses a wide range of learning strategies, from simple mimicking of the demonstrator's actions to the higher level imitation of the underlying intent extracted from the demonstrations. By focusing on this last form, we study the problem of extracting an objective function explaining the demonstrations from an over-specified set of candidate reward functions, and using this information for self-refinement of the skill. In contrast to inverse reinforcement learning strategies that attempt to explain the observations with reward functions defined for the entire task (or a set of pre-defined reward profiles active for different parts of the task), the proposed approach is based on context-dependent reward-weighted learning, where the robot can learn the relevance of candidate objective functions with respect to the current phase of the task or encountered situation. The robot then exploits this information for skills refinement in the policy parameters space. The proposed approach is tested in simulation with a cutting task performed by the STIFF-FLOP flexible robot, using kinesthetic demonstrations from a Barrett WAM manipulator. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Beyond adaptive-critic creative learning for intelligent mobile robots
NASA Astrophysics Data System (ADS)
Liao, Xiaoqun; Cao, Ming; Hall, Ernest L.
2001-10-01
Intelligent industrial and mobile robots may be considered proven technology in structured environments. Teach programming and supervised learning methods permit solutions to a variety of applications. However, we believe that to extend the operation of these machines to more unstructured environments requires a new learning method. Both unsupervised learning and reinforcement learning are potential candidates for these new tasks. The adaptive critic method has been shown to provide useful approximations or even optimal control policies to non-linear systems. The purpose of this paper is to explore the use of new learning methods that goes beyond the adaptive critic method for unstructured environments. The adaptive critic is a form of reinforcement learning. A critic element provides only high level grading corrections to a cognition module that controls the action module. In the proposed system the critic's grades are modeled and forecasted, so that an anticipated set of sub-grades are available to the cognition model. The forecasting grades are interpolated and are available on the time scale needed by the action model. The success of the system is highly dependent on the accuracy of the forecasted grades and adaptability of the action module. Examples from the guidance of a mobile robot are provided to illustrate the method for simple line following and for the more complex navigation and control in an unstructured environment. The theory presented that is beyond the adaptive critic may be called creative theory. Creative theory is a form of learning that models the highest level of human learning - imagination. The application of the creative theory appears to not only be to mobile robots but also to many other forms of human endeavor such as educational learning and business forecasting. Reinforcement learning such as the adaptive critic may be applied to known problems to aid in the discovery of their solutions. The significance of creative theory is that it permits the discovery of the unknown problems, ones that are not yet recognized but may be critical to survival or success.
ERIC Educational Resources Information Center
FISCHER, EDWARD H.; HERSCHBERGER, AUSTIN C.
USE OF THE VERBAL REINFORCEMENT TECHNIQUE (VRT) IN DEVELOPMENTAL, PERSONALITY, AND SOCIALIZATION STUDIES OFTEN RESTS ON TENUOUS AND UNTESTED ASSUMPTIONS. THIS STUDY EXAMINED FIVE VARIABLES WHICH HYPOTHETICALLY RELATE TO PERFORMANCE UNDER REINFORCEMENT--SELF-ESTEEM OF S. TASK-INVOLVEMENT, EXPERIMENTER, ORDINAL POSITION, AND FAMILY SIZE. THE METHOD…
Golob, Edward J; Taube, Jeffrey S
2002-10-17
Tasks using appetitive reinforcers show that following disorientation rats use the shape of an arena to reorient, and cannot distinguish two geometrically similar corners to obtain a reward, despite the presence of a prominent visual cue that provides information to differentiate the two corners. Other studies show that disorientation impairs performance on certain appetitive, but not aversive, tasks. This study evaluated whether rats would make similar geometric errors in a working memory task that used aversive reinforcement. We hypothesized that in a task that used aversive reinforcement rats that were initially disoriented would not reorient by arena shape and thus make similar geometric errors. Tests were performed in a rectangular arena having one polarizing cue. In the appetitive condition water consumption was the reward. The aversive condition was a water maze task with reinforcement provided by escape to a hidden platform. In the aversive condition rats returned to the reinforced corner significantly more often than in the dry condition, and did not favor the diagonally opposite corner. Results show that rats can use cues besides arena shape to reorient in an aversive reinforcement condition. These findings may also reflect different strategies, with an escape/homing strategy in the wet condition and a foraging strategy in the dry condition.
Habitual control of goal selection in humans
Cushman, Fiery; Morris, Adam
2015-01-01
Humans choose actions based on both habit and planning. Habitual control is computationally frugal but adapts slowly to novel circumstances, whereas planning is computationally expensive but can adapt swiftly. Current research emphasizes the competition between habits and plans for behavioral control, yet many complex tasks instead favor their integration. We consider a hierarchical architecture that exploits the computational efficiency of habitual control to select goals while preserving the flexibility of planning to achieve those goals. We formalize this mechanism in a reinforcement learning setting, illustrate its costs and benefits, and experimentally demonstrate its spontaneous application in a sequential decision-making task. PMID:26460050
Effects of Intrinsic Motivation on Feedback Processing During Learning
DePasque, Samantha; Tricomi, Elizabeth
2015-01-01
Learning commonly requires feedback about the consequences of one’s actions, which can drive learners to modify their behavior. Motivation may determine how sensitive an individual might be to such feedback, particularly in educational contexts where some students value academic achievement more than others. Thus, motivation for a task might influence the value placed on performance feedback and how effectively it is used to improve learning. To investigate the interplay between intrinsic motivation and feedback processing, we used functional magnetic resonance imaging (fMRI) during feedback-based learning before and after a novel manipulation based on motivational interviewing, a technique for enhancing treatment motivation in mental health settings. Because of its role in the reinforcement learning system, the striatum is situated to play a significant role in the modulation of learning based on motivation. Consistent with this idea, motivation levels during the task were associated with sensitivity to positive versus negative feedback in the striatum. Additionally, heightened motivation following a brief motivational interview was associated with increases in feedback sensitivity in the left medial temporal lobe. Our results suggest that motivation modulates neural responses to performance-related feedback, and furthermore that changes in motivation facilitates processing in areas that support learning and memory. PMID:26112370
Kobza, Stefan; Ferrea, Stefano; Schnitzler, Alfons; Pollok, Bettina; Südmeyer, Martin; Bellebaum, Christian
2012-01-01
Feedback to both actively performed and observed behaviour allows adaptation of future actions. Positive feedback leads to increased activity of dopamine neurons in the substantia nigra, whereas dopamine neuron activity is decreased following negative feedback. Dopamine level reduction in unmedicated Parkinson's Disease patients has been shown to lead to a negative learning bias, i.e. enhanced learning from negative feedback. Recent findings suggest that the neural mechanisms of active and observational learning from feedback might differ, with the striatum playing a less prominent role in observational learning. Therefore, it was hypothesized that unmedicated Parkinson's Disease patients would show a negative learning bias only in active but not in observational learning. In a between-group design, 19 Parkinson's Disease patients and 40 healthy controls engaged in either an active or an observational probabilistic feedback-learning task. For both tasks, transfer phases aimed to assess the bias to learn better from positive or negative feedback. As expected, actively learning patients showed a negative learning bias, whereas controls learned better from positive feedback. In contrast, no difference between patients and controls emerged for observational learning, with both groups showing better learning from positive feedback. These findings add to neural models of reinforcement-learning by suggesting that dopamine-modulated input to the striatum plays a minor role in observational learning from feedback. Future research will have to elucidate the specific neural underpinnings of observational learning.
Goltz, Sonia M.
1992-01-01
Reinforcement process may underlie decisions frequently found in organizations to escalate investments of time, money and other resources in strategies (e.g., product development, capital investment, plant expansion) that do not result in immediate reinforces. Whereas cognitive biases have been proffered in previous explanations, the present analysis suggested that this persistence is a form of resistance to extinction arising from experiences with past investments that were variably reinforced. This explanation was examined in two experiments by varying the pattern of returns and losses subjects experienced for investment decisions prior to experiencing a series losses. Consistent with the proposed explanation, two conditions resulted in higher levels of recommitment during continuous losses: (a) training using a variable schedule of partial reinforcement, and (b) no training on the task. Results indicate that behavior analysis can be used to understand and control situations in organizations that are prone to escalation, such as investments in the research and development of new product lines and extensions of further loans to customers. PMID:16795785
West, Elizabeth A.; Forcelli, Patrick A.; McCue, David L.; Malkova, Ludise
2013-01-01
The orbitofrontal cortex (OFC) is critical for behavioral adaptation in response to changes in reward value. Here we investigated, in rats, the role of OFC and, specifically, serotonergic neurotransmission within OFC in a reinforcer devaluation task (which measures behavioral flexibility). This task used two visual cues, each predicting one of two foods, with the spatial position (left-right) of the cues above two levers pseudorandomized across trials. An instrumental action (lever press) was required for reinforcer delivery. After training, rats received either excitotoxic OFC lesions made by NMDA (N-methyl-D-aspartic acid), serotonin-specific OFC lesions made by 5,7-DHT (5,7-dihydroxytryptamine), or sham lesions. In sham-lesioned rats, devaluation of one food (by feeding to satiety) significantly decreased responding to the cue associated with that food, when both cues were presented simultaneously during extinction. Both types of OFC lesions disrupted the devaluation effect. In contrast, extinction learning was not affected by serotonin-specific lesions and was only mildly retarded in rats with excitotoxic lesions. Thus, serotonin within OFC is necessary for appropriately adjusting behavior towards cues that predict reward but not for reducing responses in the absence of reward. Our results are the first to demonstrate that serotonin in OFC is necessary for reinforcer devaluation, but not extinction. PMID:23458741
Delayed temporal discrimination in pigeons: A comparison of two procedures
Chatlosh, Diane L.; Wasserman, Edward A.
1987-01-01
A within-subjects comparison was made of pigeons' performance on two temporal discrimination procedures that were signaled by differently colored keylight samples. During stimulus trials, a peck on the key displaying a slanted line was reinforced following short keylight samples, and a peck on the key displaying a horizontal line was reinforced following long keylight samples, regardless of the location of the stimuli on those two choice keys. During position trials, a peck on the left key was reinforced following short keylight samples and a peck on the right key was reinforced following long keylight samples, regardless of which line stimulus appeared on the correct key. Thus, on stimulus trials, the correct choice key could not be discriminated prior to the presentation of the test stimuli, whereas on position trials, the correct choice key could be discriminated during the presentation of the sample stimulus. During Phase 1, with a 0-s delay between sample and choice stimuli, discrimination learning was faster on position trials than on stimulus trials for all 4 birds. During Phase 2, 0-, 0.5-, and 1.0-s delays produced differential loss of stimulus control under the two tasks for 2 birds. Response patterns during the delay intervals provided some evidence for differential mediation of the two delayed discriminations. These between-task differences suggest that the same processes may not mediate performance in each. PMID:16812483
West, Elizabeth A; Forcelli, Patrick A; McCue, David L; Malkova, Ludise
2013-06-01
The orbitofrontal cortex (OFC) is critical for behavioral adaptation in response to changes in reward value. Here we investigated, in rats, the role of OFC and, specifically, serotonergic neurotransmission within OFC in a reinforcer devaluation task (which measures behavioral flexibility). This task used two visual cues, each predicting one of two foods, with the spatial position (left-right) of the cues above two levers pseudorandomized across trials. An instrumental action (lever press) was required for reinforcer delivery. After training, rats received either excitotoxic OFC lesions made by NMDA (N-methyl-d-aspartic acid), serotonin-specific OFC lesions made by 5,7-DHT (5,7-dihydroxytryptamine), or sham lesions. In sham-lesioned rats, devaluation of one food (by feeding to satiety) significantly decreased responding to the cue associated with that food, when both cues were presented simultaneously during extinction. Both types of OFC lesions disrupted the devaluation effect. In contrast, extinction learning was not affected by serotonin-specific lesions and was only mildly retarded in rats with excitotoxic lesions. Thus, serotonin within OFC is necessary for appropriately adjusting behavior toward cues that predict reward but not for reducing responses in the absence of reward. Our results are the first to demonstrate that serotonin in OFC is necessary for reinforcer devaluation, but not extinction. Copyright © 2013 Elsevier B.V. All rights reserved.
Role of state-dependent learning in the cognitive effects of caffeine in mice.
Sanday, Leandro; Zanin, Karina A; Patti, Camilla L; Fernandes-Santos, Luciano; Oliveira, Larissa C; Longo, Beatriz M; Andersen, Monica L; Tufik, Sergio; Frussa-Filho, Roberto
2013-08-01
Caffeine is the most widely used psychoactive substance in the world and it is generally believed that it promotes beneficial effects on cognitive performance. However, there is also evidence suggesting that caffeine has inhibitory effects on learning and memory. Considering that caffeine may have anxiogenic effects, thus changing the emotional state of the subjects, state-dependent learning may play a role in caffeine-induced cognitive alterations. Mice were administered 20 mg/kg caffeine before training and/or before testing both in the plus-maze discriminative avoidance task (an animal model that concomitantly evaluates learning, memory, anxiety-like behaviour and general activity) and in the inhibitory avoidance task, a classic paradigm for evaluating memory in rodents. Pre-training caffeine administration did not modify learning, but produced an anxiogenic effect and impaired memory retention. While pre-test administration of caffeine did not modify retrieval on its own, the pre-test administration counteracted the memory deficit induced by the pre-training caffeine injection in both the plus-maze discriminative and inhibitory avoidance tasks. Our data demonstrate that caffeine-induced memory deficits are critically related to state-dependent learning, reinforcing the importance of considering the participation of state-dependency on the interpretation of the cognitive effects of caffeine. The possible participation of caffeine-induced anxiety alterations in state-dependent memory deficits is discussed.
Rational and Mechanistic Perspectives on Reinforcement Learning
ERIC Educational Resources Information Center
Chater, Nick
2009-01-01
This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…
Scheich, Henning; Brechmann, André; Brosch, Michael; Budinger, Eike; Ohl, Frank W; Selezneva, Elena; Stark, Holger; Tischmeyer, Wolfgang; Wetzel, Wolfram
2011-01-01
Two phenomena of auditory cortex activity have recently attracted attention, namely that the primary field can show different types of learning-related changes of sound representation and that during learning even this early auditory cortex is under strong multimodal influence. Based on neuronal recordings in animal auditory cortex during instrumental tasks, in this review we put forward the hypothesis that these two phenomena serve to derive the task-specific meaning of sounds by associative learning. To understand the implications of this tenet, it is helpful to realize how a behavioral meaning is usually derived for novel environmental sounds. For this purpose, associations with other sensory, e.g. visual, information are mandatory to develop a connection between a sound and its behaviorally relevant cause and/or the context of sound occurrence. This makes it plausible that in instrumental tasks various non-auditory sensory and procedural contingencies of sound generation become co-represented by neuronal firing in auditory cortex. Information related to reward or to avoidance of discomfort during task learning, that is essentially non-auditory, is also co-represented. The reinforcement influence points to the dopaminergic internal reward system, the local role of which for memory consolidation in auditory cortex is well-established. Thus, during a trial of task performance, the neuronal responses to the sounds are embedded in a sequence of representations of such non-auditory information. The embedded auditory responses show task-related modulations of auditory responses falling into types that correspond to three basic logical classifications that may be performed with a perceptual item, i.e. from simple detection to discrimination, and categorization. This hierarchy of classifications determine the semantic "same-different" relationships among sounds. Different cognitive classifications appear to be a consequence of learning task and lead to a recruitment of different excitatory and inhibitory mechanisms and to distinct spatiotemporal metrics of map activation to represent a sound. The described non-auditory firing and modulations of auditory responses suggest that auditory cortex, by collecting all necessary information, functions as a "semantic processor" deducing the task-specific meaning of sounds by learning. © 2010. Published by Elsevier B.V.
Learning what matters: A neural explanation for the sparsity bias.
Hassall, Cameron D; Connor, Patrick C; Trappenberg, Thomas P; McDonald, John J; Krigolson, Olave E
2018-05-01
The visual environment is filled with complex, multi-dimensional objects that vary in their value to an observer's current goals. When faced with multi-dimensional stimuli, humans may rely on biases to learn to select those objects that are most valuable to the task at hand. Here, we show that decision making in a complex task is guided by the sparsity bias: the focusing of attention on a subset of available features. Participants completed a gambling task in which they selected complex stimuli that varied randomly along three dimensions: shape, color, and texture. Each dimension comprised three features (e.g., color: red, green, yellow). Only one dimension was relevant in each block (e.g., color), and a randomly-chosen value ranking determined outcome probabilities (e.g., green > yellow > red). Participants were faster to respond to infrequent probe stimuli that appeared unexpectedly within stimuli that possessed a more valuable feature than to probes appearing within stimuli possessing a less valuable feature. Event-related brain potentials recorded during the task provided a neurophysiological explanation for sparsity as a learning-dependent increase in optimal attentional performance (as measured by the N2pc component of the human event-related potential) and a concomitant learning-dependent decrease in prediction errors (as measured by the feedback-elicited reward positivity). Together, our results suggest that the sparsity bias guides human reinforcement learning in complex environments. Copyright © 2018 Elsevier B.V. All rights reserved.
Zeuner, Kirsten E; Knutzen, Arne; Granert, Oliver; Sablowsky, Simone; Götz, Julia; Wolff, Stephan; Jansen, Olav; Dressler, Dirk; Schneider, Susanne A; Klein, Christine; Deuschl, Günther; van Eimeren, Thilo; Witt, Karsten
2016-01-01
Previous receptor binding studies suggest dopamine function is altered in the basal ganglia circuitry in task-specific dystonia, a condition characterized by contraction of agonist and antagonist muscles while performing specific tasks. Dopamine plays a role in reward-based learning. Using fMRI, this study compared 31 right-handed writer's cramp patients to 35 controls in reward-based learning of a probabilistic reversal-learning task. All subjects chose between two stimuli and indicated their response with their left or right index finger. One stimulus response was rewarded 80%, the other 20%. After contingencies reversal, the second stimulus response was rewarded in 80%. We further linked the DRD2/ANKK1-TaqIa polymorphism, which is associated with 30% reduction of the striatal dopamine receptor density with reward-based learning and assumed impaired reversal learning in A + subjects. Feedback learning in patients was normal. Blood-oxygen level dependent (BOLD) signal in controls increased with negative feedback in the insula, rostral cingulate cortex, middle frontal gyrus and parietal cortex (pFWE < 0.05). In comparison to controls, patients showed greater increase in BOLD activity following negative feedback in the dorsal anterior cingulate cortex (BA32). The genetic status was not correlated with the BOLD activity. The Brodmann area 32 (BA32) is part of the dorsal anterior cingulate cortex (dACC) that plays an important role in coordinating and integrating information to guide behavior and in reward-based learning. The dACC is connected with the basal ganglia-thalamo-loop modulated by dopaminergic signaling. This finding suggests disturbed integration of reinforcement history in decision making and implicate that the reward system might contribute to the pathogenesis in writer's cramp.
Morita, Kenji; Morishima, Mieko; Sakai, Katsuyuki; Kawaguchi, Yasuo
2013-05-15
Humans and animals take actions quickly when they expect that the actions lead to reward, reflecting their motivation. Injection of dopamine receptor antagonists into the striatum has been shown to slow such reward-seeking behavior, suggesting that dopamine is involved in the control of motivational processes. Meanwhile, neurophysiological studies have revealed that phasic response of dopamine neurons appears to represent reward prediction error, indicating that dopamine plays central roles in reinforcement learning. However, previous attempts to elucidate the mechanisms of these dopaminergic controls have not fully explained how the motivational and learning aspects are related and whether they can be understood by the way the activity of dopamine neurons itself is controlled by their upstream circuitries. To address this issue, we constructed a closed-circuit model of the corticobasal ganglia system based on recent findings regarding intracortical and corticostriatal circuit architectures. Simulations show that the model could reproduce the observed distinct motivational effects of D1- and D2-type dopamine receptor antagonists. Simultaneously, our model successfully explains the dopaminergic representation of reward prediction error as observed in behaving animals during learning tasks and could also explain distinct choice biases induced by optogenetic stimulation of the D1 and D2 receptor-expressing striatal neurons. These results indicate that the suggested roles of dopamine in motivational control and reinforcement learning can be understood in a unified manner through a notion that the indirect pathway of the basal ganglia represents the value of states/actions at a previous time point, an empirically driven key assumption of our model.
Frontal Theta Links Prediction Errors to Behavioral Adaptation in Reinforcement Learning
Cavanagh, James F.; Frank, Michael J.; Klein, Theresa J.; Allen, John J.B.
2009-01-01
Investigations into action monitoring have consistently detailed a fronto-central voltage deflection in the Event-Related Potential (ERP) following the presentation of negatively valenced feedback, sometimes termed the Feedback Related Negativity (FRN). The FRN has been proposed to reflect a neural response to prediction errors during reinforcement learning, yet the single trial relationship between neural activity and the quanta of expectation violation remains untested. Although ERP methods are not well suited to single trial analyses, the FRN has been associated with theta band oscillatory perturbations in the medial prefrontal cortex. Medio-frontal theta oscillations have been previously associated with expectation violation and behavioral adaptation and are well suited to single trial analysis. Here, we recorded EEG activity during a probabilistic reinforcement learning task and fit the performance data to an abstract computational model (Q-learning) for calculation of single-trial reward prediction errors. Single-trial theta oscillatory activities following feedback were investigated within the context of expectation (prediction error) and adaptation (subsequent reaction time change). Results indicate that interactive medial and lateral frontal theta activities reflect the degree of negative and positive reward prediction error in the service of behavioral adaptation. These different brain areas use prediction error calculations for different behavioral adaptations: with medial frontal theta reflecting the utilization of prediction errors for reaction time slowing (specifically following errors), but lateral frontal theta reflecting prediction errors leading to working memory-related reaction time speeding for the correct choice. PMID:19969093
The time course of explicit and implicit categorization.
Smith, J David; Zakrzewski, Alexandria C; Herberger, Eric R; Boomer, Joseph; Roeder, Jessica L; Ashby, F Gregory; Church, Barbara A
2015-10-01
Contemporary theory in cognitive neuroscience distinguishes, among the processes and utilities that serve categorization, explicit and implicit systems of category learning that learn, respectively, category rules by active hypothesis testing or adaptive behaviors by association and reinforcement. Little is known about the time course of categorization within these systems. Accordingly, the present experiments contrasted tasks that fostered explicit categorization (because they had a one-dimensional, rule-based solution) or implicit categorization (because they had a two-dimensional, information-integration solution). In Experiment 1, participants learned categories under unspeeded or speeded conditions. In Experiment 2, they applied previously trained category knowledge under unspeeded or speeded conditions. Speeded conditions selectively impaired implicit category learning and implicit mature categorization. These results illuminate the processing dynamics of explicit/implicit categorization.
Kasanova, Zuzana; Ceccarini, Jenny; Frank, Michael J; Amelsvoort, Thérèse van; Booij, Jan; Heinzel, Alexander; Mottaghy, Felix; Myin-Germeys, Inez
2017-07-01
Much human behavior is driven by rewards. Preclinical neurophysiological and clinical positron emission tomography (PET) studies have implicated striatal phasic dopamine (DA) release as a primary modulator of reward processing. However, the relationship between experimental reward-induced striatal DA release and responsiveness to naturalistic rewards, and therefore functional relevance of these findings, has been elusive. We therefore combined, for the first time, a DA D 2/3 receptor [ 18 F]fallypride PET during a probabilistic reinforcement learning (RL) task with a six day ecological momentary assessments (EMA) of reward-related behavior in the everyday life of 16 healthy volunteers. We detected significant reward-induced DA release in the bilateral putamen, caudate nucleus and ventral striatum, the extent of which was associated with better behavioral performance on the RL task across all regions. Furthermore, individual variability in the extent of reward-induced DA release in the right caudate nucleus and ventral striatum modulated the tendency to be actively engaged in a behavior if the active engagement was previously deemed enjoyable. This study suggests a link between striatal reward-related DA release and ecologically relevant reward-oriented behavior, suggesting an avenue for the inquiry into the DAergic basis of optimal and impaired motivational drive. Copyright © 2017 Elsevier B.V. All rights reserved.
Consideration of species differences in developing novel molecules as cognition enhancers.
Young, Jared W; Jentsch, J David; Bussey, Timothy J; Wallace, Tanya L; Hutcheson, Daniel M
2013-11-01
The NIH-funded CNTRICS initiative has coordinated efforts to promote the vertical translation of novel procognitive molecules from testing in mice, rats and non-human primates, to clinical efficacy in patients with schizophrenia. CNTRICS highlighted improving construct validation of tasks across species to increase the likelihood that the translation of a candidate molecule to humans will be successful. Other aspects of cross-species behaviors remain important however. This review describes cognitive tasks utilized across species, providing examples of differences and similarities of innate behavior between species, as well as convergent construct and predictive validity. Tests of attention, olfactory discrimination, reversal learning, and paired associate learning are discussed. Moreover, information on the practical implication of species differences in drug development research is also provided. The issues covered here will aid in task development and utilization across species as well as reinforcing the positive role preclinical research can have in developing procognitive treatments for psychiatric disorders. Copyright © 2012 Elsevier Ltd. All rights reserved.
Kangas, Brian D; Bergman, Jack; Coyle, Joseph T
2016-05-01
Recent developments in precision gene editing have led to the emergence of the marmoset as an experimental subject of considerable interest and translational value. A better understanding of behavioral phenotypes of the common marmoset will inform the extent to which forthcoming transgenic mutants are cognitively intact. Therefore, additional information regarding their learning, inhibitory control, and motivational abilities is needed. The present studies used touchscreen-based repeated acquisition and discrimination reversal tasks to examine basic dimensions of learning and response inhibition. Marmosets were trained daily to respond to one of the two simultaneously presented novel stimuli. Subjects learned to discriminate the two stimuli (acquisition) and, subsequently, with the contingencies switched (reversal). In addition, progressive ratio performance was used to measure the effort expended to obtain a highly palatable reinforcer varying in magnitude and, thereby, provide an index of relative motivational value. Results indicate that rates of both acquisition and reversal of novel discriminations increased across successive sessions, but that rate of reversal learning remained slower than acquisition learning, i.e., more trials were needed for mastery. A positive correlation was observed between progressive ratio break point and reinforcement magnitude. These results closely replicate previous findings with squirrel monkeys, thus providing evidence of similarity in learning processes across nonhuman primate species. Moreover, these data provide key information about the normative phenotype of wild-type marmosets using three relevant behavioral endpoints.
Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation
Morita, Kenji
2016-01-01
It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of ‘Go’ or ‘No-Go’ selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of ‘Go’ values towards a goal, and (2) value-contrasts between ‘Go’ and ‘No-Go’ are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning are active even though learning has apparently converged, the systems might be in a state of dynamic equilibrium, where learning and forgetting are balanced. PMID:27736881
Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.
Kato, Ayaka; Morita, Kenji
2016-10-01
It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2) value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning are active even though learning has apparently converged, the systems might be in a state of dynamic equilibrium, where learning and forgetting are balanced.
Higher incentives can impair performance: neural evidence on reinforcement and rationality.
Achtziger, Anja; Alós-Ferrer, Carlos; Hügelschäfer, Sabine; Steinhauser, Marco
2015-11-01
Standard economic thinking postulates that increased monetary incentives should increase performance. Human decision makers, however, frequently focus on past performance, a form of reinforcement learning occasionally at odds with rational decision making. We used an incentivized belief-updating task from economics to investigate this conflict through measurements of neural correlates of reward processing. We found that higher incentives fail to improve performance when immediate feedback on decision outcomes is provided. Subsequent analysis of the feedback-related negativity, an early event-related potential following feedback, revealed the mechanism behind this paradoxical effect. As incentives increase, the win/lose feedback becomes more prominent, leading to an increased reliance on reinforcement and more errors. This mechanism is relevant for economic decision making and the debate on performance-based payment. © The Author (2015). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Representation of aversive prediction errors in the human periaqueductal gray
Roy, Mathieu; Shohamy, Daphna; Daw, Nathaniel; Jepma, Marieke; Wimmer, Elliott; Wager, Tor D.
2014-01-01
Pain is a primary driver of learning and motivated action. It is also a target of learning, as nociceptive brain responses are shaped by learning processes. We combined an instrumental pain avoidance task with an axiomatic approach to assessing fMRI signals related to prediction errors (PEs), which drive reinforcement-based learning. We found that pain PEs were encoded in the periaqueductal gray (PAG), an important structure for pain control and learning in animal models. Axiomatic tests combined with dynamic causal modeling suggested that ventromedial prefrontal cortex, supported by putamen, provides an expected value-related input to the PAG, which then conveys PE signals to prefrontal regions important for behavioral regulation, including orbitofrontal, anterior mid-cingulate, and dorsomedial prefrontal cortices. Thus, pain-related learning involves distinct neural circuitry, with implications for behavior and pain dynamics. PMID:25282614
[Interaction of immobilization stress and food-getting learning].
Levshina, I P; Stashkevich, I S; Shuĭkin, N N
2009-01-01
The behavioral effects of emotional negative stress (immobilization) were studied in Wistar rats intact and those that had previous positive emotion experience. The food-getting learning has been chosen as positive emotion experience. Animals were trained in food pellet-reaching task by their preferred paw. It was shown that immobilization of intact rats leads to suppression of motor activity and increasing the duration of grooming. These effects indicate enhancement of passive-avoidance reactions. It was also shown that motor learning in group of rats with food reinforcement before immobilisation significantly reduces appearance of passive-avoidance reactions. It was found that immobilization stress does not inverse the initial direction of limb preference in majority of rats.
Reinforcement learning and decision making in monkeys during a competitive game.
Lee, Daeyeol; Conroy, Michelle L; McGreevy, Benjamin P; Barraclough, Dominic J
2004-12-01
Animals living in a dynamic environment must adjust their decision-making strategies through experience. To gain insights into the neural basis of such adaptive decision-making processes, we trained monkeys to play a competitive game against a computer in an oculomotor free-choice task. The animal selected one of two visual targets in each trial and was rewarded only when it selected the same target as the computer opponent. To determine how the animal's decision-making strategy can be affected by the opponent's strategy, the computer opponent was programmed with three different algorithms that exploited different aspects of the animal's choice and reward history. When the computer selected its targets randomly with equal probabilities, animals selected one of the targets more often, violating the prediction of probability matching, and their choices were systematically influenced by the choice history of the two players. When the computer exploited only the animal's choice history but not its reward history, animal's choice became more independent of its own choice history but was still related to the choice history of the opponent. This bias was substantially reduced, but not completely eliminated, when the computer used the choice history of both players in making its predictions. These biases were consistent with the predictions of reinforcement learning, suggesting that the animals sought optimal decision-making strategies using reinforcement learning algorithms.
Vicario-Feliciano, Raquel; Murray, Elisabeth A; Averbeck, Bruno B
2017-10-01
A large body of work has implicated the ventral striatum (VS) in aspects of reinforcement learning (RL). However, less work has directly examined the effects of lesions in the VS, or other forms of inactivation, on 2-armed bandit RL tasks. We have recently found that lesions in the VS in macaque monkeys affect learning with stochastic schedules but have minimal effects with deterministic schedules. The reasons for this are not currently clear. Because our previous work used short intertrial intervals, one possibility is that the animals were using working memory to bridge stimulus-reward associations from 1 trial to the next. In the present study, we examined learning of 60 pairs of objects, in which the animals received only 1 trial per day with each pair. The large number of object pairs and the long interval (approximately 24 hr) between trials with a given pair minimized the chances that the animals could use working memory to bridge trials. We found that monkeys with VS lesions were unimpaired relative to controls, which suggests that animals with VS lesions can still learn to select rewarded objects even when they cannot make use of working memory. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
ERIC Educational Resources Information Center
Hogan, Lindsey C.; Bell, Matthew; Olson, Ryan
2009-01-01
The vigilance reinforcement hypothesis (VRH) asserts that errors in signal detection tasks are partially explained by operant reinforcement and extinction processes. VRH predictions were tested with a computerized baggage screening task. Our experiment evaluated the effects of signal schedule (extinction vs. variable interval 6 min) and visual…
Reinforcing value and hypothetical behavioral economic demand for food and their relation to BMI.
Epstein, Leonard H; Paluch, Rocco A; Carr, Katelyn A; Temple, Jennifer L; Bickel, Warren K; MacKillop, James
2018-04-01
Food is a primary reinforcer, and food reinforcement is related to obesity. The reinforcing value of food can be measured by establishing how hard someone will work to get food on progressive-ratio schedules. An alternative way to measure food reinforcement is a hypothetical purchase task which creates behavioral economic demand curves. This paper studies whether reinforcing value and hypothetical behavioral demand approaches are assessing the same or unique aspects of food reinforcement for low (LED) and high (HED) energy density foods using a combination of analytic approaches in females of varying BMI. Results showed absolute reinforcing value for LED and HED foods and relative reinforcing value were related to demand intensity (r's = 0.20-0.30, p's < 0.01), and demand elasticity (r's = 0.17-0.22, p's < 0.05). Correlations between demographic, BMI and restraint, disinhibition and hunger variables with the two measures of food reinforcement were different. Finally, the two measures provided unique contributions to predicting BMI. Potential reasons for differences between the reinforcing value and hypothetical purchase tasks were actual responding versus hypothetical purchasing, choice of reinforcers versus purchasing of individual foods in the demand task, and the differential role of effort in the two tasks. Examples of how a better understanding of food reinforcement may be useful to prevent or treat obesity are discussed, including engaging in alternative non-food reinforcers as substitutes for food, such as crafts or socializing in a non-food environment, and reducing the value of immediate food reinforcers by episodic future thinking. Copyright © 2018. Published by Elsevier Ltd.
Neural correlates of forward planning in a spatial decision task in humans
Simon, Dylan Alexander; Daw, Nathaniel D.
2011-01-01
Although reinforcement learning (RL) theories have been influential in characterizing the brain’s mechanisms for reward-guided choice, the predominant temporal difference (TD) algorithm cannot explain many flexible or goal-directed actions that have been demonstrated behaviorally. We investigate such actions by contrasting an RL algorithm that is model-based, in that it relies on learning a map or model of the task and planning within it, to traditional model-free TD learning. To distinguish these approaches in humans, we used fMRI in a continuous spatial navigation task, in which frequent changes to the layout of the maze forced subjects continually to relearn their favored routes, thereby exposing the RL mechanisms employed. We sought evidence for the neural substrates of such mechanisms by comparing choice behavior and BOLD signals to decision variables extracted from simulations of either algorithm. Both choices and value-related BOLD signals in striatum, though most often associated with TD learning, were better explained by the model-based theory. Further, predecessor quantities for the model-based value computation were correlated with BOLD signals in the medial temporal lobe and frontal cortex. These results point to a significant extension of both the computational and anatomical substrates for RL in the brain. PMID:21471389
Potentiation of the early visual response to learned danger signals in adults and adolescents
Howsley, Philippa; Jordan, Jeff; Johnston, Pat
2015-01-01
The reinforcing effects of aversive outcomes on avoidance behaviour are well established. However, their influence on perceptual processes is less well explored, especially during the transition from adolescence to adulthood. Using electroencephalography, we examined whether learning to actively or passively avoid harm can modulate early visual responses in adolescents and adults. The task included two avoidance conditions, active and passive, where two different warning stimuli predicted the imminent, but avoidable, presentation of an aversive tone. To avoid the aversive outcome, participants had to learn to emit an action (active avoidance) for one of the warning stimuli and omit an action for the other (passive avoidance). Both adults and adolescents performed the task with a high degree of accuracy. For both adolescents and adults, increased N170 event-related potential amplitudes were found for both the active and the passive warning stimuli compared with control conditions. Moreover, the potentiation of the N170 to the warning stimuli was stable and long lasting. Developmental differences were also observed; adolescents showed greater potentiation of the N170 component to danger signals. These findings demonstrate, for the first time, that learned danger signals in an instrumental avoidance task can influence early visual sensory processes in both adults and adolescents. PMID:24652856
Lewon, Matthew; Peters, Christina M; Van Ry, Pam M; Burkin, Dean J; Hunter, Kenneth W; Hayes, Linda J
2017-09-01
The mdx mouse is an important nonhuman model for Duchenne muscular dystrophy (DMD) research. Characterizing the behavioral traits of the strain relative to congenic wild-type (WT) mice may enhance our understanding of the cognitive deficits observed in some humans with DMD and contribute to treatment development and evaluation. In this paper we report the results of a number of experiments comparing the behavior of mdx to WT mice in operant conditioning procedures designed to assess learning and memory. We found that mdx outperformed WT in all learning and memory tasks involving food reinforcement, and this appeared to be related to the differential effects of the food deprivation motivating operation on mdx mice. Conversely, WT outperformed mdx in an escape/avoidance learning task. These results suggest motivational differences between the strains and demonstrate the potential utility of operant conditioning procedures in the assessment of the behavioral characteristics of the mdx mouse. Copyright © 2017 Elsevier B.V. All rights reserved.
Autonomous learning based on cost assumptions: theoretical studies and experiments in robot control.
Ribeiro, C H; Hemerly, E M
2000-02-01
Autonomous learning techniques are based on experience acquisition. In most realistic applications, experience is time-consuming: it implies sensor reading, actuator control and algorithmic update, constrained by the learning system dynamics. The information crudeness upon which classical learning algorithms operate make such problems too difficult and unrealistic. Nonetheless, additional information for facilitating the learning process ideally should be embedded in such a way that the structural, well-studied characteristics of these fundamental algorithms are maintained. We investigate in this article a more general formulation of the Q-learning method that allows for a spreading of information derived from single updates towards a neighbourhood of the instantly visited state and converges to optimality. We show how this new formulation can be used as a mechanism to safely embed prior knowledge about the structure of the state space, and demonstrate it in a modified implementation of a reinforcement learning algorithm in a real robot navigation task.
NASA Astrophysics Data System (ADS)
Sakai, Naoki; Kawabe, Naoto; Hara, Masayuki; Toyoda, Nozomi; Yabuta, Tetsuro
This paper argues how a compact humanoid robot can acquire a giant-swing motion without any robotic models by using Q-Learning method. Generally, it is widely said that Q-Learning is not appropriated for learning dynamic motions because Markov property is not necessarily guaranteed during the dynamic task. However, we tried to solve this problem by embedding the angular velocity state into state definition and averaging Q-Learning method to reduce dynamic effects, although there remain non-Markov effects in the learning results. The result shows how the robot can acquire a giant-swing motion by using Q-Learning algorithm. The successful acquired motions are analyzed in the view point of dynamics in order to realize a functionally giant-swing motion. Finally, the result shows how this method can avoid the stagnant action loop at around the bottom of the horizontal bar during the early stage of giant-swing motion.
Learning relative values in the striatum induces violations of normative decision making
Klein, Tilmann A.; Ullsperger, Markus; Jocham, Gerhard
2017-01-01
To decide optimally between available options, organisms need to learn the values associated with these options. Reinforcement learning models offer a powerful explanation of how these values are learnt from experience. However, human choices often violate normative principles. We suggest that seemingly counterintuitive decisions may arise as a natural consequence of the learning mechanisms deployed by humans. Here, using fMRI and a novel behavioural task, we show that, when suddenly switched to novel choice contexts, participants’ choices are incongruent with values learnt by standard learning algorithms. Instead, behaviour is compatible with the decisions of an agent learning how good an option is relative to an option with which it had previously been paired. Striatal activity exhibits the characteristics of a prediction error used to update such relative option values. Our data suggest that choices can be biased by a tendency to learn option values with reference to the available alternatives. PMID:28631734
An integrated utility-based model of conflict evaluation and resolution in the Stroop task.
Chuderski, Adam; Smolen, Tomasz
2016-04-01
Cognitive control allows humans to direct and coordinate their thoughts and actions in a flexible way, in order to reach internal goals regardless of interference and distraction. The hallmark test used to examine cognitive control is the Stroop task, which elicits both the weakly learned but goal-relevant and the strongly learned but goal-irrelevant response tendencies, and requires people to follow the former while ignoring the latter. After reviewing the existing computational models of cognitive control in the Stroop task, its novel, integrated utility-based model is proposed. The model uses 3 crucial control mechanisms: response utility reinforcement learning, utility-based conflict evaluation using the Festinger formula for assessing the conflict level, and top-down adaptation of response utility in service of conflict resolution. Their complex, dynamic interaction led to replication of 18 experimental effects, being the largest data set explained to date by 1 Stroop model. The simulations cover the basic congruency effects (including the response latency distributions), performance dynamics and adaptation (including EEG indices of conflict), as well as the effects resulting from manipulations applied to stimulation and responding, which are yielded by the extant Stroop literature. (c) 2016 APA, all rights reserved).
Brain-Machine Interface control of a robot arm using actor-critic rainforcement learning.
Pohlmeyer, Eric A; Mahmoudi, Babak; Geng, Shijia; Prins, Noeline; Sanchez, Justin C
2012-01-01
Here we demonstrate how a marmoset monkey can use a reinforcement learning (RL) Brain-Machine Interface (BMI) to effectively control the movements of a robot arm for a reaching task. In this work, an actor-critic RL algorithm used neural ensemble activity in the monkey's motor cortext to control the robot movements during a two-target decision task. This novel approach to decoding offers unique advantages for BMI control applications. Compared to supervised learning decoding methods, the actor-critic RL algorithm does not require an explicit set of training data to create a static control model, but rather it incrementally adapts the model parameters according to its current performance, in this case requiring only a very basic feedback signal. We show how this algorithm achieved high performance when mapping the monkey's neural states (94%) to robot actions, and only needed to experience a few trials before obtaining accurate real-time control of the robot arm. Since RL methods responsively adapt and adjust their parameters, they can provide a method to create BMIs that are robust against perturbations caused by changes in either the neural input space or the output actions they generate under different task requirements or goals.
Awata, Hiroko; Wakuda, Ryo; Ishimaru, Yoshiyasu; Matsuoka, Yuji; Terao, Kanta; Katata, Satomi; Matsumoto, Yukihisa; Hamanaka, Yoshitaka; Noji, Sumihare; Mito, Taro; Mizunami, Makoto
2016-01-01
Revealing reinforcing mechanisms in associative learning is important for elucidation of brain mechanisms of behavior. In mammals, dopamine neurons are thought to mediate both appetitive and aversive reinforcement signals. Studies using transgenic fruit-flies suggested that dopamine neurons mediate both appetitive and aversive reinforcements, through the Dop1 dopamine receptor, but our studies using octopamine and dopamine receptor antagonists and using Dop1 knockout crickets suggested that octopamine neurons mediate appetitive reinforcement and dopamine neurons mediate aversive reinforcement in associative learning in crickets. To fully resolve this issue, we examined the effects of silencing of expression of genes that code the OA1 octopamine receptor and Dop1 and Dop2 dopamine receptors by RNAi in crickets. OA1-silenced crickets exhibited impairment in appetitive learning with water but not in aversive learning with sodium chloride solution, while Dop1-silenced crickets exhibited impairment in aversive learning but not in appetitive learning. Dop2-silenced crickets showed normal scores in both appetitive learning and aversive learning. The results indicate that octopamine neurons mediate appetitive reinforcement via OA1 and that dopamine neurons mediate aversive reinforcement via Dop1 in crickets, providing decisive evidence that neurotransmitters and receptors that mediate appetitive reinforcement indeed differ among different species of insects. PMID:27412401
Awata, Hiroko; Wakuda, Ryo; Ishimaru, Yoshiyasu; Matsuoka, Yuji; Terao, Kanta; Katata, Satomi; Matsumoto, Yukihisa; Hamanaka, Yoshitaka; Noji, Sumihare; Mito, Taro; Mizunami, Makoto
2016-07-14
Revealing reinforcing mechanisms in associative learning is important for elucidation of brain mechanisms of behavior. In mammals, dopamine neurons are thought to mediate both appetitive and aversive reinforcement signals. Studies using transgenic fruit-flies suggested that dopamine neurons mediate both appetitive and aversive reinforcements, through the Dop1 dopamine receptor, but our studies using octopamine and dopamine receptor antagonists and using Dop1 knockout crickets suggested that octopamine neurons mediate appetitive reinforcement and dopamine neurons mediate aversive reinforcement in associative learning in crickets. To fully resolve this issue, we examined the effects of silencing of expression of genes that code the OA1 octopamine receptor and Dop1 and Dop2 dopamine receptors by RNAi in crickets. OA1-silenced crickets exhibited impairment in appetitive learning with water but not in aversive learning with sodium chloride solution, while Dop1-silenced crickets exhibited impairment in aversive learning but not in appetitive learning. Dop2-silenced crickets showed normal scores in both appetitive learning and aversive learning. The results indicate that octopamine neurons mediate appetitive reinforcement via OA1 and that dopamine neurons mediate aversive reinforcement via Dop1 in crickets, providing decisive evidence that neurotransmitters and receptors that mediate appetitive reinforcement indeed differ among different species of insects.
Kobza, Stefan; Ferrea, Stefano; Schnitzler, Alfons; Pollok, Bettina
2012-01-01
Feedback to both actively performed and observed behaviour allows adaptation of future actions. Positive feedback leads to increased activity of dopamine neurons in the substantia nigra, whereas dopamine neuron activity is decreased following negative feedback. Dopamine level reduction in unmedicated Parkinson’s Disease patients has been shown to lead to a negative learning bias, i.e. enhanced learning from negative feedback. Recent findings suggest that the neural mechanisms of active and observational learning from feedback might differ, with the striatum playing a less prominent role in observational learning. Therefore, it was hypothesized that unmedicated Parkinson’s Disease patients would show a negative learning bias only in active but not in observational learning. In a between-group design, 19 Parkinson’s Disease patients and 40 healthy controls engaged in either an active or an observational probabilistic feedback-learning task. For both tasks, transfer phases aimed to assess the bias to learn better from positive or negative feedback. As expected, actively learning patients showed a negative learning bias, whereas controls learned better from positive feedback. In contrast, no difference between patients and controls emerged for observational learning, with both groups showing better learning from positive feedback. These findings add to neural models of reinforcement-learning by suggesting that dopamine-modulated input to the striatum plays a minor role in observational learning from feedback. Future research will have to elucidate the specific neural underpinnings of observational learning. PMID:23185586
West, Elizabeth A.
2016-01-01
Nucleus accumbens (NAc) neurons encode features of stimulus learning and action selection associated with rewards. The NAc is necessary for using information about expected outcome values to guide behavior after reinforcer devaluation. Evidence suggests that core and shell subregions may play dissociable roles in guiding motivated behavior. Here, we recorded neural activity in the NAc core and shell during training and performance of a reinforcer devaluation task. Long–Evans male rats were trained that presses on a lever under an illuminated cue light delivered a flavored sucrose reward. On subsequent test days, each rat was given free access to one of two distinctly flavored foods to consume to satiation and were then immediately tested on the lever pressing task under extinction conditions. Rats decreased pressing on the test day when the reinforcer earned during training was the sated flavor (devalued) compared with the test day when the reinforcer was not the sated flavor (nondevalued), demonstrating evidence of outcome-selective devaluation. Cue-selective encoding during training by NAc core (but not shell) neurons reliably predicted subsequent behavioral performance; that is, the greater the percentage of neurons that responded to the cue, the better the rats suppressed responding after devaluation. In contrast, NAc shell (but not core) neurons significantly decreased cue-selective encoding in the devalued condition compared with the nondevalued condition. These data reveal that NAc core and shell neurons encode information differentially about outcome-specific cues after reinforcer devaluation that are related to behavioral performance and outcome value, respectively. SIGNIFICANCE STATEMENT Many neuropsychiatric disorders are marked by impairments in behavioral flexibility. Although the nucleus accumbens (NAc) is required for behavioral flexibility, it is not known how NAc neurons encode this information. Here, we recorded NAc neurons during a training session in which rats learned that a cue predicted a specific reward and during a test session when that reward value was changed. Although encoding in the core during training predicted the ability of rats to change behavior after the reward value was altered, the NAc shell encoded information about the change in reward value during the test session. These findings suggest differential roles of the core and shell in behavioral flexibility. PMID:26818502
Gold, James M.; Waltz, James A.; Matveeva, Tatyana M.; Kasanova, Zuzana; Strauss, Gregory P.; Herbener, Ellen S.; Collins, Anne G.E.; Frank, Michael J.
2015-01-01
Context Negative symptoms are a core feature of schizophrenia, but their pathophysiology remains unclear. Objective Negative symptoms are defined by the absence of normal function. However, there must be a productive mechanism that leads to this absence. Here, we test a reinforcement learning account suggesting that negative symptoms result from a failure to represent the expected value of rewards coupled with preserved loss avoidance learning. Design Subjects performed a probabilistic reinforcement learning paradigm involving stimulus pairs in which choices resulted in either reward or avoidance of loss. Following training, subjects indicated their valuation of the stimuli in a transfer task. Computational modeling was used to distinguish between alternative accounts of the data. Setting A tertiary care research outpatient clinic. Patients A total of 47 clinically stable patients with a diagnosis of schizophrenia or schizoaffective disorder and 28 healthy volunteers participated. Patients were divided into high and low negative symptom groups. Main Outcome measures 1) The number of choices leading to reward or loss avoidance and 2) performance in the transfer phase. Quantitative fits from three different models were examined. Results High negative symptom patients demonstrated impaired learning from rewards but intact loss avoidance learning, and failed to distinguish rewarding stimuli from loss-avoiding stimuli in the transfer phase. Model fits revealed that high negative symptom patients were better characterized by an “actor-critic” model, learning stimulus-response associations, whereas controls and low negative symptom patients incorporated expected value of their actions (“Q-learning”) into the selection process. Conclusions Negative symptoms are associated with a specific reinforcement learning abnormality: High negative symptoms patients do not represent the expected value of rewards when making decisions but learn to avoid punishments through the use of prediction errors. This computational framework offers the potential to understand negative symptoms at a mechanistic level. PMID:22310503
Oliveira, Emileane C; Hunziker, Maria Helena
2014-07-01
In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title. Copyright © 2014. Published by Elsevier B.V.
Network analysis of exploratory behaviors of mice in a spatial learning and memory task
Suzuki, Yusuke
2017-01-01
The Barnes maze is one of the main behavioral tasks used to study spatial learning and memory. The Barnes maze is a task conducted on “dry land” in which animals try to escape from a brightly lit exposed circular open arena to a small dark escape box located under one of several holes at the periphery of the arena. In comparison with another classical spatial learning and memory task, the Morris water maze, the negative reinforcements that motivate animals in the Barnes maze are less severe and less stressful. Furthermore, the Barnes maze is more compatible with recently developed cutting-edge techniques in neural circuit research, such as the miniature brain endoscope or optogenetics. For this study, we developed a lift-type task start system and equipped the Barnes maze with it. The subject mouse is raised up by the lift and released into the maze automatically so that it can start navigating the maze smoothly from exactly the same start position across repeated trials. We believe that a Barnes maze test with a lift-type task start system may be useful for behavioral experiments when combined with head-mounted or wire-connected devices for online imaging and intervention in neural circuits. Furthermore, we introduced a network analysis method for the analysis of the Barnes maze data. Each animal’s exploratory behavior in the maze was visualized as a network of nodes and their links, and spatial learning in the maze is described by systematic changes in network structures of search behavior. Network analysis was capable of visualizing and quantitatively analyzing subtle but significant differences in an animal’s exploratory behavior in the maze. PMID:28700627
A study on the effects of some reinforcers to improve performance of employees in a retail industry.
Raj, John Dilip; Nelson, John Abraham; Rao, K S P
2006-11-01
Two field experiments were conducted in the Business Information Technology Department of a major retail industry to analyze the impact of positive task performance reinforcers. The employees were divided into two broad groups - those performing complex tasks and those performing relatively simpler tasks. The first group was further divided into two subgroups, one being reinforced with money and paid leave and the other with feedback. Both the subgroups showed a significant improvement in performance behavior. However, feedback had a stronger effect on task performance even after the reinforcement was withdrawn. The second group of employees was allowed to choose reinforcers of their liking. Two simple techniques, a casual dress code and flexible working hours chosen by them, had a positive effect on their performance, which continued even after 6 months into the intervention. Besides, the procedure for the second group required no monetary or work-time loss to the employer.
Rational and mechanistic perspectives on reinforcement learning.
Chater, Nick
2009-12-01
This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: mechanistic and rational. Reinforcement learning is often viewed in mechanistic terms--as describing the operation of aspects of an agent's cognitive and neural machinery. Yet it can also be viewed as a rational level of description, specifically, as describing a class of methods for learning from experience, using minimal background knowledge. This paper considers how rational and mechanistic perspectives differ, and what types of evidence distinguish between them. Reinforcement learning research in the cognitive and brain sciences is often implicitly committed to the mechanistic interpretation. Here the opposite view is put forward: that accounts of reinforcement learning should apply at the rational level, unless there is strong evidence for a mechanistic interpretation. Implications of this viewpoint for reinforcement-based theories in the cognitive and brain sciences are discussed.
Multiple memory systems as substrates for multiple decision systems
Doll, Bradley B.; Shohamy, Daphna; Daw, Nathaniel D.
2014-01-01
It has recently become widely appreciated that value-based decision making is supported by multiple computational strategies. In particular, animal and human behavior in learning tasks appears to include habitual responses described by prominent model-free reinforcement learning (RL) theories, but also more deliberative or goal-directed actions that can be characterized by a different class of theories, model-based RL. The latter theories evaluate actions by using a representation of the contingencies of the task (as with a learned map of a spatial maze), called an “internal model.” Given the evidence of behavioral and neural dissociations between these approaches, they are often characterized as dissociable learning systems, though they likely interact and share common mechanisms. In many respects, this division parallels a longstanding dissociation in cognitive neuroscience between multiple memory systems, describing, at the broadest level, separate systems for declarative and procedural learning. Procedural learning has notable parallels with model-free RL: both involve learning of habits and both are known to depend on parts of the striatum. Declarative memory, by contrast, supports memory for single events or episodes and depends on the hippocampus. The hippocampus is thought to support declarative memory by encoding temporal and spatial relations among stimuli and thus is often referred to as a relational memory system. Such relational encoding is likely to play an important role in learning an internal model, the representation that is central to model-based RL. Thus, insofar as the memory systems represent more general-purpose cognitive mechanisms that might subserve performance on many sorts of tasks including decision making, these parallels raise the question whether the multiple decision systems are served by multiple memory systems, such that one dissociation is grounded in the other. Here we investigated the relationship between model-based RL and relational memory by comparing individual differences across behavioral tasks designed to measure either capacity. Human subjects performed two tasks, a learning and generalization task (acquired equivalence) which involves relational encoding and depends on the hippocampus; and a sequential RL task that could be solved by either a model-based or model-free strategy. We assessed the correlation between subjects’ use of flexible, relational memory, as measured by generalization in the acquired equivalence task, and their differential reliance on either RL strategy in the decision task. We observed a significant positive relationship between generalization and model-based, but not model-free, choice strategies. These results are consistent with the hypothesis that model-based RL, like acquired equivalence, relies on a more general-purpose relational memory system. PMID:24846190
Using Aberrant Behaviors as Reinforcers for Autistic Children.
ERIC Educational Resources Information Center
Charlop, Marjorie H.; And Others
1990-01-01
Three experiments assessed the efficacy of various reinforcers to increase correct task responding in a total of 10 autistic children, aged 6-9. Of the reinforcers used (stereotypy, delayed echolalia, perseverative behavior, and food), task performance was highest with opportunities to engage in aberrant behaviors, and lowest with edible…
Fagen, Ariel; Acharya, Narayan; Kaufman, Gretchen E
2014-01-01
Many trainers of animals in the zoo now rely on positive reinforcement training to teach animals to voluntarily participate in husbandry and veterinary procedures in an effort to improve behavioral reliability, captive management, and welfare. However, captive elephant handlers in Nepal still rely heavily on punishment- and aversion-based methods. The aim of this project was to determine the effectiveness of secondary positive reinforcement (SPR) in training free-contact elephants in Nepal to voluntarily participate in a trunk wash for the purpose of tuberculosis testing. Five female elephants, 4 juveniles and 1 adult, were enrolled in the project. Data were collected in the form of minutes of training, number of offers made for each training task, and success rate for each task in performance tests. Four out of 5 elephants, all juveniles, successfully learned the trunk wash in 35 sessions or fewer, with each session lasting a mean duration of 12 min. The elephants' performance improved from a mean success rate of 39.0% to 89.3% during the course of the training. This study proves that it is feasible to efficiently train juvenile, free-contact, traditionally trained elephants in Nepal to voluntarily and reliably participate in a trunk wash using only SPR techniques.
Fagen, Ariel; Acharya, Narayan; Kaufman, Gretchen E.
2016-01-01
Many trainers of animals in the zoo now rely on positive reinforcement training to teach animals to voluntarily participate in husbandry and veterinary procedures in an effort to improve behavioral reliability, captive management, and welfare. However, captive elephant handlers in Nepal still rely heavily on punishment- and aversion-based methods. The aim of this project was to determine the effectiveness of secondary positive reinforcement (SPR) in training free-contact elephants in Nepal to voluntarily participate in a trunk wash for the purpose of tuberculosis testing. Five female elephants, 4 juveniles and 1 adult, were enrolled in the project. Data were collected in the form of minutes of training, number of offers made for each training task, and success rate for each task in performance tests. Four out of 5 elephants, all juveniles, successfully learned the trunk wash in 35 sessions or fewer, with each session lasting a mean duration of 12 min. The elephants’ performance improved from a mean success rate of 39.0% to 89.3% during the course of the training. This study proves that it is feasible to efficiently train juvenile, free-contact, traditionally trained elephants in Nepal to voluntarily and reliably participate in a trunk wash using only SPR techniques. PMID:24410366
Modulation of habit formation by levodopa in Parkinson's disease.
Marzinzik, Frank; Wotka, Johann; Wahl, Michael; Krugel, Lea K; Kordsachia, Catarina; Klostermann, Fabian
2011-01-01
Dopamine promotes the execution of positively reinforced actions, but its role for the formation of behaviour when feedback is unavailable remains open. To study this issue, the performance of treated/untreated patients with Parkinson's disease and controls was analysed in an implicit learning task, hypothesising dopamine-dependent adherence to hidden task rules. Sixteen patients on/off levodopa and fourteen healthy subjects engaged in a Go/NoGo paradigm comprising four equiprobable stimuli. One of the stimuli was defined as target which was first consistently preceded by one of the three non-target stimuli (conditioning), whereas this coupling was dissolved thereafter (deconditioning). Two task versions were presented: in a 'Go version', only the target cue required the execution of a button press, whereas non-target stimuli were not instructive of a response; in a 'NoGo version', only the target cue demanded the inhibition of the button press which was demanded upon any non-target stimulus. Levodopa influenced in which task version errors grew from conditioning to deconditioning: in unmedicated patients just as controls errors only rose in the NoGo version with an increase of incorrect responses to target cues. Contrarily, in medicated patients errors went up only in the Go version with an increase of response omissions to target cues. The error increases during deconditioning can be understood as a perpetuation of reaction tendencies acquired during conditioning. The levodopa-mediated modulation of this carry-over effect suggests that dopamine supports habit conditioning under the task demand of response execution, but dampens it when inhibition is required. However, other than in reinforcement learning, supporting dopaminergic actions referred to the most frequent, i. e., non-target behaviour. Since this is passive whenever selective actions are executed against an inactive background, dopaminergic treatment could in according scenarios contribute to passive behaviour in patients with Parkinson's disease.
Effects of intrinsic motivation on feedback processing during learning.
DePasque, Samantha; Tricomi, Elizabeth
2015-10-01
Learning commonly requires feedback about the consequences of one's actions, which can drive learners to modify their behavior. Motivation may determine how sensitive an individual might be to such feedback, particularly in educational contexts where some students value academic achievement more than others. Thus, motivation for a task might influence the value placed on performance feedback and how effectively it is used to improve learning. To investigate the interplay between intrinsic motivation and feedback processing, we used functional magnetic resonance imaging (fMRI) during feedback-based learning before and after a novel manipulation based on motivational interviewing, a technique for enhancing treatment motivation in mental health settings. Because of its role in the reinforcement learning system, the striatum is situated to play a significant role in the modulation of learning based on motivation. Consistent with this idea, motivation levels during the task were associated with sensitivity to positive versus negative feedback in the striatum. Additionally, heightened motivation following a brief motivational interview was associated with increases in feedback sensitivity in the left medial temporal lobe. Our results suggest that motivation modulates neural responses to performance-related feedback, and furthermore that changes in motivation facilitate processing in areas that support learning and memory. Copyright © 2015. Published by Elsevier Inc.
The Time Course of Explicit and Implicit Categorization
Zakrzewski, Alexandria C.; Herberger, Eric; Boomer, Joseph; Roeder, Jessica; Ashby, F. Gregory; Church, Barbara A.
2015-01-01
Contemporary theory in cognitive neuroscience distinguishes, among the processes and utilities that serve categorization, explicit and implicit systems of category learning that learn, respectively, category rules by active hypothesis testing or adaptive behaviors by association and reinforcement. Little is known about the time course of categorization within these systems. Accordingly, the present experiments contrasted tasks that fostered explicit categorization (because they had a one-dimensional, rule-based solution) or implicit categorization (because they had a two-dimensional, information-integration solution). In Experiment 1, participants learned categories under unspeeded or speeded conditions. In Experiment 2, they applied previously trained category knowledge under unspeeded or speeded conditions. Speeded conditions selectively impaired implicit category learning and implicit mature categorization. These results illuminate the processing dynamics of explicit/implicit categorization. PMID:26025556
Extending the Peak Bandwidth of Parameters for Softmax Selection in Reinforcement Learning.
Iwata, Kazunori
2016-05-11
Softmax selection is one of the most popular methods for action selection in reinforcement learning. Although various recently proposed methods may be more effective with full parameter tuning, implementing a complicated method that requires the tuning of many parameters can be difficult. Thus, softmax selection is still worth revisiting, considering the cost savings of its implementation and tuning. In fact, this method works adequately in practice with only one parameter appropriately set for the environment. The aim of this paper is to improve the variable setting of this method to extend the bandwidth of good parameters, thereby reducing the cost of implementation and parameter tuning. To achieve this, we take advantage of the asymptotic equipartition property in a Markov decision process to extend the peak bandwidth of softmax selection. Using a variety of episodic tasks, we show that our setting is effective in extending the bandwidth and that it yields a better policy in terms of stability. The bandwidth is quantitatively assessed in a series of statistical tests.
Decentralized reinforcement-learning control and emergence of motion patterns
NASA Astrophysics Data System (ADS)
Svinin, Mikhail; Yamada, Kazuyaki; Okhura, Kazuhiro; Ueda, Kanji
1998-10-01
In this paper we propose a system for studying emergence of motion patterns in autonomous mobile robotic systems. The system implements an instance-based reinforcement learning control. Three spaces are of importance in formulation of the control scheme. They are the work space, the sensor space, and the action space. Important feature of our system is that all these spaces are assumed to be continuous. The core part of the system is a classifier system. Based on the sensory state space analysis, the control is decentralized and is specified at the lowest level of the control system. However, the local controllers are implicitly connected through the perceived environment information. Therefore, they constitute a dynamic environment with respect to each other. The proposed control scheme is tested under simulation for a mobile robot in a navigation task. It is shown that some patterns of global behavior--such as collision avoidance, wall-following, light-seeking--can emerge from the local controllers.
Knowledge-Based Reinforcement Learning for Data Mining
NASA Astrophysics Data System (ADS)
Kudenko, Daniel; Grzes, Marek
Data Mining is the process of extracting patterns from data. Two general avenues of research in the intersecting areas of agents and data mining can be distinguished. The first approach is concerned with mining an agent’s observation data in order to extract patterns, categorize environment states, and/or make predictions of future states. In this setting, data is normally available as a batch, and the agent’s actions and goals are often independent of the data mining task. The data collection is mainly considered as a side effect of the agent’s activities. Machine learning techniques applied in such situations fall into the class of supervised learning. In contrast, the second scenario occurs where an agent is actively performing the data mining, and is responsible for the data collection itself. For example, a mobile network agent is acquiring and processing data (where the acquisition may incur a certain cost), or a mobile sensor agent is moving in a (perhaps hostile) environment, collecting and processing sensor readings. In these settings, the tasks of the agent and the data mining are highly intertwined and interdependent (or even identical). Supervised learning is not a suitable technique for these cases. Reinforcement Learning (RL) enables an agent to learn from experience (in form of reward and punishment for explorative actions) and adapt to new situations, without a teacher. RL is an ideal learning technique for these data mining scenarios, because it fits the agent paradigm of continuous sensing and acting, and the RL agent is able to learn to make decisions on the sampling of the environment which provides the data. Nevertheless, RL still suffers from scalability problems, which have prevented its successful use in many complex real-world domains. The more complex the tasks, the longer it takes a reinforcement learning algorithm to converge to a good solution. For many real-world tasks, human expert knowledge is available. For example, human experts have developed heuristics that help them in planning and scheduling resources in their work place. However, this domain knowledge is often rough and incomplete. When the domain knowledge is used directly by an automated expert system, the solutions are often sub-optimal, due to the incompleteness of the knowledge, the uncertainty of environments, and the possibility to encounter unexpected situations. RL, on the other hand, can overcome the weaknesses of the heuristic domain knowledge and produce optimal solutions. In the talk we propose two techniques, which represent first steps in the area of knowledge-based RL (KBRL). The first technique [1] uses high-level STRIPS operator knowledge in reward shaping to focus the search for the optimal policy. Empirical results show that the plan-based reward shaping approach outperforms other RL techniques, including alternative manual and MDP-based reward shaping when it is used in its basic form. We showed that MDP-based reward shaping may fail and successful experiments with STRIPS-based shaping suggest modifications which can overcome encountered problems. The STRIPSbased method we propose allows expressing the same domain knowledge in a different way and the domain expert can choose whether to define an MDP or STRIPS planning task. We also evaluated the robustness of the proposed STRIPS-based technique to errors in the plan knowledge. In case that STRIPS knowledge is not available, we propose a second technique [2] that shapes the reward with hierarchical tile coding. Where the Q-function is represented with low-level tile coding, a V-function with coarser tile coding can be learned in parallel and used to approximate the potential for ground states. In the context of data mining, our KBRL approaches can also be used for any data collection task where the acquisition of data may incur considerable cost. In addition, observing the data collection agent in specific scenarios may lead to new insights into optimal data collection behaviour in the respective domains. In future work, we intend to demonstrate and evaluate our techniques on concrete real-world data mining applications.
Pigeons' Discrimination of Michotte's Launching Effect
Young, Michael E; Beckmann, Joshua S; Wasserman, Edward A
2006-01-01
We trained four pigeons to discriminate a Michotte launching animation from three other animations using a go/no-go task. The pigeons received food for pecking at one of the animations, but not for pecking at the others. The four animations featured two types of interactions among objects: causal (direct launching) and noncausal (delayed, distal, and distal & delayed). Two pigeons were reinforced for pecking at the causal interaction, but not at the noncausal interactions; two other pigeons were reinforced for pecking at the distal & delayed interaction, but not at the other interactions. Both discriminations proved difficult for the pigeons to master; later tests suggested that the pigeons often learned the discriminations by attending to subtle stimulus properties other than the intended ones. PMID:17002229
Liu, Chunming; Xu, Xin; Hu, Dewen
2013-04-29
Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.
Parker, Jones G.; Wanat, Matthew J.; Soden, Marta E.; Ahmad, Kinza; Zweifel, Larry S.; Bamford, Nigel S.; Palmiter, Richard D.
2011-01-01
Phasic dopamine transmission encodes the value of reward-predictive stimuli and influences both learning and decision-making. Altered dopamine signaling is associated with psychiatric conditions characterized by risky choices such as pathological gambling. These observations highlight the importance of understanding how dopamine neuron activity is modulated. While excitatory drive onto dopamine neurons is critical for generating phasic dopamine responses, emerging evidence suggests that inhibitory signaling also modulates these responses. To address the functional importance of inhibitory signaling in dopamine neurons, we generated mice lacking the β3 subunit of the GABAA receptor specifically in dopamine neurons (β3-KO mice) and examined their behavior in tasks that assessed appetitive learning, aversive learning, and risk preference. Dopamine neurons in midbrain slices from β3-KO mice exhibited attenuated GABA-evoked inhibitory post-synaptic currents. Furthermore, electrical stimulation of excitatory afferents to dopamine neurons elicited more dopamine release in the nucleus accumbens of β3-KO mice as measured by fast-scan cyclic voltammetry. β3-KO mice were more active than controls when given morphine, which correlated with potential compensatory upregulation of GABAergic tone onto dopamine neurons. β3-KO mice learned faster in two food-reinforced learning paradigms, but extinguished their learned behavior normally. Enhanced learning was specific for appetitive tasks, as aversive learning was unaffected in β3-KO mice. Finally, we found that β3-KO mice had enhanced risk preference in a probabilistic selection task that required mice to choose between a small certain reward and a larger uncertain reward. Collectively, these findings identify a selective role for GABAA signaling in dopamine neurons in appetitive learning and decision-making. PMID:22114279
Pohlmeyer, Eric A.; Mahmoudi, Babak; Geng, Shijia; Prins, Noeline W.; Sanchez, Justin C.
2014-01-01
Brain-machine interface (BMI) systems give users direct neural control of robotic, communication, or functional electrical stimulation systems. As BMI systems begin transitioning from laboratory settings into activities of daily living, an important goal is to develop neural decoding algorithms that can be calibrated with a minimal burden on the user, provide stable control for long periods of time, and can be responsive to fluctuations in the decoder’s neural input space (e.g. neurons appearing or being lost amongst electrode recordings). These are significant challenges for static neural decoding algorithms that assume stationary input/output relationships. Here we use an actor-critic reinforcement learning architecture to provide an adaptive BMI controller that can successfully adapt to dramatic neural reorganizations, can maintain its performance over long time periods, and which does not require the user to produce specific kinetic or kinematic activities to calibrate the BMI. Two marmoset monkeys used the Reinforcement Learning BMI (RLBMI) to successfully control a robotic arm during a two-target reaching task. The RLBMI was initialized using random initial conditions, and it quickly learned to control the robot from brain states using only a binary evaluative feedback regarding whether previously chosen robot actions were good or bad. The RLBMI was able to maintain control over the system throughout sessions spanning multiple weeks. Furthermore, the RLBMI was able to quickly adapt and maintain control of the robot despite dramatic perturbations to the neural inputs, including a series of tests in which the neuron input space was deliberately halved or doubled. PMID:24498055
A questionnaire approach to measuring the relative reinforcing efficacy of snack foods
Epstein, Leonard H.; Dearing, Kelly K.; Roba, Lora G.
2010-01-01
Behavioral choice theory and laboratory choice paradigms can provide a framework to understand the reinforcing efficacy or reinforcing value of food. Reinforcing efficacy is measured in the laboratory by assessing how much effort one will engage in to gain access to food as the amount of work progressively increases. However, this method to establish demand curves as estimates of reinforcer efficacy is time consuming and limits the number of reinforcers that can be tested. The general aim of this study was to compare the reinforcing efficacy of snack foods using a behavioral task that requires subjects to respond to gain access to portions of food (LAB task) with a questionnaire version of a purchasing task designed to determine demand curves (QUES task) in nonobese and obese adults (n = 24). Results showed correlations between the maximal amount of money that individuals were willing to spend for food (QUES Omax) and the maximal amount of responses made on the highest reinforcement schedule completed (LAB Omax) (r = 0.45, p < 0.05), and between BMI and the LAB Omax (r = 0.43, p < 0.05) and the QUES Omax (r = 0.52, p < 0.05). The study suggests the questionnaire provides valid measures of reinforcing efficacy that can be used in place of or in conjunction with traditional laboratory paradigms to establish demand curves that describe the behavioral maintaining properties of food. PMID:20188288
DISTRIBUTED AND ACCUMULATED REINFORCEMENT ARRANGEMENTS: EVALUATIONS OF EFFICACY AND PREFERENCE
DELEON, ISER G.; CHASE, JULIE A.; FRANK-CRAWFORD, MICHELLE A.; CARREAU-WEBSTER, ABBEY B.; TRIGGS, MANDY M.; BULLOCK, CHRISTOPHER E.; JENNETT, HEATHER K.
2015-01-01
We assessed the efficacy of, and preference for, accumulated access to reinforcers, which allows uninterrupted engagement with the reinforcers but imposes an inherent delay required to first complete the task. Experiment 1 compared rates of task completion in 4 individuals who had been diagnosed with intellectual disabilities when reinforcement was distributed (i.e., 30-s access to the reinforcer delivered immediately after each response) and accumulated (i.e., 5-min access to the reinforcer after completion of multiple consecutive responses). Accumulated reinforcement produced response rates that equaled or exceeded rates during distributed reinforcement for 3 participants. Experiment 2 used a concurrent-chains schedule to examine preferences for each arrangement. All participants preferred delayed, accumulated access when the reinforcer was an activity. Three participants also preferred accumulated access to edible reinforcers. The collective results suggest that, despite the inherent delay, accumulated reinforcement is just as effective and is often preferred by learners over distributed reinforcement. PMID:24782203
Bayesian Cue Integration as a Developmental Outcome of Reward Mediated Learning
Weisswange, Thomas H.; Rothkopf, Constantin A.; Rodemann, Tobias; Triesch, Jochen
2011-01-01
Average human behavior in cue combination tasks is well predicted by Bayesian inference models. As this capability is acquired over developmental timescales, the question arises, how it is learned. Here we investigated whether reward dependent learning, that is well established at the computational, behavioral, and neuronal levels, could contribute to this development. It is shown that a model free reinforcement learning algorithm can indeed learn to do cue integration, i.e. weight uncertain cues according to their respective reliabilities and even do so if reliabilities are changing. We also consider the case of causal inference where multimodal signals can originate from one or multiple separate objects and should not always be integrated. In this case, the learner is shown to develop a behavior that is closest to Bayesian model averaging. We conclude that reward mediated learning could be a driving force for the development of cue integration and causal inference. PMID:21750717
Acute effects of caffeine on several operant behaviors in rhesus monkeys.
Buffalo, E A; Gillam, M P; Allen, R R; Paule, M G
1993-11-01
The acute effects of 1,3-trimethylxanthine (caffeine) were assessed using an operant test battery (OTB) of complex food-reinforced tasks that are thought to depend upon relatively specific brain functions, such as motivation to work for food (progressive ratio, PR), learning (incremental repeated acquisition, IRA), color and position discrimination (conditioned position responding, CPR), time estimation (temporal response differentiation, TRD), and short-term memory and attention (delayed matching-to-sample, DMTS). Endpoints included response rates (RR), accuracies (ACC), and percent task completed (PTC). Caffeine sulfate (0.175-20.0 mg/kg, IV), given 15 min pretesting, produced significant dose-dependent decreases in TRD percent task completed and accuracy at doses > or = 5.6 mg/kg. Caffeine produced no systematic effects on either DMTS or PR responding, but low doses tended to enhance performance in both IRA and CPR tasks. Thus, in monkeys, performance of an operant task designed to model time estimation is more sensitive to the disruptive effects of caffeine than is performance of the other tasks in the OTB.
Navigating complex decision spaces: Problems and paradigms in sequential choice
Walsh, Matthew M.; Anderson, John R.
2015-01-01
To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult when the consequences of an action follow a delay. This introduces the problem of temporal credit assignment. When feedback follows a sequence of decisions, how should the individual assign credit to the intermediate actions that comprise the sequence? Research in reinforcement learning provides two general solutions to this problem: model-free reinforcement learning and model-based reinforcement learning. In this review, we examine connections between stimulus-response and cognitive learning theories, habitual and goal-directed control, and model-free and model-based reinforcement learning. We then consider a range of problems related to temporal credit assignment. These include second-order conditioning and secondary reinforcers, latent learning and detour behavior, partially observable Markov decision processes, actions with distributed outcomes, and hierarchical learning. We ask whether humans and animals, when faced with these problems, behave in a manner consistent with reinforcement learning techniques. Throughout, we seek to identify neural substrates of model-free and model-based reinforcement learning. The former class of techniques is understood in terms of the neurotransmitter dopamine and its effects in the basal ganglia. The latter is understood in terms of a distributed network of regions including the prefrontal cortex, medial temporal lobes cerebellum, and basal ganglia. Not only do reinforcement learning techniques have a natural interpretation in terms of human and animal behavior, but they also provide a useful framework for understanding neural reward valuation and action selection. PMID:23834192
Effects of Reinforcement History and Instructions on the Persistence of Student Engagement.
ERIC Educational Resources Information Center
Martens, Brian K.; Bradley, Tracy A.; Eckert, Tanya L.
1997-01-01
This study examined the effects of three reinforcement histories on the persistence of task engagement by two students (ages 9-10) who were off task during independent seat work. Results found the reinforcement history that contained an instructional control component produced the greatest persistence in student engagement. (Author/CR)
ERIC Educational Resources Information Center
Bouxsein, Kelly J.; Roane, Henry S.; Harper, Tara
2011-01-01
Positive and negative reinforcement are effective for treating escape-maintained destructive behavior. The current study evaluated the separate and combined effects of these contingencies to increase task compliance. Results showed that a combination of positive and negative reinforcement was most effective for increasing compliance. (Contains 1…
Model-Based Reinforcement Learning under Concurrent Schedules of Reinforcement in Rodents
ERIC Educational Resources Information Center
Huh, Namjung; Jo, Suhyun; Kim, Hoseok; Sul, Jung Hoon; Jung, Min Whan
2009-01-01
Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's…
Murray, C L; Fibiger, H C
1986-02-01
The effects of bilateral ibotenic acid-induced lesions of the nucleus basalis magnocellularis (nBM) on the acquisition and retention of several spatial memory tasks were studied in the rat. Maintenance of spatial memory in a food search task was impaired following nBM lesions. Acquisition of spontaneous alternation and reinforced alternation in a T-maze was also significantly impaired in animals with these lesions. In contrast, the animals with nBM lesions were not impaired in the acquisition of a position habit in a T-maze. In several of the tasks there was evidence of some learning in the lesion animals after substantial training, although they were significantly deficient when compared with the controls. Administration of the cholinergic agonists physostigmine sulfate or pilocarpine nitrate prior to behavioral testing resulted in a rapid and significant improvement in the performance of the lesion animals. The ibotenate-induced lesions significantly reduced the activity of choline acetyltransferase (CAT) in the anterior and the posterior neocortex. Hippocampal CAT activity was not changed. The results indicate that the cholinergic projections originating in the nBM are involved in the learning and memory of spatial tasks.
The dorsolateral striatum selectively mediates extinction of habit memory.
Goodman, Jarid; Ressler, Reed L; Packard, Mark G
2016-12-01
Previous research has indicated a role for the dorsolateral striatum (DLS) in acquisition and retrieval of habit memory. However, the neurobiological mechanisms guiding extinction of habit memory have not been extensively investigated. The present study examined whether the dorsolateral striatum (DLS) is involved in extinction of habit memory in a food-rewarded response learning version of the plus-maze in adult male Long-Evans rats (experiment 1). In addition, to determine whether the role of this brain region in extinction is selective to habit memory, we also examined whether the DLS is required for extinction of hippocampus-dependent spatial memory in a place learning version of the plus-maze (experiment 2). Following acquisition in either task, rats received two days of extinction training, in which the food reward was removed from the maze. The number of perseverative trials (a trial in which the rat made the same previously reinforced body-turn) and latency to reach the previously correct food well were used as measures of extinction. Animals were given immediate post-training intra-DLS administration of the sodium channel blocker bupivacaine or vehicle to determine the effect of DLS inactivation on consolidation of extinction memory in each task. In the response learning task, post-training DLS inactivation impaired consolidation of extinction memory. Injections of bupivacaine delayed 2 h post-training did not affect extinction, indicating a time-dependent effect of neural inactivation on consolidation of extinction memory in this task. In contrast, post-training DLS inactivation did not impair, but instead slightly enhanced, extinction memory in the place learning task. The present findings indicate a critical role for the DLS in extinction of habit memory in the response learning task, and may be relevant to understanding the neural mechanisms through which maladaptive habits in human psychopathologies (e.g. drug addiction) may be suppressed. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Guérin, Joris; Gibaru, Olivier; Thiery, Stéphane; Nyiri, Eric
2017-01-01
Recent methods of Reinforcement Learning have enabled to solve difficult, high dimensional, robotic tasks under unknown dynamics using iterative Linear Quadratic Gaussian control theory. These algorithms are based on building a local time-varying linear model of the dynamics from data gathered through interaction with the environment. In such tasks, the cost function is often expressed directly in terms of the state and control variables so that it can be locally quadratized to run the algorithm. If the cost is expressed in terms of other variables, a model is required to compute the cost function from the variables manipulated. We propose a method to learn the cost function directly from the data, in the same way as for the dynamics. This way, the cost function can be defined in terms of any measurable quantity and thus can be chosen more appropriately for the task to be carried out. With our method, any sensor information can be used to design the cost function. We demonstrate the efficiency of this method through simulating, with the V-REP software, the learning of a Cartesian positioning task on several industrial robots with different characteristics. The robots are controlled in joint space and no model is provided a priori. Our results are compared with another model free technique, consisting in writing the cost function as a state variable.
Automatic spin-chain learning to explore the quantum speed limit
NASA Astrophysics Data System (ADS)
Zhang, Xiao-Ming; Cui, Zi-Wei; Wang, Xin; Yung, Man-Hong
2018-05-01
One of the ambitious goals of artificial intelligence is to build a machine that outperforms human intelligence, even if limited knowledge and data are provided. Reinforcement learning (RL) provides one such possibility to reach this goal. In this work, we consider a specific task from quantum physics, i.e., quantum state transfer in a one-dimensional spin chain. The mission for the machine is to find transfer schemes with the fastest speeds while maintaining high transfer fidelities. The first scenario we consider is when the Hamiltonian is time independent. We update the coupling strength by minimizing a loss function dependent on both the fidelity and the speed. Compared with a scheme proven to be at the quantum speed limit for the perfect state transfer, the scheme provided by RL is faster while maintaining the infidelity below 5 ×10-4 . In the second scenario where a time-dependent external field is introduced, we convert the state transfer process into a Markov decision process that can be understood by the machine. We solve it with the deep Q-learning algorithm. After training, the machine successfully finds transfer schemes with high fidelities and speeds, which are faster than previously known ones. These results show that reinforcement learning can be a powerful tool for quantum control problems.
Fire Performance of Shipboard Electronic Space Materials
2006-09-15
5d. PROJECT NUMBER 61-8513-0-6-5 John B. Hoover, Clarence L. Whitehurst, Eric B. Chang, and Frederick W. Williams 5e . TASK NUMBER 5f. WORK UNIT NUMBER...representative of current US Navy surface ship electronic spaces. It is expected that lessons learned from tests of this configuration will be applicable... NGSS ) for use on current construction DDG-51 class destroyers. The panels consist of a Nomex honeycomb core with a GRP (glass reinforced plastic
Dynamic Sensor Tasking for Space Situational Awareness via Reinforcement Learning
NASA Astrophysics Data System (ADS)
Linares, R.; Furfaro, R.
2016-09-01
This paper studies the Sensor Management (SM) problem for optical Space Object (SO) tracking. The tasking problem is formulated as a Markov Decision Process (MDP) and solved using Reinforcement Learning (RL). The RL problem is solved using the actor-critic policy gradient approach. The actor provides a policy which is random over actions and given by a parametric probability density function (pdf). The critic evaluates the policy by calculating the estimated total reward or the value function for the problem. The parameters of the policy action pdf are optimized using gradients with respect to the reward function. Both the critic and the actor are modeled using deep neural networks (multi-layer neural networks). The policy neural network takes the current state as input and outputs probabilities for each possible action. This policy is random, and can be evaluated by sampling random actions using the probabilities determined by the policy neural network's outputs. The critic approximates the total reward using a neural network. The estimated total reward is used to approximate the gradient of the policy network with respect to the network parameters. This approach is used to find the non-myopic optimal policy for tasking optical sensors to estimate SO orbits. The reward function is based on reducing the uncertainty for the overall catalog to below a user specified uncertainty threshold. This work uses a 30 km total position error for the uncertainty threshold. This work provides the RL method with a negative reward as long as any SO has a total position error above the uncertainty threshold. This penalizes policies that take longer to achieve the desired accuracy. A positive reward is provided when all SOs are below the catalog uncertainty threshold. An optimal policy is sought that takes actions to achieve the desired catalog uncertainty in minimum time. This work trains the policy in simulation by letting it task a single sensor to "learn" from its performance. The proposed approach for the SM problem is tested in simulation and good performance is found using the actor-critic policy gradient method.
Effects of dopamine on reinforcement learning and consolidation in Parkinson's disease.
Grogan, John P; Tsivos, Demitra; Smith, Laura; Knight, Brogan E; Bogacz, Rafal; Whone, Alan; Coulthard, Elizabeth J
2017-07-10
Emerging evidence suggests that dopamine may modulate learning and memory with important implications for understanding the neurobiology of memory and future therapeutic targeting. An influential hypothesis posits that dopamine biases reinforcement learning. More recent data also suggest an influence during both consolidation and retrieval. Eighteen Parkinson's disease patients learned through feedback ON or OFF medication, with memory tested 24 hr later ON or OFF medication (4 conditions, within-subjects design with matched healthy control group). Patients OFF medication during learning decreased in memory accuracy over the following 24 hr. In contrast to previous studies, however, dopaminergic medication during learning and testing did not affect expression of positive or negative reinforcement. Two further experiments were run without the 24 hr delay, but they too failed to reproduce effects of dopaminergic medication on reinforcement learning. While supportive of a dopaminergic role in consolidation, this study failed to replicate previous findings on reinforcement learning.
Enhanced Experience Replay for Deep Reinforcement Learning
2015-11-01
ARL-TR-7538 ● NOV 2015 US Army Research Laboratory Enhanced Experience Replay for Deep Reinforcement Learning by David Doria...Experience Replay for Deep Reinforcement Learning by David Doria, Bryan Dawson, and Manuel Vindiola Computational and Information Sciences Directorate...
Adaptive effort investment in cognitive and physical tasks: a neurocomputational model
Verguts, Tom; Vassena, Eliana; Silvetti, Massimo
2015-01-01
Despite its importance in everyday life, the computational nature of effort investment remains poorly understood. We propose an effort model obtained from optimality considerations, and a neurocomputational approximation to the optimal model. Both are couched in the framework of reinforcement learning. It is shown that choosing when or when not to exert effort can be adaptively learned, depending on rewards, costs, and task difficulty. In the neurocomputational model, the limbic loop comprising anterior cingulate cortex (ACC) and ventral striatum in the basal ganglia allocates effort to cortical stimulus-action pathways whenever this is valuable. We demonstrate that the model approximates optimality. Next, we consider two hallmark effects from the cognitive control literature, namely proportion congruency and sequential congruency effects. It is shown that the model exerts both proactive and reactive cognitive control. Then, we simulate two physical effort tasks. In line with empirical work, impairing the model's dopaminergic pathway leads to apathetic behavior. Thus, we conceptually unify the exertion of cognitive and physical effort, studied across a variety of literatures (e.g., motivation and cognitive control) and animal species. PMID:25805978
Model-Based Reasoning in Humans Becomes Automatic with Training.
Economides, Marcos; Kurth-Nelson, Zeb; Lübbert, Annika; Guitart-Masip, Marc; Dolan, Raymond J
2015-09-01
Model-based and model-free reinforcement learning (RL) have been suggested as algorithmic realizations of goal-directed and habitual action strategies. Model-based RL is more flexible than model-free but requires sophisticated calculations using a learnt model of the world. This has led model-based RL to be identified with slow, deliberative processing, and model-free RL with fast, automatic processing. In support of this distinction, it has recently been shown that model-based reasoning is impaired by placing subjects under cognitive load--a hallmark of non-automaticity. Here, using the same task, we show that cognitive load does not impair model-based reasoning if subjects receive prior training on the task. This finding is replicated across two studies and a variety of analysis methods. Thus, task familiarity permits use of model-based reasoning in parallel with other cognitive demands. The ability to deploy model-based reasoning in an automatic, parallelizable fashion has widespread theoretical implications, particularly for the learning and execution of complex behaviors. It also suggests a range of important failure modes in psychiatric disorders.
Lucas, Morgan; Ilin, Yana; Anunu, Rachel; Kehat, Orli; Xu, Lin; Desmedt, Aline; Richter-Levin, Gal
2014-09-01
Findings suggest that stress-induced impaired learning and coping abilities may be attributed more to the psychological nature of the stressor, rather than its physical properties. It has been proposed that establishing controllability over stressors can ameliorate some of its effects on cognition and behavior. Gaining controllability was suggested to be associated with the development of stress resilience. Based on repeated exposure to the two-way shuttle avoidance task, we previously developed and validated a behavioral task that leads to a strict dissociation between gaining controllability (to the level that the associated fear is significantly reduced) and a fearful state of uncontrollability. Employing this protocol, we investigated here the impact of gaining or failing to gain emotional controllability on indices of anxiety and depression and on subsequent abilities to cope with positively or negatively reinforcing learning experiences. In agreement with previous studies, rats exposed to the uncontrollable protocol demonstrated high concentration of sera corticosterone, increased immobility, reduced duration of struggling in the forced swim test and impaired ability to acquire subsequent learning tasks. Achieving emotional controllability resulted in resilience to stress as was indicated by longer duration of struggling in the forced swim test, and enhanced learning abilities. Our prolonged training protocol, with the demonstrated ability of rats to gain emotional controllability, is proposed as a useful tool to study the neurobiological mechanisms of stress resilience.
Harvey, Roxann C; Jordan, Chloe J; Tassin, David H; Moody, Kayla R; Dwoskin, Linda P; Kantak, Kathleen M
2013-01-01
Research examining medication effects on set shifting in teens with attention deficit/hyperactivity disorder (ADHD) is lacking. An animal model of ADHD may be useful for exploring this gap. The Spontaneously Hypertensive Rat (SHR) is a commonly used animal model of ADHD. SHR and two comparator strains, Wistar-Kyoto (WKY) and Wistar (WIS), were evaluated during adolescence in a strategy set shifting task under conditions of a 0-sec or 15-sec delay to reinforcer delivery. The task had three phases: initial discrimination, set shift and reversal learning. Under 0-sec delays, SHR performed as well as or better than WKY and WIS. Treatment with 0.3 mg/kg/day atomoxetine had little effect, other than to modestly increase trials to criterion during set shifting in all strains. Under 15-sec delays, SHR had longer lever press reaction times, longer latencies to criterion and more trial omissions than WKY during set shifting and reversal learning. These deficits were not reduced systematically by 1.5 mg/kg/day methylphenidate or 0.3 mg/kg/day atomoxetine. Regarding learning in SHR, methylphenidate improved initial discrimination, whereas atomoxetine improved set shifting but disrupted initial discrimination. During reversal learning, both drugs were ineffective in SHR, and atomoxetine made reaction time and trial omissions greater in WKY. Overall, WIS performance differed from SHR or WKY, depending on phase. Collectively, a genetic model of ADHD in adolescent rats revealed that neither methylphenidate nor atomoxetine mitigated all deficits in SHR during the set shifting task. Thus, methylphenidate or atomoxetine monotherapy may not mitigate all set shift task-related deficits in teens with ADHD. PMID:23376704
Short-term total sleep deprivation alters delay-conditioned memory in the rat.
Tripathi, Shweta; Jha, Sushil K
2016-06-01
Short-term sleep deprivation soon after training may impair memory consolidation. Also, a particular sleep stage or its components increase after learning some tasks, such as negative and positive reinforcement tasks, avoidance tasks, and spatial learning tasks, and so forth. It suggests that discrete memory types may require specific sleep stage or its components for their optimal processing. The classical conditioning paradigms are widely used to study learning and memory but the role of sleep in a complex conditioned learning is unclear. Here, we have investigated the effects of short-term sleep deprivation on the consolidation of delay-conditioned memory and the changes in sleep architecture after conditioning. Rats were trained for the delay-conditioned task (for conditioning, house-light [conditioned stimulus] was paired with fruit juice [unconditioned stimulus]). Animals were divided into 3 groups: (a) sleep deprived (SD); (b) nonsleep deprived (NSD); and (c) stress control (SC) groups. Two-way ANOVA revealed a significant interaction between groups and days (training and testing) during the conditioned stimulus-unconditioned stimulus presentation. Further, Tukey post hoc comparison revealed that the NSD and SC animals exhibited significant increase in performances during testing. The SD animals, however, performed significantly less during testing. Further, we observed that wakefulness and NREM sleep did not change after training and testing. Interestingly, REM sleep increased significantly on both days compared to baseline more specifically during the initial 4-hr time window after conditioning. Our results suggest that the consolidation of delay-conditioned memory is sleep-dependent and requires augmented REM sleep during an explicit time window soon after training. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Prespeech motor learning in a neural network using reinforcement.
Warlaumont, Anne S; Westermann, Gert; Buder, Eugene H; Oller, D Kimbrough
2013-02-01
Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one's language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the different conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network's post-learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network's post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model's post-learning productions were more likely to resemble the English vowels and vice versa. Copyright © 2012 Elsevier Ltd. All rights reserved.
Dopamine D3 Receptor Availability Is Associated with Inflexible Decision Making.
Groman, Stephanie M; Smith, Nathaniel J; Petrullli, J Ryan; Massi, Bart; Chen, Lihui; Ropchan, Jim; Huang, Yiyun; Lee, Daeyeol; Morris, Evan D; Taylor, Jane R
2016-06-22
Dopamine D2/3 receptor signaling is critical for flexible adaptive behavior; however, it is unclear whether D2, D3, or both receptor subtypes modulate precise signals of feedback and reward history that underlie optimal decision making. Here, PET with the radioligand [(11)C]-(+)-PHNO was used to quantify individual differences in putative D3 receptor availability in rodents trained on a novel three-choice spatial acquisition and reversal-learning task with probabilistic reinforcement. Binding of [(11)C]-(+)-PHNO in the midbrain was negatively related to the ability of rats to adapt to changes in rewarded locations, but not to the initial learning. Computational modeling of choice behavior in the reversal phase indicated that [(11)C]-(+)-PHNO binding in the midbrain was related to the learning rate and sensitivity to positive, but not negative, feedback. Administration of a D3-preferring agonist likewise impaired reversal performance by reducing the learning rate and sensitivity to positive feedback. These results demonstrate a previously unrecognized role for D3 receptors in select aspects of reinforcement learning and suggest that individual variation in midbrain D3 receptors influences flexible behavior. Our combined neuroimaging, behavioral, pharmacological, and computational approach implicates the dopamine D3 receptor in decision-making processes that are altered in psychiatric disorders. Flexible decision-making behavior is dependent upon dopamine D2/3 signaling in corticostriatal brain regions. However, the role of D3 receptors in adaptive, goal-directed behavior has not been thoroughly investigated. By combining PET imaging with the D3-preferring radioligand [(11)C]-(+)-PHNO, pharmacology, a novel three-choice probabilistic discrimination and reversal task and computational modeling of behavior in rats, we report that naturally occurring variation in [(11)C]-(+)-PHNO receptor availability relates to specific aspects of flexible decision making. We confirm these relationships using a D3-preferring agonist, thus identifying a unique role of midbrain D3 receptors in decision-making processes. Copyright © 2016 the authors 0270-6474/16/366732-10$15.00/0.
ERIC Educational Resources Information Center
Redish, A. David; Jensen, Steve; Johnson, Adam; Kurth-Nelson, Zeb
2007-01-01
Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated value and are thus unable to capture renewal. TDRL…
Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments.
Leong, Yuan Chang; Radulescu, Angela; Daniel, Reka; DeWoskin, Vivian; Niv, Yael
2017-01-18
Little is known about the relationship between attention and learning during decision making. Using eye tracking and multivariate pattern analysis of fMRI data, we measured participants' dimensional attention as they performed a trial-and-error learning task in which only one of three stimulus dimensions was relevant for reward at any given time. Analysis of participants' choices revealed that attention biased both value computation during choice and value update during learning. Value signals in the ventromedial prefrontal cortex and prediction errors in the striatum were similarly biased by attention. In turn, participants' focus of attention was dynamically modulated by ongoing learning. Attentional switches across dimensions correlated with activity in a frontoparietal attention network, which showed enhanced connectivity with the ventromedial prefrontal cortex between switches. Our results suggest a bidirectional interaction between attention and learning: attention constrains learning to relevant dimensions of the environment, while we learn what to attend to via trial and error. Copyright © 2017 Elsevier Inc. All rights reserved.
Dargis, Monika; Wolf, Richard C; Koenigs, Michael
2017-06-01
Deficits in reinforcement learning are presumed to underlie the impulsive and incorrigible behavior exhibited by psychopathic criminals. However, previous studies documenting reversal learning impairments in psychopathic individuals have not investigated this relationship across a continuous range of psychopathy severity, nor have they examined how reversal learning impairments relate to different psychopathic traits, such as the interpersonal-affective and lifestyle-antisocial dimensions. Furthermore, previous studies have not considered the role that childhood maltreatment and substance use may have in this specific cognitive deficit. Using a standard reversal learning task in a sample of N = 114 incarcerated male offenders, we demonstrate a significant relationship between psychopathy severity and reversal learning errors. Furthermore, we show a significant interaction between psychopathy and childhood maltreatment, but not substance use, such that individuals high in psychopathy with an extensive history of maltreatment committed the greatest number of reversal learning errors. These findings extend the current understanding of reversal learning performance among psychopathic individuals, and highlight the importance of considering childhood maltreatment when studying psychopathy.
Richmond, Paul; Buesing, Lars; Giugliano, Michele; Vasilaki, Eleni
2011-05-04
High performance computing on the Graphics Processing Unit (GPU) is an emerging field driven by the promise of high computational power at a low cost. However, GPU programming is a non-trivial task and moreover architectural limitations raise the question of whether investing effort in this direction may be worthwhile. In this work, we use GPU programming to simulate a two-layer network of Integrate-and-Fire neurons with varying degrees of recurrent connectivity and investigate its ability to learn a simplified navigation task using a policy-gradient learning rule stemming from Reinforcement Learning. The purpose of this paper is twofold. First, we want to support the use of GPUs in the field of Computational Neuroscience. Second, using GPU computing power, we investigate the conditions under which the said architecture and learning rule demonstrate best performance. Our work indicates that networks featuring strong Mexican-Hat-shaped recurrent connections in the top layer, where decision making is governed by the formation of a stable activity bump in the neural population (a "non-democratic" mechanism), achieve mediocre learning results at best. In absence of recurrent connections, where all neurons "vote" independently ("democratic") for a decision via population vector readout, the task is generally learned better and more robustly. Our study would have been extremely difficult on a desktop computer without the use of GPU programming. We present the routines developed for this purpose and show that a speed improvement of 5x up to 42x is provided versus optimised Python code. The higher speed is achieved when we exploit the parallelism of the GPU in the search of learning parameters. This suggests that efficient GPU programming can significantly reduce the time needed for simulating networks of spiking neurons, particularly when multiple parameter configurations are investigated.
Design and Control of Large Collections of Learning Agents
NASA Technical Reports Server (NTRS)
Agogino, Adrian
2001-01-01
The intelligent control of multiple autonomous agents is an important yet difficult task. Previous methods used to address this problem have proved to be either too brittle, too hard to use, or not scalable to large systems. The 'Collective Intelligence' project at NASA/Ames provides an elegant, machine-learning approach to address these problems. This approach mathematically defines some essential properties that a reward system should have to promote coordinated behavior among reinforcement learners. This work has focused on creating additional key properties and algorithms within the mathematics of the Collective Intelligence framework. One of the additions will allow agents to learn more quickly, in a more coordinated manner. The other will let agents learn with less knowledge of their environment. These additions will allow the framework to be applied more easily, to a much larger domain of multi-agent problems.
Working Memory Load Strengthens Reward Prediction Errors.
Collins, Anne G E; Ciullo, Brittany; Frank, Michael J; Badre, David
2017-04-19
Reinforcement learning (RL) in simple instrumental tasks is usually modeled as a monolithic process in which reward prediction errors (RPEs) are used to update expected values of choice options. This modeling ignores the different contributions of different memory and decision-making systems thought to contribute even to simple learning. In an fMRI experiment, we investigated how working memory (WM) and incremental RL processes interact to guide human learning. WM load was manipulated by varying the number of stimuli to be learned across blocks. Behavioral results and computational modeling confirmed that learning was best explained as a mixture of two mechanisms: a fast, capacity-limited, and delay-sensitive WM process together with slower RL. Model-based analysis of fMRI data showed that striatum and lateral prefrontal cortex were sensitive to RPE, as shown previously, but, critically, these signals were reduced when the learning problem was within capacity of WM. The degree of this neural interaction related to individual differences in the use of WM to guide behavioral learning. These results indicate that the two systems do not process information independently, but rather interact during learning. SIGNIFICANCE STATEMENT Reinforcement learning (RL) theory has been remarkably productive at improving our understanding of instrumental learning as well as dopaminergic and striatal network function across many mammalian species. However, this neural network is only one contributor to human learning and other mechanisms such as prefrontal cortex working memory also play a key role. Our results also show that these other players interact with the dopaminergic RL system, interfering with its key computation of reward prediction errors. Copyright © 2017 the authors 0270-6474/17/374332-11$15.00/0.
Distributed and accumulated reinforcement arrangements: evaluations of efficacy and preference.
DeLeon, Iser G; Chase, Julie A; Frank-Crawford, Michelle A; Carreau-Webster, Abbey B; Triggs, Mandy M; Bullock, Christopher E; Jennett, Heather K
2014-01-01
We assessed the efficacy of, and preference for, accumulated access to reinforcers, which allows uninterrupted engagement with the reinforcers but imposes an inherent delay required to first complete the task. Experiment 1 compared rates of task completion in 4 individuals who had been diagnosed with intellectual disabilities when reinforcement was distributed (i.e., 30-s access to the reinforcer delivered immediately after each response) and accumulated (i.e., 5-min access to the reinforcer after completion of multiple consecutive responses). Accumulated reinforcement produced response rates that equaled or exceeded rates during distributed reinforcement for 3 participants. Experiment 2 used a concurrent-chains schedule to examine preferences for each arrangement. All participants preferred delayed, accumulated access when the reinforcer was an activity. Three participants also preferred accumulated access to edible reinforcers. The collective results suggest that, despite the inherent delay, accumulated reinforcement is just as effective and is often preferred by learners over distributed reinforcement. © Society for the Experimental Analysis of Behavior.
Behavioral and neural properties of social reinforcement learning
Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Libby, Victoria; Glover, Gary; Voss, Henning U.; Ballon, Douglas J.; Casey, BJ
2011-01-01
Social learning is critical for engaging in complex interactions with other individuals. Learning from positive social exchanges, such as acceptance from peers, may be similar to basic reinforcement learning. We formally test this hypothesis by developing a novel paradigm that is based upon work in non-human primates and human imaging studies of reinforcement learning. The probability of receiving positive social reinforcement from three distinct peers was parametrically manipulated while brain activity was recorded in healthy adults using event-related functional magnetic resonance imaging (fMRI). Over the course of the experiment, participants responded more quickly to faces of peers who provided more frequent positive social reinforcement, and rated them as more likeable. Modeling trial-by-trial learning showed ventral striatum and orbital frontal cortex activity correlated positively with forming expectations about receiving social reinforcement. Rostral anterior cingulate cortex activity tracked positively with modulations of expected value of the cues (peers). Together, the findings across three levels of analysis - social preferences, response latencies and modeling neural responses – are consistent with reinforcement learning theory and non-human primate electrophysiological studies of reward. This work highlights the fundamental influence of acceptance by one’s peers in altering subsequent behavior. PMID:21917787
Ilango, A; Wetzel, W; Scheich, H; Ohl, F W
2010-03-31
Learned changes in behavior can be elicited by either appetitive or aversive reinforcers. It is, however, not clear whether the two types of motivation, (approaching appetitive stimuli and avoiding aversive stimuli) drive learning in the same or different ways, nor is their interaction understood in situations where the two types are combined in a single experiment. To investigate this question we have developed a novel learning paradigm for Mongolian gerbils, which not only allows rewards and punishments to be presented in isolation or in combination with each other, but also can use these opposite reinforcers to drive the same learned behavior. Specifically, we studied learning of tone-conditioned hurdle crossing in a shuttle box driven by either an appetitive reinforcer (brain stimulation reward) or an aversive reinforcer (electrical footshock), or by a combination of both. Combination of the two reinforcers potentiated speed of acquisition, led to maximum possible performance, and delayed extinction as compared to either reinforcer alone. Additional experiments, using partial reinforcement protocols and experiments in which one of the reinforcers was omitted after the animals had been previously trained with the combination of both reinforcers, indicated that appetitive and aversive reinforcers operated together but acted in different ways: in this particular experimental context, punishment appeared to be more effective for initial acquisition and reward more effective to maintain a high level of conditioned responses (CRs). The results imply that learning mechanisms in problem solving were maximally effective when the initial punishment of mistakes was combined with the subsequent rewarding of correct performance. Copyright 2010 IBRO. Published by Elsevier Ltd. All rights reserved.
Towards Contextualized Learning Services
NASA Astrophysics Data System (ADS)
Specht, Marcus
Personalization of feedback and instruction has often been considered as a key feature in learning support. The adaptations of the instructional process to the individual and its different aspects have been investigated from different research perspectives as learner modelling, intelligent tutoring systems, adaptive hypermedia, adaptive instruction and others. Already in the 1950s first commercial systems for adaptive instruction for trainings of keyboard skills have been developed utilizing adaptive configuration of feedback based on user performance and interaction footprints (Pask 1964). Around adaptive instruction there is a variety of research issues bringing together interdisciplinary research from computer science, engineering, psychology, psychotherapy, cybernetics, system dynamics, instructional design, and empirical research on technology enhanced learning. When classifying best practices of adaptive instruction different parameters of the instructional process have been identified which are adapted to the learner, as: sequence and size of task difficulty, time of feedback, pace of learning speed, reinforcement plan and others these are often referred to the adaptation target. Furthermore Aptitude Treatment Interaction studies explored the effect of adapting instructional parameters to different characteristics of the learner (Tennyson and Christensen 1988) as task performance, personality characteristics, or cognitive abilities, this is information is referred to as adaptation mean.
Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers
ERIC Educational Resources Information Center
Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.
2014-01-01
Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…
Functional MRI in Awake Unrestrained Dogs
Berns, Gregory S.; Brooks, Andrew M.; Spivak, Mark
2012-01-01
Because of dogs' prolonged evolution with humans, many of the canine cognitive skills are thought to represent a selection of traits that make dogs particularly sensitive to human cues. But how does the dog mind actually work? To develop a methodology to answer this question, we trained two dogs to remain motionless for the duration required to collect quality fMRI images by using positive reinforcement without sedation or physical restraints. The task was designed to determine which brain circuits differentially respond to human hand signals denoting the presence or absence of a food reward. Head motion within trials was less than 1 mm. Consistent with prior reinforcement learning literature, we observed caudate activation in both dogs in response to the hand signal denoting reward versus no-reward. PMID:22606363
Simple and conditional visual discrimination with wheel running as reinforcement in rats.
Iversen, I H
1998-09-01
Three experiments explored whether access to wheel running is sufficient as reinforcement to establish and maintain simple and conditional visual discriminations in nondeprived rats. In Experiment 1, 2 rats learned to press a lit key to produce access to running; responding was virtually absent when the key was dark, but latencies to respond were longer than for customary food and water reinforcers. Increases in the intertrial interval did not improve the discrimination performance. In Experiment 2, 3 rats acquired a go-left/go-right discrimination with a trial-initiating response and reached an accuracy that exceeded 80%; when two keys showed a steady light, pressing the left key produced access to running whereas pressing the right key produced access to running when both keys showed blinking light. Latencies to respond to the lights shortened when the trial-initiation response was introduced and became much shorter than in Experiment 1. In Experiment 3, 1 rat acquired a conditional discrimination task (matching to sample) with steady versus blinking lights at an accuracy exceeding 80%. A trial-initiation response allowed self-paced trials as in Experiment 2. When the rat was exposed to the task for 19 successive 24-hr periods with access to food and water, the discrimination performance settled in a typical circadian pattern and peak accuracy exceeded 90%. When the trial-initiation response was under extinction, without access to running, the circadian activity pattern determined the time of spontaneous recovery. The experiments demonstrate that wheel-running reinforcement can be used to establish and maintain simple and conditional visual discriminations in nondeprived rats.
The cerebellum: a neural system for the study of reinforcement learning.
Swain, Rodney A; Kerr, Abigail L; Thompson, Richard F
2011-01-01
In its strictest application, the term "reinforcement learning" refers to a computational approach to learning in which an agent (often a machine) interacts with a mutable environment to maximize reward through trial and error. The approach borrows essentials from several fields, most notably Computer Science, Behavioral Neuroscience, and Psychology. At the most basic level, a neural system capable of mediating reinforcement learning must be able to acquire sensory information about the external environment and internal milieu (either directly or through connectivities with other brain regions), must be able to select a behavior to be executed, and must be capable of providing evaluative feedback about the success of that behavior. Given that Psychology informs us that reinforcers, both positive and negative, are stimuli or consequences that increase the probability that the immediately antecedent behavior will be repeated and that reinforcer strength or viability is modulated by the organism's past experience with the reinforcer, its affect, and even the state of its muscles (e.g., eyes open or closed); it is the case that any neural system that supports reinforcement learning must also be sensitive to these same considerations. Once learning is established, such a neural system must finally be able to maintain continued response expression and prevent response drift. In this report, we examine both historical and recent evidence that the cerebellum satisfies all of these requirements. While we report evidence from a variety of learning paradigms, the majority of our discussion will focus on classical conditioning of the rabbit eye blink response as an ideal model system for the study of reinforcement and reinforcement learning.
Incomplete Multisource Transfer Learning.
Ding, Zhengming; Shao, Ming; Fu, Yun
2018-02-01
Transfer learning is generally exploited to adapt well-established source knowledge for learning tasks in weakly labeled or unlabeled target domain. Nowadays, it is common to see multiple sources available for knowledge transfer, each of which, however, may not include complete classes information of the target domain. Naively merging multiple sources together would lead to inferior results due to the large divergence among multiple sources. In this paper, we attempt to utilize incomplete multiple sources for effective knowledge transfer to facilitate the learning task in target domain. To this end, we propose an incomplete multisource transfer learning through two directional knowledge transfer, i.e., cross-domain transfer from each source to target, and cross-source transfer. In particular, in cross-domain direction, we deploy latent low-rank transfer learning guided by iterative structure learning to transfer knowledge from each single source to target domain. This practice reinforces to compensate for any missing data in each source by the complete target data. While in cross-source direction, unsupervised manifold regularizer and effective multisource alignment are explored to jointly compensate for missing data from one portion of source to another. In this way, both marginal and conditional distribution discrepancy in two directions would be mitigated. Experimental results on standard cross-domain benchmarks and synthetic data sets demonstrate the effectiveness of our proposed model in knowledge transfer from incomplete multiple sources.
A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment.
Hung, Shao-Ming; Givigi, Sidney N
2017-01-01
In the past two decades, unmanned aerial vehicles (UAVs) have demonstrated their efficacy in supporting both military and civilian applications, where tasks can be dull, dirty, dangerous, or simply too costly with conventional methods. Many of the applications contain tasks that can be executed in parallel, hence the natural progression is to deploy multiple UAVs working together as a force multiplier. However, to do so requires autonomous coordination among the UAVs, similar to swarming behaviors seen in animals and insects. This paper looks at flocking with small fixed-wing UAVs in the context of a model-free reinforcement learning problem. In particular, Peng's Q(λ) with a variable learning rate is employed by the followers to learn a control policy that facilitates flocking in a leader-follower topology. The problem is structured as a Markov decision process, where the agents are modeled as small fixed-wing UAVs that experience stochasticity due to disturbances such as winds and control noises, as well as weight and balance issues. Learned policies are compared to ones solved using stochastic optimal control (i.e., dynamic programming) by evaluating the average cost incurred during flight according to a cost function. Simulation results demonstrate the feasibility of the proposed learning approach at enabling agents to learn how to flock in a leader-follower topology, while operating in a nonstationary stochastic environment.
A model for discriminating reinforcers in time and space.
Cowie, Sarah; Davison, Michael; Elliffe, Douglas
2016-06-01
Both the response-reinforcer and stimulus-reinforcer relation are important in discrimination learning; differential responding requires a minimum of two discriminably-different stimuli and two discriminably-different associated contingencies of reinforcement. When elapsed time is a discriminative stimulus for the likely availability of a reinforcer, choice over time may be modeled by an extension of the Davison and Nevin (1999) model that assumes that local choice strictly matches the effective local reinforcer ratio. The effective local reinforcer ratio may differ from the obtained local reinforcer ratio for two reasons: Because the animal inaccurately estimates times associated with obtained reinforcers, and thus incorrectly discriminates the stimulus-reinforcer relation across time; and because of error in discriminating the response-reinforcer relation. In choice-based timing tasks, the two responses are usually highly discriminable, and so the larger contributor to differences between the effective and obtained reinforcer ratio is error in discriminating the stimulus-reinforcer relation. Such error may be modeled either by redistributing the numbers of reinforcers obtained at each time across surrounding times, or by redistributing the ratio of reinforcers obtained at each time in the same way. We assessed the extent to which these two approaches to modeling discrimination of the stimulus-reinforcer relation could account for choice in a range of temporal-discrimination procedures. The version of the model that redistributed numbers of reinforcers accounted for more variance in the data. Further, this version provides an explanation for shifts in the point of subjective equality that occur as a result of changes in the local reinforcer rate. The inclusion of a parameter reflecting error in discriminating the response-reinforcer relation enhanced the ability of each version of the model to describe data. The ability of this class of model to account for a range of data suggests that timing, like other conditional discriminations, is choice under the joint discriminative control of elapsed time and differential reinforcement. Understanding the role of differential reinforcement is therefore critical to understanding control by elapsed time. Copyright © 2016 Elsevier B.V. All rights reserved.
Reinforcement Learning of Two-Joint Virtual Arm Reaching in a Computer Model of Sensorimotor Cortex
Neymotin, Samuel A.; Chadderdon, George L.; Kerr, Cliff C.; Francis, Joseph T.; Lytton, William W.
2014-01-01
Neocortical mechanisms of learning sensorimotor control involve a complex series of interactions at multiple levels, from synaptic mechanisms to cellular dynamics to network connectomics. We developed a model of sensory and motor neocortex consisting of 704 spiking model neurons. Sensory and motor populations included excitatory cells and two types of interneurons. Neurons were interconnected with AMPA/NMDA and GABAA synapses. We trained our model using spike-timing-dependent reinforcement learning to control a two-joint virtual arm to reach to a fixed target. For each of 125 trained networks, we used 200 training sessions, each involving 15 s reaches to the target from 16 starting positions. Learning altered network dynamics, with enhancements to neuronal synchrony and behaviorally relevant information flow between neurons. After learning, networks demonstrated retention of behaviorally relevant memories by using proprioceptive information to perform reach-to-target from multiple starting positions. Networks dynamically controlled which joint rotations to use to reach a target, depending on current arm position. Learning-dependent network reorganization was evident in both sensory and motor populations: learned synaptic weights showed target-specific patterning optimized for particular reach movements. Our model embodies an integrative hypothesis of sensorimotor cortical learning that could be used to interpret future electrophysiological data recorded in vivo from sensorimotor learning experiments. We used our model to make the following predictions: learning enhances synchrony in neuronal populations and behaviorally relevant information flow across neuronal populations, enhanced sensory processing aids task-relevant motor performance and the relative ease of a particular movement in vivo depends on the amount of sensory information required to complete the movement. PMID:24047323
Robust sensorimotor representation to physical interaction changes in humanoid motion learning.
Shimizu, Toshihiko; Saegusa, Ryo; Ikemoto, Shuhei; Ishiguro, Hiroshi; Metta, Giorgio
2015-05-01
This paper proposes a learning from demonstration system based on a motion feature, called phase transfer sequence. The system aims to synthesize the knowledge on humanoid whole body motions learned during teacher-supported interactions, and apply this knowledge during different physical interactions between a robot and its surroundings. The phase transfer sequence represents the temporal order of the changing points in multiple time sequences. It encodes the dynamical aspects of the sequences so as to absorb the gaps in timing and amplitude derived from interaction changes. The phase transfer sequence was evaluated in reinforcement learning of sitting-up and walking motions conducted by a real humanoid robot and compatible simulator. In both tasks, the robotic motions were less dependent on physical interactions when learned by the proposed feature than by conventional similarity measurements. Phase transfer sequence also enhanced the convergence speed of motion learning. Our proposed feature is original primarily because it absorbs the gaps caused by changes of the originally acquired physical interactions, thereby enhancing the learning speed in subsequent interactions.
Leue, Anja; Lange, Sebastian; Beauducel, André
2012-06-01
According to Botvinick's (2007) integrative account, conflict monitoring is aversive because individuals anticipate cognitive demand, whereas the revised reinforcement sensitivity theory (rRST) predicts that conflict processing is aversive because individuals anticipate aversive reinforcement of erroneous responses. Because these accounts give different reasons for the aversive aspects of conflict, we manipulated cognitive demand and the aversive reinforcement as a consequence of wrong choices in a go/no-go task. Thereby, we also aimed to investigate whether individual differences in conflict sensitivity (i.e., in trait anxiety, linked to high sensitivity of the behavioral inhibition system [trait-BIS]) represent the effects of aversive reinforcement and cognitive demand in conflict tasks. We expected that these manipulations would have effects on the frontal N2 component representing activity of the anterior cingulate cortex. Moreover, higher-trait-BIS individuals should be more sensitive than lower-trait-BIS individuals to aversive effects in conflict situations, resulting in a more negative frontal N2 for higher-trait-BIS individuals. In Study 1, with N = 104 students, and Study 2, with N = 47 students, aversive reinforcement was manipulated in three levels (within-subjects factor) and cognitive demand in two levels (between-subjects factor). The behavioral findings from the go/no-go task with noncounterbalanced reinforcement levels (Study 1) could be widely replicated in a task with counterbalanced reinforcement levels (Study 2). The frontal mean no-go N2 amplitude and the frontal no-go N2 dipole captured predicted reinforcement-related variations of conflict monitoring, indicating that the anticipation of aversive reinforcement induces variations in conflict monitoring intensity in frontal brain areas. The aversive nature of conflict was underlined by the more pronounced conflict monitoring in higher- than in lower-trait-BIS individuals.
Pérez-García, Georgina; Guzmán-Quevedo, Omar; Da Silva Aragão, Raquel; Bolaños-Jiménez, Francisco
2016-02-17
Numerous epidemiological studies indicate that malnutrition during in utero development and/or childhood induces long-lasting learning disabilities and enhanced susceptibility to develop psychiatric disorders. However, animal studies aimed to address this question have yielded inconsistent results due to the use of learning tasks involving negative or positive reinforces that interfere with the enduring changes in emotional reactivity and motivation produced by in utero and neonatal malnutrition. Consequently, the mechanisms underlying the learning deficits associated with malnutrition in early life remain unknown. Here we implemented a behavioural paradigm based on the combination of the novel object recognition and the novel object location tasks to define the impact of early protein-restriction on the behavioural, cellular and molecular basis of memory processing. Adult rats born to dams fed a low-protein diet during pregnancy and lactation, exhibited impaired encoding and consolidation of memory resulting from impaired pattern separation. This learning deficit was associated with reduced production of newly born hippocampal neurons and down regulation of BDNF gene expression. These data sustain the existence of a causal relationship between early malnutrition and impaired learning in adulthood and show that decreased adult neurogenesis is associated to the cognitive deficits induced by childhood exposure to poor nutrition.
Pérez-García, Georgina; Guzmán-Quevedo, Omar; Da Silva Aragão, Raquel; Bolaños-Jiménez, Francisco
2016-01-01
Numerous epidemiological studies indicate that malnutrition during in utero development and/or childhood induces long-lasting learning disabilities and enhanced susceptibility to develop psychiatric disorders. However, animal studies aimed to address this question have yielded inconsistent results due to the use of learning tasks involving negative or positive reinforces that interfere with the enduring changes in emotional reactivity and motivation produced by in utero and neonatal malnutrition. Consequently, the mechanisms underlying the learning deficits associated with malnutrition in early life remain unknown. Here we implemented a behavioural paradigm based on the combination of the novel object recognition and the novel object location tasks to define the impact of early protein-restriction on the behavioural, cellular and molecular basis of memory processing. Adult rats born to dams fed a low-protein diet during pregnancy and lactation, exhibited impaired encoding and consolidation of memory resulting from impaired pattern separation. This learning deficit was associated with reduced production of newly born hippocampal neurons and down regulation of BDNF gene expression. These data sustain the existence of a causal relationship between early malnutrition and impaired learning in adulthood and show that decreased adult neurogenesis is associated to the cognitive deficits induced by childhood exposure to poor nutrition. PMID:26882991
The role of GABAB receptors in human reinforcement learning.
Ort, Andres; Kometer, Michael; Rohde, Judith; Seifritz, Erich; Vollenweider, Franz X
2014-10-01
Behavioral evidence from human studies suggests that the γ-aminobutyric acid type B receptor (GABAB receptor) agonist baclofen modulates reinforcement learning and reduces craving in patients with addiction spectrum disorders. However, in contrast to the well established role of dopamine in reinforcement learning, the mechanisms by which the GABAB receptor influences reinforcement learning in humans remain completely unknown. To further elucidate this issue, a cross-over, double-blind, placebo-controlled study was performed in healthy human subjects (N=15) to test the effects of baclofen (20 and 50mg p.o.) on probabilistic reinforcement learning. Outcomes were the feedback-induced P2 component of the event-related potential, the feedback-related negativity, and the P300 component of the event-related potential. Baclofen produced a reduction of P2 amplitude over the course of the experiment, but did not modulate the feedback-related negativity. Furthermore, there was a trend towards increased learning after baclofen administration relative to placebo over the course of the experiment. The present results extend previous theories of reinforcement learning, which focus on the importance of mesolimbic dopamine signaling, and indicate that stimulation of cortical GABAB receptors in a fronto-parietal network leads to better attentional allocation in reinforcement learning. This observation is a first step in our understanding of how baclofen may improve reinforcement learning in healthy subjects. Further studies with bigger sample sizes are needed to corroborate this conclusion and furthermore, test this effect in patients with addiction spectrum disorder. Copyright © 2014 Elsevier B.V. and ECNP. All rights reserved.
Effects of dopamine on reinforcement learning and consolidation in Parkinson’s disease
Grogan, John P; Tsivos, Demitra; Smith, Laura; Knight, Brogan E; Bogacz, Rafal; Whone, Alan; Coulthard, Elizabeth J
2017-01-01
Emerging evidence suggests that dopamine may modulate learning and memory with important implications for understanding the neurobiology of memory and future therapeutic targeting. An influential hypothesis posits that dopamine biases reinforcement learning. More recent data also suggest an influence during both consolidation and retrieval. Eighteen Parkinson’s disease patients learned through feedback ON or OFF medication, with memory tested 24 hr later ON or OFF medication (4 conditions, within-subjects design with matched healthy control group). Patients OFF medication during learning decreased in memory accuracy over the following 24 hr. In contrast to previous studies, however, dopaminergic medication during learning and testing did not affect expression of positive or negative reinforcement. Two further experiments were run without the 24 hr delay, but they too failed to reproduce effects of dopaminergic medication on reinforcement learning. While supportive of a dopaminergic role in consolidation, this study failed to replicate previous findings on reinforcement learning. DOI: http://dx.doi.org/10.7554/eLife.26801.001 PMID:28691905
A Comparison of Self-Monitoring with and without Reinforcement to Improve On-Task Classroom Behavior
ERIC Educational Resources Information Center
Davis, Tonya N.; Dacus, Sharon; Bankhead, Jenna; Haupert, Megan; Fuentes, Lisa; Zoch, Tamara; Kang, Soyeon; Attai, Shanna; Lang, Russell
2014-01-01
In this study we analyzed the effects of a self-monitoring and self-monitoring plus reinforcement intervention on classroom behavior. A typically-developing high school student demonstrating difficulty staying on-task during classroom instruction was observed in three classroom settings associated with high levels of off-task behavior. During…
A Study on the Effects of Some Reinforcers to Improve Performance of Employees in a Retail Industry
ERIC Educational Resources Information Center
Raj, John Dilip; Nelson, John Abraham; Rao, K. S. P.
2006-01-01
Two field experiments were conducted in the Business Information Technology Department of a major retail industry to analyze the impact of positive task performance reinforcers. The employees were divided into two broad groups--those performing complex tasks and those performing relatively simpler tasks. The first group was further divided into…
Fear of losing money? Aversive conditioning with secondary reinforcers.
Delgado, M R; Labouliere, C D; Phelps, E A
2006-12-01
Money is a secondary reinforcer that acquires its value through social communication and interaction. In everyday human behavior and laboratory studies, money has been shown to influence appetitive or reward learning. It is unclear, however, if money has a similar impact on aversive learning. The goal of this study was to investigate the efficacy of money in aversive learning, comparing it with primary reinforcers that are traditionally used in fear conditioning paradigms. A series of experiments were conducted in which participants initially played a gambling game that led to a monetary gain. They were then presented with an aversive conditioning paradigm, with either shock (primary reinforcer) or loss of money (secondary reinforcer) as the unconditioned stimulus. Skin conductance responses and subjective ratings indicated that potential monetary loss modulated the conditioned response. Depending on the presentation context, the secondary reinforcer was as effective as the primary reinforcer during aversive conditioning. These results suggest that stimuli that acquire reinforcing properties through social communication and interaction, such as money, can effectively influence aversive learning.
Reinforcement learning and Tourette syndrome.
Palminteri, Stefano; Pessiglione, Mathias
2013-01-01
In this chapter, we report the first experimental explorations of reinforcement learning in Tourette syndrome, realized by our team in the last few years. This report will be preceded by an introduction aimed to provide the reader with the state of the art of the knowledge concerning the neural bases of reinforcement learning at the moment of these studies and the scientific rationale beyond them. In short, reinforcement learning is learning by trial and error to maximize rewards and minimize punishments. This decision-making and learning process implicates the dopaminergic system projecting to the frontal cortex-basal ganglia circuits. A large body of evidence suggests that the dysfunction of the same neural systems is implicated in the pathophysiology of Tourette syndrome. Our results show that Tourette condition, as well as the most common pharmacological treatments (dopamine antagonists), affects reinforcement learning performance in these patients. Specifically, the results suggest a deficit in negative reinforcement learning, possibly underpinned by a functional hyperdopaminergia, which could explain the persistence of tics, despite their evident inadaptive (negative) value. This idea, together with the implications of these results in Tourette therapy and the future perspectives, is discussed in Section 4 of this chapter. © 2013 Elsevier Inc. All rights reserved.
Boedecker, Joschka; Lampe, Thomas; Riedmiller, Martin
2013-01-01
A common assumption in psychology, economics, and other fields holds that higher performance will result if extrinsic rewards (such as money) are offered as an incentive. While this principle seems to work well for tasks that require the execution of the same sequence of steps over and over, with little uncertainty about the process, in other cases, especially where creative problem solving is required due to the difficulty in finding the optimal sequence of actions, external rewards can actually be detrimental to task performance. Furthermore, they have the potential to undermine intrinsic motivation to do an otherwise interesting activity. In this work, we extend a computational model of the dorsomedial and dorsolateral striatal reinforcement learning systems to account for the effects of extrinsic and intrinsic rewards. The model assumes that the brain employs both a goal-directed and a habitual learning system, and competition between both is based on the trade-off between the cost of the reasoning process and value of information. The goal-directed system elicits internal rewards when its models of the environment improve, while the habitual system, being model-free, does not. Our results account for the phenomena that initial extrinsic reward leads to reduced activity after extinction compared to the case without any initial extrinsic rewards, and that performance in complex task settings drops when higher external rewards are promised. We also test the hypothesis that external rewards bias the competition in favor of the computationally efficient, but cruder and less flexible habitual system, which can negatively influence intrinsic motivation and task performance in the class of tasks we consider.
On the integration of reinforcement learning and approximate reasoning for control
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1991-01-01
The author discusses the importance of strengthening the knowledge representation characteristic of reinforcement learning techniques using methods such as approximate reasoning. The ARIC (approximate reasoning-based intelligent control) architecture is an example of such a hybrid approach in which the fuzzy control rules are modified (fine-tuned) using reinforcement learning. ARIC also demonstrates that it is possible to start with an approximately correct control knowledge base and learn to refine this knowledge through further experience. On the other hand, techniques such as the TD (temporal difference) algorithm and Q-learning establish stronger theoretical foundations for their use in adaptive control and also in stability analysis of hybrid reinforcement learning and approximate reasoning-based controllers.
Intelligence moderates reinforcement learning: a mini-review of the neural evidence
2014-01-01
Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. PMID:25185818
Intelligence moderates reinforcement learning: a mini-review of the neural evidence.
Chen, Chong
2015-06-01
Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. Copyright © 2015 the American Physiological Society.
Hodzic, Amra; Veit, Ralf; Karim, Ahmed A; Erb, Michael; Godde, Ben
2004-01-14
Perceptual learning can be induced by passive tactile coactivation without attention or reinforcement. We used functional MRI (fMRI) and psychophysics to investigate in detail the specificity of this type of learning for different tactile discrimination tasks and the underlying cortical reorganization. We found that a few hours of Hebbian coactivation evoked a significant increase of primary (SI) and secondary (SII) somatosensory cortical areas representing the stimulated body parts. The amount of plastic changes was strongly correlated with improvement in spatial discrimination performance. However, in the same subjects, frequency discrimination was impaired after coactivation, indicating that even maladaptive processes can be induced by intense passive sensory stimulation.
Prespeech motor learning in a neural network using reinforcement☆
Warlaumont, Anne S.; Westermann, Gert; Buder, Eugene H.; Oller, D. Kimbrough
2012-01-01
Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one’s language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the differ-ent conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network’s post learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network’s post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model’s post-learning productions were more likely to resemble the English vowels and vice versa. PMID:23275137
Altered cingulo-striatal function underlies reward drive deficits in schizophrenia.
Park, Il Ho; Chun, Ji Won; Park, Hae-Jeong; Koo, Min-Seong; Park, Sunyoung; Kim, Seok-Hyeong; Kim, Jae-Jin
2015-02-01
Amotivation in schizophrenia is assumed to involve dysfunctional dopaminergic signaling of reward prediction or anticipation. It is unclear, however, whether the translation of neural representation of reward value to behavioral drive is affected in schizophrenia. In order to examine how abnormal neural processing of response valuation and initiation affects incentive motivation in schizophrenia, we conducted functional MRI using a deterministic reinforcement learning task with variable intervals of contingency reversals in 20 clinically stable patients with schizophrenia and 20 healthy controls. Behaviorally, the advantage of positive over negative reinforcer in reinforcement-related responsiveness was not observed in patients. Patients showed altered response valuation and initiation-related striatal activity and deficient rostro-ventral anterior cingulate cortex activation during reward approach initiation. Among these neural abnormalities, rostro-ventral anterior cingulate cortex activation was correlated with positive reinforcement-related responsiveness in controls and social anhedonia and social amotivation subdomain scores in patients. Our findings indicate that the central role of the anterior cingulate cortex is in translating action value into driving force of action, and underscore the role of the cingulo-striatal network in amotivation in schizophrenia. Copyright © 2014 Elsevier B.V. All rights reserved.
Aberg, Kristoffer Carl; Doell, Kimberly Crystal; Schwartz, Sophie
2016-08-01
Orienting biases refer to consistent, trait-like direction of attention or locomotion toward one side of space. Recent studies suggest that such hemispatial biases may determine how well people memorize information presented in the left or right hemifield. Moreover, lesion studies indicate that learning rewarded stimuli in one hemispace depends on the integrity of the contralateral striatum. However, the exact neural and computational mechanisms underlying the influence of individual orienting biases on reward learning remain unclear. Because reward-based behavioural adaptation depends on the dopaminergic system and prediction error (PE) encoding in the ventral striatum, we hypothesized that hemispheric asymmetries in dopamine (DA) function may determine individual spatial biases in reward learning. To test this prediction, we acquired fMRI in 33 healthy human participants while they performed a lateralized reward task. Learning differences between hemispaces were assessed by presenting stimuli, assigned to different reward probabilities, to the left or right of central fixation, i.e. presented in the left or right visual hemifield. Hemispheric differences in DA function were estimated through differential fMRI responses to positive vs. negative feedback in the left vs. right ventral striatum, and a computational approach was used to identify the neural correlates of PEs. Our results show that spatial biases favoring reward learning in the right (vs. left) hemifield were associated with increased reward responses in the left hemisphere and relatively better neural encoding of PEs for stimuli presented in the right (vs. left) hemifield. These findings demonstrate that trait-like spatial biases implicate hemisphere-specific learning mechanisms, with individual differences between hemispheres contributing to reinforcing spatial biases. Copyright © 2016 Elsevier Ltd. All rights reserved.
Reinforcement learning in complementarity game and population dynamics
NASA Astrophysics Data System (ADS)
Jost, Jürgen; Li, Wei
2014-02-01
We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005), 10.1016/j.physa.2004.07.005] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.
Intertrial interval duration and learning in autistic children.
Koegel, R L; Dunlap, G; Dyer, K
1980-01-01
This study investigated the influence of intertrial interval duration on the performance of autistic children during teaching situations. The children were taught under the same conditions existing in their regular programs, except that the length of time between trials was systematically manipulated. With both multiple baseline and repeated reversal designs, two lengths of intertrial interval were employed; short intervals with the SD for any given trial presented approximately one second following the reinforcer for the previous trial versus long intervals with the SD presented four or more seconds following the reinforcer for the previous trial. The results showed that: (1) the short intertrial intervals always produced higher levels of correct responding than the long intervals; and (2) there were improving trends in performance and rapid acquisition with the short intertrial intervals, in contrast to minimal or no change with the long intervals. The results are discussed in terms of utilizing information about child and task characteristics in terms of selecting optimal intervals. The data suggest that manipulations made between trials have a large influence on autistic children's learning. PMID:7364701
Cognitive control predicts use of model-based reinforcement learning.
Otto, A Ross; Skatova, Anya; Madlon-Kay, Seth; Daw, Nathaniel D
2015-02-01
Accounts of decision-making and its neural substrates have long posited the operation of separate, competing valuation systems in the control of choice behavior. Recent theoretical and experimental work suggest that this classic distinction between behaviorally and neurally dissociable systems for habitual and goal-directed (or more generally, automatic and controlled) choice may arise from two computational strategies for reinforcement learning (RL), called model-free and model-based RL, but the cognitive or computational processes by which one system may dominate over the other in the control of behavior is a matter of ongoing investigation. To elucidate this question, we leverage the theoretical framework of cognitive control, demonstrating that individual differences in utilization of goal-related contextual information--in the service of overcoming habitual, stimulus-driven responses--in established cognitive control paradigms predict model-based behavior in a separate, sequential choice task. The behavioral correspondence between cognitive control and model-based RL compellingly suggests that a common set of processes may underpin the two behaviors. In particular, computational mechanisms originally proposed to underlie controlled behavior may be applicable to understanding the interactions between model-based and model-free choice behavior.
Kappel, David; Legenstein, Robert; Habenschuss, Stefan; Hsieh, Michael; Maass, Wolfgang
2018-01-01
Synaptic connections between neurons in the brain are dynamic because of continuously ongoing spine dynamics, axonal sprouting, and other processes. In fact, it was recently shown that the spontaneous synapse-autonomous component of spine dynamics is at least as large as the component that depends on the history of pre- and postsynaptic neural activity. These data are inconsistent with common models for network plasticity and raise the following questions: how can neural circuits maintain a stable computational function in spite of these continuously ongoing processes, and what could be functional uses of these ongoing processes? Here, we present a rigorous theoretical framework for these seemingly stochastic spine dynamics and rewiring processes in the context of reward-based learning tasks. We show that spontaneous synapse-autonomous processes, in combination with reward signals such as dopamine, can explain the capability of networks of neurons in the brain to configure themselves for specific computational tasks, and to compensate automatically for later changes in the network or task. Furthermore, we show theoretically and through computer simulations that stable computational performance is compatible with continuously ongoing synapse-autonomous changes. After reaching good computational performance it causes primarily a slow drift of network architecture and dynamics in task-irrelevant dimensions, as observed for neural activity in motor cortex and other areas. On the more abstract level of reinforcement learning the resulting model gives rise to an understanding of reward-driven network plasticity as continuous sampling of network configurations.
Habenschuss, Stefan; Hsieh, Michael
2018-01-01
Synaptic connections between neurons in the brain are dynamic because of continuously ongoing spine dynamics, axonal sprouting, and other processes. In fact, it was recently shown that the spontaneous synapse-autonomous component of spine dynamics is at least as large as the component that depends on the history of pre- and postsynaptic neural activity. These data are inconsistent with common models for network plasticity and raise the following questions: how can neural circuits maintain a stable computational function in spite of these continuously ongoing processes, and what could be functional uses of these ongoing processes? Here, we present a rigorous theoretical framework for these seemingly stochastic spine dynamics and rewiring processes in the context of reward-based learning tasks. We show that spontaneous synapse-autonomous processes, in combination with reward signals such as dopamine, can explain the capability of networks of neurons in the brain to configure themselves for specific computational tasks, and to compensate automatically for later changes in the network or task. Furthermore, we show theoretically and through computer simulations that stable computational performance is compatible with continuously ongoing synapse-autonomous changes. After reaching good computational performance it causes primarily a slow drift of network architecture and dynamics in task-irrelevant dimensions, as observed for neural activity in motor cortex and other areas. On the more abstract level of reinforcement learning the resulting model gives rise to an understanding of reward-driven network plasticity as continuous sampling of network configurations. PMID:29696150
Saunders, Richard R; McEntee, Julie E; Saunders, Muriel D
2005-01-01
The effects of variable-interval (VI) and fixed-ratio (FR) schedules of reinforcement for work-related behavior and an organizer for the work materials (behavioral prosthesis) were evaluated with 3 adults with severe or profound mental retardation. The participants had been recommended for study because of high rates of off-task and aberrant behavior in their daily vocational training programs. For 2 participants, VI and FR schedules resulted in the same outcome: more aberrant behavior than on-task and off-task behavior combined. The FR schedule nearly eliminated emission of aberrant and off-task behavior by the 3rd participant. Combining the behavioral prosthesis with FR reinforcement (FR+O) increased the proportion of time spent in on-task behavior by all participants under certain FR schedule parameters. Second-by-second analyses of the observation records revealed that FR schedules reduced off-task and aberrant behavior during work sequences (i.e., ratio runs), and FR+O led to a further reduction of these behaviors during postreinforcement pauses. Overall, the results show how organizer and schedule parameters can be adjusted to produce an optimized balance between productivity and reinforcement while undesirable behavior is minimized.