Therrien, Amanda S; Wolpert, Daniel M; Bastian, Amy J
2016-01-01
Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. © The Author (2015). Published by Oxford University Press on behalf of the Guarantors of Brain.
Therrien, Amanda S.; Wolpert, Daniel M.
2016-01-01
Abstract See Miall and Galea (doi: 10.1093/awv343 ) for a scientific commentary on this article. Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. PMID:26626368
ERIC Educational Resources Information Center
Redish, A. David; Jensen, Steve; Johnson, Adam; Kurth-Nelson, Zeb
2007-01-01
Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated value and are thus unable to capture renewal. TDRL…
Racial bias shapes social reinforcement learning.
Lindström, Björn; Selbing, Ida; Molapour, Tanaz; Olsson, Andreas
2014-03-01
Both emotional facial expressions and markers of racial-group belonging are ubiquitous signals in social interaction, but little is known about how these signals together affect future behavior through learning. To address this issue, we investigated how emotional (threatening or friendly) in-group and out-group faces reinforced behavior in a reinforcement-learning task. We asked whether reinforcement learning would be modulated by intergroup attitudes (i.e., racial bias). The results showed that individual differences in racial bias critically modulated reinforcement learning. As predicted, racial bias was associated with more efficiently learned avoidance of threatening out-group individuals. We used computational modeling analysis to quantitatively delimit the underlying processes affected by social reinforcement. These analyses showed that racial bias modulates the rate at which exposure to threatening out-group individuals is transformed into future avoidance behavior. In concert, these results shed new light on the learning processes underlying social interaction with racial-in-group and out-group individuals.
The drift diffusion model as the choice rule in reinforcement learning.
Pedersen, Mads Lund; Frank, Michael J; Biele, Guido
2017-08-01
Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.
The drift diffusion model as the choice rule in reinforcement learning
Frank, Michael J.
2017-01-01
Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyper-activity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups. PMID:27966103
Social Cognition as Reinforcement Learning: Feedback Modulates Emotion Inference.
Zaki, Jamil; Kallman, Seth; Wimmer, G Elliott; Ochsner, Kevin; Shohamy, Daphna
2016-09-01
Neuroscientific studies of social cognition typically employ paradigms in which perceivers draw single-shot inferences about the internal states of strangers. Real-world social inference features much different parameters: People often encounter and learn about particular social targets (e.g., friends) over time and receive feedback about whether their inferences are correct or incorrect. Here, we examined this process and, more broadly, the intersection between social cognition and reinforcement learning. Perceivers were scanned using fMRI while repeatedly encountering three social targets who produced conflicting visual and verbal emotional cues. Perceivers guessed how targets felt and received feedback about whether they had guessed correctly. Visual cues reliably predicted one target's emotion, verbal cues predicted a second target's emotion, and neither reliably predicted the third target's emotion. Perceivers successfully used this information to update their judgments over time. Furthermore, trial-by-trial learning signals-estimated using two reinforcement learning models-tracked activity in ventral striatum and ventromedial pFC, structures associated with reinforcement learning, and regions associated with updating social impressions, including TPJ. These data suggest that learning about others' emotions, like other forms of feedback learning, relies on domain-general reinforcement mechanisms as well as domain-specific social information processing.
Reinforcement learning in scheduling
NASA Technical Reports Server (NTRS)
Dietterich, Tom G.; Ok, Dokyeong; Zhang, Wei; Tadepalli, Prasad
1994-01-01
The goal of this research is to apply reinforcement learning methods to real-world problems like scheduling. In this preliminary paper, we show that learning to solve scheduling problems such as the Space Shuttle Payload Processing and the Automatic Guided Vehicle (AGV) scheduling can be usefully studied in the reinforcement learning framework. We discuss some of the special challenges posed by the scheduling domain to these methods and propose some possible solutions we plan to implement.
Framework for robot skill learning using reinforcement learning
NASA Astrophysics Data System (ADS)
Wei, Yingzi; Zhao, Mingyang
2003-09-01
Robot acquiring skill is a process similar to human skill learning. Reinforcement learning (RL) is an on-line actor critic method for a robot to develop its skill. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. We present an augmented reward function that provides a new way for RL controller to incorporate prior knowledge and experience into the RL controller. Also, the difference form of augmented reward function is considered carefully. The additional reward beyond conventional reward will provide more heuristic information for RL. In this paper, we present a strategy for the task of complex skill learning. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. The new form of value function is introduced to attain smooth motion switching swiftly. We present a formal, but practical, framework for robot skill learning and also illustrate with an example the utility of method for learning skilled robot control on line.
Balcarras, Matthew; Ardid, Salva; Kaping, Daniel; Everling, Stefan; Womelsdorf, Thilo
2016-02-01
Attention includes processes that evaluate stimuli relevance, select the most relevant stimulus against less relevant stimuli, and bias choice behavior toward the selected information. It is not clear how these processes interact. Here, we captured these processes in a reinforcement learning framework applied to a feature-based attention task that required macaques to learn and update the value of stimulus features while ignoring nonrelevant sensory features, locations, and action plans. We found that value-based reinforcement learning mechanisms could account for feature-based attentional selection and choice behavior but required a value-independent stickiness selection process to explain selection errors while at asymptotic behavior. By comparing different reinforcement learning schemes, we found that trial-by-trial selections were best predicted by a model that only represents expected values for the task-relevant feature dimension, with nonrelevant stimulus features and action plans having only a marginal influence on covert selections. These findings show that attentional control subprocesses can be described by (1) the reinforcement learning of feature values within a restricted feature space that excludes irrelevant feature dimensions, (2) a stochastic selection process on feature-specific value representations, and (3) value-independent stickiness toward previous feature selections akin to perseveration in the motor domain. We speculate that these three mechanisms are implemented by distinct but interacting brain circuits and that the proposed formal account of feature-based stimulus selection will be important to understand how attentional subprocesses are implemented in primate brain networks.
Mastery Learning through Individualized Instruction: A Reinforcement Strategy
ERIC Educational Resources Information Center
Sagy, John; Ravi, R.; Ananthasayanam, R.
2009-01-01
The present study attempts to gauge the effect of individualized instructional methods as a reinforcement strategy for mastery learning. Among various individualized instructional methods, the study focuses on PIM (Programmed Instructional Method) and CAIM (Computer Assisted Instruction Method). Mastery learning is a process where students achieve…
Bakic, Jasmina; Pourtois, Gilles; Jepma, Marieke; Duprat, Romain; De Raedt, Rudi; Baeken, Chris
2017-01-01
Major depressive disorder (MDD) creates debilitating effects on a wide range of cognitive functions, including reinforcement learning (RL). In this study, we sought to assess whether reward processing as such, or alternatively the complex interplay between motivation and reward might potentially account for the abnormal reward-based learning in MDD. A total of 35 treatment resistant MDD patients and 44 age matched healthy controls (HCs) performed a standard probabilistic learning task. RL was titrated using behavioral, computational modeling and event-related brain potentials (ERPs) data. MDD patients showed comparable learning rate compared to HCs. However, they showed decreased lose-shift responses as well as blunted subjective evaluations of the reinforcers used during the task, relative to HCs. Moreover, MDD patients showed normal internal (at the level of error-related negativity, ERN) but abnormal external (at the level of feedback-related negativity, FRN) reward prediction error (RPE) signals during RL, selectively when additional efforts had to be made to establish learning. Collectively, these results lend support to the assumption that MDD does not impair reward processing per se during RL. Instead, it seems to alter the processing of the emotional value of (external) reinforcers during RL, when additional intrinsic motivational processes have to be engaged. © 2016 Wiley Periodicals, Inc.
Adaptive Educational Software by Applying Reinforcement Learning
ERIC Educational Resources Information Center
Bennane, Abdellah
2013-01-01
The introduction of the intelligence in teaching software is the object of this paper. In software elaboration process, one uses some learning techniques in order to adapt the teaching software to characteristics of student. Generally, one uses the artificial intelligence techniques like reinforcement learning, Bayesian network in order to adapt…
Navigating complex decision spaces: Problems and paradigms in sequential choice
Walsh, Matthew M.; Anderson, John R.
2015-01-01
To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult when the consequences of an action follow a delay. This introduces the problem of temporal credit assignment. When feedback follows a sequence of decisions, how should the individual assign credit to the intermediate actions that comprise the sequence? Research in reinforcement learning provides two general solutions to this problem: model-free reinforcement learning and model-based reinforcement learning. In this review, we examine connections between stimulus-response and cognitive learning theories, habitual and goal-directed control, and model-free and model-based reinforcement learning. We then consider a range of problems related to temporal credit assignment. These include second-order conditioning and secondary reinforcers, latent learning and detour behavior, partially observable Markov decision processes, actions with distributed outcomes, and hierarchical learning. We ask whether humans and animals, when faced with these problems, behave in a manner consistent with reinforcement learning techniques. Throughout, we seek to identify neural substrates of model-free and model-based reinforcement learning. The former class of techniques is understood in terms of the neurotransmitter dopamine and its effects in the basal ganglia. The latter is understood in terms of a distributed network of regions including the prefrontal cortex, medial temporal lobes cerebellum, and basal ganglia. Not only do reinforcement learning techniques have a natural interpretation in terms of human and animal behavior, but they also provide a useful framework for understanding neural reward valuation and action selection. PMID:23834192
Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning.
Feng, Yuntian; Zhang, Hongjun; Hao, Wenning; Chen, Gang
2017-01-01
We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q -Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.
Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning
Zhang, Hongjun; Chen, Gang
2017-01-01
We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q-Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score. PMID:28894463
Prespeech motor learning in a neural network using reinforcement.
Warlaumont, Anne S; Westermann, Gert; Buder, Eugene H; Oller, D Kimbrough
2013-02-01
Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one's language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the different conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network's post-learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network's post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model's post-learning productions were more likely to resemble the English vowels and vice versa. Copyright © 2012 Elsevier Ltd. All rights reserved.
Neural Basis of Reinforcement Learning and Decision Making
Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan
2012-01-01
Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain. PMID:22462543
Goal-Directed and Habit-Like Modulations of Stimulus Processing during Reinforcement Learning.
Luque, David; Beesley, Tom; Morris, Richard W; Jack, Bradley N; Griffiths, Oren; Whitford, Thomas J; Le Pelley, Mike E
2017-03-15
Recent research has shown that perceptual processing of stimuli previously associated with high-value rewards is automatically prioritized even when rewards are no longer available. It has been hypothesized that such reward-related modulation of stimulus salience is conceptually similar to an "attentional habit." Recording event-related potentials in humans during a reinforcement learning task, we show strong evidence in favor of this hypothesis. Resistance to outcome devaluation (the defining feature of a habit) was shown by the stimulus-locked P1 component, reflecting activity in the extrastriate visual cortex. Analysis at longer latencies revealed a positive component (corresponding to the P3b, from 550-700 ms) sensitive to outcome devaluation. Therefore, distinct spatiotemporal patterns of brain activity were observed corresponding to habitual and goal-directed processes. These results demonstrate that reinforcement learning engages both attentional habits and goal-directed processes in parallel. Consequences for brain and computational models of reinforcement learning are discussed. SIGNIFICANCE STATEMENT The human attentional network adapts to detect stimuli that predict important rewards. A recent hypothesis suggests that the visual cortex automatically prioritizes reward-related stimuli, driven by cached representations of reward value; that is, stimulus-response habits. Alternatively, the neural system may track the current value of the predicted outcome. Our results demonstrate for the first time that visual cortex activity is increased for reward-related stimuli even when the rewarding event is temporarily devalued. In contrast, longer-latency brain activity was specifically sensitive to transient changes in reward value. Therefore, we show that both habit-like attention and goal-directed processes occur in the same learning episode at different latencies. This result has important consequences for computational models of reinforcement learning. Copyright © 2017 the authors 0270-6474/17/373009-09$15.00/0.
Reinforcement learning and Tourette syndrome.
Palminteri, Stefano; Pessiglione, Mathias
2013-01-01
In this chapter, we report the first experimental explorations of reinforcement learning in Tourette syndrome, realized by our team in the last few years. This report will be preceded by an introduction aimed to provide the reader with the state of the art of the knowledge concerning the neural bases of reinforcement learning at the moment of these studies and the scientific rationale beyond them. In short, reinforcement learning is learning by trial and error to maximize rewards and minimize punishments. This decision-making and learning process implicates the dopaminergic system projecting to the frontal cortex-basal ganglia circuits. A large body of evidence suggests that the dysfunction of the same neural systems is implicated in the pathophysiology of Tourette syndrome. Our results show that Tourette condition, as well as the most common pharmacological treatments (dopamine antagonists), affects reinforcement learning performance in these patients. Specifically, the results suggest a deficit in negative reinforcement learning, possibly underpinned by a functional hyperdopaminergia, which could explain the persistence of tics, despite their evident inadaptive (negative) value. This idea, together with the implications of these results in Tourette therapy and the future perspectives, is discussed in Section 4 of this chapter. © 2013 Elsevier Inc. All rights reserved.
Walker, Brendan M.
2013-01-01
This article represents one of five contributions focusing on the topic “Plasticity and neuroadaptive responses within the extended amygdala in response to chronic or excessive alcohol exposure” that were developed by awardees participating in the Young Investigator Award Symposium at the “Alcoholism and Stress: A Framework for Future Treatment Strategies” conference in Volterra, Italy on May 3–6, 2011 that was organized/chaired by Drs. Antonio Noronha and Fulton Crews and sponsored by the National Institute on Alcohol Abuse and Alcoholism. This review discusses the dependence-induced neuroadaptations in affective systems that provide a basis for negative reinforcement learning and presents evidence demonstrating that escalated alcohol consumption during withdrawal is a learned, plasticity-dependent process. The review concludes by identifying changes within extended amygdala dynorphin/kappa-opioid receptor systems that could serve as the foundation for the occurrence of negative reinforcement processes. While some evidence contained herein may be specific to alcohol dependence-related learning and plasticity, much of the information will be of relevance to any addictive disorder involving negative reinforcement mechanisms. Collectively, the information presented within this review provides a framework to assess the negative reinforcing effects of alcohol in a manner that distinguishes neuroadaptations produced by chronic alcohol exposure from the actual plasticity that is associated with negative reinforcement learning in dependent organisms. PMID:22459874
Hisey, Erin; Kearney, Matthew Gene; Mooney, Richard
2018-04-01
The complex skills underlying verbal and musical expression can be learned without external punishment or reward, indicating their learning is internally guided. The neural mechanisms that mediate internally guided learning are poorly understood, but a circuit comprising dopamine-releasing neurons in the midbrain ventral tegmental area (VTA) and their targets in the basal ganglia are important to externally reinforced learning. Juvenile zebra finches copy a tutor song in a process that is internally guided and, in adulthood, can learn to modify the fundamental frequency (pitch) of a target syllable in response to external reinforcement with white noise. Here we combined intersectional genetic ablation of VTA neurons, reversible blockade of dopamine receptors in the basal ganglia, and singing-triggered optogenetic stimulation of VTA terminals to establish that a common VTA-basal ganglia circuit enables internally guided song copying and externally reinforced syllable pitch learning.
Improving the Science Excursion: An Educational Technologist's View
ERIC Educational Resources Information Center
Balson, M.
1973-01-01
Analyzes the nature of the learning process and attempts to show how the three components of a reinforcement contingency, the stimulus, the response and the reinforcement can be utilized to increase the efficiency of a typical science learning experience, the excursion. (JR)
Insel, Catherine; Reinen, Jenna; Weber, Jochen; Wager, Tor D; Jarskog, L Fredrik; Shohamy, Daphna; Smith, Edward E
2014-03-01
Schizophrenia is characterized by an abnormal dopamine system, and dopamine blockade is the primary mechanism of antipsychotic treatment. Consistent with the known role of dopamine in reward processing, prior research has demonstrated that patients with schizophrenia exhibit impairments in reward-based learning. However, it remains unknown how treatment with antipsychotic medication impacts the behavioral and neural signatures of reinforcement learning in schizophrenia. The goal of this study was to examine whether antipsychotic medication modulates behavioral and neural responses to prediction error coding during reinforcement learning. Patients with schizophrenia completed a reinforcement learning task while undergoing functional magnetic resonance imaging. The task consisted of two separate conditions in which participants accumulated monetary gain or avoided monetary loss. Behavioral results indicated that antipsychotic medication dose was associated with altered behavioral approaches to learning, such that patients taking higher doses of medication showed increased sensitivity to negative reinforcement. Higher doses of antipsychotic medication were also associated with higher learning rates (LRs), suggesting that medication enhanced sensitivity to trial-by-trial feedback. Neuroimaging data demonstrated that antipsychotic dose was related to differences in neural signatures of feedback prediction error during the loss condition. Specifically, patients taking higher doses of medication showed attenuated prediction error responses in the striatum and the medial prefrontal cortex. These findings indicate that antipsychotic medication treatment may influence motivational processes in patients with schizophrenia.
Comparative learning theory and its application in the training of horses.
Cooper, J J
1998-11-01
Training can best be explained as a process that occurs through stimulus-response-reinforcement chains, whereby animals are conditioned to associate cues in their environment, with specific behavioural responses and their rewarding consequences. Research into learning in horses has concentrated on their powers of discrimination and on primary positive reinforcement schedules, where the correct response is paired with a desirable consequence such as food. In contrast, a number of other learning processes that are used in training have been widely studied in other species, but have received little scientific investigation in the horse. These include: negative reinforcement, where performance of the correct response is followed by removal of, or decrease in, intensity of a unpleasant stimulus; punishment, where an incorrect response is paired with an undesirable consequence, but without consistent prior warning; secondary conditioning, where a natural primary reinforcer such as food is closely associated with an arbitrary secondary reinforcer such as vocal praise; and variable or partial conditioning, where once the correct response has been learnt, reinforcement is presented according to an intermittent schedule to increase resistance to extinction outside of training.
Reinforcement learning in multidimensional environments relies on attention mechanisms.
Niv, Yael; Daniel, Reka; Geana, Andra; Gershman, Samuel J; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C
2015-05-27
In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this "representation learning" process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the "curse of dimensionality" in reinforcement learning. Copyright © 2015 the authors 0270-6474/15/358145-13$15.00/0.
A neural model of hierarchical reinforcement learning.
Rasmussen, Daniel; Voelker, Aaron; Eliasmith, Chris
2017-01-01
We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain's general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model's behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions.
Prespeech motor learning in a neural network using reinforcement☆
Warlaumont, Anne S.; Westermann, Gert; Buder, Eugene H.; Oller, D. Kimbrough
2012-01-01
Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one’s language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the differ-ent conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network’s post learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network’s post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model’s post-learning productions were more likely to resemble the English vowels and vice versa. PMID:23275137
Pfeifer, Gaby; Garfinkel, Sarah N; Gould van Praag, Cassandra D; Sahota, Kuljit; Betka, Sophie; Critchley, Hugo D
2017-05-01
Feedback processing is critical to trial-and-error learning. Here, we examined whether interoceptive signals concerning the state of cardiovascular arousal influence the processing of reinforcing feedback during the learning of 'emotional' face-name pairs, with subsequent effects on retrieval. Participants (N=29) engaged in a learning task of face-name pairs (fearful, neutral, happy faces). Correct and incorrect learning decisions were reinforced by auditory feedback, which was delivered either at cardiac systole (on the heartbeat, when baroreceptors signal the contraction of the heart to the brain), or at diastole (between heartbeats during baroreceptor quiescence). We discovered a cardiac influence on feedback processing that enhanced the learning of fearful faces in people with heightened interoceptive ability. Individuals with enhanced accuracy on a heartbeat counting task learned fearful face-name pairs better when feedback was given at systole than at diastole. This effect was not present for neutral and happy faces. At retrieval, we also observed related effects of personality: First, individuals scoring higher for extraversion showed poorer retrieval accuracy. These individuals additionally manifested lower resting heart rate and lower state anxiety, suggesting that attenuated levels of cardiovascular arousal in extraverts underlies poorer performance. Second, higher extraversion scores predicted higher emotional intensity ratings of fearful faces reinforced at systole. Third, individuals scoring higher for neuroticism showed higher retrieval confidence for fearful faces reinforced at diastole. Our results show that cardiac signals shape feedback processing to influence learning of fearful faces, an effect underpinned by personality differences linked to psychophysiological arousal. Copyright © 2017 Elsevier B.V. All rights reserved.
Reinforcement learning in computer vision
NASA Astrophysics Data System (ADS)
Bernstein, A. V.; Burnaev, E. V.
2018-04-01
Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computer vision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computer vision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computer vision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computer vision problems.
Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms
Daniel, Reka; Geana, Andra; Gershman, Samuel J.; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C.
2015-01-01
In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this “representation learning” process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the “curse of dimensionality” in reinforcement learning. PMID:26019331
Utilising reinforcement learning to develop strategies for driving auditory neural implants.
Lee, Geoffrey W; Zambetta, Fabio; Li, Xiaodong; Paolini, Antonio G
2016-08-01
In this paper we propose a novel application of reinforcement learning to the area of auditory neural stimulation. We aim to develop a simulation environment which is based off real neurological responses to auditory and electrical stimulation in the cochlear nucleus (CN) and inferior colliculus (IC) of an animal model. Using this simulator we implement closed loop reinforcement learning algorithms to determine which methods are most effective at learning effective acoustic neural stimulation strategies. By recording a comprehensive set of acoustic frequency presentations and neural responses from a set of animals we created a large database of neural responses to acoustic stimulation. Extensive electrical stimulation in the CN and the recording of neural responses in the IC provides a mapping of how the auditory system responds to electrical stimuli. The combined dataset is used as the foundation for the simulator, which is used to implement and test learning algorithms. Reinforcement learning, utilising a modified n-Armed Bandit solution, is implemented to demonstrate the model's function. We show the ability to effectively learn stimulation patterns which mimic the cochlea's ability to covert acoustic frequencies to neural activity. Time taken to learn effective replication using neural stimulation takes less than 20 min under continuous testing. These results show the utility of reinforcement learning in the field of neural stimulation. These results can be coupled with existing sound processing technologies to develop new auditory prosthetics that are adaptable to the recipients current auditory pathway. The same process can theoretically be abstracted to other sensory and motor systems to develop similar electrical replication of neural signals.
Suppression of Striatal Prediction Errors by the Prefrontal Cortex in Placebo Hypoalgesia.
Schenk, Lieven A; Sprenger, Christian; Onat, Selim; Colloca, Luana; Büchel, Christian
2017-10-04
Classical learning theories predict extinction after the discontinuation of reinforcement through prediction errors. However, placebo hypoalgesia, although mediated by associative learning, has been shown to be resistant to extinction. We tested the hypothesis that this is mediated by the suppression of prediction error processing through the prefrontal cortex (PFC). We compared pain modulation through treatment cues (placebo hypoalgesia, treatment context) with pain modulation through stimulus intensity cues (stimulus context) during functional magnetic resonance imaging in 48 male and female healthy volunteers. During acquisition, our data show that expectations are correctly learned and that this is associated with prediction error signals in the ventral striatum (VS) in both contexts. However, in the nonreinforced test phase, pain modulation and expectations of pain relief persisted to a larger degree in the treatment context, indicating that the expectations were not correctly updated in the treatment context. Consistently, we observed significantly stronger neural prediction error signals in the VS in the stimulus context compared with the treatment context. A connectivity analysis revealed negative coupling between the anterior PFC and the VS in the treatment context, suggesting that the PFC can suppress the expression of prediction errors in the VS. Consistent with this, a participant's conceptual views and beliefs about treatments influenced the pain modulation only in the treatment context. Our results indicate that in placebo hypoalgesia contextual treatment information engages prefrontal conceptual processes, which can suppress prediction error processing in the VS and lead to reduced updating of treatment expectancies, resulting in less extinction of placebo hypoalgesia. SIGNIFICANCE STATEMENT In aversive and appetitive reinforcement learning, learned effects show extinction when reinforcement is discontinued. This is thought to be mediated by prediction errors (i.e., the difference between expectations and outcome). Although reinforcement learning has been central in explaining placebo hypoalgesia, placebo hypoalgesic effects show little extinction and persist after the discontinuation of reinforcement. Our results support the idea that conceptual treatment beliefs bias the neural processing of expectations in a treatment context compared with a more stimulus-driven processing of expectations with stimulus intensity cues. We provide evidence that this is associated with the suppression of prediction error processing in the ventral striatum by the prefrontal cortex. This provides a neural basis for persisting effects in reinforcement learning and placebo hypoalgesia. Copyright © 2017 the authors 0270-6474/17/379715-09$15.00/0.
Evolution with Reinforcement Learning in Negotiation
Zou, Yi; Zhan, Wenjie; Shao, Yuan
2014-01-01
Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm. PMID:25048108
Evolution with reinforcement learning in negotiation.
Zou, Yi; Zhan, Wenjie; Shao, Yuan
2014-01-01
Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm.
Shephard, E; Jackson, G M; Groom, M J
2014-01-01
This study examined neurocognitive differences between children and adults in the ability to learn and adapt simple stimulus-response associations through feedback. Fourteen typically developing children (mean age=10.2) and 15 healthy adults (mean age=25.5) completed a simple task in which they learned to associate visually presented stimuli with manual responses based on performance feedback (acquisition phase), and then reversed and re-learned those associations following an unexpected change in reinforcement contingencies (reversal phase). Electrophysiological activity was recorded throughout task performance. We found no group differences in learning-related changes in performance (reaction time, accuracy) or in the amplitude of event-related potentials (ERPs) associated with stimulus processing (P3 ERP) or feedback processing (feedback-related negativity; FRN) during the acquisition phase. However, children's performance was significantly more disrupted by the reversal than adults and FRN amplitudes were significantly modulated by the reversal phase in children but not adults. These findings indicate that children have specific difficulties with reinforcement learning when acquired behaviours must be altered. This may be caused by the added demands on immature executive functioning, specifically response monitoring, created by the requirement to reverse the associations, or a developmental difference in the way in which children and adults approach reinforcement learning. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.
Taylor, Jordan A; Ivry, Richard B
2014-01-01
Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. © 2014 Elsevier B.V. All rights reserved.
Cerebellar and Prefrontal Cortex Contributions to Adaptation, Strategies, and Reinforcement Learning
Taylor, Jordan A.; Ivry, Richard B.
2014-01-01
Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. PMID:24916295
Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning.
Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane
2017-01-01
Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.
Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning
Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane
2017-01-01
Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning. PMID:29163004
A neural model of hierarchical reinforcement learning
Rasmussen, Daniel; Eliasmith, Chris
2017-01-01
We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain’s general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model’s behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions. PMID:28683111
An intelligent agent for optimal river-reservoir system management
NASA Astrophysics Data System (ADS)
Rieker, Jeffrey D.; Labadie, John W.
2012-09-01
A generalized software package is presented for developing an intelligent agent for stochastic optimization of complex river-reservoir system management and operations. Reinforcement learning is an approach to artificial intelligence for developing a decision-making agent that learns the best operational policies without the need for explicit probabilistic models of hydrologic system behavior. The agent learns these strategies experientially in a Markov decision process through observational interaction with the environment and simulation of the river-reservoir system using well-calibrated models. The graphical user interface for the reinforcement learning process controller includes numerous learning method options and dynamic displays for visualizing the adaptive behavior of the agent. As a case study, the generalized reinforcement learning software is applied to developing an intelligent agent for optimal management of water stored in the Truckee river-reservoir system of California and Nevada for the purpose of streamflow augmentation for water quality enhancement. The intelligent agent successfully learns long-term reservoir operational policies that specifically focus on mitigating water temperature extremes during persistent drought periods that jeopardize the survival of threatened and endangered fish species.
Feature Reinforcement Learning: Part I. Unstructured MDPs
NASA Astrophysics Data System (ADS)
Hutter, Marcus
2009-12-01
General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part II (Hutter, 2009c). The role of POMDPs is also considered there.
Local Service Learning in Teacher Preparation Program
ERIC Educational Resources Information Center
Nuangchalerm, Prasart
2016-01-01
The local knowledge is simply integrated in education and learning process. This study aims to promote local knowledge in school through service learning. The learning process is employed herbal plants to reinforce students learn how to sustain local knowledge with modern life and 21st century classroom. Participants consisted of 42 pre-service…
Human-level control through deep reinforcement learning.
Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis
2015-02-26
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Human-level control through deep reinforcement learning
NASA Astrophysics Data System (ADS)
Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis
2015-02-01
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Hart, Andrew S.; Collins, Anne L.; Bernstein, Ilene L.; Phillips, Paul E. M.
2012-01-01
Alcohol use during adolescence has profound and enduring consequences on decision-making under risk. However, the fundamental psychological processes underlying these changes are unknown. Here, we show that alcohol use produces over-fast learning for better-than-expected, but not worse-than-expected, outcomes without altering subjective reward valuation. We constructed a simple reinforcement learning model to simulate altered decision making using behavioral parameters extracted from rats with a history of adolescent alcohol use. Remarkably, the learning imbalance alone was sufficient to simulate the divergence in choice behavior observed between these groups of animals. These findings identify a selective alteration in reinforcement learning following adolescent alcohol use that can account for a robust change in risk-based decision making persisting into later life. PMID:22615989
Improving Robot Motor Learning with Negatively Valenced Reinforcement Signals
Navarro-Guerrero, Nicolás; Lowe, Robert J.; Wermter, Stefan
2017-01-01
Both nociception and punishment signals have been used in robotics. However, the potential for using these negatively valenced types of reinforcement learning signals for robot learning has not been exploited in detail yet. Nociceptive signals are primarily used as triggers of preprogrammed action sequences. Punishment signals are typically disembodied, i.e., with no or little relation to the agent-intrinsic limitations, and they are often used to impose behavioral constraints. Here, we provide an alternative approach for nociceptive signals as drivers of learning rather than simple triggers of preprogrammed behavior. Explicitly, we use nociception to expand the state space while we use punishment as a negative reinforcement learning signal. We compare the performance—in terms of task error, the amount of perceived nociception, and length of learned action sequences—of different neural networks imbued with punishment-based reinforcement signals for inverse kinematic learning. We contrast the performance of a version of the neural network that receives nociceptive inputs to that without such a process. Furthermore, we provide evidence that nociception can improve learning—making the algorithm more robust against network initializations—as well as behavioral performance by reducing the task error, perceived nociception, and length of learned action sequences. Moreover, we provide evidence that punishment, at least as typically used within reinforcement learning applications, may be detrimental in all relevant metrics. PMID:28420976
Brain Research: Implications for Learning.
ERIC Educational Resources Information Center
Soares, Louise M.; Soares, Anthony T.
Brain research has illuminated several areas of the learning process: (1) learning as association; (2) learning as reinforcement; (3) learning as perception; (4) learning as imitation; (5) learning as organization; (6) learning as individual style; and (7) learning as brain activity. The classic conditioning model developed by Pavlov advanced…
Economic decision-making in the ultimatum game by smokers.
Takahashi, Taiki
2007-10-01
No study to date compared degrees of inequity aversion in economic decision-making in the ultimatum game between non-addictive and addictive reinforcers. The comparison is potentially important in neuroeconomics and reinforcement learning theory of addiction. We compared the degrees of inequity aversion in the ultimatum game between money and cigarettes in habitual smokers. Smokers avoided inequity in the ultimatum game more dramatically for money than for cigarettes; i.e., there was a "domain effect" in decision-making in the ultimatum game. Reward-processing neural activities in the brain for non-addictive and addictive reinforcers may be distinct and the insula activation due to cue-induced craving may conflict with unfair offer-induced insula activation. Future studies in neuroeconomics of addiction should employ game-theoretic decision tasks for elucidating reinforcement learning processes in dopaminergic neural circuits.
Cooperative Education Is a Superior Strategy for Using Basic Learning Processes.
ERIC Educational Resources Information Center
Reed, V. Gerald
Cooperative education is a learning strategy that fits very well with basic laws of learning. In fact, several basic important learning processes are far better adapted to the cooperative education strategy than to methods that lean entirely on classroom instruction. For instance, cooperative education affords more opportunities for reinforcement,…
The nature of sexual reinforcement.
Crawford, L L; Holloway, K S; Domjan, M
1993-01-01
Sexual reinforcers are not part of a regulatory system involved in the maintenance of critical metabolic processes, they differ for males and females, they differ as a function of species and mating system, and they show ontogenetic and seasonal changes related to endocrine conditions. Exposure to a member of the opposite sex without copulation can be sufficient for sexual reinforcement. However, copulatory access is a stronger reinforcer, and copulatory opportunity can serve to enhance the reinforcing efficacy of stimulus features of a sexual partner. Conversely, under certain conditions, noncopulatory exposure serves to decrease reinforcer efficacy. Many common learning phenomena such as acquisition, extinction, discrimination learning, second-order conditioning, and latent inhibition have been demonstrated in sexual conditioning. These observations extend the generality of findings obtained with more conventional reinforcers, but the mechanisms of these effects and their gender and species specificity remain to be explored. PMID:8354970
Better or Worse than Expected? Aging, Learning, and the ERN
ERIC Educational Resources Information Center
Eppinger, Ben; Kray, Jutta; Mock, Barbara; Mecklinger, Axel
2008-01-01
This study examined age differences in error processing and reinforcement learning. We were interested in whether the electrophysiological correlates of error processing, the error-related negativity (ERN) and the feedback-related negativity (FRN), reflect learning-related changes in younger and older adults. To do so, we applied a probabilistic…
Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji
2015-01-01
A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach.
Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji
2015-01-01
A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach. PMID:25734662
Mesolimbic confidence signals guide perceptual learning in the absence of external feedback
Guggenmos, Matthias; Wilbertz, Gregor; Hebart, Martin N; Sterzer, Philipp
2016-01-01
It is well established that learning can occur without external feedback, yet normative reinforcement learning theories have difficulties explaining such instances of learning. Here, we propose that human observers are capable of generating their own feedback signals by monitoring internal decision variables. We investigated this hypothesis in a visual perceptual learning task using fMRI and confidence reports as a measure for this monitoring process. Employing a novel computational model in which learning is guided by confidence-based reinforcement signals, we found that mesolimbic brain areas encoded both anticipation and prediction error of confidence—in remarkable similarity to previous findings for external reward-based feedback. We demonstrate that the model accounts for choice and confidence reports and show that the mesolimbic confidence prediction error modulation derived through the model predicts individual learning success. These results provide a mechanistic neurobiological explanation for learning without external feedback by augmenting reinforcement models with confidence-based feedback. DOI: http://dx.doi.org/10.7554/eLife.13388.001 PMID:27021283
What Can Reinforcement Learning Teach Us About Non-Equilibrium Quantum Dynamics
NASA Astrophysics Data System (ADS)
Bukov, Marin; Day, Alexandre; Sels, Dries; Weinberg, Phillip; Polkovnikov, Anatoli; Mehta, Pankaj
Equilibrium thermodynamics and statistical physics are the building blocks of modern science and technology. Yet, our understanding of thermodynamic processes away from equilibrium is largely missing. In this talk, I will reveal the potential of what artificial intelligence can teach us about the complex behaviour of non-equilibrium systems. Specifically, I will discuss the problem of finding optimal drive protocols to prepare a desired target state in quantum mechanical systems by applying ideas from Reinforcement Learning [one can think of Reinforcement Learning as the study of how an agent (e.g. a robot) can learn and perfect a given policy through interactions with an environment.]. The driving protocols learnt by our agent suggest that the non-equilibrium world features possibilities easily defying intuition based on equilibrium physics.
Context transfer in reinforcement learning using action-value functions.
Mousavi, Amin; Nadjar Araabi, Babak; Nili Ahmadabadi, Majid
2014-01-01
This paper discusses the notion of context transfer in reinforcement learning tasks. Context transfer, as defined in this paper, implies knowledge transfer between source and target tasks that share the same environment dynamics and reward function but have different states or action spaces. In other words, the agents learn the same task while using different sensors and actuators. This requires the existence of an underlying common Markov decision process (MDP) to which all the agents' MDPs can be mapped. This is formulated in terms of the notion of MDP homomorphism. The learning framework is Q-learning. To transfer the knowledge between these tasks, the feature space is used as a translator and is expressed as a partial mapping between the state-action spaces of different tasks. The Q-values learned during the learning process of the source tasks are mapped to the sets of Q-values for the target task. These transferred Q-values are merged together and used to initialize the learning process of the target task. An interval-based approach is used to represent and merge the knowledge of the source tasks. Empirical results show that the transferred initialization can be beneficial to the learning process of the target task.
Context Transfer in Reinforcement Learning Using Action-Value Functions
Mousavi, Amin; Nadjar Araabi, Babak; Nili Ahmadabadi, Majid
2014-01-01
This paper discusses the notion of context transfer in reinforcement learning tasks. Context transfer, as defined in this paper, implies knowledge transfer between source and target tasks that share the same environment dynamics and reward function but have different states or action spaces. In other words, the agents learn the same task while using different sensors and actuators. This requires the existence of an underlying common Markov decision process (MDP) to which all the agents' MDPs can be mapped. This is formulated in terms of the notion of MDP homomorphism. The learning framework is Q-learning. To transfer the knowledge between these tasks, the feature space is used as a translator and is expressed as a partial mapping between the state-action spaces of different tasks. The Q-values learned during the learning process of the source tasks are mapped to the sets of Q-values for the target task. These transferred Q-values are merged together and used to initialize the learning process of the target task. An interval-based approach is used to represent and merge the knowledge of the source tasks. Empirical results show that the transferred initialization can be beneficial to the learning process of the target task. PMID:25610457
Learning the specific quality of taste reinforcement in larval Drosophila.
Schleyer, Michael; Miura, Daisuke; Tanimura, Teiichi; Gerber, Bertram
2015-01-27
The only property of reinforcement insects are commonly thought to learn about is its value. We show that larval Drosophila not only remember the value of reinforcement (How much?), but also its quality (What?). This is demonstrated both within the appetitive domain by using sugar vs amino acid as different reward qualities, and within the aversive domain by using bitter vs high-concentration salt as different qualities of punishment. From the available literature, such nuanced memories for the quality of reinforcement are unexpected and pose a challenge to present models of how insect memory is organized. Given that animals as simple as larval Drosophila, endowed with but 10,000 neurons, operate with both reinforcement value and quality, we suggest that both are fundamental aspects of mnemonic processing-in any brain.
Feedback-related brain activity predicts learning from feedback in multiple-choice testing.
Ernst, Benjamin; Steinhauser, Marco
2012-06-01
Different event-related potentials (ERPs) have been shown to correlate with learning from feedback in decision-making tasks and with learning in explicit memory tasks. In the present study, we investigated which ERPs predict learning from corrective feedback in a multiple-choice test, which combines elements from both paradigms. Participants worked through sets of multiple-choice items of a Swahili-German vocabulary task. Whereas the initial presentation of an item required the participants to guess the answer, corrective feedback could be used to learn the correct response. Initial analyses revealed that corrective feedback elicited components related to reinforcement learning (FRN), as well as to explicit memory processing (P300) and attention (early frontal positivity). However, only the P300 and early frontal positivity were positively correlated with successful learning from corrective feedback, whereas the FRN was even larger when learning failed. These results suggest that learning from corrective feedback crucially relies on explicit memory processing and attentional orienting to corrective feedback, rather than on reinforcement learning.
Impact of Computer Animations in Cognitive Learning: Differentiation
ERIC Educational Resources Information Center
Altiparmak, Kemal
2014-01-01
In mathematic courses, construction of some concepts by the students in a meaningful way may be complicated. In such circumstances, to embody the concepts application of the required technologies may reinforce learning process. Onset of learning process over daily life events of the student's environment may lure their attention and may…
Apprenticeship Learning: Learning to Schedule from Human Experts
2016-06-09
approaches to learning such models are based on Markov models, such as reinforcement learning or inverse reinforcement learning (Busoniu, Babuska, and De...via inverse reinforcement learning. In ICML. Barto, A. G., and Mahadevan, S. 2003. Recent advances in hierarchical reinforcement learning. Discrete...of tasks with temporal constraints. In Proc. AAAI, 2110–2116. Odom, P., and Natarajan, S. 2015. Active advice seeking for inverse reinforcement
Working Memory Contributions to Reinforcement Learning Impairments in Schizophrenia
Brown, Jaime K.; Gold, James M.; Waltz, James A.; Frank, Michael J.
2014-01-01
Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine-dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia. PMID:25297101
Stress enhances model-free reinforcement learning only after negative outcome
Lee, Daeyeol
2017-01-01
Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i.e., how quickly information from the new environment is incorporated into choices. Participants were randomly assigned to either a stress or a control condition, and performed a two-stage Markov decision-making task in which the reward probabilities underwent periodic reversals without notice. We found that stress increased the contribution of model-free reinforcement learning only after negative outcome. Furthermore, stress decreased the learning rate. The results suggest that stress diminishes one’s ability to make adaptive choices in multiple aspects of reinforcement learning. This finding has implications for understanding how stress facilitates maladaptive habits, such as addictive behavior, and other dysfunctional behaviors associated with stress in clinical and educational contexts. PMID:28723943
Stress enhances model-free reinforcement learning only after negative outcome.
Park, Heyeon; Lee, Daeyeol; Chey, Jeanyung
2017-01-01
Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i.e., how quickly information from the new environment is incorporated into choices. Participants were randomly assigned to either a stress or a control condition, and performed a two-stage Markov decision-making task in which the reward probabilities underwent periodic reversals without notice. We found that stress increased the contribution of model-free reinforcement learning only after negative outcome. Furthermore, stress decreased the learning rate. The results suggest that stress diminishes one's ability to make adaptive choices in multiple aspects of reinforcement learning. This finding has implications for understanding how stress facilitates maladaptive habits, such as addictive behavior, and other dysfunctional behaviors associated with stress in clinical and educational contexts.
A reinforcement learning-based architecture for fuzzy logic control
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1992-01-01
This paper introduces a new method for learning to refine a rule-based fuzzy logic controller. A reinforcement learning technique is used in conjunction with a multilayer neural network model of a fuzzy controller. The approximate reasoning based intelligent control (ARIC) architecture proposed here learns by updating its prediction of the physical system's behavior and fine tunes a control knowledge base. Its theory is related to Sutton's temporal difference (TD) method. Because ARIC has the advantage of using the control knowledge of an experienced operator and fine tuning it through the process of learning, it learns faster than systems that train networks from scratch. The approach is applied to a cart-pole balancing system.
Motor Learning Enhances Use-Dependent Plasticity
2017-01-01
Motor behaviors are shaped not only by current sensory signals but also by the history of recent experiences. For instance, repeated movements toward a particular target bias the subsequent movements toward that target direction. This process, called use-dependent plasticity (UDP), is considered a basic and goal-independent way of forming motor memories. Most studies consider movement history as the critical component that leads to UDP (Classen et al., 1998; Verstynen and Sabes, 2011). However, the effects of learning (i.e., improved performance) on UDP during movement repetition have not been investigated. Here, we used transcranial magnetic stimulation in two experiments to assess plasticity changes occurring in the primary motor cortex after individuals repeated reinforced and nonreinforced actions. The first experiment assessed whether learning a skill task modulates UDP. We found that a group that successfully learned the skill task showed greater UDP than a group that did not accumulate learning, but made comparable repeated actions. The second experiment aimed to understand the role of reinforcement learning in UDP while controlling for reward magnitude and action kinematics. We found that providing subjects with a binary reward without visual feedback of the cursor led to increased UDP effects. Subjects in the group that received comparable reward not associated with their actions maintained the previously induced UDP. Our findings illustrate how reinforcing consistent actions strengthens use-dependent memories and provide insight into operant mechanisms that modulate plastic changes in the motor cortex. SIGNIFICANCE STATEMENT Performing consistent motor actions induces use-dependent plastic changes in the motor cortex. This plasticity reflects one of the basic forms of human motor learning. Past studies assumed that this form of learning is exclusively affected by repetition of actions. However, here we showed that success-based reinforcement signals could affect the human use-dependent plasticity (UDP) process. Our results indicate that learning augments and interacts with UDP. This effect is important to the understanding of the interplay between the different forms of motor learning and suggests that reinforcement is not only important to learning new behaviors, but can shape our subsequent behavior via its interaction with UDP. PMID:28143961
Álvarez de Toledo, Santiago; Anguera, Aurea; Barreiro, José M; Lara, Juan A; Lizcano, David
2017-01-19
Over the last few decades, a number of reinforcement learning techniques have emerged, and different reinforcement learning-based applications have proliferated. However, such techniques tend to specialize in a particular field. This is an obstacle to their generalization and extrapolation to other areas. Besides, neither the reward-punishment (r-p) learning process nor the convergence of results is fast and efficient enough. To address these obstacles, this research proposes a general reinforcement learning model. This model is independent of input and output types and based on general bioinspired principles that help to speed up the learning process. The model is composed of a perception module based on sensors whose specific perceptions are mapped as perception patterns. In this manner, similar perceptions (even if perceived at different positions in the environment) are accounted for by the same perception pattern. Additionally, the model includes a procedure that statistically associates perception-action pattern pairs depending on the positive or negative results output by executing the respective action in response to a particular perception during the learning process. To do this, the model is fitted with a mechanism that reacts positively or negatively to particular sensory stimuli in order to rate results. The model is supplemented by an action module that can be configured depending on the maneuverability of each specific agent. The model has been applied in the air navigation domain, a field with strong safety restrictions, which led us to implement a simulated system equipped with the proposed model. Accordingly, the perception sensors were based on Automatic Dependent Surveillance-Broadcast (ADS-B) technology, which is described in this paper. The results were quite satisfactory, and it outperformed traditional methods existing in the literature with respect to learning reliability and efficiency.
Álvarez de Toledo, Santiago; Anguera, Aurea; Barreiro, José M.; Lara, Juan A.; Lizcano, David
2017-01-01
Over the last few decades, a number of reinforcement learning techniques have emerged, and different reinforcement learning-based applications have proliferated. However, such techniques tend to specialize in a particular field. This is an obstacle to their generalization and extrapolation to other areas. Besides, neither the reward-punishment (r-p) learning process nor the convergence of results is fast and efficient enough. To address these obstacles, this research proposes a general reinforcement learning model. This model is independent of input and output types and based on general bioinspired principles that help to speed up the learning process. The model is composed of a perception module based on sensors whose specific perceptions are mapped as perception patterns. In this manner, similar perceptions (even if perceived at different positions in the environment) are accounted for by the same perception pattern. Additionally, the model includes a procedure that statistically associates perception-action pattern pairs depending on the positive or negative results output by executing the respective action in response to a particular perception during the learning process. To do this, the model is fitted with a mechanism that reacts positively or negatively to particular sensory stimuli in order to rate results. The model is supplemented by an action module that can be configured depending on the maneuverability of each specific agent. The model has been applied in the air navigation domain, a field with strong safety restrictions, which led us to implement a simulated system equipped with the proposed model. Accordingly, the perception sensors were based on Automatic Dependent Surveillance-Broadcast (ADS-B) technology, which is described in this paper. The results were quite satisfactory, and it outperformed traditional methods existing in the literature with respect to learning reliability and efficiency. PMID:28106849
Smith, Tim J.; Senju, Atsushi
2017-01-01
While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue–reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue–reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. PMID:28250186
Vernetti, Angélina; Smith, Tim J; Senju, Atsushi
2017-03-15
While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. © 2017 The Authors.
What Does a Cultural Approach Offer to Research on Learning?
ERIC Educational Resources Information Center
Hatano, Giyoo; Miyake, Naomi
1991-01-01
The articles of this issue argue that taking culture into consideration in research on learning can enhance understanding of a variety of phenomena related to learning. Discussion reinforces this position and further considers learning processes that are or are not determined by culture. (SLD)
Collins, Anne G E; Albrecht, Matthew A; Waltz, James A; Gold, James M; Frank, Michael J
2017-09-15
When studying learning, researchers directly observe only the participants' choices, which are often assumed to arise from a unitary learning process. However, a number of separable systems, such as working memory (WM) and reinforcement learning (RL), contribute simultaneously to human learning. Identifying each system's contributions is essential for mapping the neural substrates contributing in parallel to behavior; computational modeling can help to design tasks that allow such a separable identification of processes and infer their contributions in individuals. We present a new experimental protocol that separately identifies the contributions of RL and WM to learning, is sensitive to parametric variations in both, and allows us to investigate whether the processes interact. In experiments 1 and 2, we tested this protocol with healthy young adults (n = 29 and n = 52, respectively). In experiment 3, we used it to investigate learning deficits in medicated individuals with schizophrenia (n = 49 patients, n = 32 control subjects). Experiments 1 and 2 established WM and RL contributions to learning, as evidenced by parametric modulations of choice by load and delay and reward history, respectively. They also showed interactions between WM and RL, where RL was enhanced under high WM load. Moreover, we observed a cost of mental effort when controlling for reinforcement history: participants preferred stimuli they encountered under low WM load. Experiment 3 revealed selective deficits in WM contributions and preserved RL value learning in individuals with schizophrenia compared with control subjects. Computational approaches allow us to disentangle contributions of multiple systems to learning and, consequently, to further our understanding of psychiatric diseases. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Collins, Anne G. E.; Frank, Michael J.
2012-01-01
Instrumental learning involves corticostriatal circuitry and the dopaminergic system. This system is typically modeled in the reinforcement learning (RL) framework by incrementally accumulating reward values of states and actions. However, human learning also implicates prefrontal cortical mechanisms involved in higher level cognitive functions. The interaction of these systems remains poorly understood, and models of human behavior often ignore working memory (WM) and therefore incorrectly assign behavioral variance to the RL system. Here we designed a task that highlights the profound entanglement of these two processes, even in simple learning problems. By systematically varying the size of the learning problem and delay between stimulus repetitions, we separately extracted WM-specific effects of load and delay on learning. We propose a new computational model that accounts for the dynamic integration of RL and WM processes observed in subjects' behavior. Incorporating capacity-limited WM into the model allowed us to capture behavioral variance that could not be captured in a pure RL framework even if we (implausibly) allowed separate RL systems for each set size. The WM component also allowed for a more reasonable estimation of a single RL process. Finally, we report effects of two genetic polymorphisms having relative specificity for prefrontal and basal ganglia functions. Whereas the COMT gene coding for catechol-O-methyl transferase selectively influenced model estimates of WM capacity, the GPR6 gene coding for G-protein-coupled receptor 6 influenced the RL learning rate. Thus, this study allowed us to specify distinct influences of the high-level and low-level cognitive functions on instrumental learning, beyond the possibilities offered by simple RL models. PMID:22487033
Mechanisms and time course of vocal learning and consolidation in the adult songbird.
Warren, Timothy L; Tumer, Evren C; Charlesworth, Jonathan D; Brainard, Michael S
2011-10-01
In songbirds, the basal ganglia outflow nucleus LMAN is a cortical analog that is required for several forms of song plasticity and learning. Moreover, in adults, inactivating LMAN can reverse the initial expression of learning driven via aversive reinforcement. In the present study, we investigated how LMAN contributes to both reinforcement-driven learning and a self-driven recovery process in adult Bengalese finches. We first drove changes in the fundamental frequency of targeted song syllables and compared the effects of inactivating LMAN with the effects of interfering with N-methyl-d-aspartate (NMDA) receptor-dependent transmission from LMAN to one of its principal targets, the song premotor nucleus RA. Inactivating LMAN and blocking NMDA receptors in RA caused indistinguishable reversions in the expression of learning, indicating that LMAN contributes to learning through NMDA receptor-mediated glutamatergic transmission to RA. We next assessed how LMAN's role evolves over time by maintaining learned changes to song while periodically inactivating LMAN. The expression of learning consolidated to become LMAN independent over multiple days, indicating that this form of consolidation is not completed over one night, as previously suggested, and instead may occur gradually during singing. Subsequent cessation of reinforcement was followed by a gradual self-driven recovery of original song structure, indicating that consolidation does not correspond with the lasting retention of changes to song. Finally, for self-driven recovery, as for reinforcement-driven learning, LMAN was required for the expression of initial, but not later, changes to song. Our results indicate that NMDA receptor-dependent transmission from LMAN to RA plays an essential role in the initial expression of two distinct forms of vocal learning and that this role gradually wanes over a multiday process of consolidation. The results support an emerging view that cortical-basal ganglia circuits can direct the initial expression of learning via top-down influences on primary motor circuitry.
Mechanisms and time course of vocal learning and consolidation in the adult songbird
Tumer, Evren C.; Charlesworth, Jonathan D.; Brainard, Michael S.
2011-01-01
In songbirds, the basal ganglia outflow nucleus LMAN is a cortical analog that is required for several forms of song plasticity and learning. Moreover, in adults, inactivating LMAN can reverse the initial expression of learning driven via aversive reinforcement. In the present study, we investigated how LMAN contributes to both reinforcement-driven learning and a self-driven recovery process in adult Bengalese finches. We first drove changes in the fundamental frequency of targeted song syllables and compared the effects of inactivating LMAN with the effects of interfering with N-methyl-d-aspartate (NMDA) receptor-dependent transmission from LMAN to one of its principal targets, the song premotor nucleus RA. Inactivating LMAN and blocking NMDA receptors in RA caused indistinguishable reversions in the expression of learning, indicating that LMAN contributes to learning through NMDA receptor-mediated glutamatergic transmission to RA. We next assessed how LMAN's role evolves over time by maintaining learned changes to song while periodically inactivating LMAN. The expression of learning consolidated to become LMAN independent over multiple days, indicating that this form of consolidation is not completed over one night, as previously suggested, and instead may occur gradually during singing. Subsequent cessation of reinforcement was followed by a gradual self-driven recovery of original song structure, indicating that consolidation does not correspond with the lasting retention of changes to song. Finally, for self-driven recovery, as for reinforcement-driven learning, LMAN was required for the expression of initial, but not later, changes to song. Our results indicate that NMDA receptor-dependent transmission from LMAN to RA plays an essential role in the initial expression of two distinct forms of vocal learning and that this role gradually wanes over a multiday process of consolidation. The results support an emerging view that cortical-basal ganglia circuits can direct the initial expression of learning via top-down influences on primary motor circuitry. PMID:21734110
Toward a Science of Learning Games
ERIC Educational Resources Information Center
Howard-Jones, Paul; Demetriou, Skevi; Bogacz, Rafal; Yoo, Jee H.; Leonards, Ute
2011-01-01
Reinforcement learning involves a tight coupling of reward-associated behavior and a type of learning that is very different from that promoted by education. However, the emerging understanding of its underlying processes may help derive principles for effective learning games that have, until now, been elusive. This article first reviews findings…
Improving Collaborative Learning by Supporting Casual Encounters in Distance Learning.
ERIC Educational Resources Information Center
Contreras, Juan; Llamas, Rafael; Vizcaino, Aurora; Vavela, Jesus
Casual encounters in a learning environment are very useful in reinforcing previous knowledge and acquiring new knowledge. Such encounters are very common in traditional learning environments and can be used successfully in social environments in which students can discover and construct knowledge through a process of dialogue, negotiation, or…
Zhu, Ruoqing; Zeng, Donglin; Kosorok, Michael R.
2015-01-01
In this paper, we introduce a new type of tree-based method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001) under high-dimensional settings. The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach enables linear combination cuts at little extra computational cost. Second, we propose a variable muting procedure that progressively eliminates noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards terminal nodes, where the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method under basic assumptions and discuss rationale in general settings. PMID:26903687
Wong, Scott A; Randolph, Sienna H; Ivan, Victorita E; Gruber, Aaron J
2017-09-29
Δ-9-Tetrahydrocannabinol (THC) is the main psychoactive component of marijuana and has potent effects on decision-making, including a proposed reduction in cognitive flexibility. We demonstrate here that acute THC administration differentially affects some of the processes that contribute to cognitive flexibility. Specifically, THC reduces lose-shift responding in which female rats tend to immediately shift choice responses away from options that result in reward omission on the previous trial. THC, however, did not impair the ability of rats to flexibly bias responses toward feeders with higher probability of reward in a reversal task. This response adaptation developed over several trials, suggesting that THC did not impair slower forms of reinforcement learning needed to choose among options with unequal utility. This dissociation of THC's effects on innate/rapid and learned/gradual decision-making processes was unexpected, but is supported by emerging evidence that lose-shift responding is mediated by neural mechanisms distinct from those involved in other forms of reinforcement learning. The present data suggest that, at least in some tasks, the apparent reductions in cognitive flexibility by THC may be explained by the immediate effects on loss sensitivity, rather than impairments of all processes used for choice adaptation. Copyright © 2017 Elsevier B.V. All rights reserved.
Extinction of Pavlovian conditioning: The influence of trial number and reinforcement history.
Chan, C K J; Harris, Justin A
2017-08-01
Pavlovian conditioning is sensitive to the temporal relationship between the conditioned stimulus (CS) and the unconditioned stimulus (US). This has motivated models that describe learning as a process that continuously updates associative strength during the trial or specifically encodes the CS-US interval. These models predict that extinction of responding is also continuous, such that response loss is proportional to the cumulative duration of exposure to the CS without the US. We review evidence showing that this prediction is incorrect, and that extinction is trial-based rather than time-based. We also present two experiments that test the importance of trials versus time on the Partial Reinforcement Extinction Effect (PREE), in which responding extinguishes more slowly for a CS that was inconsistently reinforced with the US than for a consistently reinforced one. We show that increasing the number of extinction trials of the partially reinforced CS, relative to the consistently reinforced CS, overcomes the PREE. However, increasing the duration of extinction trials by the same amount does not overcome the PREE. We conclude that animals learn about the likelihood of the US per trial during conditioning, and learn trial-by-trial about the absence of the US during extinction. Moreover, what they learn about the likelihood of the US during conditioning affects how sensitive they are to the absence of the US during extinction. Copyright © 2017 Elsevier B.V. All rights reserved.
Reinforcement active learning in the vibrissae system: optimal object localization.
Gordon, Goren; Dorfman, Nimrod; Ahissar, Ehud
2013-01-01
Rats move their whiskers to acquire information about their environment. It has been observed that they palpate novel objects and objects they are required to localize in space. We analyze whisker-based object localization using two complementary paradigms, namely, active learning and intrinsic-reward reinforcement learning. Active learning algorithms select the next training samples according to the hypothesized solution in order to better discriminate between correct and incorrect labels. Intrinsic-reward reinforcement learning uses prediction errors as the reward to an actor-critic design, such that behavior converges to the one that optimizes the learning process. We show that in the context of object localization, the two paradigms result in palpation whisking as their respective optimal solution. These results suggest that rats may employ principles of active learning and/or intrinsic reward in tactile exploration and can guide future research to seek the underlying neuronal mechanisms that implement them. Furthermore, these paradigms are easily transferable to biomimetic whisker-based artificial sensors and can improve the active exploration of their environment. Copyright © 2012 Elsevier Ltd. All rights reserved.
Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C
2014-03-01
Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.
Working memory contributions to reinforcement learning impairments in schizophrenia.
Collins, Anne G E; Brown, Jaime K; Gold, James M; Waltz, James A; Frank, Michael J
2014-10-08
Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine-dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia. Copyright © 2014 the authors 0270-6474/14/3413747-10$15.00/0.
Structure identification in fuzzy inference using reinforcement learning
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Khedkar, Pratap
1993-01-01
In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.
The Effects of Basolateral Amygdala Lesions on Unblocking
Chang, Stephen E.; McDannald, Michael A.; Wheeler, Daniel S.; Holland, Peter C.
2012-01-01
Prior reinforcement of a neutral stimulus often blocks subsequent conditioning of a new stimulus if a compound of the original and new cues is paired with the same reinforcer. However, if the value of the reinforcer is altered when the compound is presented, the new cue typically acquires conditioning, a result called unblocking. Blocking, unblocking and related phenomena have been attributed to variations in processing of either the reinforcer, for example, the Rescorla-Wagner (1972) model, or cues, for example, the Pearce-Hall (1980) model. Here, we examined the effects of lesions of the basolateral amygdala on the occurrence of unblocking when the food reinforcer was increased in quantity at the time of introduction of the new cue. The lesions had no effects on unblocking in a simple design (Experiment 1), which did not distinguish between unblocking produced by variations in reward or cue processing. However, in a procedure that distinguished between unblocking due to direct conditioning by the added reinforcer, consistent with the Rescorla-Wagner (1972) model, and that due to increases in conditioning to the original reinforcer, consistent with the Pearce-Hall (1980) and other models of learning, the lesions prevented unblocking of the latter type. These results were discussed in the context of roles of the basolateral amygdala in coding and using reward prediction error information in associative learning. PMID:22448857
Resurgence of Integrated Behavioral Units
Bachá-Méndez, Gustavo; Reid, Alliston K; Mendoza-Soylovna, Adela
2007-01-01
Two experiments with rats examined the dynamics of well-learned response sequences when reinforcement contingencies were changed. Both experiments contained four phases, each of which reinforced a 2-response sequence of lever presses until responding was stable. The contingencies then were shifted to a new reinforced sequence until responding was again stable. Extinction-induced resurgence of previously reinforced, and then extinguished, heterogeneous response sequences was observed in all subjects in both experiments. These sequences were demonstrated to be integrated behavioral units, controlled by processes acting at the level of the entire sequence. Response-level processes were also simultaneously operative. Errors in sequence production were strongly influenced by the terminal, not the initial, response in the currently reinforced sequence, but not by the previously reinforced sequence. These studies demonstrate that sequence-level and response-level processes can operate simultaneously in integrated behavioral units. Resurgence and the development of integrated behavioral units may be dissociated; thus the observation of one does not necessarily imply the other. PMID:17345948
Hierarchical extreme learning machine based reinforcement learning for goal localization
NASA Astrophysics Data System (ADS)
AlDahoul, Nouar; Zaw Htike, Zaw; Akmeliawati, Rini
2017-03-01
The objective of goal localization is to find the location of goals in noisy environments. Simple actions are performed to move the agent towards the goal. The goal detector should be capable of minimizing the error between the predicted locations and the true ones. Few regions need to be processed by the agent to reduce the computational effort and increase the speed of convergence. In this paper, reinforcement learning (RL) method was utilized to find optimal series of actions to localize the goal region. The visual data, a set of images, is high dimensional unstructured data and needs to be represented efficiently to get a robust detector. Different deep Reinforcement models have already been used to localize a goal but most of them take long time to learn the model. This long learning time results from the weights fine tuning stage that is applied iteratively to find an accurate model. Hierarchical Extreme Learning Machine (H-ELM) was used as a fast deep model that doesn’t fine tune the weights. In other words, hidden weights are generated randomly and output weights are calculated analytically. H-ELM algorithm was used in this work to find good features for effective representation. This paper proposes a combination of Hierarchical Extreme learning machine and Reinforcement learning to find an optimal policy directly from visual input. This combination outperforms other methods in terms of accuracy and learning speed. The simulations and results were analysed by using MATLAB.
Rational and Mechanistic Perspectives on Reinforcement Learning
ERIC Educational Resources Information Center
Chater, Nick
2009-01-01
This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…
Salvador, Alexandre; Worbe, Yulia; Delorme, Cécile; Coricelli, Giorgio; Gaillard, Raphaël; Robbins, Trevor W; Hartmann, Andreas; Palminteri, Stefano
2017-07-24
The dopamine partial agonist aripiprazole is increasingly used to treat pathologies for which other antipsychotics are indicated because it displays fewer side effects, such as sedation and depression-like symptoms, than other dopamine receptor antagonists. Previously, we showed that aripiprazole may protect motivational function by preserving reinforcement-related signals used to sustain reward-maximization. However, the effect of aripiprazole on more cognitive facets of human reinforcement learning, such as learning from the forgone outcomes of alternative courses of action (i.e., counterfactual learning), is unknown. To test the influence of aripiprazole on counterfactual learning, we administered a reinforcement learning task that involves both direct learning from obtained outcomes and indirect learning from forgone outcomes to two groups of Gilles de la Tourette (GTS) patients, one consisting of patients who were completely unmedicated and the other consisting of patients who were receiving aripiprazole monotherapy, and to healthy subjects. We found that whereas learning performance improved in the presence of counterfactual feedback in both healthy controls and unmedicated GTS patients, this was not the case in aripiprazole-medicated GTS patients. Our results suggest that whereas aripiprazole preserves direct learning of action-outcome associations, it may impair more complex inferential processes, such as counterfactual learning from forgone outcomes, in GTS patients treated with this medication.
NASA Technical Reports Server (NTRS)
Yen, John; Wang, Haojin; Daugherity, Walter C.
1992-01-01
Fuzzy logic controllers have some often-cited advantages over conventional techniques such as PID control, including easier implementation, accommodation to natural language, and the ability to cover a wider range of operating conditions. One major obstacle that hinders the broader application of fuzzy logic controllers is the lack of a systematic way to develop and modify their rules; as a result the creation and modification of fuzzy rules often depends on trial and error or pure experimentation. One of the proposed approaches to address this issue is a self-learning fuzzy logic controller (SFLC) that uses reinforcement learning techniques to learn the desirability of states and to adjust the consequent part of its fuzzy control rules accordingly. Due to the different dynamics of the controlled processes, the performance of a self-learning fuzzy controller is highly contingent on its design. The design issue has not received sufficient attention. The issues related to the design of a SFLC for application to a petrochemical process are discussed, and its performance is compared with that of a PID and a self-tuning fuzzy logic controller.
Chang, Li-Chiu; Chen, Pin-An; Chang, Fi-John
2012-08-01
A reliable forecast of future events possesses great value. The main purpose of this paper is to propose an innovative learning technique for reinforcing the accuracy of two-step-ahead (2SA) forecasts. The real-time recurrent learning (RTRL) algorithm for recurrent neural networks (RNNs) can effectively model the dynamics of complex processes and has been used successfully in one-step-ahead forecasts for various time series. A reinforced RTRL algorithm for 2SA forecasts using RNNs is proposed in this paper, and its performance is investigated by two famous benchmark time series and a streamflow during flood events in Taiwan. Results demonstrate that the proposed reinforced 2SA RTRL algorithm for RNNs can adequately forecast the benchmark (theoretical) time series, significantly improve the accuracy of flood forecasts, and effectively reduce time-lag effects.
The effects of aging on the interaction between reinforcement learning and attention.
Radulescu, Angela; Daniel, Reka; Niv, Yael
2016-11-01
Reinforcement learning (RL) in complex environments relies on selective attention to uncover those aspects of the environment that are most predictive of reward. Whereas previous work has focused on age-related changes in RL, it is not known whether older adults learn differently from younger adults when selective attention is required. In 2 experiments, we examined how aging affects the interaction between RL and selective attention. Younger and older adults performed a learning task in which only 1 stimulus dimension was relevant to predicting reward, and within it, 1 "target" feature was the most rewarding. Participants had to discover this target feature through trial and error. In Experiment 1, stimuli varied on 1 or 3 dimensions and participants received hints that revealed the target feature, the relevant dimension, or gave no information. Group-related differences in accuracy and RTs differed systematically as a function of the number of dimensions and the type of hint available. In Experiment 2 we used trial-by-trial computational modeling of the learning process to test for age-related differences in learning strategies. Behavior of both young and older adults was explained well by a reinforcement-learning model that uses selective attention to constrain learning. However, the model suggested that older adults restricted their learning to fewer features, employing more focused attention than younger adults. Furthermore, this difference in strategy predicted age-related deficits in accuracy. We discuss these results suggesting that a narrower filter of attention may reflect an adaptation to the reduced capabilities of the reinforcement learning system. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Negative reinforcement learning is affected in substance dependence.
Thompson, Laetitia L; Claus, Eric D; Mikulich-Gilbertson, Susan K; Banich, Marie T; Crowley, Thomas; Krmpotich, Theodore; Miller, David; Tanabe, Jody
2012-06-01
Negative reinforcement results in behavior to escape or avoid an aversive outcome. Withdrawal symptoms are purported to be negative reinforcers in perpetuating substance dependence, but little is known about negative reinforcement learning in this population. The purpose of this study was to examine reinforcement learning in substance dependent individuals (SDI), with an emphasis on assessing negative reinforcement learning. We modified the Iowa Gambling Task to separately assess positive and negative reinforcement. We hypothesized that SDI would show differences in negative reinforcement learning compared to controls and we investigated whether learning differed as a function of the relative magnitude or frequency of the reinforcer. Thirty subjects dependent on psychostimulants were compared with 28 community controls on a decision making task that manipulated outcome frequencies and magnitudes and required an action to avoid a negative outcome. SDI did not learn to avoid negative outcomes to the same degree as controls. This difference was driven by the magnitude, not the frequency, of negative feedback. In contrast, approach behaviors in response to positive reinforcement were similar in both groups. Our findings are consistent with a specific deficit in negative reinforcement learning in SDI. SDI were relatively insensitive to the magnitude, not frequency, of loss. If this generalizes to drug-related stimuli, it suggests that repeated episodes of withdrawal may drive relapse more than the severity of a single episode. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
ERIC Educational Resources Information Center
Munoz-Organero, M.; Munoz-Merino, P. J.; Kloos, Carlos Delgado
2011-01-01
The use of technology in learning environments should be targeted at improving the learning outcome of the process. Several technology enhanced techniques can be used for maximizing the learning gain of particular students when having access to learning resources. One of them is content adaptation. Adapting content is especially important when…
Davidow, Juliet Y; Foerde, Karin; Galván, Adriana; Shohamy, Daphna
2016-10-05
Adolescents are notorious for engaging in reward-seeking behaviors, a tendency attributed to heightened activity in the brain's reward systems during adolescence. It has been suggested that reward sensitivity in adolescence might be adaptive, but evidence of an adaptive role has been scarce. Using a probabilistic reinforcement learning task combined with reinforcement learning models and fMRI, we found that adolescents showed better reinforcement learning and a stronger link between reinforcement learning and episodic memory for rewarding outcomes. This behavioral benefit was related to heightened prediction error-related BOLD activity in the hippocampus and to stronger functional connectivity between the hippocampus and the striatum at the time of reinforcement. These findings reveal an important role for the hippocampus in reinforcement learning in adolescence and suggest that reward sensitivity in adolescence is related to adaptive differences in how adolescents learn from experience. Copyright © 2016 Elsevier Inc. All rights reserved.
Analysis of Modeling Processes
ERIC Educational Resources Information Center
Bandura, Albert
1975-01-01
Traditional learning theories stress that people are either conditioned through reward and punishment or by close association with neutral or evocative stimuli. These direct experience theories do not account for people's learning complex behavior through observation. Attentional, retention, motoric reproduction, reinforcement, and motivational…
Design issues for a reinforcement-based self-learning fuzzy controller
NASA Technical Reports Server (NTRS)
Yen, John; Wang, Haojin; Dauherity, Walter
1993-01-01
Fuzzy logic controllers have some often cited advantages over conventional techniques such as PID control: easy implementation, its accommodation to natural language, the ability to cover wider range of operating conditions and others. One major obstacle that hinders its broader application is the lack of a systematic way to develop and modify its rules and as result the creation and modification of fuzzy rules often depends on try-error or pure experimentation. One of the proposed approaches to address this issue is self-learning fuzzy logic controllers (SFLC) that use reinforcement learning techniques to learn the desirability of states and to adjust the consequent part of fuzzy control rules accordingly. Due to the different dynamics of the controlled processes, the performance of self-learning fuzzy controller is highly contingent on the design. The design issue has not received sufficient attention. The issues related to the design of a SFLC for the application to chemical process are discussed and its performance is compared with that of PID and self-tuning fuzzy logic controller.
ERIC Educational Resources Information Center
Biederman, Gerald B.; Davey, Valerie A.
1995-01-01
This counterpoint responds to Phillip Ward's commentary on a study by Gerald Biederman and others which concluded that verbal reinforcement in combination with interactive modeling strategies may produce confusion in children with language/learning difficulties. The counterpoint argues that the commentary indicates problems with semantics and with…
The Computational Development of Reinforcement Learning during Adolescence
Palminteri, Stefano; Coricelli, Giorgio; Blakemore, Sarah-Jayne
2016-01-01
Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents’ behaviour was better explained by a basic reinforcement learning algorithm, adults’ behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence. PMID:27322574
Decker, Johannes H.; Otto, A. Ross; Daw, Nathaniel D.; Hartley, Catherine A.
2016-01-01
Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative strategies has not been well characterized. In this study, children, adolescents, and adults, performed a sequential reinforcement-learning task that enables estimation of model-based and model-free contributions to choice. Whereas a model-free strategy was evident in choice behavior across all age groups, evidence of a model-based strategy only emerged during adolescence and continued to increase into adulthood. These results suggest that recruitment of model-based valuation systems represents a critical cognitive component underlying the gradual maturation of goal-directed behavior. PMID:27084852
Robotic action acquisition with cognitive biases in coarse-grained state space.
Uragami, Daisuke; Kohno, Yu; Takahashi, Tatsuji
2016-07-01
Some of the authors have previously proposed a cognitively inspired reinforcement learning architecture (LS-Q) that mimics cognitive biases in humans. LS-Q adaptively learns under uniform, coarse-grained state division and performs well without parameter tuning in a giant-swing robot task. However, these results were shown only in simulations. In this study, we test the validity of the LS-Q implemented in a robot in a real environment. In addition, we analyze the learning process to elucidate the mechanism by which the LS-Q adaptively learns under the partially observable environment. We argue that the LS-Q may be a versatile reinforcement learning architecture, which is, despite its simplicity, easily applicable and does not require well-prepared settings. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Reinforcement Learning for Weakly-Coupled MDPs and an Application to Planetary Rover Control
NASA Technical Reports Server (NTRS)
Bernstein, Daniel S.; Zilberstein, Shlomo
2003-01-01
Weakly-coupled Markov decision processes can be decomposed into subprocesses that interact only through a small set of bottleneck states. We study a hierarchical reinforcement learning algorithm designed to take advantage of this particular type of decomposability. To test our algorithm, we use a decision-making problem faced by autonomous planetary rovers. In this problem, a Mars rover must decide which activities to perform and when to traverse between science sites in order to make the best use of its limited resources. In our experiments, the hierarchical algorithm performs better than Q-learning in the early stages of learning, but unlike Q-learning it converges to a suboptimal policy. This suggests that it may be advantageous to use the hierarchical algorithm when training time is limited.
Dissociable Learning Processes Underlie Human Pain Conditioning
Zhang, Suyi; Mano, Hiroaki; Ganesh, Gowrishankar; Robbins, Trevor; Seymour, Ben
2016-01-01
Summary Pavlovian conditioning underlies many aspects of pain behavior, including fear and threat detection [1], escape and avoidance learning [2], and endogenous analgesia [3]. Although a central role for the amygdala is well established [4], both human and animal studies implicate other brain regions in learning, notably ventral striatum and cerebellum [5]. It remains unclear whether these regions make different contributions to a single aversive learning process or represent independent learning mechanisms that interact to generate the expression of pain-related behavior. We designed a human parallel aversive conditioning paradigm in which different Pavlovian visual cues probabilistically predicted thermal pain primarily to either the left or right arm and studied the acquisition of conditioned Pavlovian responses using combined physiological recordings and fMRI. Using computational modeling based on reinforcement learning theory, we found that conditioning involves two distinct types of learning process. First, a non-specific “preparatory” system learns aversive facial expressions and autonomic responses such as skin conductance. The associated learning signals—the learned associability and prediction error—were correlated with fMRI brain responses in amygdala-striatal regions, corresponding to the classic aversive (fear) learning circuit. Second, a specific lateralized system learns “consummatory” limb-withdrawal responses, detectable with electromyography of the arm to which pain is predicted. Its related learned associability was correlated with responses in ipsilateral cerebellar cortex, suggesting a novel computational role for the cerebellum in pain. In conclusion, our results show that the overall phenotype of conditioned pain behavior depends on two dissociable reinforcement learning circuits. PMID:26711494
Reinforcement learning of periodical gaits in locomotion robots
NASA Astrophysics Data System (ADS)
Svinin, Mikhail; Yamada, Kazuyaki; Ushio, S.; Ueda, Kanji
1999-08-01
Emergence of stable gaits in locomotion robots is studied in this paper. A classifier system, implementing an instance- based reinforcement learning scheme, is used for sensory- motor control of an eight-legged mobile robot. Important feature of the classifier system is its ability to work with the continuous sensor space. The robot does not have a prior knowledge of the environment, its own internal model, and the goal coordinates. It is only assumed that the robot can acquire stable gaits by learning how to reach a light source. During the learning process the control system, is self-organized by reinforcement signals. Reaching the light source defines a global reward. Forward motion gets a local reward, while stepping back and falling down get a local punishment. Feasibility of the proposed self-organized system is tested under simulation and experiment. The control actions are specified at the leg level. It is shown that, as learning progresses, the number of the action rules in the classifier systems is stabilized to a certain level, corresponding to the acquired gait patterns.
Biases in probabilistic category learning in relation to social anxiety
Abraham, Anna; Hermann, Christiane
2015-01-01
Instrumental learning paradigms are rarely employed to investigate the mechanisms underlying acquired fear responses in social anxiety. Here, we adapted a probabilistic category learning paradigm to assess information processing biases as a function of the degree of social anxiety traits in a sample of healthy individuals without a diagnosis of social phobia. Participants were presented with three pairs of neutral faces with differing probabilistic accuracy contingencies (A/B: 80/20, C/D: 70/30, E/F: 60/40). Upon making their choice, negative and positive feedback was conveyed using angry and happy faces, respectively. The highly socially anxious group showed a strong tendency to be more accurate at learning the probability contingency associated with the most ambiguous stimulus pair (E/F: 60/40). Moreover, when pairing the most positively reinforced stimulus or the most negatively reinforced stimulus with all the other stimuli in a test phase, the highly socially anxious group avoided the most negatively reinforced stimulus significantly more than the control group. The results are discussed with reference to avoidance learning and hypersensitivity to negative socially evaluative information associated with social anxiety. PMID:26347685
Green Schools as High Performance Learning Facilities
ERIC Educational Resources Information Center
Gordon, Douglas E.
2010-01-01
In practice, a green school is the physical result of a consensus process of planning, design, and construction that takes into account a building's performance over its entire 50- to 60-year life cycle. The main focus of the process is to reinforce optimal learning, a goal very much in keeping with the parallel goals of resource efficiency and…
Kerr, Robert R.; Grayden, David B.; Thomas, Doreen A.; Gilson, Matthieu; Burkitt, Anthony N.
2014-01-01
A fundamental goal of neuroscience is to understand how cognitive processes, such as operant conditioning, are performed by the brain. Typical and well studied examples of operant conditioning, in which the firing rates of individual cortical neurons in monkeys are increased using rewards, provide an opportunity for insight into this. Studies of reward-modulated spike-timing-dependent plasticity (RSTDP), and of other models such as R-max, have reproduced this learning behavior, but they have assumed that no unsupervised learning is present (i.e., no learning occurs without, or independent of, rewards). We show that these models cannot elicit firing rate reinforcement while exhibiting both reward learning and ongoing, stable unsupervised learning. To fix this issue, we propose a new RSTDP model of synaptic plasticity based upon the observed effects that dopamine has on long-term potentiation and depression (LTP and LTD). We show, both analytically and through simulations, that our new model can exhibit unsupervised learning and lead to firing rate reinforcement. This requires that the strengthening of LTP by the reward signal is greater than the strengthening of LTD and that the reinforced neuron exhibits irregular firing. We show the robustness of our findings to spike-timing correlations, to the synaptic weight dependence that is assumed, and to changes in the mean reward. We also consider our model in the differential reinforcement of two nearby neurons. Our model aligns more strongly with experimental studies than previous models and makes testable predictions for future experiments. PMID:24475240
Awata, Hiroko; Wakuda, Ryo; Ishimaru, Yoshiyasu; Matsuoka, Yuji; Terao, Kanta; Katata, Satomi; Matsumoto, Yukihisa; Hamanaka, Yoshitaka; Noji, Sumihare; Mito, Taro; Mizunami, Makoto
2016-01-01
Revealing reinforcing mechanisms in associative learning is important for elucidation of brain mechanisms of behavior. In mammals, dopamine neurons are thought to mediate both appetitive and aversive reinforcement signals. Studies using transgenic fruit-flies suggested that dopamine neurons mediate both appetitive and aversive reinforcements, through the Dop1 dopamine receptor, but our studies using octopamine and dopamine receptor antagonists and using Dop1 knockout crickets suggested that octopamine neurons mediate appetitive reinforcement and dopamine neurons mediate aversive reinforcement in associative learning in crickets. To fully resolve this issue, we examined the effects of silencing of expression of genes that code the OA1 octopamine receptor and Dop1 and Dop2 dopamine receptors by RNAi in crickets. OA1-silenced crickets exhibited impairment in appetitive learning with water but not in aversive learning with sodium chloride solution, while Dop1-silenced crickets exhibited impairment in aversive learning but not in appetitive learning. Dop2-silenced crickets showed normal scores in both appetitive learning and aversive learning. The results indicate that octopamine neurons mediate appetitive reinforcement via OA1 and that dopamine neurons mediate aversive reinforcement via Dop1 in crickets, providing decisive evidence that neurotransmitters and receptors that mediate appetitive reinforcement indeed differ among different species of insects. PMID:27412401
Awata, Hiroko; Wakuda, Ryo; Ishimaru, Yoshiyasu; Matsuoka, Yuji; Terao, Kanta; Katata, Satomi; Matsumoto, Yukihisa; Hamanaka, Yoshitaka; Noji, Sumihare; Mito, Taro; Mizunami, Makoto
2016-07-14
Revealing reinforcing mechanisms in associative learning is important for elucidation of brain mechanisms of behavior. In mammals, dopamine neurons are thought to mediate both appetitive and aversive reinforcement signals. Studies using transgenic fruit-flies suggested that dopamine neurons mediate both appetitive and aversive reinforcements, through the Dop1 dopamine receptor, but our studies using octopamine and dopamine receptor antagonists and using Dop1 knockout crickets suggested that octopamine neurons mediate appetitive reinforcement and dopamine neurons mediate aversive reinforcement in associative learning in crickets. To fully resolve this issue, we examined the effects of silencing of expression of genes that code the OA1 octopamine receptor and Dop1 and Dop2 dopamine receptors by RNAi in crickets. OA1-silenced crickets exhibited impairment in appetitive learning with water but not in aversive learning with sodium chloride solution, while Dop1-silenced crickets exhibited impairment in aversive learning but not in appetitive learning. Dop2-silenced crickets showed normal scores in both appetitive learning and aversive learning. The results indicate that octopamine neurons mediate appetitive reinforcement via OA1 and that dopamine neurons mediate aversive reinforcement via Dop1 in crickets, providing decisive evidence that neurotransmitters and receptors that mediate appetitive reinforcement indeed differ among different species of insects.
Oliveira, Emileane C; Hunziker, Maria Helena
2014-07-01
In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title. Copyright © 2014. Published by Elsevier B.V.
Amygdala and Ventral Striatum Make Distinct Contributions to Reinforcement Learning.
Costa, Vincent D; Dal Monte, Olga; Lucas, Daniel R; Murray, Elisabeth A; Averbeck, Bruno B
2016-10-19
Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL, we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with an RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys' choice reaction times, which emphasized a speed-accuracy trade-off that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL. Published by Elsevier Inc.
Amygdala and ventral striatum make distinct contributions to reinforcement learning
Costa, Vincent D.; Monte, Olga Dal; Lucas, Daniel R.; Murray, Elisabeth A.; Averbeck, Bruno B.
2016-01-01
Summary Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with a RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys’ choice reaction times, which emphasized a speed-accuracy tradeoff that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL. PMID:27720488
Rational and mechanistic perspectives on reinforcement learning.
Chater, Nick
2009-12-01
This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: mechanistic and rational. Reinforcement learning is often viewed in mechanistic terms--as describing the operation of aspects of an agent's cognitive and neural machinery. Yet it can also be viewed as a rational level of description, specifically, as describing a class of methods for learning from experience, using minimal background knowledge. This paper considers how rational and mechanistic perspectives differ, and what types of evidence distinguish between them. Reinforcement learning research in the cognitive and brain sciences is often implicitly committed to the mechanistic interpretation. Here the opposite view is put forward: that accounts of reinforcement learning should apply at the rational level, unless there is strong evidence for a mechanistic interpretation. Implications of this viewpoint for reinforcement-based theories in the cognitive and brain sciences are discussed.
Valenchon, Mathilde; Lévy, Frédéric; Moussu, Chantal; Lansade, Léa
2017-01-01
The present study investigated how stress affects instrumental learning performance in horses (Equus caballus) depending on the type of reinforcement. Horses were assigned to four groups (N = 15 per group); each group received training with negative or positive reinforcement in the presence or absence of stressors unrelated to the learning task. The instrumental learning task consisted of the horse entering one of two compartments at the appearance of a visual signal given by the experimenter. In the absence of stressors unrelated to the task, learning performance did not differ between negative and positive reinforcements. The presence of stressors unrelated to the task (exposure to novel and sudden stimuli) impaired learning performance. Interestingly, this learning deficit was smaller when the negative reinforcement was used. The negative reinforcement, considered as a stressor related to the task, could have counterbalanced the impact of the extrinsic stressor by focusing attention toward the learning task. In addition, learning performance appears to differ between certain dimensions of personality depending on the presence of stressors and the type of reinforcement. These results suggest that when negative reinforcement is used (i.e. stressor related to the task), the most fearful horses may be the best performers in the absence of stressors but the worst performers when stressors are present. On the contrary, when positive reinforcement is used, the most fearful horses appear to be consistently the worst performers, with and without exposure to stressors unrelated to the learning task. This study is the first to demonstrate in ungulates that stress affects learning performance differentially according to the type of reinforcement and in interaction with personality. It provides fundamental and applied perspectives in the understanding of the relationships between personality and training abilities. PMID:28475581
Dere, Ekrem; De Souza-Silva, Maria A; Topic, Bianca; Spieler, Richard E; Haas, Helmut L; Huston, Joseph P
2003-01-01
The brain's histaminergic system has been implicated in hippocampal synaptic plasticity, learning, and memory, as well as brain reward and reinforcement. Our past pharmacological and lesion studies indicated that the brain's histamine system exerts inhibitory effects on the brain's reinforcement respective reward system reciprocal to mesolimbic dopamine systems, thereby modulating learning and memory performance. Given the close functional relationship between brain reinforcement and memory processes, the total disruption of brain histamine synthesis via genetic disruption of its synthesizing enzyme, histidine decarboxylase (HDC), in the mouse might have differential effects on learning dependent on the task-inherent reinforcement contingencies. Here, we investigated the effects of an HDC gene disruption in the mouse in a nonreinforced object exploration task and a negatively reinforced water-maze task as well as on neo- and ventro-striatal dopamine systems known to be involved in brain reward and reinforcement. Histidine decarboxylase knockout (HDC-KO) mice had higher dihydrophenylacetic acid concentrations and a higher dihydrophenylacetic acid/dopamine ratio in the neostriatum. In the ventral striatum, dihydrophenylacetic acid/dopamine and 3-methoxytyramine/dopamine ratios were higher in HDC-KO mice. Furthermore, the HDC-KO mice showed improved water-maze performance during both hidden and cued platform tasks, but deficient object discrimination based on temporal relationships. Our data imply that disruption of brain histamine synthesis can have both memory promoting and suppressive effects via distinct and independent mechanisms and further indicate that these opposed effects are related to the task-inherent reinforcement contingencies.
2010-02-01
multi-agent reputation management. State abstraction is a technique used to allow machine learning technologies to cope with problems that have large...state abstrac- tion process to enable reinforcement learning in domains with large state spaces. State abstraction is vital to machine learning ...across a collective of independent platforms. These individual elements, often referred to as agents in the machine learning community, should exhibit both
Policy improvement by a model-free Dyna architecture.
Hwang, Kao-Shing; Lo, Chia-Yue
2013-05-01
The objective of this paper is to accelerate the process of policy improvement in reinforcement learning. The proposed Dyna-style system combines two learning schemes, one of which utilizes a temporal difference method for direct learning; the other uses relative values for indirect learning in planning between two successive direct learning cycles. Instead of establishing a complicated world model, the approach introduces a simple predictor of average rewards to actor-critic architecture in the simulation (planning) mode. The relative value of a state, defined as the accumulated differences between immediate reward and average reward, is used to steer the improvement process in the right direction. The proposed learning scheme is applied to control a pendulum system for tracking a desired trajectory to demonstrate its adaptability and robustness. Through reinforcement signals from the environment, the system takes the appropriate action to drive an unknown dynamic to track desired outputs in few learning cycles. Comparisons are made between the proposed model-free method, a connectionist adaptive heuristic critic, and an advanced method of Dyna-Q learning in the experiments of labyrinth exploration. The proposed method outperforms its counterparts in terms of elapsed time and convergence rate.
Chronic Exposure to Methamphetamine Disrupts Reinforcement-Based Decision Making in Rats.
Groman, Stephanie M; Rich, Katherine M; Smith, Nathaniel J; Lee, Daeyeol; Taylor, Jane R
2018-03-01
The persistent use of psychostimulant drugs, despite the detrimental outcomes associated with continued drug use, may be because of disruptions in reinforcement-learning processes that enable behavior to remain flexible and goal directed in dynamic environments. To identify the reinforcement-learning processes that are affected by chronic exposure to the psychostimulant methamphetamine (MA), the current study sought to use computational and biochemical analyses to characterize decision-making processes, assessed by probabilistic reversal learning, in rats before and after they were exposed to an escalating dose regimen of MA (or saline control). The ability of rats to use flexible and adaptive decision-making strategies following changes in stimulus-reward contingencies was significantly impaired following exposure to MA. Computational analyses of parameters that track choice and outcome behavior indicated that exposure to MA significantly impaired the ability of rats to use negative outcomes effectively. These MA-induced changes in decision making were similar to those observed in rats following administration of a dopamine D2/3 receptor antagonist. These data use computational models to provide insight into drug-induced maladaptive decision making that may ultimately identify novel targets for the treatment of psychostimulant addiction. We suggest that the disruption in utilization of negative outcomes to adaptively guide dynamic decision making is a new behavioral mechanism by which MA rigidly biases choice behavior.
Kruse, Lauren C; Schindler, Abigail G; Williams, Rapheal G; Weber, Sophia J; Clark, Jeremy J
2017-01-01
According to recent WHO reports, alcohol remains the number one substance used and abused by adolescents, despite public health efforts to curb its use. Adolescence is a critical period of biological maturation where brain development, particularly the mesocorticolimbic dopamine system, undergoes substantial remodeling. These circuits are implicated in complex decision making, incentive learning and reinforcement during substance use and abuse. An appealing theoretical approach has been to suggest that alcohol alters the normal development of these processes to promote deficits in reinforcement learning and decision making, which together make individuals vulnerable to developing substance use disorders in adulthood. Previously we have used a preclinical model of voluntary alcohol intake in rats to show that use in adolescence promotes risky decision making in adulthood that is mirrored by selective perturbations in dopamine network dynamics. Further, we have demonstrated that incentive learning processes in adulthood are also altered by adolescent alcohol use, again mirrored by changes in cue-evoked dopamine signaling. Indeed, we have proposed that these two processes, risk-based decision making and incentive learning, are fundamentally linked through dysfunction of midbrain circuitry where inputs to the dopamine system are disrupted by adolescent alcohol use. Here, we test the behavioral predictions of this model in rats and present the findings in the context of the prevailing literature with reference to the long-term consequences of early-life substance use on the vulnerability to develop substance use disorders. We utilize an impulsive choice task to assess the selectivity of alcohol's effect on decision-making profiles and conditioned reinforcement to parse out the effect of incentive value attribution, one mechanism of incentive learning. Finally, we use the differential reinforcement of low rates of responding (DRL) task to examine the degree to which behavioral disinhibition may contribute to an overall decision-making profile. The findings presented here support the proposition that early life alcohol use selectively alters risk-based choice behavior through modulation of incentive learning processes, both of which may be inexorably linked through perturbations in mesolimbic circuitry and may serve as fundamental vulnerabilities to the development of substance use disorders.
Kruse, Lauren C.; Schindler, Abigail G.; Williams, Rapheal G.; Weber, Sophia J.; Clark, Jeremy J.
2017-01-01
According to recent WHO reports, alcohol remains the number one substance used and abused by adolescents, despite public health efforts to curb its use. Adolescence is a critical period of biological maturation where brain development, particularly the mesocorticolimbic dopamine system, undergoes substantial remodeling. These circuits are implicated in complex decision making, incentive learning and reinforcement during substance use and abuse. An appealing theoretical approach has been to suggest that alcohol alters the normal development of these processes to promote deficits in reinforcement learning and decision making, which together make individuals vulnerable to developing substance use disorders in adulthood. Previously we have used a preclinical model of voluntary alcohol intake in rats to show that use in adolescence promotes risky decision making in adulthood that is mirrored by selective perturbations in dopamine network dynamics. Further, we have demonstrated that incentive learning processes in adulthood are also altered by adolescent alcohol use, again mirrored by changes in cue-evoked dopamine signaling. Indeed, we have proposed that these two processes, risk-based decision making and incentive learning, are fundamentally linked through dysfunction of midbrain circuitry where inputs to the dopamine system are disrupted by adolescent alcohol use. Here, we test the behavioral predictions of this model in rats and present the findings in the context of the prevailing literature with reference to the long-term consequences of early-life substance use on the vulnerability to develop substance use disorders. We utilize an impulsive choice task to assess the selectivity of alcohol’s effect on decision-making profiles and conditioned reinforcement to parse out the effect of incentive value attribution, one mechanism of incentive learning. Finally, we use the differential reinforcement of low rates of responding (DRL) task to examine the degree to which behavioral disinhibition may contribute to an overall decision-making profile. The findings presented here support the proposition that early life alcohol use selectively alters risk-based choice behavior through modulation of incentive learning processes, both of which may be inexorably linked through perturbations in mesolimbic circuitry and may serve as fundamental vulnerabilities to the development of substance use disorders. PMID:28790900
Liu, Chunming; Xu, Xin; Hu, Dewen
2013-04-29
Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.
Avoidance-based human Pavlovian-to-instrumental transfer
Lewis, Andrea H.; Niznikiewicz, Michael A.; Delamater, Andrew R.; Delgado, Mauricio R.
2013-01-01
The Pavlovian-to-instrumental transfer (PIT) paradigm probes the influence of Pavlovian cues over instrumentally learned behavior. The paradigm has been used extensively to probe basic cognitive and motivational processes in studies of animal learning but, more recently, PIT and its underlying neural basis have been extended to investigations in humans. These initial neuroimaging studies of PIT have focused on the influence of appetitively conditioned stimuli on instrumental responses maintained by positive reinforcement, and highlight the involvement of the striatum. In the current study, we sought to understand the neural correlates of PIT in an aversive Pavlovian learning situation when instrumental responding was maintained through negative reinforcement. Participants exhibited specific PIT, wherein selective increases in instrumental responding to conditioned stimuli occurred when the stimulus signaled a specific aversive outcome whose omission negatively reinforced the instrumental response. Additionally, a general PIT effect was observed such that when a stimulus was associated with a different aversive outcome than was used to negatively reinforce instrumental behavior, the presence of that stimulus caused a non-selective increase in overall instrumental responding. Both specific and general PIT behavioral effects correlated with increased activation in corticostriatal circuitry, particularly in the striatum, a region involved in cognitive and motivational processes. These results suggest that avoidance-based PIT utilizes a similar neural mechanism to that seen with PIT in an appetitive context, which has implications for understanding mechanisms of drug-seeking behavior during addiction and relapse. PMID:24118624
Hierarchically organized behavior and its neural foundations: A reinforcement-learning perspective
Botvinick, Matthew M.; Niv, Yael; Barto, Andrew C.
2009-01-01
Research on human and animal behavior has long emphasized its hierarchical structure — the divisibility of ongoing behavior into discrete tasks, which are comprised of subtask sequences, which in turn are built of simple actions. The hierarchical structure of behavior has also been of enduring interest within neuroscience, where it has been widely considered to reflect prefrontal cortical functions. In this paper, we reexamine behavioral hierarchy and its neural substrates from the point of view of recent developments in computational reinforcement learning. Specifically, we consider a set of approaches known collectively as hierarchical reinforcement learning, which extend the reinforcement learning paradigm by allowing the learning agent to aggregate actions into reusable subroutines or skills. A close look at the components of hierarchical reinforcement learning suggests how they might map onto neural structures, in particular regions within the dorsolateral and orbital prefrontal cortex. It also suggests specific ways in which hierarchical reinforcement learning might provide a complement to existing psychological models of hierarchically structured behavior. A particularly important question that hierarchical reinforcement learning brings to the fore is that of how learning identifies new action routines that are likely to provide useful building blocks in solving a wide range of future problems. Here and at many other points, hierarchical reinforcement learning offers an appealing framework for investigating the computational and neural underpinnings of hierarchically structured behavior. PMID:18926527
Neural mechanisms of reinforcement learning in unmedicated patients with major depressive disorder.
Rothkirch, Marcus; Tonn, Jonas; Köhler, Stephan; Sterzer, Philipp
2017-04-01
According to current concepts, major depressive disorder is strongly related to dysfunctional neural processing of motivational information, entailing impairments in reinforcement learning. While computational modelling can reveal the precise nature of neural learning signals, it has not been used to study learning-related neural dysfunctions in unmedicated patients with major depressive disorder so far. We thus aimed at comparing the neural coding of reward and punishment prediction errors, representing indicators of neural learning-related processes, between unmedicated patients with major depressive disorder and healthy participants. To this end, a group of unmedicated patients with major depressive disorder (n = 28) and a group of age- and sex-matched healthy control participants (n = 30) completed an instrumental learning task involving monetary gains and losses during functional magnetic resonance imaging. The two groups did not differ in their learning performance. Patients and control participants showed the same level of prediction error-related activity in the ventral striatum and the anterior insula. In contrast, neural coding of reward prediction errors in the medial orbitofrontal cortex was reduced in patients. Moreover, neural reward prediction error signals in the medial orbitofrontal cortex and ventral striatum showed negative correlations with anhedonia severity. Using a standard instrumental learning paradigm we found no evidence for an overall impairment of reinforcement learning in medication-free patients with major depressive disorder. Importantly, however, the attenuated neural coding of reward in the medial orbitofrontal cortex and the relation between anhedonia and reduced reward prediction error-signalling in the medial orbitofrontal cortex and ventral striatum likely reflect an impairment in experiencing pleasure from rewarding events as a key mechanism of anhedonia in major depressive disorder. © The Author (2017). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Beyond Stimulus Cues and Reinforcement Signals: A New Approach to Animal Metacognition
Couchman, Justin J.; Coutinho, Mariana V. C.; Beran, Michael J.; Smith, J. David
2010-01-01
Some metacognition paradigms for nonhuman animals encourage the alternative explanation that animals avoid difficult trials based only on reinforcement history and stimulus aversion. To explore this possibility, we placed humans and monkeys in successive uncertainty-monitoring tasks that were qualitatively different, eliminating many associative cues that might support transfer across tasks. In addition, task transfer occurred under conditions of deferred and rearranged feedback—both species completed blocks of trials followed by summary feedback. This ensured that animals received no trial-by-trial reinforcement. Despite distancing performance from associative cues, humans and monkeys still made adaptive uncertainty responses by declining the most difficult trials. These findings suggest that monkeys’ uncertainty responses could represent a higher-level, decisional process of cognitive monitoring, though that process need not involve full self-awareness or consciousness. The dissociation of performance from reinforcement has theoretical implications concerning the status of reinforcement as the critical binding force in animal learning. PMID:20836592
Ethanol Seeking by Long Evans Rats Is Not Always a Goal-Directed Behavior
Mangieri, Regina A.; Cofresí, Roberto U.; Gonzales, Rueben A.
2012-01-01
Background Two parallel and interacting processes are said to underlie animal behavior, whereby learning and performance of a behavior is at first via conscious and deliberate (goal-directed) processes, but after initial acquisition, the behavior can become automatic and stimulus-elicited (habitual). With respect to instrumental behaviors, animal learning studies suggest that the duration of training and the action-outcome contingency are two factors involved in the emergence of habitual seeking of “natural” reinforcers (e.g., sweet solutions, food or sucrose pellets). To rigorously test whether behaviors reinforced by abused substances such as ethanol, in particular, similarly become habitual was the primary aim of this study. Methodology/Principal Findings Male Long Evans rats underwent extended or limited operant lever press training with 10% sucrose/10% ethanol (10S10E) reinforcement (variable interval (VI) or (VR) ratio schedule of reinforcement), or with 10% sucrose (10S) reinforcement (VI schedule only). Once training and pretesting were complete, the impact of outcome devaluation on operant behavior was evaluated after lithium chloride injections were paired with the reinforcer, or unpaired 24 hours later. After limited, but not extended instrumental training, lever pressing by groups trained under VR with 10S10E and under VI with 10S was sensitive to outcome devaluation. In contrast, responding by both the extended and limited training 10S10E VI groups was not sensitive to ethanol devaluation during the test for habitual behavior. Conclusions/Significance Operant behavior by rats trained to self-administer an ethanol-sucrose solution showed variable sensitivity to a change in the value of ethanol, with relative insensitivity developing sooner in animals that received time-variable ethanol reinforcement during training sessions. One important implication, with respect to substance abuse in humans, is that initial learning about the relationship between instrumental actions and the opportunity to consume ethanol-containing drinks can influence the time course for the development or expression of habitual ethanol seeking behavior. PMID:22870342
Somatosensory Contribution to the Initial Stages of Human Motor Learning
Bernardi, Nicolò F.; Darainy, Mohammad
2015-01-01
The early stages of motor skill acquisition are often marked by uncertainty about the sensory and motor goals of the task, as is the case in learning to speak or learning the feel of a good tennis serve. Here we present an experimental model of this early learning process, in which targets are acquired by exploration and reinforcement rather than sensory error. We use this model to investigate the relative contribution of motor and sensory factors to human motor learning. Participants make active reaching movements or matched passive movements to an unseen target using a robot arm. We find that learning through passive movements paired with reinforcement is comparable with learning associated with active movement, both in terms of magnitude and durability, with improvements due to training still observable at a 1 week retest. Motor learning is also accompanied by changes in somatosensory perceptual acuity. No stable changes in motor performance are observed for participants that train, actively or passively, in the absence of reinforcement, or for participants who are given explicit information about target position in the absence of somatosensory experience. These findings indicate that the somatosensory system dominates learning in the early stages of motor skill acquisition. SIGNIFICANCE STATEMENT The research focuses on the initial stages of human motor learning, introducing a new experimental model that closely approximates the key features of motor learning outside of the laboratory. The finding indicates that it is the somatosensory system rather than the motor system that dominates learning in the early stages of motor skill acquisition. This is important given that most of our computational models of motor learning are based on the idea that learning is motoric in origin. This is also a valuable finding for rehabilitation of patients with limited mobility as it shows that reinforcement in conjunction with passive movement results in benefits to motor learning that are as great as those observed for active movement training. PMID:26490869
Model-Based Reinforcement Learning under Concurrent Schedules of Reinforcement in Rodents
ERIC Educational Resources Information Center
Huh, Namjung; Jo, Suhyun; Kim, Hoseok; Sul, Jung Hoon; Jung, Min Whan
2009-01-01
Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's…
Konovalov, Arkady; Krajbich, Ian
2016-01-01
Organisms appear to learn and make decisions using different strategies known as model-free and model-based learning; the former is mere reinforcement of previously rewarded actions and the latter is a forward-looking strategy that involves evaluation of action-state transition probabilities. Prior work has used neural data to argue that both model-based and model-free learners implement a value comparison process at trial onset, but model-based learners assign more weight to forward-looking computations. Here using eye-tracking, we report evidence for a different interpretation of prior results: model-based subjects make their choices prior to trial onset. In contrast, model-free subjects tend to ignore model-based aspects of the task and instead seem to treat the decision problem as a simple comparison process between two differentially valued items, consistent with previous work on sequential-sampling models of decision making. These findings illustrate a problem with assuming that experimental subjects make their decisions at the same prescribed time. PMID:27511383
Negative reinforcement impairs overnight memory consolidation.
Stamm, Andrew W; Nguyen, Nam D; Seicol, Benjamin J; Fagan, Abigail; Oh, Angela; Drumm, Michael; Lundt, Maureen; Stickgold, Robert; Wamsley, Erin J
2014-11-01
Post-learning sleep is beneficial for human memory. However, it may be that not all memories benefit equally from sleep. Here, we manipulated a spatial learning task using monetary reward and performance feedback, asking whether enhancing the salience of the task would augment overnight memory consolidation and alter its incorporation into dreaming. Contrary to our hypothesis, we found that the addition of reward impaired overnight consolidation of spatial memory. Our findings seemingly contradict prior reports that enhancing the reward value of learned information augments sleep-dependent memory processing. Given that the reward followed a negative reinforcement paradigm, consolidation may have been impaired via a stress-related mechanism. © 2014 Stamm et al.; Published by Cold Spring Harbor Laboratory Press.
GA-based fuzzy reinforcement learning for control of a magnetic bearing system.
Lin, C T; Jou, C P
2000-01-01
This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.
Theoretical assumptions of Maffesoli's sensitivity and Problem-Based Learning in Nursing Education1
Rodríguez-Borrego, María-Aurora; Nitschke, Rosane Gonçalves; do Prado, Marta Lenise; Martini, Jussara Gue; Guerra-Martín, María-Dolores; González-Galán, Carmen
2014-01-01
Objective understand the everyday and the imaginary of Nursing students in their knowledge socialization process through the Problem-Based Learning (PBL) strategy. Method Action Research, involving 86 students from the second year of an undergraduate Nursing program in Spain. A Critical Incident Questionnaire and Group interview were used. Thematic/categorical analysis, triangulation of researchers, subjects and techniques. Results the students signal the need to have a view from within, reinforcing the criticism against the schematic dualism; PBL allows one to learn how to be with the other, with his mechanical and organic solidarity; the feeling together, with its emphasis on learning to work in group and wanting to be close to the person taking care. Conclusions The great contradictions the protagonists of the process, that is, the students experience seem to express that group learning is not a form of gaining knowledge, as it makes them lose time to study. The daily, the execution time and the imaginary of how learning should be do not seem to have an intersection point in the use of Problem-Based Learning. The importance of focusing on the daily and the imaginary should be reinforced when we consider nursing education. PMID:25029064
ERIC Educational Resources Information Center
Hartjen, Raymond H.
Albert Bandura of Stanford University has proposed four component processes to his theory of observational learning: a) attention, b) retention, c) motor reproduction, and d) reinforcement and motivation. This study represents one phase of an effort to relate modeling and observational learning theory to teacher training. The problem of this study…
Bouton, Mark E.; Winterbauer, Neil E.; Todd, Travis P.
2012-01-01
It is widely recognized that extinction (the procedure in which a Pavlovian conditioned stimulus or an instrumental action is repeatedly presented without its reinforcer) weakens behavior without erasing the original learning. Most of the experiments that support this claim have focused on several “relapse” effects that occur after Pavlovian extinction, which collectively suggest that the original learning is saved through extinction. However, although such effects do occur after instrumental extinction, they have not been explored there in as much detail. This article reviews recent research in our laboratory that has investigated three relapse effects that occur after the extinction of instrumental (operant) learning. In renewal, responding returns after extinction when the behavior is tested in a different context; in resurgence, responding recovers when a second response that has been reinforced during extinction of the first is itself put on extinction; and in rapid reacquisition, extinguished responding returns rapidly when the response is reinforced again. The results provide new insights into extinction and relapse, and are consistent with principles that have been developed to explain extinction and relapse as they occur after Pavlovian conditioning. Extinction of instrumental learning, like Pavlovian learning, involves new learning that is relatively dependent on the context for expression. PMID:22450305
Complexity and Competition in Appetitive and Aversive Neural Circuits
Barberini, Crista L.; Morrison, Sara E.; Saez, Alex; Lau, Brian; Salzman, C. Daniel
2012-01-01
Decision-making often involves using sensory cues to predict possible rewarding or punishing reinforcement outcomes before selecting a course of action. Recent work has revealed complexity in how the brain learns to predict rewards and punishments. Analysis of neural signaling during and after learning in the amygdala and orbitofrontal cortex, two brain areas that process appetitive and aversive stimuli, reveals a dynamic relationship between appetitive and aversive circuits. Specifically, the relationship between signaling in appetitive and aversive circuits in these areas shifts as a function of learning. Furthermore, although appetitive and aversive circuits may often drive opposite behaviors – approaching or avoiding reinforcement depending upon its valence – these circuits can also drive similar behaviors, such as enhanced arousal or attention; these processes also may influence choice behavior. These data highlight the formidable challenges ahead in dissecting how appetitive and aversive neural circuits interact to produce a complex and nuanced range of behaviors. PMID:23189037
Dissociable Learning Processes Underlie Human Pain Conditioning.
Zhang, Suyi; Mano, Hiroaki; Ganesh, Gowrishankar; Robbins, Trevor; Seymour, Ben
2016-01-11
Pavlovian conditioning underlies many aspects of pain behavior, including fear and threat detection [1], escape and avoidance learning [2], and endogenous analgesia [3]. Although a central role for the amygdala is well established [4], both human and animal studies implicate other brain regions in learning, notably ventral striatum and cerebellum [5]. It remains unclear whether these regions make different contributions to a single aversive learning process or represent independent learning mechanisms that interact to generate the expression of pain-related behavior. We designed a human parallel aversive conditioning paradigm in which different Pavlovian visual cues probabilistically predicted thermal pain primarily to either the left or right arm and studied the acquisition of conditioned Pavlovian responses using combined physiological recordings and fMRI. Using computational modeling based on reinforcement learning theory, we found that conditioning involves two distinct types of learning process. First, a non-specific "preparatory" system learns aversive facial expressions and autonomic responses such as skin conductance. The associated learning signals-the learned associability and prediction error-were correlated with fMRI brain responses in amygdala-striatal regions, corresponding to the classic aversive (fear) learning circuit. Second, a specific lateralized system learns "consummatory" limb-withdrawal responses, detectable with electromyography of the arm to which pain is predicted. Its related learned associability was correlated with responses in ipsilateral cerebellar cortex, suggesting a novel computational role for the cerebellum in pain. In conclusion, our results show that the overall phenotype of conditioned pain behavior depends on two dissociable reinforcement learning circuits. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Motivation of extended behaviors by anterior cingulate cortex.
Holroyd, Clay B; Yeung, Nick
2012-02-01
Intense research interest over the past decade has yielded diverse and often discrepant theories about the function of anterior cingulate cortex (ACC). In particular, a dichotomy has emerged between neuropsychological theories suggesting a primary role for ACC in motivating or 'energizing' behavior, and neuroimaging-inspired theories emphasizing its contribution to cognitive control and reinforcement learning. To reconcile these views, we propose that ACC supports the selection and maintenance of 'options' - extended, context-specific sequences of behavior directed toward particular goals - that are learned through a process of hierarchical reinforcement learning. This theory accounts for ACC activity in relation to learning and control while simultaneously explaining the effects of ACC damage as disrupting the motivational context supporting the production of goal-directed action sequences. Copyright © 2011 Elsevier Ltd. All rights reserved.
11.2 YIP Human In the Loop Statistical RelationalLearners
2017-10-23
learning formalisms including inverse reinforcement learning [4] and statistical relational learning [7, 5, 8]. We have also applied our algorithms in...one introduced for label preferences. 4 Figure 2: Active Advice Seeking for Inverse Reinforcement Learning. active advice seeking is in selecting the...learning tasks. 1.2.1 Sequential Decision-Making Our previous work on advice for inverse reinforcement learning (IRL) defined advice as action
Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model
Jensen, Greg; Muñoz, Fabian; Alkan, Yelda; Ferrera, Vincent P.; Terrace, Herbert S.
2015-01-01
Transitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort, inspired by cognitive processes, which performs transitive inference at low computational cost. This is accomplished by (1) representing stimulus positions along a unit span using beta distributions, (2) treating positive and negative feedback asymmetrically, and (3) updating the position of every stimulus during every trial, whether that stimulus was visible or not. Performance was compared for rhesus macaques, humans, and the betasort algorithm, as well as Q-learning, an established reward-prediction error (RPE) model. Of these, only Q-learning failed to respond above chance during critical test trials. Betasort’s success (when compared to RPE models) and its computational efficiency (when compared to full Markov decision process implementations) suggests that the study of reinforcement learning in organisms will be best served by a feature-driven approach to comparing formal models. PMID:26407227
Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model.
Jensen, Greg; Muñoz, Fabian; Alkan, Yelda; Ferrera, Vincent P; Terrace, Herbert S
2015-01-01
Transitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort, inspired by cognitive processes, which performs transitive inference at low computational cost. This is accomplished by (1) representing stimulus positions along a unit span using beta distributions, (2) treating positive and negative feedback asymmetrically, and (3) updating the position of every stimulus during every trial, whether that stimulus was visible or not. Performance was compared for rhesus macaques, humans, and the betasort algorithm, as well as Q-learning, an established reward-prediction error (RPE) model. Of these, only Q-learning failed to respond above chance during critical test trials. Betasort's success (when compared to RPE models) and its computational efficiency (when compared to full Markov decision process implementations) suggests that the study of reinforcement learning in organisms will be best served by a feature-driven approach to comparing formal models.
NASA Astrophysics Data System (ADS)
Radac, Mircea-Bogdan; Precup, Radu-Emil; Roman, Raul-Cristian
2017-04-01
This paper proposes the combination of two model-free controller tuning techniques, namely linear virtual reference feedback tuning (VRFT) and nonlinear state-feedback Q-learning, referred to as a new mixed VRFT-Q learning approach. VRFT is first used to find stabilising feedback controller using input-output experimental data from the process in a model reference tracking setting. Reinforcement Q-learning is next applied in the same setting using input-state experimental data collected under perturbed VRFT to ensure good exploration. The Q-learning controller learned with a batch fitted Q iteration algorithm uses two neural networks, one for the Q-function estimator and one for the controller, respectively. The VRFT-Q learning approach is validated on position control of a two-degrees-of-motion open-loop stable multi input-multi output (MIMO) aerodynamic system (AS). Extensive simulations for the two independent control channels of the MIMO AS show that the Q-learning controllers clearly improve performance over the VRFT controllers.
Social learning through prediction error in the brain
NASA Astrophysics Data System (ADS)
Joiner, Jessica; Piva, Matthew; Turrin, Courtney; Chang, Steve W. C.
2017-06-01
Learning about the world is critical to survival and success. In social animals, learning about others is a necessary component of navigating the social world, ultimately contributing to increasing evolutionary fitness. How humans and nonhuman animals represent the internal states and experiences of others has long been a subject of intense interest in the developmental psychology tradition, and, more recently, in studies of learning and decision making involving self and other. In this review, we explore how psychology conceptualizes the process of representing others, and how neuroscience has uncovered correlates of reinforcement learning signals to explore the neural mechanisms underlying social learning from the perspective of representing reward-related information about self and other. In particular, we discuss self-referenced and other-referenced types of reward prediction errors across multiple brain structures that effectively allow reinforcement learning algorithms to mediate social learning. Prediction-based computational principles in the brain may be strikingly conserved between self-referenced and other-referenced information.
Real-Time Teaching: Lessons from Katrina
ERIC Educational Resources Information Center
Phillips, Antoinette S.; Phillips, Carl R.
2008-01-01
Professors strive constantly to find ways for students to apply what they are learning in the classroom, thereby reinforcing principles being taught and increasing student interest and involvement in the learning process. Hurricane Katrina's devastating impact on the Gulf Coast had wide-ranging consequences. As a result, many individuals…
Decker, Johannes H; Otto, A Ross; Daw, Nathaniel D; Hartley, Catherine A
2016-06-01
Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative strategies has not been well characterized. In this study, children, adolescents, and adults performed a sequential reinforcement-learning task that enabled estimation of model-based and model-free contributions to choice. Whereas a model-free strategy was apparent in choice behavior across all age groups, a model-based strategy was absent in children, became evident in adolescents, and strengthened in adults. These results suggest that recruitment of model-based valuation systems represents a critical cognitive component underlying the gradual maturation of goal-directed behavior. © The Author(s) 2016.
Effects of dopamine on reinforcement learning and consolidation in Parkinson's disease.
Grogan, John P; Tsivos, Demitra; Smith, Laura; Knight, Brogan E; Bogacz, Rafal; Whone, Alan; Coulthard, Elizabeth J
2017-07-10
Emerging evidence suggests that dopamine may modulate learning and memory with important implications for understanding the neurobiology of memory and future therapeutic targeting. An influential hypothesis posits that dopamine biases reinforcement learning. More recent data also suggest an influence during both consolidation and retrieval. Eighteen Parkinson's disease patients learned through feedback ON or OFF medication, with memory tested 24 hr later ON or OFF medication (4 conditions, within-subjects design with matched healthy control group). Patients OFF medication during learning decreased in memory accuracy over the following 24 hr. In contrast to previous studies, however, dopaminergic medication during learning and testing did not affect expression of positive or negative reinforcement. Two further experiments were run without the 24 hr delay, but they too failed to reproduce effects of dopaminergic medication on reinforcement learning. While supportive of a dopaminergic role in consolidation, this study failed to replicate previous findings on reinforcement learning.
Enhanced Experience Replay for Deep Reinforcement Learning
2015-11-01
ARL-TR-7538 ● NOV 2015 US Army Research Laboratory Enhanced Experience Replay for Deep Reinforcement Learning by David Doria...Experience Replay for Deep Reinforcement Learning by David Doria, Bryan Dawson, and Manuel Vindiola Computational and Information Sciences Directorate...
Towards a genetics-based adaptive agent to support flight testing
NASA Astrophysics Data System (ADS)
Cribbs, Henry Brown, III
Although the benefits of aircraft simulation have been known since the late 1960s, simulation almost always entails interaction with a human test pilot. This "pilot-in-the-loop" simulation process provides useful evaluative information to the aircraft designer and provides a training tool to the pilot. Emulation of a pilot during the early phases of the aircraft design process might provide designers a useful evaluative tool. Machine learning might emulate a pilot in a simulated aircraft/cockpit setting. Preliminary work in the application of machine learning techniques, such as reinforcement learning, to aircraft maneuvering have shown promise. These studies used simplified interfaces between machine learning agent and the aircraft simulation. The simulations employed low order equivalent system models. High-fidelity aircraft simulations exist, such as the simulations developed by NASA at its Dryden Flight Research Center. To expand the applicational domain of reinforcement learning to aircraft designs, this study presents a series of experiments that examine a reinforcement learning agent in the role of test pilot. The NASA X-31 and F-106 high-fidelity simulations provide realistic aircraft for the agent to maneuver. The approach of the study is to examine an agent possessing a genetic-based, artificial neural network to approximate long-term, expected cost (Bellman value) in a basic maneuvering task. The experiments evaluate different learning methods based on a common feedback function and an identical task. The learning methods evaluated are: Q-learning, Q(lambda)-learning, SARSA learning, and SARSA(lambda) learning. Experimental results indicate that, while prediction error remain quite high, similar, repeatable behaviors occur in both aircraft. Similar behavior exhibits portability of the agent between aircraft with different handling qualities (dynamics). Besides the adaptive behavior aspects of the study, the genetic algorithm used in the agent is shown to play an additive role in the shaping of the artificial neural network to the prediction task.
Working Memory Load Strengthens Reward Prediction Errors.
Collins, Anne G E; Ciullo, Brittany; Frank, Michael J; Badre, David
2017-04-19
Reinforcement learning (RL) in simple instrumental tasks is usually modeled as a monolithic process in which reward prediction errors (RPEs) are used to update expected values of choice options. This modeling ignores the different contributions of different memory and decision-making systems thought to contribute even to simple learning. In an fMRI experiment, we investigated how working memory (WM) and incremental RL processes interact to guide human learning. WM load was manipulated by varying the number of stimuli to be learned across blocks. Behavioral results and computational modeling confirmed that learning was best explained as a mixture of two mechanisms: a fast, capacity-limited, and delay-sensitive WM process together with slower RL. Model-based analysis of fMRI data showed that striatum and lateral prefrontal cortex were sensitive to RPE, as shown previously, but, critically, these signals were reduced when the learning problem was within capacity of WM. The degree of this neural interaction related to individual differences in the use of WM to guide behavioral learning. These results indicate that the two systems do not process information independently, but rather interact during learning. SIGNIFICANCE STATEMENT Reinforcement learning (RL) theory has been remarkably productive at improving our understanding of instrumental learning as well as dopaminergic and striatal network function across many mammalian species. However, this neural network is only one contributor to human learning and other mechanisms such as prefrontal cortex working memory also play a key role. Our results also show that these other players interact with the dopaminergic RL system, interfering with its key computation of reward prediction errors. Copyright © 2017 the authors 0270-6474/17/374332-11$15.00/0.
2016-01-01
Laboratory rats can exhibit marked, qualitative individual differences in the form of acquired behaviors. For example, when exposed to a signal-reinforcer relationship some rats show marked and consistent changes in sign-tracking (interacting with the signal; e.g., a lever) and others show marked and consistent changes in goal-tracking (interacting with the location of the predicted reinforcer; e.g., the food well). Here, stable individual differences in rats’ sign-tracking and goal-tracking emerged over the course of training, but these differences did not generalize across different signal-reinforcer relationships (Experiment 1). This selectivity suggests that individual differences in sign- and goal-tracking reflect differences in the value placed on individual reinforcers. Two findings provide direct support for this interpretation: the palatability of a reinforcer (as measured by an analysis of lick-cluster size) was positively correlated with goal-tracking (and negatively correlated with sign-tracking); and sating rats with a reinforcer affected goal-tracking but not sign-tracking (Experiment 2). These results indicate that the observed individual differences in sign- and goal-tracking behavior arise from the interaction between the palatability or value of the reinforcer and processes of association as opposed to dispositional differences (e.g., in sensory processes, “temperament,” or response repertoire). PMID:27732045
Training Program Development for Criminal Justice Agencies.
ERIC Educational Resources Information Center
Cheesebro, Deborah; Skinner, Gilbert H.
This manual is designed to assist in the development of a criminal justice agency training program. The first chapter is a discussion of various learning principles (motivation, practice, reinforcement, and learning transfer) and how they may help the trainer select instructional strategies later in the process. Administration, trainer, and…
Use of Inverse Reinforcement Learning for Identity Prediction
NASA Technical Reports Server (NTRS)
Hayes, Roy; Bao, Jonathan; Beling, Peter; Horowitz, Barry
2011-01-01
We adopt Markov Decision Processes (MDP) to model sequential decision problems, which have the characteristic that the current decision made by a human decision maker has an uncertain impact on future opportunity. We hypothesize that the individuality of decision makers can be modeled as differences in the reward function under a common MDP model. A machine learning technique, Inverse Reinforcement Learning (IRL), was used to learn an individual's reward function based on limited observation of his or her decision choices. This work serves as an initial investigation for using IRL to analyze decision making, conducted through a human experiment in a cyber shopping environment. Specifically, the ability to determine the demographic identity of users is conducted through prediction analysis and supervised learning. The results show that IRL can be used to correctly identify participants, at a rate of 68% for gender and 66% for one of three college major categories.
Behavioral and neural properties of social reinforcement learning
Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Libby, Victoria; Glover, Gary; Voss, Henning U.; Ballon, Douglas J.; Casey, BJ
2011-01-01
Social learning is critical for engaging in complex interactions with other individuals. Learning from positive social exchanges, such as acceptance from peers, may be similar to basic reinforcement learning. We formally test this hypothesis by developing a novel paradigm that is based upon work in non-human primates and human imaging studies of reinforcement learning. The probability of receiving positive social reinforcement from three distinct peers was parametrically manipulated while brain activity was recorded in healthy adults using event-related functional magnetic resonance imaging (fMRI). Over the course of the experiment, participants responded more quickly to faces of peers who provided more frequent positive social reinforcement, and rated them as more likeable. Modeling trial-by-trial learning showed ventral striatum and orbital frontal cortex activity correlated positively with forming expectations about receiving social reinforcement. Rostral anterior cingulate cortex activity tracked positively with modulations of expected value of the cues (peers). Together, the findings across three levels of analysis - social preferences, response latencies and modeling neural responses – are consistent with reinforcement learning theory and non-human primate electrophysiological studies of reward. This work highlights the fundamental influence of acceptance by one’s peers in altering subsequent behavior. PMID:21917787
Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning.
Fernandez-Gauna, Borja; Etxeberria-Agiriano, Ismael; Graña, Manuel
2015-01-01
Multi-Agent Reinforcement Learning (MARL) algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL) algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV) in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS) in the relevant state-action subspace. Each agent's local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS) control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.
Ilango, A; Wetzel, W; Scheich, H; Ohl, F W
2010-03-31
Learned changes in behavior can be elicited by either appetitive or aversive reinforcers. It is, however, not clear whether the two types of motivation, (approaching appetitive stimuli and avoiding aversive stimuli) drive learning in the same or different ways, nor is their interaction understood in situations where the two types are combined in a single experiment. To investigate this question we have developed a novel learning paradigm for Mongolian gerbils, which not only allows rewards and punishments to be presented in isolation or in combination with each other, but also can use these opposite reinforcers to drive the same learned behavior. Specifically, we studied learning of tone-conditioned hurdle crossing in a shuttle box driven by either an appetitive reinforcer (brain stimulation reward) or an aversive reinforcer (electrical footshock), or by a combination of both. Combination of the two reinforcers potentiated speed of acquisition, led to maximum possible performance, and delayed extinction as compared to either reinforcer alone. Additional experiments, using partial reinforcement protocols and experiments in which one of the reinforcers was omitted after the animals had been previously trained with the combination of both reinforcers, indicated that appetitive and aversive reinforcers operated together but acted in different ways: in this particular experimental context, punishment appeared to be more effective for initial acquisition and reward more effective to maintain a high level of conditioned responses (CRs). The results imply that learning mechanisms in problem solving were maximally effective when the initial punishment of mistakes was combined with the subsequent rewarding of correct performance. Copyright 2010 IBRO. Published by Elsevier Ltd. All rights reserved.
Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers
ERIC Educational Resources Information Center
Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.
2014-01-01
Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…
The cerebellum: a neural system for the study of reinforcement learning.
Swain, Rodney A; Kerr, Abigail L; Thompson, Richard F
2011-01-01
In its strictest application, the term "reinforcement learning" refers to a computational approach to learning in which an agent (often a machine) interacts with a mutable environment to maximize reward through trial and error. The approach borrows essentials from several fields, most notably Computer Science, Behavioral Neuroscience, and Psychology. At the most basic level, a neural system capable of mediating reinforcement learning must be able to acquire sensory information about the external environment and internal milieu (either directly or through connectivities with other brain regions), must be able to select a behavior to be executed, and must be capable of providing evaluative feedback about the success of that behavior. Given that Psychology informs us that reinforcers, both positive and negative, are stimuli or consequences that increase the probability that the immediately antecedent behavior will be repeated and that reinforcer strength or viability is modulated by the organism's past experience with the reinforcer, its affect, and even the state of its muscles (e.g., eyes open or closed); it is the case that any neural system that supports reinforcement learning must also be sensitive to these same considerations. Once learning is established, such a neural system must finally be able to maintain continued response expression and prevent response drift. In this report, we examine both historical and recent evidence that the cerebellum satisfies all of these requirements. While we report evidence from a variety of learning paradigms, the majority of our discussion will focus on classical conditioning of the rabbit eye blink response as an ideal model system for the study of reinforcement and reinforcement learning.
Martínez-Velázquez, Eduardo S; Ramos-Loyo, Julieta; González-Garrido, Andrés A; Sequeira, Henrique
2015-01-21
Feedback-related negativity (FRN) is a negative deflection that appears around 250 ms after the gain or loss of feedback to chosen alternatives in a gambling task in frontocentral regions following outcomes. Few studies have reported FRN enhancement in adolescents compared with adults in a gambling task without probabilistic reinforcement learning, despite the fact that learning from positive or negative consequences is crucial for decision-making during adolescence. Therefore, the aim of the present research was to identify differences in FRN amplitude and latency between adolescents and adults on a gambling task with favorable and unfavorable probabilistic reinforcement learning conditions, in addition to a nonlearning condition with monetary gains and losses. Higher rate scores of high-magnitude choices during the final 30 trials compared with the first 30 trials were observed during the favorable condition, whereas lower rates were observed during the unfavorable condition in both groups. Higher FRN amplitude in all conditions and longer latency in the nonlearning condition were observed in adolescents compared with adults and in relation to losses. Results indicate that both the adolescents and the adults improved their performance in relation to positive and negative feedback. However, the FRN findings suggest an increased sensitivity to external feedback to losses in adolescents compared with adults, irrespective of the presence or absence of probabilistic reinforcement learning. These results reflect processing differences on the neural monitoring system and provide new perspectives on the dynamic development of an adolescent's brain.
Silvetti, Massimo; Wiersema, Jan R; Sonuga-Barke, Edmund; Verguts, Tom
2013-10-01
Attention Deficit/Hyperactivity Disorder (ADHD) is a pathophysiologically complex and heterogeneous condition with both cognitive and motivational components. We propose a novel computational hypothesis of motivational deficits in ADHD, drawing together recent evidence on the role of anterior cingulate cortex (ACC) and associated mesolimbic dopamine circuits in both reinforcement learning and ADHD. Based on findings of dopamine dysregulation and ACC involvement in ADHD we simulated a lesion in a previously validated computational model of ACC (Reward Value and Prediction Model, RVPM). We explored the effects of the lesion on the processing of reinforcement signals. We tested specific behavioral predictions about the profile of reinforcement-related deficits in ADHD in three experimental contexts; probability tracking task, partial and continuous reward schedules, and immediate versus delayed rewards. In addition, predictions were made at the neurophysiological level. Behavioral and neurophysiological predictions from the RVPM-based lesion-model of motivational dysfunction in ADHD were confirmed by data from previously published studies. RVPM represents a promising model of ADHD reinforcement learning suggesting that ACC dysregulation might play a role in the pathogenesis of motivational deficits in ADHD. However, more behavioral and neurophysiological studies are required to test core predictions of the model. In addition, the interaction with different brain networks underpinning other aspects of ADHD neuropathology (i.e., executive function) needs to be better understood. Copyright © 2013 Elsevier Ltd. All rights reserved.
The role of GABAB receptors in human reinforcement learning.
Ort, Andres; Kometer, Michael; Rohde, Judith; Seifritz, Erich; Vollenweider, Franz X
2014-10-01
Behavioral evidence from human studies suggests that the γ-aminobutyric acid type B receptor (GABAB receptor) agonist baclofen modulates reinforcement learning and reduces craving in patients with addiction spectrum disorders. However, in contrast to the well established role of dopamine in reinforcement learning, the mechanisms by which the GABAB receptor influences reinforcement learning in humans remain completely unknown. To further elucidate this issue, a cross-over, double-blind, placebo-controlled study was performed in healthy human subjects (N=15) to test the effects of baclofen (20 and 50mg p.o.) on probabilistic reinforcement learning. Outcomes were the feedback-induced P2 component of the event-related potential, the feedback-related negativity, and the P300 component of the event-related potential. Baclofen produced a reduction of P2 amplitude over the course of the experiment, but did not modulate the feedback-related negativity. Furthermore, there was a trend towards increased learning after baclofen administration relative to placebo over the course of the experiment. The present results extend previous theories of reinforcement learning, which focus on the importance of mesolimbic dopamine signaling, and indicate that stimulation of cortical GABAB receptors in a fronto-parietal network leads to better attentional allocation in reinforcement learning. This observation is a first step in our understanding of how baclofen may improve reinforcement learning in healthy subjects. Further studies with bigger sample sizes are needed to corroborate this conclusion and furthermore, test this effect in patients with addiction spectrum disorder. Copyright © 2014 Elsevier B.V. and ECNP. All rights reserved.
Effects of dopamine on reinforcement learning and consolidation in Parkinson’s disease
Grogan, John P; Tsivos, Demitra; Smith, Laura; Knight, Brogan E; Bogacz, Rafal; Whone, Alan; Coulthard, Elizabeth J
2017-01-01
Emerging evidence suggests that dopamine may modulate learning and memory with important implications for understanding the neurobiology of memory and future therapeutic targeting. An influential hypothesis posits that dopamine biases reinforcement learning. More recent data also suggest an influence during both consolidation and retrieval. Eighteen Parkinson’s disease patients learned through feedback ON or OFF medication, with memory tested 24 hr later ON or OFF medication (4 conditions, within-subjects design with matched healthy control group). Patients OFF medication during learning decreased in memory accuracy over the following 24 hr. In contrast to previous studies, however, dopaminergic medication during learning and testing did not affect expression of positive or negative reinforcement. Two further experiments were run without the 24 hr delay, but they too failed to reproduce effects of dopaminergic medication on reinforcement learning. While supportive of a dopaminergic role in consolidation, this study failed to replicate previous findings on reinforcement learning. DOI: http://dx.doi.org/10.7554/eLife.26801.001 PMID:28691905
Fear of losing money? Aversive conditioning with secondary reinforcers.
Delgado, M R; Labouliere, C D; Phelps, E A
2006-12-01
Money is a secondary reinforcer that acquires its value through social communication and interaction. In everyday human behavior and laboratory studies, money has been shown to influence appetitive or reward learning. It is unclear, however, if money has a similar impact on aversive learning. The goal of this study was to investigate the efficacy of money in aversive learning, comparing it with primary reinforcers that are traditionally used in fear conditioning paradigms. A series of experiments were conducted in which participants initially played a gambling game that led to a monetary gain. They were then presented with an aversive conditioning paradigm, with either shock (primary reinforcer) or loss of money (secondary reinforcer) as the unconditioned stimulus. Skin conductance responses and subjective ratings indicated that potential monetary loss modulated the conditioned response. Depending on the presentation context, the secondary reinforcer was as effective as the primary reinforcer during aversive conditioning. These results suggest that stimuli that acquire reinforcing properties through social communication and interaction, such as money, can effectively influence aversive learning.
van den Bos, Wouter; Cohen, Michael X; Kahnt, Thorsten; Crone, Eveline A
2012-06-01
During development, children improve in learning from feedback to adapt their behavior. However, it is still unclear which neural mechanisms might underlie these developmental changes. In the current study, we used a reinforcement learning model to investigate neurodevelopmental changes in the representation and processing of learning signals. Sixty-seven healthy volunteers between ages 8 and 22 (children: 8-11 years, adolescents: 13-16 years, and adults: 18-22 years) performed a probabilistic learning task while in a magnetic resonance imaging scanner. The behavioral data demonstrated age differences in learning parameters with a stronger impact of negative feedback on expected value in children. Imaging data revealed that the neural representation of prediction errors was similar across age groups, but functional connectivity between the ventral striatum and the medial prefrontal cortex changed as a function of age. Furthermore, the connectivity strength predicted the tendency to alter expectations after receiving negative feedback. These findings suggest that the underlying mechanisms of developmental changes in learning are not related to differences in the neural representation of learning signals per se but rather in how learning signals are used to guide behavior and expectations.
On the integration of reinforcement learning and approximate reasoning for control
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1991-01-01
The author discusses the importance of strengthening the knowledge representation characteristic of reinforcement learning techniques using methods such as approximate reasoning. The ARIC (approximate reasoning-based intelligent control) architecture is an example of such a hybrid approach in which the fuzzy control rules are modified (fine-tuned) using reinforcement learning. ARIC also demonstrates that it is possible to start with an approximately correct control knowledge base and learn to refine this knowledge through further experience. On the other hand, techniques such as the TD (temporal difference) algorithm and Q-learning establish stronger theoretical foundations for their use in adaptive control and also in stability analysis of hybrid reinforcement learning and approximate reasoning-based controllers.
Pavlovian to instrumental transfer of control in a human learning task.
Nadler, Natasha; Delgado, Mauricio R; Delamater, Andrew R
2011-10-01
Pavlovian learning tasks have been widely used as tools to understand basic cognitive and emotional processes in humans. The present studies investigated one particular task, Pavlovian-to-instrumental transfer (PIT), with human participants in an effort to examine potential cognitive and emotional effects of Pavlovian cues upon instrumentally trained performance. In two experiments, subjects first learned two separate instrumental response-outcome relationships (i.e., R1-O1 and R2-O2) and then were exposed to various stimulus-outcome relationships (i.e., S1-O1, S2-O2, S3-O3, and S4-) before the effects of the Pavlovian stimuli on instrumental responding were assessed during a non-reinforced test. In Experiment 1, instrumental responding was established using a positive-reinforcement procedure, whereas in Experiment 2, a quasi-avoidance learning task was used. In both cases, the Pavlovian stimuli exerted selective control over instrumental responding, whereby S1 and S2 selectively elevated the instrumental response with which it shared an outcome. In addition, in Experiment 2, S3 exerted a nonselective transfer of control effect, whereby both responses were elevated over baseline levels. These data identify two ways, one specific and one general, in which Pavlovian processes can exert control over instrumental responding in human learning paradigms, suggesting that this method may serve as a useful tool in the study of basic cognitive and emotional processes in human learning.
Intelligence moderates reinforcement learning: a mini-review of the neural evidence
2014-01-01
Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. PMID:25185818
Intelligence moderates reinforcement learning: a mini-review of the neural evidence.
Chen, Chong
2015-06-01
Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. Copyright © 2015 the American Physiological Society.
The Effect of Subliminal HELP Presentations on Learning a Text Editor.
ERIC Educational Resources Information Center
Wallace, F. Layne; And Others
1991-01-01
Discussion of subliminal stimuli focuses on a study of undergraduates that was conducted to determine the feasibility of presenting subliminal information in a passive manner to reinforce the learning process involved in computer-based text editing. The Texas Instruments microcomputers used in the study are described, and further research is…
The Art of Learning: A Guide to Outstanding North Carolina Arts in Education Programs.
ERIC Educational Resources Information Center
Herman, Miriam L.
The Arts in Education programs delineated in this guide complement the rigorous arts curriculum taught by arts specialists in North Carolina schools and enable students to experience the joy of the creative process while reinforcing learning in other curricula: language arts, mathematics, social studies, science, and physical education. Programs…
Reinforcement Learning Based Web Service Compositions for Mobile Business
NASA Astrophysics Data System (ADS)
Zhou, Juan; Chen, Shouming
In this paper, we propose a new solution to Reactive Web Service Composition, via molding with Reinforcement Learning, and introducing modified (alterable) QoS variables into the model as elements in the Markov Decision Process tuple. Moreover, we give an example of Reactive-WSC-based mobile banking, to demonstrate the intrinsic capability of the solution in question of obtaining the optimized service composition, characterized by (alterable) target QoS variable sets with optimized values. Consequently, we come to the conclusion that the solution has decent potentials in boosting customer experiences and qualities of services in Web Services, and those in applications in the whole electronic commerce and business sector.
Reinforcement learning in complementarity game and population dynamics
NASA Astrophysics Data System (ADS)
Jost, Jürgen; Li, Wei
2014-02-01
We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005), 10.1016/j.physa.2004.07.005] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.
Deep Reinforcement Learning of Cell Movement in the Early Stage of C. elegans Embryogenesis.
Wang, Zi; Wang, Dali; Li, Chengcheng; Xu, Yichi; Li, Husheng; Bao, Zhirong
2018-04-25
Cell movement in the early phase of C. elegans development is regulated by a highly complex process in which a set of rules and connections are formulated at distinct scales. Previous efforts have demonstrated that agent-based, multi-scale modeling systems can integrate physical and biological rules and provide new avenues to study developmental systems. However, the application of these systems to model cell movement is still challenging and requires a comprehensive understanding of regulatory networks at the right scales. Recent developments in deep learning and reinforcement learning provide an unprecedented opportunity to explore cell movement using 3D time-lapse microscopy images. We present a deep reinforcement learning approach within an agent-based modeling system to characterize cell movement in the embryonic development of C. elegans. Our modeling system captures the complexity of cell movement patterns in the embryo and overcomes the local optimization problem encountered by traditional rule-based, agent-based modeling that uses greedy algorithms. We tested our model with two real developmental processes: the anterior movement of the Cpaaa cell via intercalation and the rearrangement of the superficial left-right asymmetry. In the first case, the model results suggested that Cpaaa's intercalation is an active directional cell movement caused by the continuous effects from a longer distance (farther than the length of two adjacent cells), as opposed to a passive movement caused by neighbor cell movements. In the second case, a leader-follower mechanism well explained the collective cell movement pattern in the asymmetry rearrangement. These results showed that our approach to introduce deep reinforcement learning into agent-based modeling can test regulatory mechanisms by exploring cell migration paths in a reverse engineering perspective. This model opens new doors to explore the large datasets generated by live imaging. Source code is available at https://github.com/zwang84/drl4cellmovement. dwang7@utk.edu, baoz@mskcc.org. Supplementary data are available at Bioinformatics online.
Measuring the emergence of tobacco dependence: the contribution of negative reinforcement models.
Eissenberg, Thomas
2004-06-01
This review of negative reinforcement models of drug dependence is part of a series that takes the position that a complete understanding of current concepts of dependence will facilitate the development of reliable and valid measures of the emergence of tobacco dependence. Other reviews within the series consider models that emphasize positive reinforcement and social learning/cognitive models. This review summarizes negative reinforcement in general and then presents four current negative reinforcement models that emphasize withdrawal, classical conditioning, self-medication and opponent-processes. For each model, the paper outlines central aspects of dependence, conceptualization of dependence development and influences that the model might have on current and future measures of dependence. Understanding how drug dependence develops will be an important part of future successful tobacco dependence measurement, prevention and treatment strategies.
The primate amygdala represents the positive and negative value of visual stimuli during learning
Paton, Joseph J.; Belova, Marina A.; Morrison, Sara E.; Salzman, C. Daniel
2008-01-01
Visual stimuli can acquire positive or negative value through their association with rewards and punishments, a process called reinforcement learning. Although we now know a great deal about how the brain analyses visual information, we know little about how visual representations become linked with values. To study this process, we turned to the amygdala, a brain structure implicated in reinforcement learning1–5. We recorded the activity of individual amygdala neurons in monkeys while abstract images acquired either positive or negative value through conditioning. After monkeys had learned the initial associations, we reversed image value assignments. We examined neural responses in relation to these reversals in order to estimate the relative contribution to neural activity of the sensory properties of images and their conditioned values. Here we show that changes in the values of images modulate neural activity, and that this modulation occurs rapidly enough to account for, and correlates with, monkeys’ learning. Furthermore, distinct populations of neurons encode the positive and negative values of visual stimuli. Behavioural and physiological responses to visual stimuli may therefore be based in part on the plastic representation of value provided by the amygdala. PMID:16482160
The prefrontal cortex and hybrid learning during iterative competitive games.
Abe, Hiroshi; Seo, Hyojung; Lee, Daeyeol
2011-12-01
Behavioral changes driven by reinforcement and punishment are referred to as simple or model-free reinforcement learning. Animals can also change their behaviors by observing events that are neither appetitive nor aversive when these events provide new information about payoffs available from alternative actions. This is an example of model-based reinforcement learning and can be accomplished by incorporating hypothetical reward signals into the value functions for specific actions. Recent neuroimaging and single-neuron recording studies showed that the prefrontal cortex and the striatum are involved not only in reinforcement and punishment, but also in model-based reinforcement learning. We found evidence for both types of learning, and hence hybrid learning, in monkeys during simulated competitive games. In addition, in both the dorsolateral prefrontal cortex and orbitofrontal cortex, individual neurons heterogeneously encoded signals related to actual and hypothetical outcomes from specific actions, suggesting that both areas might contribute to hybrid learning. © 2011 New York Academy of Sciences.
Deserno, Lorenz; Boehme, Rebecca; Heinz, Andreas; Schlagenhauf, Florian
2013-01-01
Abnormalities in reinforcement learning are a key finding in schizophrenia and have been proposed to be linked to elevated levels of dopamine neurotransmission. Behavioral deficits in reinforcement learning and their neural correlates may contribute to the formation of clinical characteristics of schizophrenia. The ability to form predictions about future outcomes is fundamental for environmental interactions and depends on neuronal teaching signals, like reward prediction errors. While aberrant prediction errors, that encode non-salient events as surprising, have been proposed to contribute to the formation of positive symptoms, a failure to build neural representations of decision values may result in negative symptoms. Here, we review behavioral and neuroimaging research in schizophrenia and focus on studies that implemented reinforcement learning models. In addition, we discuss studies that combined reinforcement learning with measures of dopamine. Thereby, we suggest how reinforcement learning abnormalities in schizophrenia may contribute to the formation of psychotic symptoms and may interact with cognitive deficits. These ideas point toward an interplay of more rigid versus flexible control over reinforcement learning. Pronounced deficits in the flexible or model-based domain may allow for a detailed characterization of well-established cognitive deficits in schizophrenia patients based on computational models of learning. Finally, we propose a framework based on the potentially crucial contribution of dopamine to dysfunctional reinforcement learning on the level of neural networks. Future research may strongly benefit from computational modeling but also requires further methodological improvement for clinical group studies. These research tools may help to improve our understanding of disease-specific mechanisms and may help to identify clinically relevant subgroups of the heterogeneous entity schizophrenia. PMID:24391603
Gr-GDHP: A New Architecture for Globalized Dual Heuristic Dynamic Programming.
Zhong, Xiangnan; Ni, Zhen; He, Haibo
2017-10-01
Goal representation globalized dual heuristic dynamic programming (Gr-GDHP) method is proposed in this paper. A goal neural network is integrated into the traditional GDHP method providing an internal reinforcement signal and its derivatives to help the control and learning process. From the proposed architecture, it is shown that the obtained internal reinforcement signal and its derivatives can be able to adjust themselves online over time rather than a fixed or predefined function in literature. Furthermore, the obtained derivatives can directly contribute to the objective function of the critic network, whose learning process is thus simplified. Numerical simulation studies are applied to show the performance of the proposed Gr-GDHP method and compare the results with other existing adaptive dynamic programming designs. We also investigate this method on a ball-and-beam balancing system. The statistical simulation results are presented for both the Gr-GDHP and the GDHP methods to demonstrate the improved learning and controlling performance.
Neural computations underlying inverse reinforcement learning in the human brain
Pauli, Wolfgang M; Bossaerts, Peter; O'Doherty, John
2017-01-01
In inverse reinforcement learning an observer infers the reward distribution available for actions in the environment solely through observing the actions implemented by another agent. To address whether this computational process is implemented in the human brain, participants underwent fMRI while learning about slot machines yielding hidden preferred and non-preferred food outcomes with varying probabilities, through observing the repeated slot choices of agents with similar and dissimilar food preferences. Using formal model comparison, we found that participants implemented inverse RL as opposed to a simple imitation strategy, in which the actions of the other agent are copied instead of inferring the underlying reward structure of the decision problem. Our computational fMRI analysis revealed that anterior dorsomedial prefrontal cortex encoded inferences about action-values within the value space of the agent as opposed to that of the observer, demonstrating that inverse RL is an abstract cognitive process divorceable from the values and concerns of the observer him/herself. PMID:29083301
Switching Reinforcement Learning for Continuous Action Space
NASA Astrophysics Data System (ADS)
Nagayoshi, Masato; Murao, Hajime; Tamaki, Hisashi
Reinforcement Learning (RL) attracts much attention as a technique of realizing computational intelligence such as adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. This difficulty includes a problem of designing a suitable action space of an agent, i.e., satisfying two requirements in trade-off: (i) to keep the characteristics (or structure) of an original search space as much as possible in order to seek strategies that lie close to the optimal, and (ii) to reduce the search space as much as possible in order to expedite the learning process. In order to design a suitable action space adaptively, we propose switching RL model to mimic a process of an infant's motor development in which gross motor skills develop before fine motor skills. Then, a method for switching controllers is constructed by introducing and referring to the “entropy”. Further, through computational experiments by using robot navigation problems with one and two-dimensional continuous action space, the validity of the proposed method has been confirmed.
Keleta, Yonas B; Martinez, Joe L
2012-03-01
The reinforcing effects of addictive drugs including methamphetamine (METH) involve the midbrain ventral tegmental area (VTA). VTA is primary source of dopamine (DA) to the nucleus accumbens (NAc) and the ventral hippocampus (VHC). These three brain regions are functionally connected through the hippocampal-VTA loop that includes two main neural pathways: the bottom-up pathway and the top-down pathway. In this paper, we take the view that addiction is a learning process. Therefore, we tested the involvement of the hippocampus in reinforcement learning by studying conditioned place preference (CPP) learning by sequentially conditioning each of the three nuclei in either the bottom-up order of conditioning; VTA, then VHC, finally NAc, or the top-down order; VHC, then VTA, finally NAc. Following habituation, the rats underwent experimental modules consisting of two conditioning trials each followed by immediate testing (test 1 and test 2) and two additional tests 24 h (test 3) and/or 1 week following conditioning (test 4). The module was repeated three times for each nucleus. The results showed that METH, but not Ringer's, produced positive CPP following conditioning each brain area in the bottom-up order. In the top-down order, METH, but not Ringer's, produced either an aversive CPP or no learning effect following conditioning each nucleus of interest. In addition, METH place aversion was antagonized by coadministration of the N-methyl-d-aspartate (NMDA) receptor antagonist MK801, suggesting that the aversion learning was an NMDA receptor activation-dependent process. We conclude that the hippocampus is a critical structure in the reward circuit and hence suggest that the development of target-specific therapeutics for the control of addiction emphasizes on the hippocampus-VTA top-down connection.
Generalization of value in reinforcement learning by humans.
Wimmer, G Elliott; Daw, Nathaniel D; Shohamy, Daphna
2012-04-01
Research in decision-making has focused on the role of dopamine and its striatal targets in guiding choices via learned stimulus-reward or stimulus-response associations, behavior that is well described by reinforcement learning theories. However, basic reinforcement learning is relatively limited in scope and does not explain how learning about stimulus regularities or relations may guide decision-making. A candidate mechanism for this type of learning comes from the domain of memory, which has highlighted a role for the hippocampus in learning of stimulus-stimulus relations, typically dissociated from the role of the striatum in stimulus-response learning. Here, we used functional magnetic resonance imaging and computational model-based analyses to examine the joint contributions of these mechanisms to reinforcement learning. Humans performed a reinforcement learning task with added relational structure, modeled after tasks used to isolate hippocampal contributions to memory. On each trial participants chose one of four options, but the reward probabilities for pairs of options were correlated across trials. This (uninstructed) relationship between pairs of options potentially enabled an observer to learn about option values based on experience with the other options and to generalize across them. We observed blood oxygen level-dependent (BOLD) activity related to learning in the striatum and also in the hippocampus. By comparing a basic reinforcement learning model to one augmented to allow feedback to generalize between correlated options, we tested whether choice behavior and BOLD activity were influenced by the opportunity to generalize across correlated options. Although such generalization goes beyond standard computational accounts of reinforcement learning and striatal BOLD, both choices and striatal BOLD activity were better explained by the augmented model. Consistent with the hypothesized role for the hippocampus in this generalization, functional connectivity between the ventral striatum and hippocampus was modulated, across participants, by the ability of the augmented model to capture participants' choice. Our results thus point toward an interactive model in which striatal reinforcement learning systems may employ relational representations typically associated with the hippocampus. © 2012 The Authors. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.
Zhu, Lusha; Mathewson, Kyle E.; Hsu, Ming
2012-01-01
Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents’ beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs. PMID:22307594
Zhu, Lusha; Mathewson, Kyle E; Hsu, Ming
2012-01-31
Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents' beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs.
Schad, Daniel J.; Jünger, Elisabeth; Sebold, Miriam; Garbusow, Maria; Bernhardt, Nadine; Javadi, Amir-Homayoun; Zimmermann, Ulrich S.; Smolka, Michael N.; Heinz, Andreas; Rapp, Michael A.; Huys, Quentin J. M.
2014-01-01
Theories of decision-making and its neural substrates have long assumed the existence of two distinct and competing valuation systems, variously described as goal-directed vs. habitual, or, more recently and based on statistical arguments, as model-free vs. model-based reinforcement-learning. Though both have been shown to control choices, the cognitive abilities associated with these systems are under ongoing investigation. Here we examine the link to cognitive abilities, and find that individual differences in processing speed covary with a shift from model-free to model-based choice control in the presence of above-average working memory function. This suggests shared cognitive and neural processes; provides a bridge between literatures on intelligence and valuation; and may guide the development of process models of different valuation components. Furthermore, it provides a rationale for individual differences in the tendency to deploy valuation systems, which may be important for understanding the manifold neuropsychiatric diseases associated with malfunctions of valuation. PMID:25566131
Mitchell, D G V; Fine, C; Richell, R A; Newman, C; Lumsden, J; Blair, K S; Blair, R J R
2006-05-01
Previous work has shown that individuals with psychopathy are impaired on some forms of associative learning, particularly stimulus-reinforcement learning (Blair et al., 2004; Newman & Kosson, 1986). Animal work suggests that the acquisition of stimulus-reinforcement associations requires the amygdala (Baxter & Murray, 2002). Individuals with psychopathy also show impoverished reversal learning (Mitchell, Colledge, Leonard, & Blair, 2002). Reversal learning is supported by the ventrolateral and orbitofrontal cortex (Rolls, 2004). In this paper we present experiments investigating stimulus-reinforcement learning and relearning in patients with lesions of the orbitofrontal cortex or amygdala, and individuals with developmental psychopathy without known trauma. The results are interpreted with reference to current neurocognitive models of stimulus-reinforcement learning, relearning, and developmental psychopathy. Copyright (c) 2006 APA, all rights reserved.
Model-based reinforcement learning with dimension reduction.
Tangkaratt, Voot; Morimoto, Jun; Sugiyama, Masashi
2016-12-01
The goal of reinforcement learning is to learn an optimal policy which controls an agent to acquire the maximum cumulative reward. The model-based reinforcement learning approach learns a transition model of the environment from data, and then derives the optimal policy using the transition model. However, learning an accurate transition model in high-dimensional environments requires a large amount of data which is difficult to obtain. To overcome this difficulty, in this paper, we propose to combine model-based reinforcement learning with the recently developed least-squares conditional entropy (LSCE) method, which simultaneously performs transition model estimation and dimension reduction. We also further extend the proposed method to imitation learning scenarios. The experimental results show that policy search combined with LSCE performs well for high-dimensional control tasks including real humanoid robot control. Copyright © 2016 Elsevier Ltd. All rights reserved.
Reinforcement of Science Learning through Local Culture: A Delphi Study
ERIC Educational Resources Information Center
Nuangchalerm, Prasart
2008-01-01
This study aims to explore the ways to reinforce science learning through local culture by using Delphi technique. Twenty four participants in various fields of study were selected. The result of study provides a framework for reinforcement of science learning through local culture on the theme life and environment. (Contains 1 table.)
Ahead of the game: the use of gaming to enhance knowledge of psychopharmacology.
Beek, Terra S; Boone, Cheryl; Hubbard, Grace
2014-12-01
Experiential teaching strategies have the potential to more effectively help students with critical thinking than traditional lecture formats. Gaming is an experiential teaching-learning strategy that reinforces teamwork, interaction, and enjoyment and introduces the element of play. Two Bachelor of Science in Nursing students and a clinical instructor created a Jeopardy!(®)-style game to enhance understanding of psychopharmacology, foster student engagement in the learning process, and promote student enjoyment during clinical postconference. The current article evaluates the utility, relevance, and effectiveness of gaming using a Jeopardy!(®)-style format for the psychiatric clinical setting. Students identified the strengths of this learning activity as increased awareness of knowledge deficits, as well as the reinforcement of existing knowledge and the value of teamwork. Copyright 2014, SLACK Incorporated.
Murakoshi, Kazushi; Mizuno, Junya
2004-11-01
In order to rapidly follow unexpected environmental changes, we propose a parameter control method in reinforcement learning that changes each of learning parameters in appropriate directions. We determine each appropriate direction on the basis of relationships between behaviors and neuromodulators by considering an emergency as a key word. Computer experiments show that the agents using our proposed method could rapidly respond to unexpected environmental changes, not depending on either two reinforcement learning algorithms (Q-learning and actor-critic (AC) architecture) or two learning problems (discontinuous and continuous state-action problems).
Schiffino, Felipe L; Zhou, Vivian; Holland, Peter C
2014-02-01
Within most contemporary learning theories, reinforcement prediction error, the difference between the obtained and expected reinforcer value, critically influences associative learning. In some theories, this prediction error determines the momentary effectiveness of the reinforcer itself, such that the same physical event produces more learning when its presentation is surprising than when it is expected. In other theories, prediction error enhances attention to potential cues for that reinforcer by adjusting cue-specific associability parameters, biasing the processing of those stimuli so that they more readily enter into new associations in the future. A unique feature of these latter theories is that such alterations in stimulus associability must be represented in memory in an enduring fashion. Indeed, considerable data indicate that altered associability may be expressed days after its induction. Previous research from our laboratory identified brain circuit elements critical to the enhancement of stimulus associability by the omission of an expected event, and to the subsequent expression of that altered associability in more rapid learning. Here, for the first time, we identified a brain region, the posterior parietal cortex, as a potential site for a memorial representation of altered stimulus associability. In three experiments using rats and a serial prediction task, we found that intact posterior parietal cortex function was essential during the encoding, consolidation, and retrieval of an associability memory enhanced by surprising omissions. We discuss these new results in the context of our previous findings and additional plausible frontoparietal and subcortical networks. © 2013 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Partial Planning Reinforcement Learning
2012-08-31
Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 15. SUBJECT TERMS Reinforcement Learning, Bayesian Optimization, Active ... Learning , Action Model Learning, Decision Theoretic Assistance Prasad Tadepalli, Alan Fern Oregon State University Office of Sponsored Programs Oregon State
Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing
Lefebvre, Germain; Blakemore, Sarah-Jayne
2017-01-01
Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice. PMID:28800597
Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing.
Palminteri, Stefano; Lefebvre, Germain; Kilford, Emma J; Blakemore, Sarah-Jayne
2017-08-01
Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice.
Reinforcement learning in supply chains.
Valluri, Annapurna; North, Michael J; Macal, Charles M
2009-10-01
Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice.
Action-Driven Visual Object Tracking With Deep Reinforcement Learning.
Yun, Sangdoo; Choi, Jongwon; Yoo, Youngjoon; Yun, Kimin; Choi, Jin Young
2018-06-01
In this paper, we propose an efficient visual tracker, which directly captures a bounding box containing the target object in a video by means of sequential actions learned using deep neural networks. The proposed deep neural network to control tracking actions is pretrained using various training video sequences and fine-tuned during actual tracking for online adaptation to a change of target and background. The pretraining is done by utilizing deep reinforcement learning (RL) as well as supervised learning. The use of RL enables even partially labeled data to be successfully utilized for semisupervised learning. Through the evaluation of the object tracking benchmark data set, the proposed tracker is validated to achieve a competitive performance at three times the speed of existing deep network-based trackers. The fast version of the proposed method, which operates in real time on graphics processing unit, outperforms the state-of-the-art real-time trackers with an accuracy improvement of more than 8%.
Kernel Temporal Differences for Neural Decoding
Bae, Jihye; Sanchez Giraldo, Luis G.; Pohlmeyer, Eric A.; Francis, Joseph T.; Sanchez, Justin C.; Príncipe, José C.
2015-01-01
We study the feasibility and capability of the kernel temporal difference (KTD)(λ) algorithm for neural decoding. KTD(λ) is an online, kernel-based learning algorithm, which has been introduced to estimate value functions in reinforcement learning. This algorithm combines kernel-based representations with the temporal difference approach to learning. One of our key observations is that by using strictly positive definite kernels, algorithm's convergence can be guaranteed for policy evaluation. The algorithm's nonlinear functional approximation capabilities are shown in both simulations of policy evaluation and neural decoding problems (policy improvement). KTD can handle high-dimensional neural states containing spatial-temporal information at a reasonable computational complexity allowing real-time applications. When the algorithm seeks a proper mapping between a monkey's neural states and desired positions of a computer cursor or a robot arm, in both open-loop and closed-loop experiments, it can effectively learn the neural state to action mapping. Finally, a visualization of the coadaptation process between the decoder and the subject shows the algorithm's capabilities in reinforcement learning brain machine interfaces. PMID:25866504
Effect of reinforcement learning on coordination of multiangent systems
NASA Astrophysics Data System (ADS)
Bukkapatnam, Satish T. S.; Gao, Greg
2000-12-01
For effective coordination of distributed environments involving multiagent systems, learning ability of each agent in the environment plays a crucial role. In this paper, we develop a simple group learning method based on reinforcement, and study its effect on coordination through application to a supply chain procurement scenario involving a computer manufacturer. Here, all parties are represented by self-interested, autonomous agents, each capable of performing specific simple tasks. They negotiate with each other to perform complex tasks and thus coordinate supply chain procurement. Reinforcement learning is intended to enable each agent to reach a best negotiable price within a shortest possible time. Our simulations of the application scenario under different learning strategies reveals the positive effects of reinforcement learning on an agent's as well as the system's performance.
Otto, A Ross; Gershman, Samuel J; Markman, Arthur B; Daw, Nathaniel D
2013-05-01
A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls behavior-and under what circumstances-are still unclear. Following the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrated that having human decision makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement-learning strategy. Further, we showed that, across trials, people negotiate the trade-off between the two systems dynamically as a function of concurrent executive-function demands, and people's choice latencies reflect the computational expenses of the strategy they employ. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.
Otto, A. Ross; Gershman, Samuel J.; Markman, Arthur B.; Daw, Nathaniel D.
2013-01-01
A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. Along these lines, a flexible but computationally expensive model-based reinforcement learning system has been contrasted with a less flexible but more efficient model-free reinforcement learning system. The factors governing which system controls behavior—and under what circumstances—are still unclear. Based on the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrate that having human decision-makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement learning strategy. Further, we show that across trials, people negotiate this tradeoff dynamically as a function of concurrent executive function demands and their choice latencies reflect the computational expenses of the strategy employed. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources. PMID:23558545
Learning alternative movement coordination patterns using reinforcement feedback.
Lin, Tzu-Hsiang; Denomme, Amber; Ranganathan, Rajiv
2018-05-01
One of the characteristic features of the human motor system is redundancy-i.e., the ability to achieve a given task outcome using multiple coordination patterns. However, once participants settle on using a specific coordination pattern, the process of learning to use a new alternative coordination pattern to perform the same task is still poorly understood. Here, using two experiments, we examined this process of how participants shift from one coordination pattern to another using different reinforcement schedules. Participants performed a virtual reaching task, where they moved a cursor to different targets positioned on the screen. Our goal was to make participants use a coordination pattern with greater trunk motion, and to this end, we provided reinforcement by making the cursor disappear if the trunk motion during the reach did not cross a specified threshold value. In Experiment 1, we compared two reinforcement schedules in two groups of participants-an abrupt group, where the threshold was introduced immediately at the beginning of practice; and a gradual group, where the threshold was introduced gradually with practice. Results showed that both abrupt and gradual groups were effective in shifting their coordination patterns to involve greater trunk motion, but the abrupt group showed greater retention when the reinforcement was removed. In Experiment 2, we examined the basis of this advantage in the abrupt group using two additional control groups. Results showed that the advantage of the abrupt group was because of a greater number of practice trials with the desired coordination pattern. Overall, these results show that reinforcement can be successfully used to shift coordination patterns, which has potential in the rehabilitation of movement disorders.
An architecture for designing fuzzy logic controllers using neural networks
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1991-01-01
Described here is an architecture for designing fuzzy controllers through a hierarchical process of control rule acquisition and by using special classes of neural network learning techniques. A new method for learning to refine a fuzzy logic controller is introduced. A reinforcement learning technique is used in conjunction with a multi-layer neural network model of a fuzzy controller. The model learns by updating its prediction of the plant's behavior and is related to the Sutton's Temporal Difference (TD) method. The method proposed here has the advantage of using the control knowledge of an experienced operator and fine-tuning it through the process of learning. The approach is applied to a cart-pole balancing system.
Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules
La Camera, Giancarlo; Richmond, Barry J.
2008-01-01
It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as “schedule length effect”). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: “framing,” wherein equivalent options are treated differently depending on the context in which they are presented, and the “sunk cost” effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys. PMID:18688266
Modeling the violation of reward maximization and invariance in reinforcement schedules.
La Camera, Giancarlo; Richmond, Barry J
2008-08-08
It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as "schedule length effect"). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: "framing," wherein equivalent options are treated differently depending on the context in which they are presented, and the "sunk cost" effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys.
Sclafani, Anthony; Ackroff, Karen
2015-01-01
Intragastric (IG) flavor conditioning studies in rodents indicate that isocaloric sugar infusions differ in their reinforcing actions, with glucose and sucrose more potent than fructose. Here we determined if the sugars also differ in their ability to maintain operant self-administration by licking an empty spout for IG infusions. Food-restricted C57BL/6J mice were trained 1 h/day to lick a food-baited spout, which triggered IG infusions of 16% sucrose. In testing, the mice licked an empty spout, which triggered IG infusions of different sugars. Mice shifted from sucrose to 16% glucose increased dry licking, whereas mice shifted to 16% fructose rapidly reduced licking to low levels. Other mice shifted from sucrose to IG water reduced licking more slowly but reached the same low levels. Thus IG fructose, like water, is not reinforcing to hungry mice. The more rapid decline in licking induced by fructose may be due to the sugar's satiating effects. Further tests revealed that the Glucose mice increased their dry licking when shifted from 16% to 8% glucose, and reduced their dry licking when shifted to 32% glucose. This may reflect caloric regulation and/or differences in satiation. The Glucose mice did not maintain caloric intake when tested with different sugars. They self-infused less sugar when shifted from 16% glucose to 16% sucrose, and even more so when shifted to 16% fructose. Reduced sucrose self-administration may occur because the fructose component of the disaccharide reduces its reinforcing potency. FVB mice also reduced operant licking when tested with 16% fructose, yet learned to prefer a flavor paired with IG fructose. These data indicate that sugars differ substantially in their ability to support IG self-administration and flavor preference learning. The same post-oral reinforcement process appears to mediate operant licking and flavor learning, although flavor learning provides a more sensitive measure of sugar reinforcement. PMID:26485294
Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.
Ren, Zhipeng; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Zhipeng Ren; Daoyi Dong; Huaxiong Li; Chunlin Chen; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Ren, Zhipeng
2018-06-01
In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.
Efficient model learning methods for actor-critic control.
Grondman, Ivo; Vaandrager, Maarten; Buşoniu, Lucian; Babuska, Robert; Schuitema, Erik
2012-06-01
We propose two new actor-critic algorithms for reinforcement learning. Both algorithms use local linear regression (LLR) to learn approximations of the functions involved. A crucial feature of the algorithms is that they also learn a process model, and this, in combination with LLR, provides an efficient policy update for faster learning. The first algorithm uses a novel model-based update rule for the actor parameters. The second algorithm does not use an explicit actor but learns a reference model which represents a desired behavior, from which desired control actions can be calculated using the inverse of the learned process model. The two novel methods and a standard actor-critic algorithm are applied to the pendulum swing-up problem, in which the novel methods achieve faster learning than the standard algorithm.
Games as an innovative teaching strategy for overactive bladder and BPH.
LeCroy, Cheryl
2006-10-01
A challenge for urologic nurses and nurse educators is how to present information to staff, students, and patients in a way that will capture their interest and engage them in the learning process. The use of adult-learning principles and innovative teaching strategies can make the learning experience dynamic, and encourage learners to take a more active role in their own learning. Games are a creative, fun, and interactive way to assist in the emphasis, review, reinforcement, and retention of information for urology nurses.
Fisher, Simon D.; Gray, Jason P.; Black, Melony J.; Davies, Jennifer R.; Bednark, Jeffery G.; Redgrave, Peter; Franz, Elizabeth A.; Abraham, Wickliffe C.; Reynolds, John N. J.
2014-01-01
Action discovery and selection are critical cognitive processes that are understudied at the cellular and systems neuroscience levels. Presented here is a new rodent joystick task suitable to test these processes due to the range of action possibilities that can be learnt while performing the task. Rats learned to manipulate a joystick while progressing through task milestones that required increasing degrees of movement accuracy. In a switching phase designed to measure action discovery, rats were repeatedly required to discover new target positions to meet changing task demands. Behavior was compared using both food and electrical brain stimulation reward (BSR) of the substantia nigra as reinforcement. Rats reinforced with food and those with BSR performed similarly overall, although BSR-treated rats exhibited greater vigor in responding. In the switching phase, rats learnt new actions to adapt to changing task demands, reflecting action discovery processes. Because subjects are required to learn different goal-directed actions, this task could be employed in further investigations of the cellular mechanisms of action discovery and selection. Additionally, this task could be used to assess the behavioral flexibility impairments seen in conditions such as Parkinson's disease and obsessive-compulsive disorder. The versatility of the task will enable cross-species investigations of these impairments. PMID:25477795
B-tree search reinforcement learning for model based intelligent agent
NASA Astrophysics Data System (ADS)
Bhuvaneswari, S.; Vignashwaran, R.
2013-03-01
Agents trained by learning techniques provide a powerful approximation of active solutions for naive approaches. In this study using B - Trees implying reinforced learning the data search for information retrieval is moderated to achieve accuracy with minimum search time. The impact of variables and tactics applied in training are determined using reinforcement learning. Agents based on these techniques perform satisfactory baseline and act as finite agents based on the predetermined model against competitors from the course.
Using Fuzzy Logic for Performance Evaluation in Reinforcement Learning
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Khedkar, Pratap S.
1992-01-01
Current reinforcement learning algorithms require long training periods which generally limit their applicability to small size problems. A new architecture is described which uses fuzzy rules to initialize its two neural networks: a neural network for performance evaluation and another for action selection. This architecture is applied to control of dynamic systems and it is demonstrated that it is possible to start with an approximate prior knowledge and learn to refine it through experiments using reinforcement learning.
Changes in corticostriatal connectivity during reinforcement learning in humans.
Horga, Guillermo; Maia, Tiago V; Marsh, Rachel; Hao, Xuejun; Xu, Dongrong; Duan, Yunsuo; Tau, Gregory Z; Graniello, Barbara; Wang, Zhishun; Kangarlu, Alayar; Martinez, Diana; Packard, Mark G; Peterson, Bradley S
2015-02-01
Many computational models assume that reinforcement learning relies on changes in synaptic efficacy between cortical regions representing stimuli and striatal regions involved in response selection, but this assumption has thus far lacked empirical support in humans. We recorded hemodynamic signals with fMRI while participants navigated a virtual maze to find hidden rewards. We fitted a reinforcement-learning algorithm to participants' choice behavior and evaluated the neural activity and the changes in functional connectivity related to trial-by-trial learning variables. Activity in the posterior putamen during choice periods increased progressively during learning. Furthermore, the functional connections between the sensorimotor cortex and the posterior putamen strengthened progressively as participants learned the task. These changes in corticostriatal connectivity differentiated participants who learned the task from those who did not. These findings provide a direct link between changes in corticostriatal connectivity and learning, thereby supporting a central assumption common to several computational models of reinforcement learning. © 2014 Wiley Periodicals, Inc.
Watson, Richard A; Szathmáry, Eörs
2016-02-01
The theory of evolution links random variation and selection to incremental adaptation. In a different intellectual domain, learning theory links incremental adaptation (e.g., from positive and/or negative reinforcement) to intelligent behaviour. Specifically, learning theory explains how incremental adaptation can acquire knowledge from past experience and use it to direct future behaviours toward favourable outcomes. Until recently such cognitive learning seemed irrelevant to the 'uninformed' process of evolution. In our opinion, however, new results formally linking evolutionary processes to the principles of learning might provide solutions to several evolutionary puzzles - the evolution of evolvability, the evolution of ecological organisation, and evolutionary transitions in individuality. If so, the ability for evolution to learn might explain how it produces such apparently intelligent designs. Copyright © 2015 Elsevier Ltd. All rights reserved.
Awata, Hiroko; Watanabe, Takahito; Hamanaka, Yoshitaka; Mito, Taro; Noji, Sumihare; Mizunami, Makoto
2015-11-02
Elucidation of reinforcement mechanisms in associative learning is an important subject in neuroscience. In mammals, dopamine neurons are thought to play critical roles in mediating both appetitive and aversive reinforcement. Our pharmacological studies suggested that octopamine and dopamine neurons mediate reward and punishment, respectively, in crickets, but recent studies in fruit-flies concluded that dopamine neurons mediates both reward and punishment, via the type 1 dopamine receptor Dop1. To resolve the discrepancy between studies in different insect species, we produced Dop1 knockout crickets using the CRISPR/Cas9 system and found that they are defective in aversive learning with sodium chloride punishment but not appetitive learning with water or sucrose reward. The results suggest that dopamine and octopamine neurons mediate aversive and appetitive reinforcement, respectively, in crickets. We suggest unexpected diversity in neurotransmitters mediating appetitive reinforcement between crickets and fruit-flies, although the neurotransmitter mediating aversive reinforcement is conserved. This study demonstrates usefulness of the CRISPR/Cas9 system for producing knockout animals for the study of learning and memory.
Inglis, W L; Olmstead, M C; Robbins, T W
2000-04-01
The role of the pedunculopontine tegmental nucleus (PPTg) in stimulus-reward learning was assessed by testing the effects of PPTg lesions on performance in visual autoshaping and conditioned reinforcement (CRf) paradigms. Rats with PPTg lesions were unable to learn an association between a conditioned stimulus (CS) and a primary reward in either paradigm. In the autoshaping experiment, PPTg-lesioned rats approached the CS+ and CS- with equal frequency, and the latencies to respond to the two stimuli did not differ. PPTg lesions also disrupted discriminated approaches to an appetitive CS in the CRf paradigm and completely abolished the acquisition of responding with CRf. These data are discussed in the context of a possible cognitive function of the PPTg, particularly in terms of lesion-induced disruptions of attentional processes that are mediated by the thalamus.
Le Merrer, Julie; Plaza-Zabala, Ainhoa; Del Boca, Carolina; Matifas, Audrey; Maldonado, Rafael; Kieffer, Brigitte L
2011-04-01
Converging experimental data indicate that δ opioid receptors contribute to mediate drug reinforcement processes. Whether their contribution reflects a role in the modulation of drug reward or an implication in conditioned learning, however, has not been explored. In the present study, we investigated the impact of δ receptor gene knockout on reinforced conditioned learning under several experimental paradigms. We assessed the ability of δ receptor knockout mice to form drug-context associations with either morphine (appetitive)- or lithium (aversive)-induced Pavlovian place conditioning. We also examined the efficiency of morphine to serve as a positive reinforcer in these mice and their motivation to gain drug injections, with operant intravenous self-administration under fixed and progressive ratio schedules and at two different doses. Mutant mice showed impaired place conditioning in both appetitive and aversive conditions, indicating disrupted context-drug association. In contrast, mutant animals displayed intact acquisition of morphine self-administration and reached breaking-points comparable to control subjects. Thus, reinforcing effects of morphine and motivation to obtain the drug were maintained. Collectively, the data suggest that δ receptor activity is not involved in morphine reinforcement but facilitates place conditioning. This study reveals a novel aspect of δ opioid receptor function in addiction-related behaviors. Copyright © 2011 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Probabilistic Reinforcement Learning in Adults with Autism Spectrum Disorders
Solomon, Marjorie; Smith, Anne C.; Frank, Michael J.; Ly, Stanford; Carter, Cameron S.
2017-01-01
Background Autism spectrum disorders (ASDs) can be conceptualized as disorders of learning, however there have been few experimental studies taking this perspective. Methods We examined the probabilistic reinforcement learning performance of 28 adults with ASDs and 30 typically developing adults on a task requiring learning relationships between three stimulus pairs consisting of Japanese characters with feedback that was valid with different probabilities (80%, 70%, and 60%). Both univariate and Bayesian state–space data analytic methods were employed. Hypotheses were based on the extant literature as well as on neurobiological and computational models of reinforcement learning. Results Both groups learned the task after training. However, there were group differences in early learning in the first task block where individuals with ASDs acquired the most frequently accurately reinforced stimulus pair (80%) comparably to typically developing individuals; exhibited poorer acquisition of the less frequently reinforced 70% pair as assessed by state–space learning curves; and outperformed typically developing individuals on the near chance (60%) pair. Individuals with ASDs also demonstrated deficits in using positive feedback to exploit rewarded choices. Conclusions Results support the contention that individuals with ASDs are slower learners. Based on neurobiology and on the results of computational modeling, one interpretation of this pattern of findings is that impairments are related to deficits in flexible updating of reinforcement history as mediated by the orbito-frontal cortex, with spared functioning of the basal ganglia. This hypothesis about the pathophysiology of learning in ASDs can be tested using functional magnetic resonance imaging. PMID:21425243
Learning processes underlying avoidance of negative outcomes.
Andreatta, Marta; Michelmann, Sebastian; Pauli, Paul; Hewig, Johannes
2017-04-01
Successful avoidance of a threatening event may negatively reinforce the behavior due to activation of brain structures involved in reward processing. Here, we further investigated the learning-related properties of avoidance using feedback-related negativity (FRN). The FRN is modulated by violations of an intended outcome (prediction error, PE), that is, the bigger the difference between intended and actual outcome, the larger the FRN amplitude is. Twenty-eight participants underwent an operant conditioning paradigm, in which a behavior (button press) allowed them to avoid a painful electric shock. During two learning blocks, participants could avoid an electric shock in 80% of the trials by pressing one button (avoidance button), or by not pressing another button (punishment button). After learning, participants underwent two test blocks, which were identical to the learning ones except that no shocks were delivered. Participants pressed the avoidance button more often than the punishment button. Importantly, response frequency increased throughout the learning blocks but it did not decrease during the test blocks, indicating impaired extinction and/or habit formation. In line with a PE account, FRN amplitude to negative feedback after correct responses (i.e., unexpected punishment) was significantly larger than to positive feedback (i.e., expected omission of punishment), and it increased throughout the blocks. Highly anxious individuals showed equal FRN amplitudes to negative and positive feedback, suggesting impaired discrimination. These results confirm the role of negative reinforcement in motivating behavior and learning, and reveal important differences between high and low anxious individuals in the processing of prediction errors. © 2017 Society for Psychophysiological Research.
NASA Technical Reports Server (NTRS)
Yates, I. C.; Yost, V. H.
1973-01-01
The results of the first two of a series of research rocket flights are presented. The objectives of these flights were (1) to learn about the capabilities of these rockets, (2) to learn how to interface the payloads and rockets, and (3) to process some of the composite casting demonstration capsules intended originally for Apollo 15. The capsules contained experiments for investigating the stability of gas bubbles in plain and fiber-reinforced metal melted and solidified in a near-zero-g (0.0119g) environment. The characteristics of the two research rockets, an Aerobee 170A and a Black Brant VC, used to obtain the periods of near-zero-g and the temperature control unit used for processing the contents of the two experiment capsules are discussed.
Human Aided Reinforcement Learning in Complex Environments
learn to solve tasks through a trial -and- error process. As an agent takes ...task faster andmore accurately, a human expert can be added to the system to guide an agent in solving the task. This project seeks to expand on current...theenvironment, which works particularly well for reactive tasks . In more complex tasks , these systems do not work as intended. The manipulation
ERIC Educational Resources Information Center
Steingroever, Helen; Wetzels, Ruud; Wagenmakers, Eric-Jan
2013-01-01
The Iowa gambling task (IGT) is one of the most popular tasks used to study decision-making deficits in clinical populations. In order to decompose performance on the IGT in its constituent psychological processes, several cognitive models have been proposed (e.g., the Expectancy Valence (EV) and Prospect Valence Learning (PVL) models). Here we…
ERIC Educational Resources Information Center
Carnoy, Martin; Ngware, Moses; Oketch, Moses
2015-01-01
We take an innovative approach to estimating student mathematics learning in the sixth grade of three African countries. The study reinforces the notion that beyond the quality of the teaching process in classrooms, national contextual factors are important in understanding the contribution that schooling makes to student performance. Our approach…
Context-dependent decision-making: a simple Bayesian model
Lloyd, Kevin; Leslie, David S.
2013-01-01
Many phenomena in animal learning can be explained by a context-learning process whereby an animal learns about different patterns of relationship between environmental variables. Differentiating between such environmental regimes or ‘contexts’ allows an animal to rapidly adapt its behaviour when context changes occur. The current work views animals as making sequential inferences about current context identity in a world assumed to be relatively stable but also capable of rapid switches to previously observed or entirely new contexts. We describe a novel decision-making model in which contexts are assumed to follow a Chinese restaurant process with inertia and full Bayesian inference is approximated by a sequential-sampling scheme in which only a single hypothesis about current context is maintained. Actions are selected via Thompson sampling, allowing uncertainty in parameters to drive exploration in a straightforward manner. The model is tested on simple two-alternative choice problems with switching reinforcement schedules and the results compared with rat behavioural data from a number of T-maze studies. The model successfully replicates a number of important behavioural effects: spontaneous recovery, the effect of partial reinforcement on extinction and reversal, the overtraining reversal effect, and serial reversal-learning effects. PMID:23427101
Context-dependent decision-making: a simple Bayesian model.
Lloyd, Kevin; Leslie, David S
2013-05-06
Many phenomena in animal learning can be explained by a context-learning process whereby an animal learns about different patterns of relationship between environmental variables. Differentiating between such environmental regimes or 'contexts' allows an animal to rapidly adapt its behaviour when context changes occur. The current work views animals as making sequential inferences about current context identity in a world assumed to be relatively stable but also capable of rapid switches to previously observed or entirely new contexts. We describe a novel decision-making model in which contexts are assumed to follow a Chinese restaurant process with inertia and full Bayesian inference is approximated by a sequential-sampling scheme in which only a single hypothesis about current context is maintained. Actions are selected via Thompson sampling, allowing uncertainty in parameters to drive exploration in a straightforward manner. The model is tested on simple two-alternative choice problems with switching reinforcement schedules and the results compared with rat behavioural data from a number of T-maze studies. The model successfully replicates a number of important behavioural effects: spontaneous recovery, the effect of partial reinforcement on extinction and reversal, the overtraining reversal effect, and serial reversal-learning effects.
Effects of D-cycloserine on the extinction of appetitive operant learning
Vurbic, Drina; Gold, Benjamin; Bouton, Mark E.
2011-01-01
Four experiments with rat subjects examined whether D-cycloserine (DCS), a partial NMDA agonist, facilitates the extinction of operant lever-pressing reinforced by food. Previous research has demonstrated that DCS facilitates extinction learning with methods that involve Pavlovian extinction. In the current experiments, operant conditioning occurred in Context A, extinction in Context B, and then testing occurred in both the extinction and conditioning contexts. Experiments 1a and 1b tested the effects of three doses of DCS (5, 15, and 30 mg/kg) on the extinction of lever pressing trained as a free operant. Experiment 2 examined their effects when extinction of the free operant was conducted in the presence of non-response-contingent deliveries of the reinforcer (which theoretically reduced the role of generalization decrement in suppressing responding). Experiment 3 examined their effects on extinction of a discriminated operant, i.e., one that had been reinforced in the presence of a discriminative stimulus, but not in its absence. A strong ABA renewal effect was observed in all four experiments during testing. However, despite the use of DCS doses and a drug administration procedure that facilitates the extinction of Pavlovian learning, there was no evidence in any experiment that DCS facilitated operant extinction learning assessed in either the extinction or the conditioning context. DCS may primarily facilitate learning processes that underlie Pavlovian, rather than purely operant, extinction. PMID:21688894
Role of dopamine D2 receptors in human reinforcement learning.
Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W
2014-09-01
Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well.
Role of Dopamine D2 Receptors in Human Reinforcement Learning
Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W
2014-01-01
Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well. PMID:24713613
Effect of Reinforcement History on Hand Choice in an Unconstrained Reaching Task
Stoloff, Rebecca H.; Taylor, Jordan A.; Xu, Jing; Ridderikhoff, Arne; Ivry, Richard B.
2011-01-01
Choosing which hand to use for an action is one of the most frequent decisions people make in everyday behavior. We developed a simple reaching task in which we vary the lateral position of a target and the participant is free to reach to it with either the right or left hand. While people exhibit a strong preference to use the hand ipsilateral to the target, there is a region of uncertainty within which hand choice varies across trials. We manipulated the reinforcement rates for the two hands, either by increasing the likelihood that a reach with the non-dominant hand would successfully intersect the target or decreasing the likelihood that a reach with the dominant hand would be successful. While participants had minimal awareness of these manipulations, we observed an increase in the use of the non-dominant hand for targets presented in the region of uncertainty. We modeled the shift in hand use using a Q-learning model of reinforcement learning. The results provided a good fit of the data and indicate that the effects of increasing and decreasing the rate of positive reinforcement are additive. These experiments emphasize the role of decision processes for effector selection, and may point to a novel approach for physical rehabilitation based on intrinsic reinforcement. PMID:21472031
[Functional neuroanatomy of implicit learning: associative, motor and habit].
Correa, M
The present review focuses on the neuroanatomy of aspects of implicit learning that involve stimulus-response associations, such as classical and instrumental conditioning, motor learning and habit formation. These types of learning all require a progression in the acquisition of procedural information about 'how to do things' instead of 'what things are'. These forms of implicit learning share the neural substrate formed mainly by brain circuits involving basal ganglia, prefrontal cortex and amygdala. The relationship between pavlovian and instrumental learning is shown in the transference and autoshaping studies. There has been a resurgence of interest in habit learning because of the suggestion that addiction is a process that progresses from a reinforced response to a habit in which the stimulus-response association is supraselected and becomes independent of voluntary cognitive control. Dopamine has demonstrated to be involved in the acquisition of these procedures. The different forms of procedural learning studied here all are characterized by stimulus-response-reinforcement associations, but there are differences between them in terms of the degree to which some of these associations or components are strengthened. These different patterns of association are partially regulated by the degree of involvement of the frontal-striatal-amygdala circuits.
Network congestion control algorithm based on Actor-Critic reinforcement learning model
NASA Astrophysics Data System (ADS)
Xu, Tao; Gong, Lina; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen
2018-04-01
Aiming at the network congestion control problem, a congestion control algorithm based on Actor-Critic reinforcement learning model is designed. Through the genetic algorithm in the congestion control strategy, the network congestion problems can be better found and prevented. According to Actor-Critic reinforcement learning, the simulation experiment of network congestion control algorithm is designed. The simulation experiments verify that the AQM controller can predict the dynamic characteristics of the network system. Moreover, the learning strategy is adopted to optimize the network performance, and the dropping probability of packets is adaptively adjusted so as to improve the network performance and avoid congestion. Based on the above finding, it is concluded that the network congestion control algorithm based on Actor-Critic reinforcement learning model can effectively avoid the occurrence of TCP network congestion.
From Recurrent Choice to Skill Learning: A Reinforcement-Learning Model
ERIC Educational Resources Information Center
Fu, Wai-Tat; Anderson, John R.
2006-01-01
The authors propose a reinforcement-learning mechanism as a model for recurrent choice and extend it to account for skill learning. The model was inspired by recent research in neurophysiological studies of the basal ganglia and provides an integrated explanation of recurrent choice behavior and skill learning. The behavior includes effects of…
Ren, Xi; Valle-Inclán, Fernando; Tukaiev, Sergii; Hackley, Steven A
2017-07-01
According to reinforcement learning theory, dopamine-dependent anticipatory processes play a critical role in learning from action outcomes such as feedback or reward. To better understand outcome anticipation, we examined variation in slow cortical potentials and assessed their changes over the course of motor-skill acquisition. Healthy young adults learned a series of precisely timed, key press sequences. Feedback was delivered at a delay of either 2.5 or 8 s, to encourage use of either the striatally mediated, habit learning system or the hippocampus-dependent, episodic memory system, respectively. During the 2.5-s delay, the stimulus-preceding negativity (SPN) was shown to decline in amplitude across trials, confirming previous results from a perceptual categorization task (Morís, Luque, & Rodríguez-Fornells, 2013). This falsifies the hypothesis that SPN reflects specific outcome predictions, on the assumption that the ability to make such predictions should improve as a task is mastered. An SPN was also evident during the 8-s delay, but it increased in amplitude across trials. At the conclusion of the 8-s but not the 2.5-s prefeedback interval, a reversed-polarity lateralized readiness potential (LRP) was noted. It was suggested that this might indicate maintenance of an action representation for comparison with the feedback display. If so, this would constitute the first direct psychophysiological evidence for a popular hypothetical construct in quantitative models of reinforcement learning, the so-called eligibility trace. © 2017 Society for Psychophysiological Research.
Adolescent-specific patterns of behavior and neural activity during social reinforcement learning
Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, BJ
2014-01-01
Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents towards action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggests possible explanations for how peers may motivate adolescent behavior. PMID:24550063
Adolescent-specific patterns of behavior and neural activity during social reinforcement learning.
Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, B J
2014-06-01
Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.
Stochastic Reinforcement Benefits Skill Acquisition
ERIC Educational Resources Information Center
Dayan, Eran; Averbeck, Bruno B.; Richmond, Barry J.; Cohen, Leonardo G.
2014-01-01
Learning complex skills is driven by reinforcement, which facilitates both online within-session gains and retention of the acquired skills. Yet, in ecologically relevant situations, skills are often acquired when mapping between actions and rewarding outcomes is unknown to the learning agent, resulting in reinforcement schedules of a stochastic…
A reward optimization method based on action subrewards in hierarchical reinforcement learning.
Fu, Yuchen; Liu, Quan; Ling, Xionghong; Cui, Zhiming
2014-01-01
Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.
Multi-agent Reinforcement Learning Model for Effective Action Selection
NASA Astrophysics Data System (ADS)
Youk, Sang Jo; Lee, Bong Keun
Reinforcement learning is a sub area of machine learning concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward. In the case of multi-agent, especially, which state space and action space gets very enormous in compared to single agent, so it needs to take most effective measure available select the action strategy for effective reinforcement learning. This paper proposes a multi-agent reinforcement learning model based on fuzzy inference system in order to improve learning collect speed and select an effective action in multi-agent. This paper verifies an effective action select strategy through evaluation tests based on Robocop Keep away which is one of useful test-beds for multi-agent. Our proposed model can apply to evaluate efficiency of the various intelligent multi-agents and also can apply to strategy and tactics of robot soccer system.
Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models
Najnin, Shamima; Banerjee, Bonny
2018-01-01
Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The “novel words to novel objects” language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task. PMID:29441027
Using Reinforcement Learning to Examine Dynamic Attention Allocation during Reading
ERIC Educational Resources Information Center
Liu, Yanping; Reichle, Erik D.; Gao, Ding-Guo
2013-01-01
A fundamental question in reading research concerns whether attention is allocated strictly serially, supporting lexical processing of one word at a time, or in parallel, supporting concurrent lexical processing of two or more words (Reichle, Liversedge, Pollatsek, & Rayner, 2009). The origins of this debate are reviewed. We then report three…
Reinforcement learning improves behaviour from evaluative feedback
NASA Astrophysics Data System (ADS)
Littman, Michael L.
2015-05-01
Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.
Reinforcement learning improves behaviour from evaluative feedback.
Littman, Michael L
2015-05-28
Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.
Bai, John Y H; Podlesnik, Christopher A
2017-05-01
Greater rates of intermittent reinforcement in the presence of discriminative stimuli generally produce greater resistance to extinction, consistent with predictions of behavioral momentum theory. Other studies reveal more rapid extinction with higher rates of reinforcers - the partial reinforcement extinction effect. Further, repeated extinction often produces more rapid decreases in operant responding due to learning a discrimination between training and extinction contingencies. The present study examined extinction repeatedly with training with different rates of intermittent reinforcement in a multiple schedule. We assessed whether repeated extinction would reverse the pattern of greater resistance to extinction with greater reinforcer rates. Counter to this prediction, resistance to extinction was consistently greater across twelve assessments of training followed by six successive sessions of extinction. Moreover, patterns of responding during extinction resembled those observed during satiation tests, which should not alter discrimination processes with repeated testing. These findings join others suggesting operant responding in extinction can be durable across repeated tests. Copyright © 2017 Elsevier B.V. All rights reserved.
Stochastic evolution in populations of ideas
Nicole, Robin; Sollich, Peter; Galla, Tobias
2017-01-01
It is known that learning of players who interact in a repeated game can be interpreted as an evolutionary process in a population of ideas. These analogies have so far mostly been established in deterministic models, and memory loss in learning has been seen to act similarly to mutation in evolution. We here propose a representation of reinforcement learning as a stochastic process in finite ‘populations of ideas’. The resulting birth-death dynamics has absorbing states and allows for the extinction or fixation of ideas, marking a key difference to mutation-selection processes in finite populations. We characterize the outcome of evolution in populations of ideas for several classes of symmetric and asymmetric games. PMID:28098244
Stochastic evolution in populations of ideas
NASA Astrophysics Data System (ADS)
Nicole, Robin; Sollich, Peter; Galla, Tobias
2017-01-01
It is known that learning of players who interact in a repeated game can be interpreted as an evolutionary process in a population of ideas. These analogies have so far mostly been established in deterministic models, and memory loss in learning has been seen to act similarly to mutation in evolution. We here propose a representation of reinforcement learning as a stochastic process in finite ‘populations of ideas’. The resulting birth-death dynamics has absorbing states and allows for the extinction or fixation of ideas, marking a key difference to mutation-selection processes in finite populations. We characterize the outcome of evolution in populations of ideas for several classes of symmetric and asymmetric games.
The power of associative learning and the ontogeny of optimal behaviour.
Enquist, Magnus; Lind, Johan; Ghirlanda, Stefano
2016-11-01
Behaving efficiently (optimally or near-optimally) is central to animals' adaptation to their environment. Much evolutionary biology assumes, implicitly or explicitly, that optimal behavioural strategies are genetically inherited, yet the behaviour of many animals depends crucially on learning. The question of how learning contributes to optimal behaviour is largely open. Here we propose an associative learning model that can learn optimal behaviour in a wide variety of ecologically relevant circumstances. The model learns through chaining, a term introduced by Skinner to indicate learning of behaviour sequences by linking together shorter sequences or single behaviours. Our model formalizes the concept of conditioned reinforcement (the learning process that underlies chaining) and is closely related to optimization algorithms from machine learning. Our analysis dispels the common belief that associative learning is too limited to produce 'intelligent' behaviour such as tool use, social learning, self-control or expectations of the future. Furthermore, the model readily accounts for both instinctual and learned aspects of behaviour, clarifying how genetic evolution and individual learning complement each other, and bridging a long-standing divide between ethology and psychology. We conclude that associative learning, supported by genetic predispositions and including the oft-neglected phenomenon of conditioned reinforcement, may suffice to explain the ontogeny of optimal behaviour in most, if not all, non-human animals. Our results establish associative learning as a more powerful optimizing mechanism than acknowledged by current opinion.
The power of associative learning and the ontogeny of optimal behaviour
Enquist, Magnus; Lind, Johan
2016-01-01
Behaving efficiently (optimally or near-optimally) is central to animals' adaptation to their environment. Much evolutionary biology assumes, implicitly or explicitly, that optimal behavioural strategies are genetically inherited, yet the behaviour of many animals depends crucially on learning. The question of how learning contributes to optimal behaviour is largely open. Here we propose an associative learning model that can learn optimal behaviour in a wide variety of ecologically relevant circumstances. The model learns through chaining, a term introduced by Skinner to indicate learning of behaviour sequences by linking together shorter sequences or single behaviours. Our model formalizes the concept of conditioned reinforcement (the learning process that underlies chaining) and is closely related to optimization algorithms from machine learning. Our analysis dispels the common belief that associative learning is too limited to produce ‘intelligent’ behaviour such as tool use, social learning, self-control or expectations of the future. Furthermore, the model readily accounts for both instinctual and learned aspects of behaviour, clarifying how genetic evolution and individual learning complement each other, and bridging a long-standing divide between ethology and psychology. We conclude that associative learning, supported by genetic predispositions and including the oft-neglected phenomenon of conditioned reinforcement, may suffice to explain the ontogeny of optimal behaviour in most, if not all, non-human animals. Our results establish associative learning as a more powerful optimizing mechanism than acknowledged by current opinion. PMID:28018662
Quantum-Enhanced Machine Learning
NASA Astrophysics Data System (ADS)
Dunjko, Vedran; Taylor, Jacob M.; Briegel, Hans J.
2016-09-01
The emerging field of quantum machine learning has the potential to substantially aid in the problems and scope of artificial intelligence. This is only enhanced by recent successes in the field of classical machine learning. In this work we propose an approach for the systematic treatment of machine learning, from the perspective of quantum information. Our approach is general and covers all three main branches of machine learning: supervised, unsupervised, and reinforcement learning. While quantum improvements in supervised and unsupervised learning have been reported, reinforcement learning has received much less attention. Within our approach, we tackle the problem of quantum enhancements in reinforcement learning as well, and propose a systematic scheme for providing improvements. As an example, we show that quadratic improvements in learning efficiency, and exponential improvements in performance over limited time periods, can be obtained for a broad class of learning problems.
Can model-free reinforcement learning explain deontological moral judgments?
Ayars, Alisabeth
2016-05-01
Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework. Copyright © 2016 Elsevier B.V. All rights reserved.
General functioning predicts reward and punishment learning in schizophrenia.
Somlai, Zsuzsanna; Moustafa, Ahmed A; Kéri, Szabolcs; Myers, Catherine E; Gluck, Mark A
2011-04-01
Previous studies investigating feedback-driven reinforcement learning in patients with schizophrenia have provided mixed results. In this study, we explored the clinical predictors of reward and punishment learning using a probabilistic classification learning task. Patients with schizophrenia (n=40) performed similarly to healthy controls (n=30) on the classification learning task. However, more severe negative and general symptoms were associated with lower reward-learning performance, whereas poorer general psychosocial functioning was correlated with both lower reward- and punishment-learning performances. Multiple linear regression analyses indicated that general psychosocial functioning was the only significant predictor of reinforcement learning performance when education, antipsychotic dose, and positive, negative and general symptoms were included in the analysis. These results suggest a close relationship between reinforcement learning and general psychosocial functioning in schizophrenia. Published by Elsevier B.V.
Agent-based traffic management and reinforcement learning in congested intersection network.
DOT National Transportation Integrated Search
2012-08-01
This study evaluates the performance of traffic control systems based on reinforcement learning (RL), also called approximate dynamic programming (ADP). Two algorithms have been selected for testing: 1) Q-learning and 2) approximate dynamic programmi...
Learning Based Bidding Strategy for HVAC Systems in Double Auction Retail Energy Markets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sun, Yannan; Somani, Abhishek; Carroll, Thomas E.
In this paper, a bidding strategy is proposed using reinforcement learning for HVAC systems in a double auction market. The bidding strategy does not require a specific model-based representation of behavior, i.e., a functional form to translate indoor house temperatures into bid prices. The results from reinforcement learning based approach are compared with the HVAC bidding approach used in the AEP gridSMART® smart grid demonstration project and it is shown that the model-free (learning based) approach tracks well the results from the model-based behavior. Successful use of model-free approaches to represent device-level economic behavior may help develop similar approaches tomore » represent behavior of more complex devices or groups of diverse devices, such as in a building. Distributed control requires an understanding of decision making processes of intelligent agents so that appropriate mechanisms may be developed to control and coordinate their responses, and model-free approaches to represent behavior will be extremely useful in that quest.« less
Operant conditioning of enhanced pain sensitivity by heat-pain titration.
Becker, Susanne; Kleinböhl, Dieter; Klossika, Iris; Hölzl, Rupert
2008-11-15
Operant conditioning mechanisms have been demonstrated to be important in the development of chronic pain. Most experimental studies have investigated the operant modulation of verbal pain reports with extrinsic reinforcement, such as verbal reinforcement. Whether this reflects actual changes in the subjective experience of the nociceptive stimulus remained unclear. This study replicates and extends our previous demonstration that enhanced pain sensitivity to prolonged heat-pain stimulation could be learned in healthy participants through intrinsic reinforcement (contingent changes in nociceptive input) independent of verbal pain reports. In addition, we examine whether different magnitudes of reinforcement differentially enhance pain sensitivity using an operant heat-pain titration paradigm. It is based on the previously developed non-verbal behavioral discrimination task for the assessment of sensitization, which uses discriminative down- or up-regulation of stimulus temperatures in response to changes in subjective intensity. In operant heat-pain titration, this discriminative behavior and not verbal pain report was contingently reinforced or punished by acute decreases or increases in heat-pain intensity. The magnitude of reinforcement was varied between three groups: low (N1=13), medium (N2=11) and high reinforcement (N3=12). Continuous reinforcement was applied to acquire and train the operant behavior, followed by partial reinforcement to analyze the underlying learning mechanisms. Results demonstrated that sensitization to prolonged heat-pain stimulation was enhanced by operant learning within 1h. The extent of sensitization was directly dependent on the received magnitude of reinforcement. Thus, operant learning mechanisms based on intrinsic reinforcement may provide an explanation for the gradual development of sustained hypersensitivity during pain that is becoming chronic.
ERIC Educational Resources Information Center
Segal, Bertha E.
Materials from a teacher workshop on the Total Physical Response method for teaching English as a second language are presented. The technique describes the process of first language acquisition, uses physical activities in the classroom to reinforce learning, and allows a long period of receptive language learning before requiring production. The…
Zendehrouh, Sareh
2015-11-01
Recent work on decision-making field offers an account of dual-system theory for decision-making process. This theory holds that this process is conducted by two main controllers: a goal-directed system and a habitual system. In the reinforcement learning (RL) domain, the habitual behaviors are connected with model-free methods, in which appropriate actions are learned through trial-and-error experiences. However, goal-directed behaviors are associated with model-based methods of RL, in which actions are selected using a model of the environment. Studies on cognitive control also suggest that during processes like decision-making, some cortical and subcortical structures work in concert to monitor the consequences of decisions and to adjust control according to current task demands. Here a computational model is presented based on dual system theory and cognitive control perspective of decision-making. The proposed model is used to simulate human performance on a variant of probabilistic learning task. The basic proposal is that the brain implements a dual controller, while an accompanying monitoring system detects some kinds of conflict including a hypothetical cost-conflict one. The simulation results address existing theories about two event-related potentials, namely error related negativity (ERN) and feedback related negativity (FRN), and explore the best account of them. Based on the results, some testable predictions are also presented. Copyright © 2015 Elsevier Ltd. All rights reserved.
Place preference and vocal learning rely on distinct reinforcers in songbirds.
Murdoch, Don; Chen, Ruidong; Goldberg, Jesse H
2018-04-30
In reinforcement learning (RL) agents are typically tasked with maximizing a single objective function such as reward. But it remains poorly understood how agents might pursue distinct objectives at once. In machines, multiobjective RL can be achieved by dividing a single agent into multiple sub-agents, each of which is shaped by agent-specific reinforcement, but it remains unknown if animals adopt this strategy. Here we use songbirds to test if navigation and singing, two behaviors with distinct objectives, can be differentially reinforced. We demonstrate that strobe flashes aversively condition place preference but not song syllables. Brief noise bursts aversively condition song syllables but positively reinforce place preference. Thus distinct behavior-generating systems, or agencies, within a single animal can be shaped by correspondingly distinct reinforcement signals. Our findings suggest that spatially segregated vocal circuits can solve a credit assignment problem associated with multiobjective learning.
ERIC Educational Resources Information Center
Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias
2011-01-01
Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…
Machine Learning Control For Highly Reconfigurable High-Order Systems
2015-01-02
develop and flight test a Reinforcement Learning based approach for autonomous tracking of ground targets using a fixed wing Unmanned...Reinforcement Learning - based algorithms are developed for learning agents’ time dependent dynamics while also learning to control them. Three algorithms...to a wide range of engineering- based problems . Implementation of these solutions, however, is often complicated by the hysteretic, non-linear,
Evolving fuzzy rules in a learning classifier system
NASA Technical Reports Server (NTRS)
Valenzuela-Rendon, Manuel
1993-01-01
The fuzzy classifier system (FCS) combines the ideas of fuzzy logic controllers (FLC's) and learning classifier systems (LCS's). It brings together the expressive powers of fuzzy logic as it has been applied in fuzzy controllers to express relations between continuous variables, and the ability of LCS's to evolve co-adapted sets of rules. The goal of the FCS is to develop a rule-based system capable of learning in a reinforcement regime, and that can potentially be used for process control.
Reinforcement and inference in cross-situational word learning.
Tilles, Paulo F C; Fontanari, José F
2013-01-01
Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation) inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only) reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.
Neural and neurochemical basis of reinforcement-guided decision making.
Khani, Abbas; Rainer, Gregor
2016-08-01
Decision making is an adaptive behavior that takes into account several internal and external input variables and leads to the choice of a course of action over other available and often competing alternatives. While it has been studied in diverse fields ranging from mathematics, economics, ecology, and ethology to psychology and neuroscience, recent cross talk among perspectives from different fields has yielded novel descriptions of decision processes. Reinforcement-guided decision making models are based on economic and reinforcement learning theories, and their focus is on the maximization of acquired benefit over a defined period of time. Studies based on reinforcement-guided decision making have implicated a large network of neural circuits across the brain. This network includes a wide range of cortical (e.g., orbitofrontal cortex and anterior cingulate cortex) and subcortical (e.g., nucleus accumbens and subthalamic nucleus) brain areas and uses several neurotransmitter systems (e.g., dopaminergic and serotonergic systems) to communicate and process decision-related information. This review discusses distinct as well as overlapping contributions of these networks and neurotransmitter systems to the processing of decision making. We end the review by touching on neural circuitry and neuromodulatory regulation of exploratory decision making. Copyright © 2016 the American Physiological Society.
Effects of D-cycloserine on the extinction of appetitive operant learning.
Vurbic, Drina; Gold, Benjamin; Bouton, Mark E
2011-08-01
Four experiments with rat subjects examined whether D-cycloserine (DCS), a partial NMDA agonist, facilitates the extinction of operant lever-pressing reinforced by food. Previous research has demonstrated that DCS facilitates extinction learning with methods that involve Pavlovian extinction. In the current experiments, operant conditioning occurred in Context A, extinction in Context B, and then testing occurred in both the extinction and conditioning contexts. Experiments 1A and 1B tested the effects of three doses of DCS (5, 15, and 30 mg/kg) on the extinction of lever pressing trained as a free operant. Experiment 2 examined their effects when extinction of the free operant was conducted in the presence of nonresponse-contingent deliveries of the reinforcer (that theoretically reduced the role of generalization decrement in suppressing responding). Experiment 3 examined their effects on extinction of a discriminated operant, that is, one that had been reinforced in the presence of a discriminative stimulus, but not in its absence. A strong ABA renewal effect was observed in all four experiments during testing. However, despite the use of DCS doses and a drug administration procedure that facilitates the extinction of Pavlovian learning, there was no evidence in any experiment that DCS facilitated operant extinction learning assessed in either the extinction or the conditioning context. DCS may primarily facilitate learning processes that underlie Pavlovian, rather than purely operant, extinction. (PsycINFO Database Record (c) 2011 APA, all rights reserved).
Overcoming Learned Helplessness in Community College Students.
ERIC Educational Resources Information Center
Roueche, John E.; Mink, Oscar G.
1982-01-01
Reviews research on the effects of repeated experiences of helplessness and on locus of control. Identifies conditions necessary for overcoming learned helplessness; i.e., the potential for learning to occur; consistent reinforcement; relevant, valued reinforcers; and favorable psychological situation. Recommends eight ways for teachers to…
Michaelides, Michael; Miller, Michael L; DiNieri, Jennifer A; Gomez, Juan L; Schwartz, Elizabeth; Egervari, Gabor; Wang, Gene Jack; Mobbs, Charles V; Volkow, Nora D; Hurd, Yasmin L
2017-11-01
Appetitive drive is influenced by coordinated interactions between brain circuits that regulate reinforcement and homeostatic signals that control metabolism. Glucose modulates striatal dopamine (DA) and regulates appetitive drive and reinforcement learning. Striatal DA D2 receptors (D2Rs) also regulate reinforcement learning and are implicated in glucose-related metabolic disorders. Nevertheless, interactions between striatal D2R and peripheral glucose have not been previously described. Here we show that manipulations involving striatal D2R signaling coincide with perseverative and impulsive-like responding for sucrose, a disaccharide consisting of fructose and glucose. Fructose conveys orosensory (ie, taste) reinforcement but does not convey metabolic (ie, nutrient-derived) reinforcement. Glucose however conveys orosensory reinforcement but unlike fructose, it is a major metabolic energy source, underlies sustained reinforcement, and activates striatal circuitry. We found that mice with deletion of dopamine- and cAMP-regulated neuronal phosphoprotein (DARPP-32) exclusively in D2R-expressing cells exhibited preferential D2R changes in the nucleus accumbens (NAc), a striatal region that critically regulates sucrose reinforcement. These changes coincided with perseverative and impulsive-like responding for sucrose pellets and sustained reinforcement learning of glucose-paired flavors. These mice were also characterized by significant glucose intolerance (ie, impaired glucose utilization). Systemic glucose administration significantly attenuated sucrose operant responding and D2R activation or blockade in the NAc bidirectionally modulated blood glucose levels and glucose tolerance. Collectively, these results implicate NAc D2R in regulating both peripheral glucose levels and glucose-dependent reinforcement learning behaviors and highlight the notion that glucose metabolic impairments arising from disrupted NAc D2R signaling are involved in compulsive and perseverative feeding behaviors.
Hassett, Thomas C; Hampton, Robert R
2017-05-01
Functionally distinct memory systems likely evolved in response to incompatible demands placed on learning by distinct environmental conditions. Working memory appears adapted, in part, for conditions that change frequently, making rapid acquisition and brief retention of information appropriate. In contrast, habits form gradually over many experiences, adapting organisms to contingencies of reinforcement that are stable over relatively long intervals. Serial reversal learning provides an opportunity to simultaneously examine the processes involved in adapting to rapidly changing and relatively stable contingencies. In serial reversal learning, selecting one of the two simultaneously presented stimuli is positively reinforced, while selection of the other is not. After a preference for the positive stimulus develops, the contingencies of reinforcement reverse. Naïve subjects adapt to such reversals gradually, perseverating in selection of the previously rewarded stimulus. Experts reverse rapidly according to a win-stay, lose-shift response pattern. We assessed whether a change in the relative control of choice by habit and working memory accounts for the development of serial reversal learning expertise. Across three experiments, we applied manipulations intended to attenuate the contribution of working memory but leave the contribution of habit intact. We contrasted performance following long and short intervals in Experiments 1 and 2, and we interposed a competing cognitive load between trials in Experiment 3. These manipulations slowed the acquisition of reversals in expert subjects, but not naïve subjects, indicating that serial reversal learning expertise is facilitated by a shift in the control of choice from passively acquired habit to actively maintained working memory.
Opponent appetitive-aversive neural processes underlie predictive learning of pain relief.
Seymour, Ben; O'Doherty, John P; Koltzenburg, Martin; Wiech, Katja; Frackowiak, Richard; Friston, Karl; Dolan, Raymond
2005-09-01
Termination of a painful or unpleasant event can be rewarding. However, whether the brain treats relief in a similar way as it treats natural reward is unclear, and the neural processes that underlie its representation as a motivational goal remain poorly understood. We used fMRI (functional magnetic resonance imaging) to investigate how humans learn to generate expectations of pain relief. Using a pavlovian conditioning procedure, we show that subjects experiencing prolonged experimentally induced pain can be conditioned to predict pain relief. This proceeds in a manner consistent with contemporary reward-learning theory (average reward/loss reinforcement learning), reflected by neural activity in the amygdala and midbrain. Furthermore, these reward-like learning signals are mirrored by opposite aversion-like signals in lateral orbitofrontal cortex and anterior cingulate cortex. This dual coding has parallels to 'opponent process' theories in psychology and promotes a formal account of prediction and expectation during pain.
Zhu, Feng; Aziz, H. M. Abdul; Qian, Xinwu; ...
2015-01-31
Our study develops a novel reinforcement learning algorithm for the challenging coordinated signal control problem. Traffic signals are modeled as intelligent agents interacting with the stochastic traffic environment. The model is built on the framework of coordinated reinforcement learning. The Junction Tree Algorithm (JTA) based reinforcement learning is proposed to obtain an exact inference of the best joint actions for all the coordinated intersections. Moreover, the algorithm is implemented and tested with a network containing 18 signalized intersections in VISSIM. Finally, our results show that the JTA based algorithm outperforms independent learning (Q-learning), real-time adaptive learning, and fixed timing plansmore » in terms of average delay, number of stops, and vehicular emissions at the network level.« less
Lewis, F L; Vamvoudakis, Kyriakos G
2011-02-01
Approximate dynamic programming (ADP) is a class of reinforcement learning methods that have shown their importance in a variety of applications, including feedback control of dynamical systems. ADP generally requires full information about the system internal states, which is usually not available in practical situations. In this paper, we show how to implement ADP methods using only measured input/output data from the system. Linear dynamical systems with deterministic behavior are considered herein, which are systems of great interest in the control system community. In control system theory, these types of methods are referred to as output feedback (OPFB). The stochastic equivalent of the systems dealt with in this paper is a class of partially observable Markov decision processes. We develop both policy iteration and value iteration algorithms that converge to an optimal controller that requires only OPFB. It is shown that, similar to Q -learning, the new methods have the important advantage that knowledge of the system dynamics is not needed for the implementation of these learning algorithms or for the OPFB control. Only the order of the system, as well as an upper bound on its "observability index," must be known. The learned OPFB controller is in the form of a polynomial autoregressive moving-average controller that has equivalent performance with the optimal state variable feedback gain.
A learning theory account of depression.
Ramnerö, Jonas; Folke, Fredrik; Kanter, Jonathan W
2015-06-11
Learning theory provides a foundation for understanding and deriving treatment principles for impacting a spectrum of functional processes relevant to the construct of depression. While behavioral interventions have been commonplace in the cognitive behavioral tradition, most often conceptualized within a cognitive theoretical framework, recent years have seen renewed interest in more purely behavioral models. These modern learning theory accounts of depression focus on the interchange between behavior and the environment, mainly in terms of lack of reinforcement, extinction of instrumental behavior, and excesses of aversive control, and include a conceptualization of relevant cognitive and emotional variables. These positions, drawn from extensive basic and applied research, cohere with biological theories on reduced reward learning and reward responsiveness and views of depression as a heterogeneous, complex set of disorders. Treatment techniques based on learning theory, often labeled Behavioral Activation (BA) focus on activating the individual in directions that increase contact with potential reinforcers, as defined ideographically with the client. BA is considered an empirically well-established treatment that generalizes well across diverse contexts and populations. The learning theory account is discussed in terms of being a parsimonious model and ground for treatments highly suitable for large scale dissemination. © 2015 Scandinavian Psychological Associations and John Wiley & Sons Ltd.
The Effects of Partial Reinforcement in the Acquisition and Extinction of Recurrent Serial Patterns.
ERIC Educational Resources Information Center
Dockstader, Steven L.
The purpose of these 2 experiments was to determine whether sequential response pattern behavior is affected by partial reinforcement in the same way as other behavior systems. The first experiment investigated the partial reinforcement extinction effects (PREE) in a sequential concept learning task where subjects were required to learn a…
Automatic spin-chain learning to explore the quantum speed limit
NASA Astrophysics Data System (ADS)
Zhang, Xiao-Ming; Cui, Zi-Wei; Wang, Xin; Yung, Man-Hong
2018-05-01
One of the ambitious goals of artificial intelligence is to build a machine that outperforms human intelligence, even if limited knowledge and data are provided. Reinforcement learning (RL) provides one such possibility to reach this goal. In this work, we consider a specific task from quantum physics, i.e., quantum state transfer in a one-dimensional spin chain. The mission for the machine is to find transfer schemes with the fastest speeds while maintaining high transfer fidelities. The first scenario we consider is when the Hamiltonian is time independent. We update the coupling strength by minimizing a loss function dependent on both the fidelity and the speed. Compared with a scheme proven to be at the quantum speed limit for the perfect state transfer, the scheme provided by RL is faster while maintaining the infidelity below 5 ×10-4 . In the second scenario where a time-dependent external field is introduced, we convert the state transfer process into a Markov decision process that can be understood by the machine. We solve it with the deep Q-learning algorithm. After training, the machine successfully finds transfer schemes with high fidelities and speeds, which are faster than previously known ones. These results show that reinforcement learning can be a powerful tool for quantum control problems.
Learning to use working memory: a reinforcement learning gating model of rule acquisition in rats
Lloyd, Kevin; Becker, Nadine; Jones, Matthew W.; Bogacz, Rafal
2012-01-01
Learning to form appropriate, task-relevant working memory representations is a complex process central to cognition. Gating models frame working memory as a collection of past observations and use reinforcement learning (RL) to solve the problem of when to update these observations. Investigation of how gating models relate to brain and behavior remains, however, at an early stage. The current study sought to explore the ability of simple RL gating models to replicate rule learning behavior in rats. Rats were trained in a maze-based spatial learning task that required animals to make trial-by-trial choices contingent upon their previous experience. Using an abstract version of this task, we tested the ability of two gating algorithms, one based on the Actor-Critic and the other on the State-Action-Reward-State-Action (SARSA) algorithm, to generate behavior consistent with the rats'. Both models produced rule-acquisition behavior consistent with the experimental data, though only the SARSA gating model mirrored faster learning following rule reversal. We also found that both gating models learned multiple strategies in solving the initial task, a property which highlights the multi-agent nature of such models and which is of importance in considering the neural basis of individual differences in behavior. PMID:23115551
Microstimulation of the Human Substantia Nigra Alters Reinforcement Learning
Ramayya, Ashwin G.; Misra, Amrit
2014-01-01
Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action–reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action–reward associations rather than stimulus–reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action–reward associations during reinforcement learning. PMID:24828643
1987-09-01
Luthans (28) expanded the concept of learning as follows: 1. Learning involves a change, though not necessarily an improvement, in behaviour. Learning...that results in an unpleasant outcome is not likely to be repeated (36:244). Luthans and Kreitner (27) described the various forms of reinforcement as...four 33 alternatives (defined previously on page 24 and taken from Luthans ) of positive reinforcement, negative reinforcement, extinction and punishment
Operant conditioning of synaptic and spiking activity patterns in single hippocampal neurons.
Ishikawa, Daisuke; Matsumoto, Nobuyoshi; Sakaguchi, Tetsuya; Matsuki, Norio; Ikegaya, Yuji
2014-04-02
Learning is a process of plastic adaptation through which a neural circuit generates a more preferable outcome; however, at a microscopic level, little is known about how synaptic activity is patterned into a desired configuration. Here, we report that animals can generate a specific form of synaptic activity in a given neuron in the hippocampus. In awake, head-restricted mice, we applied electrical stimulation to the lateral hypothalamus, a reward-associated brain region, when whole-cell patch-clamped CA1 neurons exhibited spontaneous synaptic activity that met preset criteria. Within 15 min, the mice learned to generate frequently the excitatory synaptic input pattern that satisfied the criteria. This reinforcement learning of synaptic activity was not observed for inhibitory input patterns. When a burst unit activity pattern was conditioned in paired and nonpaired paradigms, the frequency of burst-spiking events increased and decreased, respectively. The burst reinforcement occurred in the conditioned neuron but not in other adjacent neurons; however, ripple field oscillations were concomitantly reinforced. Neural conditioning depended on activation of NMDA receptors and dopamine D1 receptors. Acutely stressed mice and depression model mice that were subjected to forced swimming failed to exhibit the neural conditioning. This learning deficit was rescued by repetitive treatment with fluoxetine, an antidepressant. Therefore, internally motivated animals are capable of routing an ongoing action potential series into a specific neural pathway of the hippocampal network.
Segers, Elien; Beckers, Tom; Geurts, Hilde; Claes, Laurence; Danckaerts, Marina; van der Oord, Saskia
2018-01-01
Introduction: Behavioral Parent Training (BPT) is often provided for childhood psychiatric disorders. These disorders have been shown to be associated with working memory impairments. BPT is based on operant learning principles, yet how operant principles shape behavior (through the partial reinforcement (PRF) extinction effect, i.e., greater resistance to extinction that is created when behavior is reinforced partially rather than continuously) and the potential role of working memory therein is scarcely studied in children. This study explored the PRF extinction effect and the role of working memory therein using experimental tasks in typically developing children. Methods: Ninety-seven children (age 6–10) completed a working memory task and an operant learning task, in which children acquired a response-sequence rule under either continuous or PRF (120 trials), followed by an extinction phase (80 trials). Data of 88 children were used for analysis. Results: The PRF extinction effect was confirmed: We observed slower acquisition and extinction in the PRF condition as compared to the continuous reinforcement (CRF) condition. Working memory was negatively related to acquisition but not extinction performance. Conclusion: Both reinforcement contingencies and working memory relate to acquisition performance. Potential implications for BPT are that decreasing working memory load may enhance the chance of optimally learning through reinforcement. PMID:29643822
Credit assignment in movement-dependent reinforcement learning
Boggess, Matthew J.; Crossley, Matthew J.; Parvin, Darius; Ivry, Richard B.; Taylor, Jordan A.
2016-01-01
When a person fails to obtain an expected reward from an object in the environment, they face a credit assignment problem: Did the absence of reward reflect an extrinsic property of the environment or an intrinsic error in motor execution? To explore this problem, we modified a popular decision-making task used in studies of reinforcement learning, the two-armed bandit task. We compared a version in which choices were indicated by key presses, the standard response in such tasks, to a version in which the choices were indicated by reaching movements, which affords execution failures. In the key press condition, participants exhibited a strong risk aversion bias; strikingly, this bias reversed in the reaching condition. This result can be explained by a reinforcement model wherein movement errors influence decision-making, either by gating reward prediction errors or by modifying an implicit representation of motor competence. Two further experiments support the gating hypothesis. First, we used a condition in which we provided visual cues indicative of movement errors but informed the participants that trial outcomes were independent of their actual movements. The main result was replicated, indicating that the gating process is independent of participants’ explicit sense of control. Second, individuals with cerebellar degeneration failed to modulate their behavior between the key press and reach conditions, providing converging evidence of an implicit influence of movement error signals on reinforcement learning. These results provide a mechanistically tractable solution to the credit assignment problem. PMID:27247404
Credit assignment in movement-dependent reinforcement learning.
McDougle, Samuel D; Boggess, Matthew J; Crossley, Matthew J; Parvin, Darius; Ivry, Richard B; Taylor, Jordan A
2016-06-14
When a person fails to obtain an expected reward from an object in the environment, they face a credit assignment problem: Did the absence of reward reflect an extrinsic property of the environment or an intrinsic error in motor execution? To explore this problem, we modified a popular decision-making task used in studies of reinforcement learning, the two-armed bandit task. We compared a version in which choices were indicated by key presses, the standard response in such tasks, to a version in which the choices were indicated by reaching movements, which affords execution failures. In the key press condition, participants exhibited a strong risk aversion bias; strikingly, this bias reversed in the reaching condition. This result can be explained by a reinforcement model wherein movement errors influence decision-making, either by gating reward prediction errors or by modifying an implicit representation of motor competence. Two further experiments support the gating hypothesis. First, we used a condition in which we provided visual cues indicative of movement errors but informed the participants that trial outcomes were independent of their actual movements. The main result was replicated, indicating that the gating process is independent of participants' explicit sense of control. Second, individuals with cerebellar degeneration failed to modulate their behavior between the key press and reach conditions, providing converging evidence of an implicit influence of movement error signals on reinforcement learning. These results provide a mechanistically tractable solution to the credit assignment problem.
The time course of explicit and implicit categorization.
Smith, J David; Zakrzewski, Alexandria C; Herberger, Eric R; Boomer, Joseph; Roeder, Jessica L; Ashby, F Gregory; Church, Barbara A
2015-10-01
Contemporary theory in cognitive neuroscience distinguishes, among the processes and utilities that serve categorization, explicit and implicit systems of category learning that learn, respectively, category rules by active hypothesis testing or adaptive behaviors by association and reinforcement. Little is known about the time course of categorization within these systems. Accordingly, the present experiments contrasted tasks that fostered explicit categorization (because they had a one-dimensional, rule-based solution) or implicit categorization (because they had a two-dimensional, information-integration solution). In Experiment 1, participants learned categories under unspeeded or speeded conditions. In Experiment 2, they applied previously trained category knowledge under unspeeded or speeded conditions. Speeded conditions selectively impaired implicit category learning and implicit mature categorization. These results illuminate the processing dynamics of explicit/implicit categorization.
Instructional control of reinforcement learning: A behavioral and neurocomputational investigation
Doll, Bradley B.; Jacobs, W. Jake; Sanfey, Alan G.; Frank, Michael J.
2011-01-01
Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is “overridden” at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract “Q-learning” and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a “confirmation bias” in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes. PMID:19595993
van den Akker, Karolien; Havermans, Remco C; Bouton, Mark E; Jansen, Anita
2014-10-01
Animals and humans can easily learn to associate an initially neutral cue with food intake through classical conditioning, but extinction of learned appetitive responses can be more difficult. Intermittent or partial reinforcement of food cues causes especially persistent behaviour in animals: after exposure to such learning schedules, the decline in responding that occurs during extinction is slow. After extinction, increases in responding with renewed reinforcement of food cues (reacquisition) might be less rapid after acquisition with partial reinforcement. In humans, it may be that the eating behaviour of some individuals resembles partial reinforcement schedules to a greater extent, possibly affecting dieting success by interacting with extinction and reacquisition. Furthermore, impulsivity has been associated with less successful dieting, and this association might be explained by impulsivity affecting the learning and extinction of appetitive responses. In the present two studies, the effects of different reinforcement schedules and impulsivity on the acquisition, extinction, and reacquisition of appetitive responses were investigated in a conditioning paradigm involving food rewards in healthy humans. Overall, the results indicate both partial reinforcement schedules and, possibly, impulsivity to be associated with worse extinction performance. A new model of dieting success is proposed: learning histories and, perhaps, certain personality traits (impulsivity) can interfere with the extinction and reacquisition of appetitive responses to food cues and they may be causally related to unsuccessful dieting. Copyright © 2014 Elsevier Ltd. All rights reserved.
Towards understanding and managing the learning process in mail sorting.
Berglund, M; Karltun, A
2012-01-01
This paper was based on case study research at the Swedish Mail Service Division and it addresses learning time to sort mail at new districts and means to support the learning process on an individual as well as organizational level. The study population consisted of 46 postmen and one team leader in the Swedish Mail Service Division. Data were collected through measurements of time for mail sorting, interviews and a focus group. The study showed that learning to sort mail was a much more complex process and took more time than expected by management. Means to support the learning process included clarification of the relationship between sorting and the topology of the district, a good work environment, increased support from colleagues and management, and a thorough introduction for new postmen. The identified means to support the learning process require an integration of human, technological and organizational aspects. The study further showed that increased operations flexibility cannot be reinforced without a systems perspective and thorough knowledge about real work activities and that ergonomists can aid businesses to acquire this knowledge.
Phillips, Benjamin U; Dewan, Sigma; Nilsson, Simon R O; Robbins, Trevor W; Heath, Christopher J; Saksida, Lisa M; Bussey, Timothy J; Alsiö, Johan
2018-04-22
Dysregulation of the serotonin (5-HT) system is a pathophysiological component in major depressive disorder (MDD), a condition closely associated with abnormal emotional responsivity to positive and negative feedback. However, the precise mechanism through which 5-HT tone biases feedback responsivity remains unclear. 5-HT2C receptors (5-HT2CRs) are closely linked with aspects of depressive symptomatology, including abnormalities in reinforcement processes and response to stress. Thus, we aimed to determine the impact of 5-HT2CR function on response to feedback in biased reinforcement learning. We used two touchscreen assays designed to assess the impact of positive and negative feedback on probabilistic reinforcement in mice, including a novel valence-probe visual discrimination (VPVD) and a probabilistic reversal learning procedure (PRL). Systemic administration of a 5-HT2CR agonist and antagonist resulted in selective changes in the balance of feedback sensitivity bias on these tasks. Specifically, on VPVD, SB 242084, the 5-HT2CR antagonist, impaired acquisition of a discrimination dependent on appropriate integration of positive and negative feedback. On PRL, SB 242084 at 1 mg/kg resulted in changes in behaviour consistent with reduced sensitivity to positive feedback. In contrast, WAY 163909, the 5-HT2CR agonist, resulted in changes associated with increased sensitivity to positive feedback and decreased sensitivity to negative feedback. These results suggest that 5-HT2CRs tightly regulate feedback sensitivity bias in mice with consequent effects on learning and cognitive flexibility and specify a framework for the influence of 5-HT2CRs on sensitivity to reinforcement.
The neuroscience of learning: beyond the Hebbian synapse.
Gallistel, C R; Matzel, Louis D
2013-01-01
From the traditional perspective of associative learning theory, the hypothesis linking modifications of synaptic transmission to learning and memory is plausible. It is less so from an information-processing perspective, in which learning is mediated by computations that make implicit commitments to physical and mathematical principles governing the domains where domain-specific cognitive mechanisms operate. We compare the properties of associative learning and memory to the properties of long-term potentiation, concluding that the properties of the latter do not explain the fundamental properties of the former. We briefly review the neuroscience of reinforcement learning, emphasizing the representational implications of the neuroscientific findings. We then review more extensively findings that confirm the existence of complex computations in three information-processing domains: probabilistic inference, the representation of uncertainty, and the representation of space. We argue for a change in the conceptual framework within which neuroscientists approach the study of learning mechanisms in the brain.
Organizational Culture in a Mexican School: Lessons for Reform.
ERIC Educational Resources Information Center
Davila, Anabella; Willower, donald J.
1996-01-01
Discusses a study of a Mexican Roman Catholic high school's organizational culture, highlighting findings concerning school reform and improvement processes. The school stressed community and featured activities and values promoting student commitment to learning. Religion reinforced the school's "hidden curriculum" of good conduct and…
Research on Teaching in Physical Education: Questions and Comments.
ERIC Educational Resources Information Center
Lee, Amelia M.
1991-01-01
Reinforces some of the points made in Stephen Silverman's research review on teaching in physical education, examining the process-product paradigm, the measurement of learning and teaching, and the significance of student mediation. The article identifies issues that merit further discussion and analysis. (SM)
Kheifets, Aaron; Gallistel, C R
2012-05-29
Animals successfully navigate the world despite having only incomplete information about behaviorally important contingencies. It is an open question to what degree this behavior is driven by estimates of stochastic parameters (brain-constructed models of the experienced world) and to what degree it is directed by reinforcement-driven processes that optimize behavior in the limit without estimating stochastic parameters (model-free adaptation processes, such as associative learning). We find that mice adjust their behavior in response to a change in probability more quickly and abruptly than can be explained by differential reinforcement. Our results imply that mice represent probabilities and perform calculations over them to optimize their behavior, even when the optimization produces negligible material gain.
Kheifets, Aaron; Gallistel, C. R.
2012-01-01
Animals successfully navigate the world despite having only incomplete information about behaviorally important contingencies. It is an open question to what degree this behavior is driven by estimates of stochastic parameters (brain-constructed models of the experienced world) and to what degree it is directed by reinforcement-driven processes that optimize behavior in the limit without estimating stochastic parameters (model-free adaptation processes, such as associative learning). We find that mice adjust their behavior in response to a change in probability more quickly and abruptly than can be explained by differential reinforcement. Our results imply that mice represent probabilities and perform calculations over them to optimize their behavior, even when the optimization produces negligible material gain. PMID:22592792
Regulating recognition decisions through incremental reinforcement learning.
Han, Sanghoon; Dobbins, Ian G
2009-06-01
Does incremental reinforcement learning influence recognition memory judgments? We examined this question by subtly altering the relative validity or availability of feedback in order to differentially reinforce old or new recognition judgments. Experiment 1 probabilistically and incorrectly indicated that either misses or false alarms were correct in the context of feedback that was otherwise accurate. Experiment 2 selectively withheld feedback for either misses or false alarms in the context of feedback that was otherwise present. Both manipulations caused prominent shifts of recognition memory decision criteria that remained for considerable periods even after feedback had been altogether removed. Overall, these data demonstrate that incremental reinforcement-learning mechanisms influence the degree of caution subjects exercise when evaluating explicit memories.
Multiple sites of extinction for a single learned response
Mauk, Michael D.
2012-01-01
Most learned responses can be diminished by extinction, a process that can be engaged when a conditioned stimulus (CS) is presented but not reinforced. We present evidence that plasticity in at least two brain regions can mediate extinction of responses produced by trace eyelid conditioning, where the CS and the reinforcing stimulus are separated by a stimulus-free interval. We observed individual differences in the effects of blocking extinction mechanisms in the cerebellum, the structure that, along with several forebrain structures, mediates acquisition of trace eyelid responses; in some rabbits extinction was prevented, whereas in others it was largely unaffected. We also show that cerebellar mechanisms can mediate extinction when noncerebellar mechanisms are bypassed. Together, these observations indicate that trace eyelid responses can be extinguished via processes operating at more than one site, one in the cerebellum and one upstream in forebrain. The relative contributions of these sites may vary from animal to animal and situation to situation. PMID:21940608
Infant Contingency Learning in Different Cultural Contexts
ERIC Educational Resources Information Center
Graf, Frauke; Lamm, Bettina; Goertz, Claudia; Kolling, Thorsten; Freitag, Claudia; Spangler, Sibylle; Fassbender, Ina; Teubert, Manuel; Vierhaus, Marc; Keller, Heidi; Lohaus, Arnold; Schwarzer, Gudrun; Knopf, Monika
2012-01-01
Three-month-old Cameroonian Nso farmer and German middle-class infants were compared regarding learning and retention in a computerized mobile task. Infants achieving a preset learning criterion during reinforcement were tested for immediate and long-term retention measured in terms of an increased response rate after reinforcement and after a…
A Robust Cooperated Control Method with Reinforcement Learning and Adaptive H∞ Control
NASA Astrophysics Data System (ADS)
Obayashi, Masanao; Uchiyama, Shogo; Kuremoto, Takashi; Kobayashi, Kunikazu
This study proposes a robust cooperated control method combining reinforcement learning with robust control to control the system. A remarkable characteristic of the reinforcement learning is that it doesn't require model formula, however, it doesn't guarantee the stability of the system. On the other hand, robust control system guarantees stability and robustness, however, it requires model formula. We employ both the actor-critic method which is a kind of reinforcement learning with minimal amount of computation to control continuous valued actions and the traditional robust control, that is, H∞ control. The proposed system was compared method with the conventional control method, that is, the actor-critic only used, through the computer simulation of controlling the angle and the position of a crane system, and the simulation result showed the effectiveness of the proposed method.
Unified-theory-of-reinforcement neural networks do not simulate the blocking effect.
Calvin, Nicholas T; J McDowell, J
2015-11-01
For the last 20 years the unified theory of reinforcement (Donahoe et al., 1993) has been used to develop computer simulations to evaluate its plausibility as an account for behavior. The unified theory of reinforcement states that operant and respondent learning occurs via the same neural mechanisms. As part of a larger project to evaluate the operant behavior predicted by the theory, this project was the first replication of neural network models based on the unified theory of reinforcement. In the process of replicating these neural network models it became apparent that a previously published finding, namely, that the networks simulate the blocking phenomenon (Donahoe et al., 1993), was a misinterpretation of the data. We show that the apparent blocking produced by these networks is an artifact of the inability of these networks to generate the same conditioned response to multiple stimuli. The piecemeal approach to evaluate the unified theory of reinforcement via simulation is critiqued and alternatives are discussed. Copyright © 2015 Elsevier B.V. All rights reserved.
Punishment insensitivity and impaired reinforcement learning in preschoolers.
Briggs-Gowan, Margaret J; Nichols, Sara R; Voss, Joel; Zobel, Elvira; Carter, Alice S; McCarthy, Kimberly J; Pine, Daniel S; Blair, James; Wakschlag, Lauren S
2014-01-01
Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a developmental vulnerability to psychopathic traits. One hundred and fifty-seven preschoolers (mean age 4.7 ± 0.8 years) participated in a substudy that was embedded within a larger project. Children completed the 'Stars-in-Jars' task, which involved learning to select rewarded jars and avoid punished jars. Maternal report of responsiveness to socialization was assessed with the Punishment Insensitivity and Low Concern for Others scales of the Multidimensional Assessment of Preschool Disruptive Behavior (MAP-DB). Punishment Insensitivity, but not Low Concern for Others, was significantly associated with reinforcement learning in multivariate models that accounted for age and sex. Specifically, higher Punishment Insensitivity was associated with significantly lower overall performance and more errors on punished trials ('passive avoidance'). Impairments in reinforcement learning manifest in preschoolers who are high in maternal ratings of Punishment Insensitivity. If replicated, these findings may help to pinpoint the neurodevelopmental antecedents of psychopathic tendencies and suggest novel intervention targets beginning in early childhood. © 2013 The Authors. Journal of Child Psychology and Psychiatry © 2013 Association for Child and Adolescent Mental Health.
Schulz, Daniela; Henn, Fritz A; Petri, David; Huston, Joseph P
2016-08-04
Principles of negative reinforcement learning may play a critical role in the etiology and treatment of depression. We examined the integrity of positive reinforcement learning in congenitally helpless (cH) rats, an animal model of depression, using a random ratio schedule and a devaluation-extinction procedure. Furthermore, we tested whether an antidepressant dose of the monoamine oxidase (MAO)-B inhibitor deprenyl would reverse any deficits in positive reinforcement learning. We found that cH rats (n=9) were impaired in the acquisition of even simple operant contingencies, such as a fixed interval (FI) 20 schedule. cH rats exhibited no apparent deficits in appetite or reward sensitivity. They reacted to the devaluation of food in a manner consistent with a dose-response relationship. Reinforcer motivation as assessed by lever pressing across sessions with progressively decreasing reward probabilities was highest in congenitally non-helpless (cNH, n=10) rats as long as the reward probabilities remained relatively high. cNH compared to wild-type (n=10) rats were also more resistant to extinction across sessions. Compared to saline (n=5), deprenyl (n=5) reduced the duration of immobility of cH rats in the forced swimming test, indicative of antidepressant effects, but did not restore any deficits in the acquisition of a FI 20 schedule. We conclude that positive reinforcement learning was impaired in rats bred for helplessness, possibly due to motivational impairments but not deficits in reward sensitivity, and that deprenyl exerted antidepressant effects but did not reverse the deficits in positive reinforcement learning. Copyright © 2016 IBRO. Published by Elsevier Ltd. All rights reserved.
Artificial neural networks and approximate reasoning for intelligent control in space
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1991-01-01
A method is introduced for learning to refine the control rules of approximate reasoning-based controllers. A reinforcement-learning technique is used in conjunction with a multi-layer neural network model of an approximate reasoning-based controller. The model learns by updating its prediction of the physical system's behavior. The model can use the control knowledge of an experienced operator and fine-tune it through the process of learning. Some of the space domains suitable for applications of the model such as rendezvous and docking, camera tracking, and tethered systems control are discussed.
Canopoli, Alessandro; Herbst, Joshua A; Hahnloser, Richard H R
2014-05-14
Many animals exhibit flexible behaviors that they can adjust to increase reward or avoid harm (learning by positive or aversive reinforcement). But what neural mechanisms allow them to restore their original behavior (motor program) after reinforcement is withdrawn? One possibility is that motor restoration relies on brain areas that have a role in memorization but no role in either motor production or in sensory processing relevant for expressing the behavior and its refinement. We investigated the role of a higher auditory brain area in the songbird for modifying and restoring the stereotyped adult song. We exposed zebra finches to aversively reinforcing white noise stimuli contingent on the pitch of one of their stereotyped song syllables. In response, birds significantly changed the pitch of that syllable to avoid the aversive reinforcer. After we withdrew reinforcement, birds recovered their original song within a few days. However, we found that large bilateral lesions in the caudal medial nidopallium (NCM, a high auditory area) impaired recovery of the original pitch even several weeks after withdrawal of the reinforcing stimuli. Because NCM lesions spared both successful noise-avoidance behavior and birds' auditory discrimination ability, our results show that NCM is not needed for directed motor changes or for auditory discriminative processing, but is implied in memorizing or recalling the memory of the recent song target. Copyright © 2014 the authors 0270-6474/14/347018-09$15.00/0.
Microstimulation of the human substantia nigra alters reinforcement learning.
Ramayya, Ashwin G; Misra, Amrit; Baltuch, Gordon H; Kahana, Michael J
2014-05-14
Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action-reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action-reward associations rather than stimulus-reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action-reward associations during reinforcement learning. Copyright © 2014 the authors 0270-6474/14/346887-09$15.00/0.
Batch Mode Reinforcement Learning based on the Synthesis of Artificial Trajectories
Fonteneau, Raphael; Murphy, Susan A.; Wehenkel, Louis; Ernst, Damien
2013-01-01
In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value function. As an alternative to the use of function approximators, we rely on the synthesis of “artificial trajectories” from the given sample of trajectories, and show that this idea opens new avenues for designing and analyzing algorithms for batch mode reinforcement learning. PMID:24049244
Gershman, Samuel J.; Pesaran, Bijan; Daw, Nathaniel D.
2009-01-01
Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable, due to the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning – such as prediction error signals for action valuation associated with dopamine and the striatum – can cope with this “curse of dimensionality.” We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and BOLD activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to “divide and conquer” reinforcement learning over high-dimensional action spaces. PMID:19864565
Gershman, Samuel J; Pesaran, Bijan; Daw, Nathaniel D
2009-10-28
Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.
Extinction reveals that primary sensory cortex predicts reinforcement outcome
Bieszczad, Kasia M.; Weinberger, Norman M.
2011-01-01
Primary sensory cortices are traditionally regarded as stimulus analyzers. However, studies of associative learning-induced plasticity in the primary auditory cortex (A1) indicate involvement in learning, memory and other cognitive processes. For example, the area of representation of a tone becomes larger for stronger auditory memories and the magnitude of area gain is proportional to the degree that a tone becomes behaviorally important. Here, we used extinction to investigate whether “behavioral importance” specifically reflects a sound’s ability to predict reinforcement (reward or punishment) vs. to predict any significant change in the meaning of a sound. If the former, then extinction should reverse area gains as the signal no longer predicts reinforcement. Rats (n = 11) were trained to bar-press to a signal tone (5.0 kHz) for water-rewards, to induce signal-specific area gains in A1. After subsequent withdrawal of reward, A1 was mapped to determine representational areas. Signal-specific area gains — estimated from a previously established brain–behavior quantitative function — were reversed, supporting the “reinforcement prediction” hypothesis. Area loss was specific to the signal tone vs. test tones, further indicating that withdrawal of reinforcement, rather than unreinforced tone presentation per se, was responsible for area loss. Importantly, the amount of area loss was correlated with the amount of extinction (r = 0.82, p < 0.01). These findings show that primary sensory cortical representation can encode behavioral importance as a signal’s value to predict reinforcement, and that the number of cells tuned to a stimulus can dictate its ability to command behavior. PMID:22304434
Combining Offline and Online Computation for Solving Partially Observable Markov Decision Process
2015-03-06
David Hsu and Wee Sun Lee, Monte Carlo Bayesian Reinforcement Learning, International Conference on Machine Learning (ICML), 2012. • Haoyu Bai, David...and Automation (ICRA), 2015. • Zhan Wei Lim, David Hsu, and Wee Sun Lee, Adaptive Informative Path Planning in Metric Spaces. Submitted to Int. J... Automation (ICRA), 2015. 2. Bai, H., Hsu, D., Kochenderfer, M. J., and Lee, W. S., Unmanned aircraft collision avoidance using continuous state POMDPs
What learning theories can teach us in designing neurofeedback treatments
Strehl, Ute
2014-01-01
Popular definitions of neurofeedback point out that neurofeedback is a process of operant conditioning which leads to self-regulation of brain activity. Self-regulation of brain activity is considered to be a skill. The aim of this paper is to clarify that not only operant conditioning plays a role in the acquisition of this skill. In order to design the learning process additional references have to be derived from classical conditioning, two-process-theory and in particular from skill learning and research into motivational aspects. The impact of learning by trial and error, cueing of behavior, feedback, reinforcement, and knowledge of results as well as transfer of self-regulation skills into everyday life will be analyzed in this paper. In addition to these learning theory basics this paper tries to summarize the knowledge about acquisition of self-regulation from neurofeedback studies with a main emphasis on clinical populations. As a conclusion it is hypothesized that learning to self-regulate has to be offered in a psychotherapeutic, i.e., behavior therapy framework. PMID:25414659
Dual learning processes in interactive skill acquisition.
Fu, Wai-Tat; Anderson, John R
2008-06-01
Acquisition of interactive skills involves the use of internal and external cues. Experiment 1 showed that when actions were interdependent, learning was effective with and without external cues in the single-task condition but was effective only with the presence of external cues in the dual-task condition. In the dual-task condition, actions closer to the feedback were learned faster than actions farther away but this difference was reversed in the single-task condition. Experiment 2 tested how knowledge acquired in single and dual-task conditions would transfer to a new reward structure. Results confirmed the two forms of learning mediated by the secondary task: A declarative memory encoding process that simultaneously assigned credits to actions and a reinforcement-learning process that slowly propagated credits backward from the feedback. The results showed that both forms of learning were engaged during training, but only at the response selection stage, one form of knowledge may dominate over the other depending on the availability of attentional resources. (c) 2008 APA, all rights reserved
What learning theories can teach us in designing neurofeedback treatments.
Strehl, Ute
2014-01-01
Popular definitions of neurofeedback point out that neurofeedback is a process of operant conditioning which leads to self-regulation of brain activity. Self-regulation of brain activity is considered to be a skill. The aim of this paper is to clarify that not only operant conditioning plays a role in the acquisition of this skill. In order to design the learning process additional references have to be derived from classical conditioning, two-process-theory and in particular from skill learning and research into motivational aspects. The impact of learning by trial and error, cueing of behavior, feedback, reinforcement, and knowledge of results as well as transfer of self-regulation skills into everyday life will be analyzed in this paper. In addition to these learning theory basics this paper tries to summarize the knowledge about acquisition of self-regulation from neurofeedback studies with a main emphasis on clinical populations. As a conclusion it is hypothesized that learning to self-regulate has to be offered in a psychotherapeutic, i.e., behavior therapy framework.
Separation of time-based and trial-based accounts of the partial reinforcement extinction effect.
Bouton, Mark E; Woods, Amanda M; Todd, Travis P
2014-01-01
Two appetitive conditioning experiments with rats examined time-based and trial-based accounts of the partial reinforcement extinction effect (PREE). In the PREE, the loss of responding that occurs in extinction is slower when the conditioned stimulus (CS) has been paired with a reinforcer on some of its presentations (partially reinforced) instead of every presentation (continuously reinforced). According to a time-based or "time-accumulation" view (e.g., Gallistel and Gibbon, 2000), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger amount of time has accumulated in the CS over trials. In contrast, according to a trial-based view (e.g., Capaldi, 1967), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger number of CS presentations. Experiment 1 used a procedure that equated partially and continuously reinforced groups on their expected times to reinforcement during conditioning. A PREE was still observed. Experiment 2 then used an extinction procedure that allowed time in the CS and the number of trials to accumulate differentially through extinction. The PREE was still evident when responding was examined as a function of expected time units to the reinforcer, but was eliminated when responding was examined as a function of expected trial units to the reinforcer. There was no evidence that the animal responded according to the ratio of time accumulated during the CS in extinction over the time in the CS expected before the reinforcer. The results thus favor a trial-based account over a time-based account of extinction and the PREE. This article is part of a Special Issue entitled: Associative and Temporal Learning. Copyright © 2013 Elsevier B.V. All rights reserved.
Mabu, Shingo; Hirasawa, Kotaro; Hu, Jinglu
2007-01-01
This paper proposes a graph-based evolutionary algorithm called Genetic Network Programming (GNP). Our goal is to develop GNP, which can deal with dynamic environments efficiently and effectively, based on the distinguished expression ability of the graph (network) structure. The characteristics of GNP are as follows. 1) GNP programs are composed of a number of nodes which execute simple judgment/processing, and these nodes are connected by directed links to each other. 2) The graph structure enables GNP to re-use nodes, thus the structure can be very compact. 3) The node transition of GNP is executed according to its node connections without any terminal nodes, thus the past history of the node transition affects the current node to be used and this characteristic works as an implicit memory function. These structural characteristics are useful for dealing with dynamic environments. Furthermore, we propose an extended algorithm, "GNP with Reinforcement Learning (GNPRL)" which combines evolution and reinforcement learning in order to create effective graph structures and obtain better results in dynamic environments. In this paper, we applied GNP to the problem of determining agents' behavior to evaluate its effectiveness. Tileworld was used as the simulation environment. The results show some advantages for GNP over conventional methods.
Social reinforcement can regulate localized brain activity.
Mathiak, Krystyna A; Koush, Yury; Dyck, Miriam; Gaber, Tilman J; Alawi, Eliza; Zepf, Florian D; Zvyagintsev, Mikhail; Mathiak, Klaus
2010-11-01
Social learning is essential for adaptive behavior in humans. Neurofeedback based on functional magnetic resonance imaging (fMRI) trains control over localized brain activity. It can disentangle learning processes at the neural level and thus investigate the mechanisms of operant conditioning with explicit social reinforcers. In a pilot study, a computer-generated face provided a positive feedback (smiling) when activity in the anterior cingulate cortex (ACC) increased and gradually returned to a neutral expression when the activity dropped. One female volunteer without previous experience in fMRI underwent training based on a social reinforcer. Directly before and after the neurofeedback runs, neural responses to a cognitive interference task (Simon task) were recorded. We observed a significant increase in activity within ACC during the neurofeedback blocks, correspondent with the a-priori defined anatomical region of interest. In the course of the neurofeedback training, the subject learned to regulate ACC activity and could maintain the control even without direct feedback. Moreover, ACC was activated significantly stronger during Simon task after the neurofeedback training when compared to before. Localized brain activity can be controlled by social reward. The increased ACC activity transferred to a cognitive task with the potential to reduce cognitive interference. Systematic studies are required to explore long-term effects on social behavior and clinical applications.
Reinforcement learning in depression: A review of computational research.
Chen, Chong; Takahashi, Taiki; Nakagawa, Shin; Inoue, Takeshi; Kusumi, Ichiro
2015-08-01
Despite being considered primarily a mood disorder, major depressive disorder (MDD) is characterized by cognitive and decision making deficits. Recent research has employed computational models of reinforcement learning (RL) to address these deficits. The computational approach has the advantage in making explicit predictions about learning and behavior, specifying the process parameters of RL, differentiating between model-free and model-based RL, and the computational model-based functional magnetic resonance imaging and electroencephalography. With these merits there has been an emerging field of computational psychiatry and here we review specific studies that focused on MDD. Considerable evidence suggests that MDD is associated with impaired brain signals of reward prediction error and expected value ('wanting'), decreased reward sensitivity ('liking') and/or learning (be it model-free or model-based), etc., although the causality remains unclear. These parameters may serve as valuable intermediate phenotypes of MDD, linking general clinical symptoms to underlying molecular dysfunctions. We believe future computational research at clinical, systems, and cellular/molecular/genetic levels will propel us toward a better understanding of the disease. Copyright © 2015 Elsevier Ltd. All rights reserved.
DIY Activists: Communities of Practice, Cultural Dialogism, and Radical Knowledge Sharing
ERIC Educational Resources Information Center
Hemphill, David; Leskowitz, Shari
2013-01-01
This study explored innovative alternative processes of living, learning, and knowledge sharing of a loosely knit community of anarchist, anticapitalist "Do-It-Yourself" (DIY) activists. Generated through participant observation and interviews, findings reinforced adult education theories--that adults can diagnose their own learning…
Stimulus Processing and Associative Learning in Wistar and WKHA Rats
Chess, Amy C.; Keene, Christopher S.; Wyzik, Elizabeth C.; Bucci, David J.
2007-01-01
This study assessed basic learning and attention abilities in WKHA (Wistar-Kyoto Hyperactive) rats using appetitive conditioning preparations. Two measures of conditioned responding to a visual stimulus, orienting behavior (rearing on the hindlegs) and food cup behavior (placing the head inside the recessed food cup) were measured. In Experiment 1, simple conditioning but not extinction was impaired in WKHA rats compared to Wistar rats. In Experiment 2, non-reinforced presentations of the visual cue preceded the conditioning sessions. WKHA rats displayed less orienting behavior than Wistar rats, but comparable levels of food cup behavior. These data suggest that WKHA rats exhibit specific abnormalities in attentional processing as well as learning stimulus-reward relationships. PMID:15998198
Autonomous reinforcement learning with experience replay.
Wawrzyński, Paweł; Tanwani, Ajay Kumar
2013-05-01
This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. Copyright © 2012 Elsevier Ltd. All rights reserved.
Shephard, Elizabeth; Jackson, Georgina M; Groom, Madeleine J
2016-06-01
Altered reinforcement learning is implicated in the causes of Tourette syndrome (TS) and attention-deficit/hyperactivity disorder (ADHD). TS and ADHD frequently co-occur but how this affects reinforcement learning has not been investigated. We examined the ability of young people with TS (n=18), TS+ADHD (N=17), ADHD (n=13) and typically developing controls (n=20) to learn and reverse stimulus-response (S-R) associations based on positive and negative reinforcement feedback. We used a 2 (TS-yes, TS-no)×2 (ADHD-yes, ADHD-no) factorial design to assess the effects of TS, ADHD, and their interaction on behavioural (accuracy, RT) and event-related potential (stimulus-locked P3, feedback-locked P2, feedback-related negativity, FRN) indices of learning and reversing the S-R associations. TS was associated with intact learning and reversal performance and largely typical ERP amplitudes. ADHD was associated with lower accuracy during S-R learning and impaired reversal learning (significantly reduced accuracy and a trend for smaller P3 amplitude). The results indicate that co-occurring ADHD symptoms impair reversal learning in TS+ADHD. The implications of these findings for behavioural tic therapies are discussed. Copyright © 2016 ISDN. Published by Elsevier Ltd. All rights reserved.
Embedded Incremental Feature Selection for Reinforcement Learning
2012-05-01
Prior to this work, feature selection for reinforce- ment learning has focused on linear value function ap- proximation ( Kolter and Ng, 2009; Parr et al...InProceed- ings of the the 23rd International Conference on Ma- chine Learning, pages 449–456. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature
Social Learning, Reinforcement and Crime: Evidence from Three European Cities
ERIC Educational Resources Information Center
Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina
2012-01-01
This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…
Novelty and Inductive Generalization in Human Reinforcement Learning
Gershman, Samuel J.; Niv, Yael
2015-01-01
In reinforcement learning, a decision maker searching for the most rewarding option is often faced with the question: what is the value of an option that has never been tried before? One way to frame this question is as an inductive problem: how can I generalize my previous experience with one set of options to a novel option? We show how hierarchical Bayesian inference can be used to solve this problem, and describe an equivalence between the Bayesian model and temporal difference learning algorithms that have been proposed as models of reinforcement learning in humans and animals. According to our view, the search for the best option is guided by abstract knowledge about the relationships between different options in an environment, resulting in greater search efficiency compared to traditional reinforcement learning algorithms previously applied to human cognition. In two behavioral experiments, we test several predictions of our model, providing evidence that humans learn and exploit structured inductive knowledge to make predictions about novel options. In light of this model, we suggest a new interpretation of dopaminergic responses to novelty. PMID:25808176
Learning with incomplete information and the mathematical structure behind it.
Kühn, Reimer; Stamatescu, Ion-Olimpiu
2007-07-01
We investigate the problem of learning with incomplete information as exemplified by learning with delayed reinforcement. We study a two phase learning scenario in which a phase of Hebbian associative learning based on momentary internal representations is supplemented by an 'unlearning' phase depending on a graded reinforcement signal. The reinforcement signal quantifies the success-rate globally for a number of learning steps in phase one, and 'unlearning' is indiscriminate with respect to associations learnt in that phase. Learning according to this model is studied via simulations and analytically within a student-teacher scenario for both single layer networks and, for a committee machine. Success and speed of learning depend on the ratio lambda of the learning rates used for the associative Hebbian learning phase and for the unlearning-correction in response to the reinforcement signal, respectively. Asymptotically perfect generalization is possible only, if this ratio exceeds a critical value lambda( c ), in which case the generalization error exhibits a power law decay with the number of examples seen by the student, with an exponent that depends in a non-universal manner on the parameter lambda. We find these features to be robust against a wide spectrum of modifications of microscopic modelling details. Two illustrative applications-one of a robot learning to navigate a field containing obstacles, and the problem of identifying a specific component in a collection of stimuli-are also provided.
Behavioral research in pigeons with ARENA: An automated remote environmental navigation apparatus
Leising, Kenneth J.; Garlick, Dennis; Parenteau, Michael; Blaisdell, Aaron P.
2009-01-01
Three experiments established the effectiveness of an Automated Remote Environmental Navigation Apparatus (ARENA) developed in our lab to study behavioral processes in pigeons. The technology utilizes one or more wireless modules, each capable of presenting colored lights as visual stimuli to signal reward and of detecting subject peck responses. In Experiment 1, subjects were instrumentally shaped to peck at a single ARENA module following an unsuccessful autoshaping procedure. In Experiment 2, pigeons were trained with a simultaneous discrimination procedure during which two modules were illuminated different colors; pecks to one color (S+) were reinforced while pecks to the other color (S−) were not. Pigeons learned to preferentially peck the module displaying the S+. In Experiment 3, two modules were lit the same color concurrently from a set of six colors in a conditional discrimination task. For three of the colors pecks to the module in one location (e.g., upper quadrant) were reinforced while for the remaining colors pecks at the other module (e.g., lower quadrant) were reinforced. After learning this discrimination, the color-reinforced location assignments were reversed. Pigeons successfully acquired the reversal. ARENA is an automated system for open-field studies and a more ecologically valid alternative to the touchscreen. PMID:19429204
A Novel Task for the Investigation of Action Acquisition
Stafford, Tom; Thirkettle, Martin; Walton, Tom; Vautrelle, Nicolas; Hetherington, Len; Port, Michael; Gurney, Kevin; Redgrave, Pete
2012-01-01
We present a behavioural task designed for the investigation of how novel instrumental actions are discovered and learnt. The task consists of free movement with a manipulandum, during which the full range of possible movements can be explored by the participant and recorded. A subset of these movements, the ‘target’, is set to trigger a reinforcing signal. The task is to discover what movements of the manipulandum evoke the reinforcement signal. Targets can be defined in spatial, temporal, or kinematic terms, can be a combination of these aspects, or can represent the concatenation of actions into a larger gesture. The task allows the study of how the specific elements of behaviour which cause the reinforcing signal are identified, refined and stored by the participant. The task provides a paradigm where the exploratory motive drives learning and as such we view it as in the tradition of Thorndike [1]. Most importantly it allows for repeated measures, since when a novel action is acquired the criterion for triggering reinforcement can be changed requiring a new action to be discovered. Here, we present data using both humans and rats as subjects, showing that our task is easily scalable in difficulty, adaptable across species, and produces a rich set of behavioural measures offering new and valuable insight into the action learning process. PMID:22675490
Cardinal, R. N.; Rygula, R.; Hong, Y. T.; Fryer, T. D.; Sawiak, S. J.; Ferrari, V.; Cockcroft, G.; Aigbirhio, F. I.; Robbins, T. W.; Roberts, A. C.
2014-01-01
Schizophrenia is associated with upregulation of dopamine (DA) release in the caudate nucleus. The caudate has dense connections with the orbitofrontal cortex (OFC) via the frontostriatal loops, and both areas exhibit pathophysiological change in schizophrenia. Despite evidence that abnormalities in dopaminergic neurotransmission and prefrontal cortex function co-occur in schizophrenia, the influence of OFC DA on caudate DA and reinforcement processing is poorly understood. To test the hypothesis that OFC dopaminergic dysfunction disrupts caudate dopamine function, we selectively depleted dopamine from the OFC of marmoset monkeys and measured striatal extracellular dopamine levels (using microdialysis) and dopamine D2/D3 receptor binding (using positron emission tomography), while modeling reinforcement-related behavior in a discrimination learning paradigm. OFC dopamine depletion caused an increase in tonic dopamine levels in the caudate nucleus and a corresponding reduction in D2/D3 receptor binding. Computational modeling of behavior showed that the lesion increased response exploration, reducing the tendency to persist with a recently chosen response side. This effect is akin to increased response switching previously seen in schizophrenia and was correlated with striatal but not OFC D2/D3 receptor binding. These results demonstrate that OFC dopamine depletion is sufficient to induce striatal hyperdopaminergia and changes in reinforcement learning relevant to schizophrenia. PMID:24872570
Social Learning Theory: its application in the context of nurse education.
Bahn, D
2001-02-01
Cognitive theories are fundamental to enable problem solving and the ability to understand and apply principles in a variety of situations. This article looks at Social Learning Theory, critically analysing its principles, which are based on observational learning and modelling, and considering its value and application in the context of nurse education. It also considers the component processes that will determine the outcome of observed behaviour, other than reinforcement, as identified by Bandura, namely: attention, retention, motor reproduction, and motivation. Copyright 2001 Harcourt Publishers Ltd.
2014-09-29
Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity , and Performance W. Bradley Knox...positive a trainer’s reward values are; temporal discounting, the extent to which future reward is discounted in value; episodicity , whether task...learning occurs in discrete learning episodes instead of one continuing session; and task performance, the agent’s performance on the task the trainer
De la Fuente, Jesus; Zapata, Lucía; Martínez-Vicente, Jose M.; Sander, Paul; Cardelle-Elawar, María
2014-01-01
The present investigation examines how personal self-regulation (presage variable) and regulatory teaching (process variable of teaching) relate to learning approaches, strategies for coping with stress, and self-regulated learning (process variables of learning) and, finally, how they relate to performance and satisfaction with the learning process (product variables). The objective was to clarify the associative and predictive relations between these variables, as contextualized in two different models that use the presage-process-product paradigm (the Biggs and DEDEPRO models). A total of 1101 university students participated in the study. The design was cross-sectional and retrospective with attributional (or selection) variables, using correlations and structural analysis. The results provide consistent and significant empirical evidence for the relationships hypothesized, incorporating variables that are part of and influence the teaching–learning process in Higher Education. Findings confirm the importance of interactive relationships within the teaching–learning process, where personal self-regulation is assumed to take place in connection with regulatory teaching. Variables that are involved in the relationships validated here reinforce the idea that both personal factors and teaching and learning factors should be taken into consideration when dealing with a formal teaching–learning context at university. PMID:25964764
Weidemann, Gabrielle; Tangen, Jason M; Lovibond, Peter F; Mitchell, Christopher J
2009-04-01
P. Perruchet (1985b) showed a double dissociation of conditioned responses (CRs) and expectancy for an airpuff unconditioned stimulus (US) in a 50% partial reinforcement schedule in human eyeblink conditioning. In the Perruchet effect, participants show an increase in CRs and a concurrent decrease in expectancy for the airpuff across runs of reinforced trials; conversely, participants show a decrease in CRs and a concurrent increase in expectancy for the airpuff across runs of nonreinforced trials. Three eyeblink conditioning experiments investigated whether the linear trend in eyeblink CRs in the Perruchet effect is a result of changes in associative strength of the conditioned stimulus (CS), US sensitization, or learning the precise timing of the US. Experiments 1 and 2 demonstrated that the linear trend in eyeblink CRs is not the result of US sensitization. Experiment 3 showed that the linear trend in eyeblink CRs is present with both a fixed and a variable CS-US interval and so is not the result of learning the precise timing of the US. The results are difficult to reconcile with a single learning process model of associative learning in which expectancy mediates CRs. Copyright (c) 2009 APA, all rights reserved.
Hauser, Tobias U; Iannaccone, Reto; Ball, Juliane; Mathys, Christoph; Brandeis, Daniel; Walitza, Susanne; Brem, Silvia
2014-10-01
Attention-deficit/hyperactivity disorder (ADHD) has been associated with deficient decision making and learning. Models of ADHD have suggested that these deficits could be caused by impaired reward prediction errors (RPEs). Reward prediction errors are signals that indicate violations of expectations and are known to be encoded by the dopaminergic system. However, the precise learning and decision-making deficits and their neurobiological correlates in ADHD are not well known. To determine the impaired decision-making and learning mechanisms in juvenile ADHD using advanced computational models, as well as the related neural RPE processes using multimodal neuroimaging. Twenty adolescents with ADHD and 20 healthy adolescents serving as controls (aged 12-16 years) were examined using a probabilistic reversal learning task while simultaneous functional magnetic resonance imaging and electroencephalogram were recorded. Learning and decision making were investigated by contrasting a hierarchical Bayesian model with an advanced reinforcement learning model and by comparing the model parameters. The neural correlates of RPEs were studied in functional magnetic resonance imaging and electroencephalogram. Adolescents with ADHD showed more simplistic learning as reflected by the reinforcement learning model (exceedance probability, Px = .92) and had increased exploratory behavior compared with healthy controls (mean [SD] decision steepness parameter β: ADHD, 4.83 [2.97]; controls, 6.04 [2.53]; P = .02). The functional magnetic resonance imaging analysis revealed impaired RPE processing in the medial prefrontal cortex during cue as well as during outcome presentation (P < .05, family-wise error correction). The outcome-related impairment in the medial prefrontal cortex could be attributed to deficient processing at 200 to 400 milliseconds after feedback presentation as reflected by reduced feedback-related negativity (ADHD, 0.61 [3.90] μV; controls, -1.68 [2.52] μV; P = .04). The combination of computational modeling of behavior and multimodal neuroimaging revealed that impaired decision making and learning mechanisms in adolescents with ADHD are driven by impaired RPE processing in the medial prefrontal cortex. This novel, combined approach furthers the understanding of the pathomechanisms in ADHD and may advance treatment strategies.
Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold
2014-12-01
In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.
Strauss, Gregory P.; Frank, Michael J.; Waltz, James A.; Kasanova, Zuzana; Herbener, Ellen S.; Gold, James M.
2011-01-01
Background Negative symptoms are core features of schizophrenia; however, the cognitive and neural basis for individual negative symptom domains remains unclear. Converging evidence suggests a role for striatal and prefrontal dopamine in reward learning and the exploration of actions that might produce outcomes that are better than the status quo. The current study examines whether deficits in reinforcement learning and uncertainty-driven exploration predict specific negative symptoms domains. Methods We administered a temporal decision making task, which required trial-by-trial adjustment of reaction time (RT) to maximize reward receipt, to 51 patients with schizophrenia and 39 age-matched healthy controls. Task conditions were designed such that expected value (probability * magnitude) increased (IEV), decreased (DEV), or remained constant (CEV) with increasing response times. Computational analyses were applied to estimate the degree to which trial-by-trial responses are influenced by reinforcement history. Results Individuals with schizophrenia showed impaired Go learning, but intact NoGo learning relative to controls. These effects were pronounced as a function of global measures of negative symptom. Uncertainty-based exploration was substantially reduced in individuals with schizophrenia, and selectively correlated with clinical ratings of anhedonia. Conclusions Schizophrenia patients, particularly those with high negative symptoms, failed to speed RT's to increase positive outcomes and showed reduced tendency to explore when alternative actions could lead to better outcomes than the status quo. Results are interpreted in the context of current computational, genetic, and pharmacological data supporting the roles of striatal and prefrontal dopamine in these processes. PMID:21168124
Morita, Kenji; Morishima, Mieko; Sakai, Katsuyuki; Kawaguchi, Yasuo
2013-05-15
Humans and animals take actions quickly when they expect that the actions lead to reward, reflecting their motivation. Injection of dopamine receptor antagonists into the striatum has been shown to slow such reward-seeking behavior, suggesting that dopamine is involved in the control of motivational processes. Meanwhile, neurophysiological studies have revealed that phasic response of dopamine neurons appears to represent reward prediction error, indicating that dopamine plays central roles in reinforcement learning. However, previous attempts to elucidate the mechanisms of these dopaminergic controls have not fully explained how the motivational and learning aspects are related and whether they can be understood by the way the activity of dopamine neurons itself is controlled by their upstream circuitries. To address this issue, we constructed a closed-circuit model of the corticobasal ganglia system based on recent findings regarding intracortical and corticostriatal circuit architectures. Simulations show that the model could reproduce the observed distinct motivational effects of D1- and D2-type dopamine receptor antagonists. Simultaneously, our model successfully explains the dopaminergic representation of reward prediction error as observed in behaving animals during learning tasks and could also explain distinct choice biases induced by optogenetic stimulation of the D1 and D2 receptor-expressing striatal neurons. These results indicate that the suggested roles of dopamine in motivational control and reinforcement learning can be understood in a unified manner through a notion that the indirect pathway of the basal ganglia represents the value of states/actions at a previous time point, an empirically driven key assumption of our model.
Investigating implicit statistical learning mechanisms through contextual cueing.
Goujon, Annabelle; Didierjean, André; Thorpe, Simon
2015-09-01
Since its inception, the contextual cueing (CC) paradigm has generated considerable interest in various fields of cognitive sciences because it constitutes an elegant approach to understanding how statistical learning (SL) mechanisms can detect contextual regularities during a visual search. In this article we review and discuss five aspects of CC: (i) the implicit nature of learning, (ii) the mechanisms involved in CC, (iii) the mediating factors affecting CC, (iv) the generalization of CC phenomena, and (v) the dissociation between implicit and explicit CC phenomena. The findings suggest that implicit SL is an inherent component of ongoing processing which operates through clustering, associative, and reinforcement processes at various levels of sensory-motor processing, and might result from simple spike-timing-dependent plasticity. Copyright © 2015 Elsevier Ltd. All rights reserved.
Fuzzy Q-Learning for Generalization of Reinforcement Learning
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1996-01-01
Fuzzy Q-Learning, introduced earlier by the author, is an extension of Q-Learning into fuzzy environments. GARIC is a methodology for fuzzy reinforcement learning. In this paper, we introduce GARIC-Q, a new method for doing incremental Dynamic Programming using a society of intelligent agents which are controlled at the top level by Fuzzy Q-Learning and at the local level, each agent learns and operates based on GARIC. GARIC-Q improves the speed and applicability of Fuzzy Q-Learning through generalization of input space by using fuzzy rules and bridges the gap between Q-Learning and rule based intelligent systems.
The Time Course of Explicit and Implicit Categorization
Zakrzewski, Alexandria C.; Herberger, Eric; Boomer, Joseph; Roeder, Jessica; Ashby, F. Gregory; Church, Barbara A.
2015-01-01
Contemporary theory in cognitive neuroscience distinguishes, among the processes and utilities that serve categorization, explicit and implicit systems of category learning that learn, respectively, category rules by active hypothesis testing or adaptive behaviors by association and reinforcement. Little is known about the time course of categorization within these systems. Accordingly, the present experiments contrasted tasks that fostered explicit categorization (because they had a one-dimensional, rule-based solution) or implicit categorization (because they had a two-dimensional, information-integration solution). In Experiment 1, participants learned categories under unspeeded or speeded conditions. In Experiment 2, they applied previously trained category knowledge under unspeeded or speeded conditions. Speeded conditions selectively impaired implicit category learning and implicit mature categorization. These results illuminate the processing dynamics of explicit/implicit categorization. PMID:26025556
Alterations in choice behavior by manipulations of world model.
Green, C S; Benson, C; Kersten, D; Schrater, P
2010-09-14
How to compute initially unknown reward values makes up one of the key problems in reinforcement learning theory, with two basic approaches being used. Model-free algorithms rely on the accumulation of substantial amounts of experience to compute the value of actions, whereas in model-based learning, the agent seeks to learn the generative process for outcomes from which the value of actions can be predicted. Here we show that (i) "probability matching"-a consistent example of suboptimal choice behavior seen in humans-occurs in an optimal Bayesian model-based learner using a max decision rule that is initialized with ecologically plausible, but incorrect beliefs about the generative process for outcomes and (ii) human behavior can be strongly and predictably altered by the presence of cues suggestive of various generative processes, despite statistically identical outcome generation. These results suggest human decision making is rational and model based and not consistent with model-free learning.
Alterations in choice behavior by manipulations of world model
Green, C. S.; Benson, C.; Kersten, D.; Schrater, P.
2010-01-01
How to compute initially unknown reward values makes up one of the key problems in reinforcement learning theory, with two basic approaches being used. Model-free algorithms rely on the accumulation of substantial amounts of experience to compute the value of actions, whereas in model-based learning, the agent seeks to learn the generative process for outcomes from which the value of actions can be predicted. Here we show that (i) “probability matching”—a consistent example of suboptimal choice behavior seen in humans—occurs in an optimal Bayesian model-based learner using a max decision rule that is initialized with ecologically plausible, but incorrect beliefs about the generative process for outcomes and (ii) human behavior can be strongly and predictably altered by the presence of cues suggestive of various generative processes, despite statistically identical outcome generation. These results suggest human decision making is rational and model based and not consistent with model-free learning. PMID:20805507
Application of a model of instrumental conditioning to mobile robot control
NASA Astrophysics Data System (ADS)
Saksida, Lisa M.; Touretzky, D. S.
1997-09-01
Instrumental conditioning is a psychological process whereby an animal learns to associate its actions with their consequences. This type of learning is exploited in animal training techniques such as 'shaping by successive approximations,' which enables trainers to gradually adjust the animal's behavior by giving strategically timed reinforcements. While this is similar in principle to reinforcement learning, the real phenomenon includes many subtle effects not considered in the machine learning literature. In addition, a good deal of domain information is utilized by an animal learning a new task; it does not start from scratch every time it learns a new behavior. For these reasons, it is not surprising that mobile robot learning algorithms have yet to approach the sophistication and robustness of animal learning. A serious attempt to model instrumental learning could prove fruitful for improving machine learning techniques. In the present paper, we develop a computational theory of shaping at a level appropriate for controlling mobile robots. The theory is based on a series of mechanisms for 'behavior editing,' in which pre-existing behaviors, either innate or previously learned, can be dramatically changed in magnitude, shifted in direction, or otherwise manipulated so as to produce new behavioral routines. We have implemented our theory on Amelia, an RWI B21 mobile robot equipped with a gripper and color video camera. We provide results from training Amelia on several tasks, all of which were constructed as variations of one innate behavior, object-pursuit.
Creating a Culture for Learning. On Balance
ERIC Educational Resources Information Center
Trubowitz, Sidney
2005-01-01
Everywhere we read about efforts to revitalize schools. Such initiatives include restructuring governance by centralizing the power to make decisions, introducing a mandated curriculum for all teachers to follow, and reinforcing an accountability process with a strong focus on test scores. All these school-improvement proposals rely on a belief…
Contemporary Approaches to Conditioning and Learning.
ERIC Educational Resources Information Center
McGuigan, F. J., Ed.; Lumsden, D. Barry, Ed.
Chapters contained in this volume, each with a list of references appended, are: "Scientific Psychology in Transition" by Gregory A. Kimble; "Higher Mental Processes as the Bases for the Laws of Conditioning" by Eli Saltz; "Reification and Reality in Conditioning Paradigms: Implications of Results When Modes of Reinforcement are Changed" by David…
Gimme Shelter!: Doghouse Project Hones Construction Skills
ERIC Educational Resources Information Center
Shackelford, Ray; Griffis, Kurt
2007-01-01
In Rover's House project, students practice planning, measurement, layout, and processing skills in building a doghouse. The project is more than a doghouse--it is a learning activity that helps students develop and enhance their ability to work with reinforced concrete, steel and wood studs, trusses, roofing materials and a variety of…
The Programmed Instruction Era: When Effectiveness Mattered
ERIC Educational Resources Information Center
Molenda, Michael
2008-01-01
Programmed instruction (PI) was devised to make the teaching-learning process more humane by making it more effective and customized to individual differences. B.F. Skinner's original prescription was modified by later innovators to incorporate more human interaction, social reinforcers and other forms of feedback, larger and more flexible chunks…
The Writing Staff as Faculty Compost Pile.
ERIC Educational Resources Information Center
Dorenkamp, Angela G.
Misconceptions about the teaching of writing prevail on many college campuses, partially because writing teachers fail to communicate with their colleagues. It is especially important for writing teachers to let their colleagues know that learning to write is a long term developmental process that needs support and reinforcement from the entire…
Proactivity and Reinforcement: The Contingency of Social Behavior
ERIC Educational Resources Information Center
Williams, J. Sherwood; And Others
1976-01-01
This paper analyzes development of group structure in terms of the stimulus-sampling perspective. Learning is the continual sampling of possibilities, with those reinforced possibilities increasing in probability of occurance. This contingency learning approach is tested experimentally. (NG)
Dopamine-System Genes and Cultural Acquisition: The Norm Sensitivity Hypothesis
Kitayama, Shinobu; King, Anthony; Hsu, Ming; Liberzon, Israel; Yoon, Carolyn
2016-01-01
Previous research in cultural psychology shows that cultures vary in the social orientation of independence and interdependence. To date, however, little is known about how people may acquire such global patterns of cultural behavior or cultural norms. Nor is it clear what genetic mechanisms may underlie the acquisition of cultural norms. Here, we draw on recent evidence for certain genetic variability in the susceptibility to environmental influences and propose a norm sensitivity hypothesis, which holds that people acquire culture, and rules of cultural behaviors, through reinforcement-mediated social learning processes. One corollary of the hypothesis is that the degree of cultural acquisition should be influenced by polymorphic variants of genes involved in dopaminergic neural pathways, which have been widely implicated in reinforcement learning. We reviewed initial evidence for this prediction and discussed challenges and directions for future research. PMID:28491931
Mobile robots exploration through cnn-based reinforcement learning.
Tai, Lei; Liu, Ming
2016-01-01
Exploration in an unknown environment is an elemental application for mobile robots. In this paper, we outlined a reinforcement learning method aiming for solving the exploration problem in a corridor environment. The learning model took the depth image from an RGB-D sensor as the only input. The feature representation of the depth image was extracted through a pre-trained convolutional-neural-networks model. Based on the recent success of deep Q-network on artificial intelligence, the robot controller achieved the exploration and obstacle avoidance abilities in several different simulated environments. It is the first time that the reinforcement learning is used to build an exploration strategy for mobile robots through raw sensor information.
Altered neural encoding of prediction errors in assault-related posttraumatic stress disorder.
Ross, Marisa C; Lenow, Jennifer K; Kilts, Clinton D; Cisler, Josh M
2018-05-12
Posttraumatic stress disorder (PTSD) is widely associated with deficits in extinguishing learned fear responses, which relies on mechanisms of reinforcement learning (e.g., updating expectations based on prediction errors). However, the degree to which PTSD is associated with impairments in general reinforcement learning (i.e., outside of the context of fear stimuli) remains poorly understood. Here, we investigate brain and behavioral differences in general reinforcement learning between adult women with and without a current diagnosis of PTSD. 29 adult females (15 PTSD with exposure to assaultive violence, 14 controls) underwent a neutral reinforcement-learning task (i.e., two arm bandit task) during fMRI. We modeled participant behavior using different adaptations of the Rescorla-Wagner (RW) model and used Independent Component Analysis to identify timecourses for large-scale a priori brain networks. We found that an anticorrelated and risk sensitive RW model best fit participant behavior, with no differences in computational parameters between groups. Women in the PTSD group demonstrated significantly less neural encoding of prediction errors in both a ventral striatum/mPFC and anterior insula network compared to healthy controls. Weakened encoding of prediction errors in the ventral striatum/mPFC and anterior insula during a general reinforcement learning task, outside of the context of fear stimuli, suggests the possibility of a broader conceptualization of learning differences in PTSD than currently proposed in current neurocircuitry models of PTSD. Copyright © 2018 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Hwang, Kuo-An; Yang, Chia-Hao
2009-01-01
Most courses based on distance learning focus on the cognitive domain of learning. Because students are sometimes inattentive or tired, they may neglect the attention goal of learning. This study proposes an auto-detection and reinforcement mechanism for the distance-education system based on the reinforcement teaching strategy. If a student is…
When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition
ERIC Educational Resources Information Center
Janssen, Christian P.; Gray, Wayne D.
2012-01-01
Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other…
Dissociating error-based and reinforcement-based loss functions during sensorimotor learning
McGregor, Heather R.; Mohatarem, Ayman
2017-01-01
It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback. PMID:28753634
Dissociating error-based and reinforcement-based loss functions during sensorimotor learning.
Cashaback, Joshua G A; McGregor, Heather R; Mohatarem, Ayman; Gribble, Paul L
2017-07-01
It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback.
Differential associative training enhances olfactory acuity in Drosophila melanogaster.
Barth, Jonas; Dipt, Shubham; Pech, Ulrike; Hermann, Moritz; Riemensperger, Thomas; Fiala, André
2014-01-29
Training can improve the ability to discriminate between similar, confusable stimuli, including odors. One possibility of enhancing behaviorally expressed discrimination (i.e., sensory acuity) relies on differential associative learning, during which animals are forced to detect the differences between similar stimuli. Drosophila represents a key model organism for analyzing neuronal mechanisms underlying both odor processing and olfactory learning. However, the ability of flies to enhance fine discrimination between similar odors through differential associative learning has not been analyzed in detail. We performed associative conditioning experiments using chemically similar odorants that we show to evoke overlapping neuronal activity in the fly's antennal lobes and highly correlated activity in mushroom body lobes. We compared the animals' performance in discriminating between these odors after subjecting them to one of two types of training: either absolute conditioning, in which only one odor is reinforced, or differential conditioning, in which one odor is reinforced and a second odor is explicitly not reinforced. First, we show that differential conditioning decreases behavioral generalization of similar odorants in a choice situation. Second, we demonstrate that this learned enhancement in olfactory acuity relies on both conditioned excitation and conditioned inhibition. Third, inhibitory local interneurons in the antennal lobes are shown to be required for behavioral fine discrimination between the two similar odors. Fourth, differential, but not absolute, training causes decorrelation of odor representations in the mushroom body. In conclusion, differential training with similar odors ultimately induces a behaviorally expressed contrast enhancement between the two similar stimuli that facilitates fine discrimination.
Role of amygdala central nucleus in feature negative discriminations
Holland, Peter C.
2012-01-01
Consistent with a popular theory of associative learning, the Pearce-Hall (1980) model, the surprising omission of expected events enhances cue associability (the ease with which a cue may enter into new associations), across a wide variety of behavioral training procedures. Furthermore, previous experiments from this laboratory showed that these enhancements are absent in rats with impaired function of the amygdala central nucleus (CeA). A notable exception to these assertions is found in feature negative (FN) discrimination learning, in which a “target” stimulus is reinforced when it is presented alone but nonreinforced when it is presented in compound with another, “feature” stimulus. According to the Pearce-Hall model, reinforcer omission on compound trials should enhance the associability of the feature relative to control training conditions. However, prior experiments have shown no evidence that CeA lesions affect FN discrimination learning. Here we explored this apparent contradiction by evaluating the hypothesis that the surprising omission of an event confers enhanced associability on a cue only if that cue itself generates the disconfirmed prediction. Thus, in a FN discrimination, the surprising omission of the reinforcer on compound trials would enhance the associability of the target stimulus but not that of the feature. Our data confirmed this hypothesis, and showed this enhancement to depend on intact CeA function, as in other procedures. The results are consistent with modern reformulations of both cue and reward processing theories that assign roles for both individual and aggregate error terms in associative learning. PMID:22889308
Interconnected growing self-organizing maps for auditory and semantic acquisition modeling.
Cao, Mengxue; Li, Aijun; Fang, Qiang; Kaufmann, Emily; Kröger, Bernd J
2014-01-01
Based on the incremental nature of knowledge acquisition, in this study we propose a growing self-organizing neural network approach for modeling the acquisition of auditory and semantic categories. We introduce an Interconnected Growing Self-Organizing Maps (I-GSOM) algorithm, which takes associations between auditory information and semantic information into consideration, in this paper. Direct phonetic-semantic association is simulated in order to model the language acquisition in early phases, such as the babbling and imitation stages, in which no phonological representations exist. Based on the I-GSOM algorithm, we conducted experiments using paired acoustic and semantic training data. We use a cyclical reinforcing and reviewing training procedure to model the teaching and learning process between children and their communication partners. A reinforcing-by-link training procedure and a link-forgetting procedure are introduced to model the acquisition of associative relations between auditory and semantic information. Experimental results indicate that (1) I-GSOM has good ability to learn auditory and semantic categories presented within the training data; (2) clear auditory and semantic boundaries can be found in the network representation; (3) cyclical reinforcing and reviewing training leads to a detailed categorization as well as to a detailed clustering, while keeping the clusters that have already been learned and the network structure that has already been developed stable; and (4) reinforcing-by-link training leads to well-perceived auditory-semantic associations. Our I-GSOM model suggests that it is important to associate auditory information with semantic information during language acquisition. Despite its high level of abstraction, our I-GSOM approach can be interpreted as a biologically-inspired neurocomputational model.
Predictive representations can link model-based reinforcement learning to model-free mechanisms.
Russek, Evan M; Momennejad, Ida; Botvinick, Matthew M; Gershman, Samuel J; Daw, Nathaniel D
2017-09-01
Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation.
Predictive representations can link model-based reinforcement learning to model-free mechanisms
Botvinick, Matthew M.
2017-01-01
Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation. PMID:28945743
Computational Dysfunctions in Anxiety: Failure to Differentiate Signal From Noise.
Huang, He; Thompson, Wesley; Paulus, Martin P
2017-09-15
Differentiating whether an action leads to an outcome by chance or by an underlying statistical regularity that signals environmental change profoundly affects adaptive behavior. Previous studies have shown that anxious individuals may not appropriately differentiate between these situations. This investigation aims to precisely quantify the process deficit in anxious individuals and determine the degree to which these process dysfunctions are specific to anxiety. One hundred twenty-two subjects recruited as part of an ongoing large clinical population study completed a change point detection task. Reinforcement learning models were used to explicate observed behavioral differences in low anxiety (Overall Anxiety Severity and Impairment Scale score ≤ 8) and high anxiety (Overall Anxiety Severity and Impairment Scale score ≥ 9) groups. High anxiety individuals used a suboptimal decision strategy characterized by a higher lose-shift rate. Computational models and simulations revealed that this difference was related to a higher base learning rate. These findings are better explained in a context-dependent reinforcement learning model. Anxious subjects' exaggerated response to uncertainty leads to a suboptimal decision strategy that makes it difficult for these individuals to determine whether an action is associated with an outcome by chance or by some statistical regularity. These findings have important implications for developing new behavioral intervention strategies using learning models. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
A Discussion of Possibility of Reinforcement Learning Using Event-Related Potential in BCI
NASA Astrophysics Data System (ADS)
Yamagishi, Yuya; Tsubone, Tadashi; Wada, Yasuhiro
Recently, Brain computer interface (BCI) which is a direct connecting pathway an external device such as a computer or a robot and a human brain have gotten a lot of attention. Since BCI can control the machines as robots by using the brain activity without using the voluntary muscle, the BCI may become a useful communication tool for handicapped persons, for instance, amyotrophic lateral sclerosis patients. However, in order to realize the BCI system which can perform precise tasks on various environments, it is necessary to design the control rules to adapt to the dynamic environments. Reinforcement learning is one approach of the design of the control rule. If this reinforcement leaning can be performed by the brain activity, it leads to the attainment of BCI that has general versatility. In this research, we paid attention to P300 of event-related potential as an alternative signal of the reward of reinforcement learning. We discriminated between the success and the failure trials from P300 of the EEG of the single trial by using the proposed discrimination algorithm based on Support vector machine. The possibility of reinforcement learning was examined from the viewpoint of the number of discriminated trials. It was shown that there was a possibility to be able to learn in most subjects.
Gold, James M.; Waltz, James A.; Matveeva, Tatyana M.; Kasanova, Zuzana; Strauss, Gregory P.; Herbener, Ellen S.; Collins, Anne G.E.; Frank, Michael J.
2015-01-01
Context Negative symptoms are a core feature of schizophrenia, but their pathophysiology remains unclear. Objective Negative symptoms are defined by the absence of normal function. However, there must be a productive mechanism that leads to this absence. Here, we test a reinforcement learning account suggesting that negative symptoms result from a failure to represent the expected value of rewards coupled with preserved loss avoidance learning. Design Subjects performed a probabilistic reinforcement learning paradigm involving stimulus pairs in which choices resulted in either reward or avoidance of loss. Following training, subjects indicated their valuation of the stimuli in a transfer task. Computational modeling was used to distinguish between alternative accounts of the data. Setting A tertiary care research outpatient clinic. Patients A total of 47 clinically stable patients with a diagnosis of schizophrenia or schizoaffective disorder and 28 healthy volunteers participated. Patients were divided into high and low negative symptom groups. Main Outcome measures 1) The number of choices leading to reward or loss avoidance and 2) performance in the transfer phase. Quantitative fits from three different models were examined. Results High negative symptom patients demonstrated impaired learning from rewards but intact loss avoidance learning, and failed to distinguish rewarding stimuli from loss-avoiding stimuli in the transfer phase. Model fits revealed that high negative symptom patients were better characterized by an “actor-critic” model, learning stimulus-response associations, whereas controls and low negative symptom patients incorporated expected value of their actions (“Q-learning”) into the selection process. Conclusions Negative symptoms are associated with a specific reinforcement learning abnormality: High negative symptoms patients do not represent the expected value of rewards when making decisions but learn to avoid punishments through the use of prediction errors. This computational framework offers the potential to understand negative symptoms at a mechanistic level. PMID:22310503
Gaussian Processes for Data-Efficient Learning in Robotics and Control.
Deisenroth, Marc Peter; Fox, Dieter; Rasmussen, Carl Edward
2015-02-01
Autonomous learning has been a promising direction in control and robotics for more than a decade since data-driven learning allows to reduce the amount of engineering knowledge, which is otherwise required. However, autonomous reinforcement learning (RL) approaches typically require many interactions with the system to learn controllers, which is a practical limitation in real systems, such as robots, where many interactions can be impractical and time consuming. To address this problem, current learning approaches typically require task-specific knowledge in form of expert demonstrations, realistic simulators, pre-shaped policies, or specific knowledge about the underlying dynamics. In this paper, we follow a different approach and speed up learning by extracting more information from data. In particular, we learn a probabilistic, non-parametric Gaussian process transition model of the system. By explicitly incorporating model uncertainty into long-term planning and controller learning our approach reduces the effects of model errors, a key problem in model-based learning. Compared to state-of-the art RL our model-based policy search method achieves an unprecedented speed of learning. We demonstrate its applicability to autonomous learning in real robot and control tasks.
Higher incentives can impair performance: neural evidence on reinforcement and rationality
Achtziger, Anja; Hügelschäfer, Sabine; Steinhauser, Marco
2015-01-01
Standard economic thinking postulates that increased monetary incentives should increase performance. Human decision makers, however, frequently focus on past performance, a form of reinforcement learning occasionally at odds with rational decision making. We used an incentivized belief-updating task from economics to investigate this conflict through measurements of neural correlates of reward processing. We found that higher incentives fail to improve performance when immediate feedback on decision outcomes is provided. Subsequent analysis of the feedback-related negativity, an early event-related potential following feedback, revealed the mechanism behind this paradoxical effect. As incentives increase, the win/lose feedback becomes more prominent, leading to an increased reliance on reinforcement and more errors. This mechanism is relevant for economic decision making and the debate on performance-based payment. PMID:25816816
Isolating the incentive salience of reward-associated stimuli: value, choice, and persistence
Chow, Jonathan J.
2015-01-01
Sign- and goal-tracking are differentially associated with drug abuse-related behavior. Recently, it has been hypothesized that sign- and goal-tracking behavior are mediated by different neurobehavioral valuation systems, including differential incentive salience attribution. Herein, we used different conditioned stimuli to preferentially elicit different response types to study the different incentive valuation characteristics of stimuli associated with sign- and goal-tracking within individuals. The results demonstrate that all stimuli used were equally effective conditioned stimuli; however, only a lever stimulus associated with sign-tracking behavior served as a robust conditioned reinforcer and was preferred over a tone associated with goal-tracking. Moreover, the incentive value attributed to the lever stimulus was capable of promoting suboptimal choice, leading to a significant reduction in reinforcers (food) earned. Furthermore, sign-tracking to a lever was more persistent than goal-tracking to a tone under omission and extinction contingencies. Finally, a conditional discrimination procedure demonstrated that sign-tracking to a lever and goal-tracking to a tone were dependent on learned stimulus–reinforcer relations. Collectively, these results suggest that the different neurobehavioral valuation processes proposed to govern sign- and goal-tracking behavior are independent but parallel processes within individuals. Examining these systems within individuals will provide a better understanding of how one system comes to dominate stimulus–reward learning, thus leading to the differential role these systems play in abuse-related behavior. PMID:25593298
Mirolli, Marco; Santucci, Vieri G; Baldassarre, Gianluca
2013-03-01
An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions. Copyright © 2013 Elsevier Ltd. All rights reserved.
Collins, Anne G E; Frank, Michael J
2018-03-06
Learning from rewards and punishments is essential to survival and facilitates flexible human behavior. It is widely appreciated that multiple cognitive and reinforcement learning systems contribute to decision-making, but the nature of their interactions is elusive. Here, we leverage methods for extracting trial-by-trial indices of reinforcement learning (RL) and working memory (WM) in human electro-encephalography to reveal single-trial computations beyond that afforded by behavior alone. Neural dynamics confirmed that increases in neural expectation were predictive of reduced neural surprise in the following feedback period, supporting central tenets of RL models. Within- and cross-trial dynamics revealed a cooperative interplay between systems for learning, in which WM contributes expectations to guide RL, despite competition between systems during choice. Together, these results provide a deeper understanding of how multiple neural systems interact for learning and decision-making and facilitate analysis of their disruption in clinical populations.
Learning and tuning fuzzy logic controllers through reinforcements.
Berenji, H R; Khedkar, P
1992-01-01
A method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system is presented. It is shown that: the generalized approximate-reasoning-based intelligent control (GARIC) architecture learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward network, which can then adaptively improve performance by using gradient descent methods. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.
Impairments in action-outcome learning in schizophrenia.
Morris, Richard W; Cyrzon, Chad; Green, Melissa J; Le Pelley, Mike E; Balleine, Bernard W
2018-03-03
Learning the causal relation between actions and their outcomes (AO learning) is critical for goal-directed behavior when actions are guided by desire for the outcome. This can be contrasted with habits that are acquired by reinforcement and primed by prevailing stimuli, in which causal learning plays no part. Recently, we demonstrated that goal-directed actions are impaired in schizophrenia; however, whether this deficit exists alongside impairments in habit or reinforcement learning is unknown. The present study distinguished deficits in causal learning from reinforcement learning in schizophrenia. We tested people with schizophrenia (SZ, n = 25) and healthy adults (HA, n = 25) in a vending machine task. Participants learned two action-outcome contingencies (e.g., push left to get a chocolate M&M, push right to get a cracker), and they also learned one contingency was degraded by delivery of noncontingent outcomes (e.g., free M&Ms), as well as changes in value by outcome devaluation. Both groups learned the best action to obtain rewards; however, SZ did not distinguish the more causal action when one AO contingency was degraded. Moreover, action selection in SZ was insensitive to changes in outcome value unless feedback was provided, and this was related to the deficit in AO learning. The failure to encode the causal relation between action and outcome in schizophrenia occurred without any apparent deficit in reinforcement learning. This implies that poor goal-directed behavior in schizophrenia cannot be explained by a more primary deficit in reward learning such as insensitivity to reward value or reward prediction errors.
ERIC Educational Resources Information Center
Heitzman, Andrew J.
The New York State Center for Migrant Studies conducted this 1968 study which investigated effects of token reinforcers on reading and arithmetic skills learnings of migrant primary school students during a 6-week summer school session. Students (Negro and Caucasian) received plastic tokens to reward skills learning responses. Tokens were traded…
ERIC Educational Resources Information Center
Neu, Jessica Adele
2013-01-01
I conducted two studies on the comparative effects of the observation of learn units during (a) reinforcement or (b) correction conditions on the acquisition of math objectives. The dependent variables were the within-session cumulative numbers of correct responses emitted during observational sessions. The independent variables were the…
ERIC Educational Resources Information Center
Chi, Min; VanLehn, Kurt; Litman, Diane; Jordan, Pamela
2011-01-01
Pedagogical strategies are policies for a tutor to decide the next action when there are multiple actions available. When the content is controlled to be the same across experimental conditions, there has been little evidence that tutorial decisions have an impact on students' learning. In this paper, we applied Reinforcement Learning (RL) to…
The Identification and Establishment of Reinforcement for Collaboration in Elementary Students
ERIC Educational Resources Information Center
Darcy, Laura
2017-01-01
In Experiment 1, I conducted a functional analysis of student rate of learning with and without a peer-yoked contingency for 12 students in Kindergarten through 2nd grade in order to determine if they had conditioned reinforcement for collaboration. Using an ABAB reversal design, I compared rate of learning as measured by learn units to criterion…
Dillon, Laura; Collins, Meaghan; Conway, Maura; Cunningham, Kate
2013-01-01
Three experiments examined the implicit learning of sequences under conditions in which the elements comprising a sequence were equated in terms of reinforcement probability. In Experiment 1 cotton-top tamarins (Saguinus oedipus) experienced a five-element sequence displayed serially on a touch screen in which reinforcement probability was equated across elements at .16 per element. Tamarins demonstrated learning of this sequence with higher latencies during a random test as compared to baseline sequence training. In Experiments 2 and 3, manipulations of the procedure used in the first experiment were undertaken to rule out a confound owing to the fact that the elements in Experiment 1 bore different temporal relations to the intertrial interval (ITI), an inhibitory period. The results of Experiments 2 and 3 indicated that the implicit learning observed in Experiment 1 was not due to temporal proximity between some elements and the inhibitory ITI. The results taken together support two conclusion: First that tamarins engaged in sequence learning whether or not there was contingent reinforcement for learning the sequence, and second that this learning was not due to subtle differences in associative strength between the elements of the sequence. PMID:23344718
Deep learning of orthographic representations in baboons.
Hannagan, Thomas; Ziegler, Johannes C; Dufau, Stéphane; Fagot, Joël; Grainger, Jonathan
2014-01-01
What is the origin of our ability to learn orthographic knowledge? We use deep convolutional networks to emulate the primate's ventral visual stream and explore the recent finding that baboons can be trained to discriminate English words from nonwords. The networks were exposed to the exact same sequence of stimuli and reinforcement signals as the baboons in the experiment, and learned to map real visual inputs (pixels) of letter strings onto binary word/nonword responses. We show that the networks' highest levels of representations were indeed sensitive to letter combinations as postulated in our previous research. The model also captured the key empirical findings, such as generalization to novel words, along with some intriguing inter-individual differences. The present work shows the merits of deep learning networks that can simulate the whole processing chain all the way from the visual input to the response while allowing researchers to analyze the complex representations that emerge during the learning process.
Modification of saccadic gain by reinforcement
Paeye, Céline; Wallman, Josh
2011-01-01
Control of saccadic gain is often viewed as a simple compensatory process in which gain is adjusted over many trials by the postsaccadic retinal error, thereby maintaining saccadic accuracy. Here, we propose that gain might also be changed by a reinforcement process not requiring a visual error. To test this hypothesis, we used experimental paradigms in which retinal error was removed by extinguishing the target at the start of each saccade and either an auditory tone or the vision of the target on the fovea was provided as reinforcement after those saccades that met an amplitude criterion. These reinforcement procedures caused a progressive change in saccade amplitude in nearly all subjects, although the rate of adaptation differed greatly among subjects. When we reversed the contingencies and reinforced those saccades landing closer to the original target location, saccade gain changed back toward normal gain in most subjects. When subjects had saccades adapted first by reinforcement and a week later by conventional intrasaccadic step adaptation, both paradigms yielded similar degrees of gain changes and similar transfer to new amplitudes and to new starting positions of the target step as well as comparable rates of recovery. We interpret these changes in saccadic gain in the absence of postsaccadic retinal error as showing that saccade adaptation is not controlled by a single error signal. More generally, our findings suggest that normal saccade adaptation might involve general learning mechanisms rather than only specialized mechanisms for motor calibration. PMID:21525366
Goal-seeking neural net for recall and recognition
NASA Astrophysics Data System (ADS)
Omidvar, Omid M.
1990-07-01
Neural networks have been used to mimic cognitive processes which take place in animal brains. The learning capability inherent in neural networks makes them suitable candidates for adaptive tasks such as recall and recognition. The synaptic reinforcements create a proper condition for adaptation, which results in memorization, formation of perception, and higher order information processing activities. In this research a model of a goal seeking neural network is studied and the operation of the network with regard to recall and recognition is analyzed. In these analyses recall is defined as retrieval of stored information where little or no matching is involved. On the other hand recognition is recall with matching; therefore it involves memorizing a piece of information with complete presentation. This research takes the generalized view of reinforcement in which all the signals are potential reinforcers. The neuronal response is considered to be the source of the reinforcement. This local approach to adaptation leads to the goal seeking nature of the neurons as network components. In the proposed model all the synaptic strengths are reinforced in parallel while the reinforcement among the layers is done in a distributed fashion and pipeline mode from the last layer inward. A model of complex neuron with varying threshold is developed to account for inhibitory and excitatory behavior of real neuron. A goal seeking model of a neural network is presented. This network is utilized to perform recall and recognition tasks. The performance of the model with regard to the assigned tasks is presented.
Reinforcement Learning with Orthonormal Basis Adaptation Based on Activity-Oriented Index Allocation
NASA Astrophysics Data System (ADS)
Satoh, Hideki
An orthonormal basis adaptation method for function approximation was developed and applied to reinforcement learning with multi-dimensional continuous state space. First, a basis used for linear function approximation of a control function is set to an orthonormal basis. Next, basis elements with small activities are replaced with other candidate elements as learning progresses. As this replacement is repeated, the number of basis elements with large activities increases. Example chaos control problems for multiple logistic maps were solved, demonstrating that the method for adapting an orthonormal basis can modify a basis while holding the orthonormality in accordance with changes in the environment to improve the performance of reinforcement learning and to eliminate the adverse effects of redundant noisy states.
Kangas, Brian D; Bergman, Jack; Coyle, Joseph T
2016-05-01
Recent developments in precision gene editing have led to the emergence of the marmoset as an experimental subject of considerable interest and translational value. A better understanding of behavioral phenotypes of the common marmoset will inform the extent to which forthcoming transgenic mutants are cognitively intact. Therefore, additional information regarding their learning, inhibitory control, and motivational abilities is needed. The present studies used touchscreen-based repeated acquisition and discrimination reversal tasks to examine basic dimensions of learning and response inhibition. Marmosets were trained daily to respond to one of the two simultaneously presented novel stimuli. Subjects learned to discriminate the two stimuli (acquisition) and, subsequently, with the contingencies switched (reversal). In addition, progressive ratio performance was used to measure the effort expended to obtain a highly palatable reinforcer varying in magnitude and, thereby, provide an index of relative motivational value. Results indicate that rates of both acquisition and reversal of novel discriminations increased across successive sessions, but that rate of reversal learning remained slower than acquisition learning, i.e., more trials were needed for mastery. A positive correlation was observed between progressive ratio break point and reinforcement magnitude. These results closely replicate previous findings with squirrel monkeys, thus providing evidence of similarity in learning processes across nonhuman primate species. Moreover, these data provide key information about the normative phenotype of wild-type marmosets using three relevant behavioral endpoints.
Stochastic abstract policies: generalizing knowledge to improve reinforcement learning.
Koga, Marcelo L; Freire, Valdinei; Costa, Anna H R
2015-01-01
Reinforcement learning (RL) enables an agent to learn behavior by acquiring experience through trial-and-error interactions with a dynamic environment. However, knowledge is usually built from scratch and learning to behave may take a long time. Here, we improve the learning performance by leveraging prior knowledge; that is, the learner shows proper behavior from the beginning of a target task, using the knowledge from a set of known, previously solved, source tasks. In this paper, we argue that building stochastic abstract policies that generalize over past experiences is an effective way to provide such improvement and this generalization outperforms the current practice of using a library of policies. We achieve that contributing with a new algorithm, AbsProb-PI-multiple and a framework for transferring knowledge represented as a stochastic abstract policy in new RL tasks. Stochastic abstract policies offer an effective way to encode knowledge because the abstraction they provide not only generalizes solutions but also facilitates extracting the similarities among tasks. We perform experiments in a robotic navigation environment and analyze the agent's behavior throughout the learning process and also assess the transfer ratio for different amounts of source tasks. We compare our method with the transfer of a library of policies, and experiments show that the use of a generalized policy produces better results by more effectively guiding the agent when learning a target task.
Hybrid computing using a neural network with dynamic external memory.
Graves, Alex; Wayne, Greg; Reynolds, Malcolm; Harley, Tim; Danihelka, Ivo; Grabska-Barwińska, Agnieszka; Colmenarejo, Sergio Gómez; Grefenstette, Edward; Ramalho, Tiago; Agapiou, John; Badia, Adrià Puigdomènech; Hermann, Karl Moritz; Zwols, Yori; Ostrovski, Georg; Cain, Adam; King, Helen; Summerfield, Christopher; Blunsom, Phil; Kavukcuoglu, Koray; Hassabis, Demis
2016-10-27
Artificial neural networks are remarkably adept at sensory processing, sequence learning and reinforcement learning, but are limited in their ability to represent variables and data structures and to store data over long timescales, owing to the lack of an external memory. Here we introduce a machine learning model called a differentiable neural computer (DNC), which consists of a neural network that can read from and write to an external memory matrix, analogous to the random-access memory in a conventional computer. Like a conventional computer, it can use its memory to represent and manipulate complex data structures, but, like a neural network, it can learn to do so from data. When trained with supervised learning, we demonstrate that a DNC can successfully answer synthetic questions designed to emulate reasoning and inference problems in natural language. We show that it can learn tasks such as finding the shortest path between specified points and inferring the missing links in randomly generated graphs, and then generalize these tasks to specific graphs such as transport networks and family trees. When trained with reinforcement learning, a DNC can complete a moving blocks puzzle in which changing goals are specified by sequences of symbols. Taken together, our results demonstrate that DNCs have the capacity to solve complex, structured tasks that are inaccessible to neural networks without external read-write memory.
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Vengerov, David
1999-01-01
Successful operations of future multi-agent intelligent systems require efficient cooperation schemes between agents sharing learning experiences. We consider a pseudo-realistic world in which one or more opportunities appear and disappear in random locations. Agents use fuzzy reinforcement learning to learn which opportunities are most worthy of pursuing based on their promise rewards, expected lifetimes, path lengths and expected path costs. We show that this world is partially observable because the history of an agent influences the distribution of its future states. We consider a cooperation mechanism in which agents share experience by using and-updating one joint behavior policy. We also implement a coordination mechanism for allocating opportunities to different agents in the same world. Our results demonstrate that K cooperative agents each learning in a separate world over N time steps outperform K independent agents each learning in a separate world over K*N time steps, with this result becoming more pronounced as the degree of partial observability in the environment increases. We also show that cooperation between agents learning in the same world decreases performance with respect to independent agents. Since cooperation reduces diversity between agents, we conclude that diversity is a key parameter in the trade off between maximizing utility from cooperation when diversity is low and maximizing utility from competitive coordination when diversity is high.
Flow Navigation by Smart Microswimmers via Reinforcement Learning
NASA Astrophysics Data System (ADS)
Colabrese, Simona; Biferale, Luca; Celani, Antonio; Gustavsson, Kristian
2017-11-01
We have numerically modeled active particles which are able to acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. We show that those swimmers can learn effective strategies just by experience, using a reinforcement learning algorithm. As an example, we focus on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, exploiting the underlying flow whenever possible. The reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This work paves the way towards the engineering of smart microswimmers that solve difficult navigation problems. ERC AdG NewTURB 339032.
Neural correlates of reinforcement learning and social preferences in competitive bidding.
van den Bos, Wouter; Talwar, Arjun; McClure, Samuel M
2013-01-30
In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.
The Tablet Inscribed: Inclusive Writing Instruction with the iPad
ERIC Educational Resources Information Center
Sullivan, Rebecca M.
2013-01-01
Despite the author's initial skepticism, a classroom set of iPads has reinforced a student-directed approach to writing instruction, while also supporting an inclusive classroom. Using the iPads, students guide their writing process with access to the learning management system, electronic information resources, and an online text editor. Students…
Choosing Classroom-Led Activities Reinforce Active Learning
ERIC Educational Resources Information Center
Escribano, Begona M.; Aguera, Estrella L.; Tovar, Pura
2013-01-01
The authors of this article have had some doubts as to whether the new information and communication technologies presented through the internet may be altering the way in which the student's brain processes the information received. Access to the internet permits students to obtain abundant information quite rapidly, but the result is…
High School Health and Physical Education: Reinforcing the 3Rs
ERIC Educational Resources Information Center
Moore, John
2009-01-01
The ultimate goal of the education process should be to improve instruction and increase student learning. To effectively accomplish this would truly result in education reform. Therefore, the first step in bringing about education reform is to provide academic rigor, vocational relevance and curricula relationships in programs that students see…
Cognitive control predicts use of model-based reinforcement learning.
Otto, A Ross; Skatova, Anya; Madlon-Kay, Seth; Daw, Nathaniel D
2015-02-01
Accounts of decision-making and its neural substrates have long posited the operation of separate, competing valuation systems in the control of choice behavior. Recent theoretical and experimental work suggest that this classic distinction between behaviorally and neurally dissociable systems for habitual and goal-directed (or more generally, automatic and controlled) choice may arise from two computational strategies for reinforcement learning (RL), called model-free and model-based RL, but the cognitive or computational processes by which one system may dominate over the other in the control of behavior is a matter of ongoing investigation. To elucidate this question, we leverage the theoretical framework of cognitive control, demonstrating that individual differences in utilization of goal-related contextual information--in the service of overcoming habitual, stimulus-driven responses--in established cognitive control paradigms predict model-based behavior in a separate, sequential choice task. The behavioral correspondence between cognitive control and model-based RL compellingly suggests that a common set of processes may underpin the two behaviors. In particular, computational mechanisms originally proposed to underlie controlled behavior may be applicable to understanding the interactions between model-based and model-free choice behavior.
Reinforcement learning and decision making in monkeys during a competitive game.
Lee, Daeyeol; Conroy, Michelle L; McGreevy, Benjamin P; Barraclough, Dominic J
2004-12-01
Animals living in a dynamic environment must adjust their decision-making strategies through experience. To gain insights into the neural basis of such adaptive decision-making processes, we trained monkeys to play a competitive game against a computer in an oculomotor free-choice task. The animal selected one of two visual targets in each trial and was rewarded only when it selected the same target as the computer opponent. To determine how the animal's decision-making strategy can be affected by the opponent's strategy, the computer opponent was programmed with three different algorithms that exploited different aspects of the animal's choice and reward history. When the computer selected its targets randomly with equal probabilities, animals selected one of the targets more often, violating the prediction of probability matching, and their choices were systematically influenced by the choice history of the two players. When the computer exploited only the animal's choice history but not its reward history, animal's choice became more independent of its own choice history but was still related to the choice history of the opponent. This bias was substantially reduced, but not completely eliminated, when the computer used the choice history of both players in making its predictions. These biases were consistent with the predictions of reinforcement learning, suggesting that the animals sought optimal decision-making strategies using reinforcement learning algorithms.
ERIC Educational Resources Information Center
Dunn-Kenney, Maylan
2010-01-01
Service learning is often used in teacher education as a way to challenge social bias and provide teacher candidates with skills needed to work in partnership with diverse families. Although some literature suggests that service learning could reinforce cultural bias, there is little documentation. In a study of 21 early childhood teacher…
Deep Gate Recurrent Neural Network
2016-11-22
Schmidhuber. A system for robotic heart surgery that learns to tie knots using recurrent neural networks. In IEEE International Conference on...tasks, such as Machine Translation (Bahdanau et al. (2015)) or Robot Reinforcement Learning (Bakker (2001)). The main idea behind these networks is to...and J. Peters. Reinforcement learning in robotics : A survey. The International Journal of Robotics Research, 32:1238–1274, 2013. ISSN 0278-3649. doi
Reinstated episodic context guides sampling-based decisions for reward.
Bornstein, Aaron M; Norman, Kenneth A
2017-07-01
How does experience inform decisions? In episodic sampling, decisions are guided by a few episodic memories of past choices. This process can yield choice patterns similar to model-free reinforcement learning; however, samples can vary from trial to trial, causing decisions to vary. Here we show that context retrieved during episodic sampling can cause choice behavior to deviate sharply from the predictions of reinforcement learning. Specifically, we show that, when a given memory is sampled, choices (in the present) are influenced by the properties of other decisions made in the same context as the sampled event. This effect is mediated by fMRI measures of context retrieval on each trial, suggesting a mechanism whereby cues trigger retrieval of context, which then triggers retrieval of other decisions from that context. This result establishes a new avenue by which experience can guide choice and, as such, has broad implications for the study of decisions.
Goal-directed EEG activity evoked by discriminative stimuli in reinforcement learning.
Luque, David; Morís, Joaquín; Rushby, Jacqueline A; Le Pelley, Mike E
2015-02-01
In reinforcement learning (RL), discriminative stimuli (S) allow agents to anticipate the value of a future outcome, and the response that will produce that outcome. We examined this processing by recording EEG locked to S during RL. Incentive value of outcomes and predictive value of S were manipulated, allowing us to discriminate between outcome-related and response-related activity. S predicting the correct response differed from nonpredictive S in the P2. S paired with high-value outcomes differed from those paired with low-value outcomes in a frontocentral positivity and in the P3b. A slow negativity then distinguished between predictive and nonpredictive S. These results suggest that, first, attention prioritizes detection of informative S. Activation of mental representations of these informative S then retrieves representations of outcomes, which in turn retrieve representations of responses that previously produced those outcomes. © 2014 Society for Psychophysiological Research.
Morphological elucidation of basal ganglia circuits contributing reward prediction
Fujiyama, Fumino; Takahashi, Susumu; Karube, Fuyuki
2015-01-01
Electrophysiological studies in monkeys have shown that dopaminergic neurons respond to the reward prediction error. In addition, striatal neurons alter their responsiveness to cortical or thalamic inputs in response to the dopamine signal, via the mechanism of dopamine-regulated synaptic plasticity. These findings have led to the hypothesis that the striatum exhibits synaptic plasticity under the influence of the reward prediction error and conduct reinforcement learning throughout the basal ganglia circuits. The reinforcement learning model is useful; however, the mechanism by which such a process emerges in the basal ganglia needs to be anatomically explained. The actor–critic model has been previously proposed and extended by the existence of role sharing within the striatum, focusing on the striosome/matrix compartments. However, this hypothesis has been difficult to confirm morphologically, partly because of the complex structure of the striosome/matrix compartments. Here, we review recent morphological studies that elucidate the input/output organization of the striatal compartments. PMID:25698913
Rodríguez-Gironés, Miguel A.; Trillo, Alejandro; Corcobado, Guadalupe
2013-01-01
The results of behavioural experiments provide important information about the structure and information-processing abilities of the visual system. Nevertheless, if we want to infer from behavioural data how the visual system operates, it is important to know how different learning protocols affect performance and to devise protocols that minimise noise in the response of experimental subjects. The purpose of this work was to investigate how reinforcement schedule and individual variability affect the learning process in a colour discrimination task. Free-flying bumblebees were trained to discriminate between two perceptually similar colours. The target colour was associated with sucrose solution, and the distractor could be associated with water or quinine solution throughout the experiment, or with one substance during the first half of the experiment and the other during the second half. Both acquisition and final performance of the discrimination task (measured as proportion of correct choices) were determined by the choice of reinforcer during the first half of the experiment: regardless of whether bees were trained with water or quinine during the second half of the experiment, bees trained with quinine during the first half learned the task faster and performed better during the whole experiment. Our results confirm that the choice of stimuli used during training affects the rate at which colour discrimination tasks are acquired and show that early contact with a strongly aversive stimulus can be sufficient to maintain high levels of attention during several hours. On the other hand, bees which took more time to decide on which flower to alight were more likely to make correct choices than bees which made fast decisions. This result supports the existence of a trade-off between foraging speed and accuracy, and highlights the importance of measuring choice latencies during behavioural experiments focusing on cognitive abilities. PMID:23951186
Self-regulation of the anterior insula: Reinforcement learning using real-time fMRI neurofeedback.
Lawrence, Emma J; Su, Li; Barker, Gareth J; Medford, Nick; Dalton, Jeffrey; Williams, Steve C R; Birbaumer, Niels; Veit, Ralf; Ranganatha, Sitaram; Bodurka, Jerzy; Brammer, Michael; Giampietro, Vincent; David, Anthony S
2014-03-01
The anterior insula (AI) plays a key role in affective processing, and insular dysfunction has been noted in several clinical conditions. Real-time functional MRI neurofeedback (rtfMRI-NF) provides a means of helping people learn to self-regulate activation in this brain region. Using the Blood Oxygenated Level Dependant (BOLD) signal from the right AI (RAI) as neurofeedback, we trained participants to increase RAI activation. In contrast, another group of participants was shown 'control' feedback from another brain area. Pre- and post-training affective probes were shown, with subjective ratings and skin conductance response (SCR) measured. We also investigated a reward-related reinforcement learning model of rtfMRI-NF. In contrast to the controls, we hypothesised a positive linear increase in RAI activation in participants shown feedback from this region, alongside increases in valence ratings and SCR to affective probes. Hypothesis-driven analyses showed a significant interaction between the RAI/control neurofeedback groups and the effect of self-regulation. Whole-brain analyses revealed a significant linear increase in RAI activation across four training runs in the group who received feedback from RAI. Increased activation was also observed in the caudate body and thalamus, likely representing feedback-related learning. No positive linear trend was observed in the RAI in the group receiving control feedback, suggesting that these data are not a general effect of cognitive strategy or control feedback. The control group did, however, show diffuse activation across the putamen, caudate and posterior insula which may indicate the representation of false feedback. No significant training-related behavioural differences were observed for valence ratings, or SCR. In addition, correlational analyses based on a reinforcement learning model showed that the dorsal anterior cingulate cortex underpinned learning in both groups. In summary, these data demonstrate that it is possible to regulate the RAI using rtfMRI-NF within one scanning session, and that such reward-related learning is mediated by the dorsal anterior cingulate. Copyright © 2013 Elsevier Inc. All rights reserved.
Dissociating hippocampal and striatal contributions to sequential prediction learning
Bornstein, Aaron M.; Daw, Nathaniel D.
2011-01-01
Behavior may be generated on the basis of many different kinds of learned contingencies. For instance, responses could be guided by the direct association between a stimulus and response, or by sequential stimulus-stimulus relationships (as in model-based reinforcement learning or goal-directed actions). However, the neural architecture underlying sequential predictive learning is not well-understood, in part because it is difficult to isolate its effect on choice behavior. To track such learning more directly, we examined reaction times (RTs) in a probabilistic sequential picture identification task. We used computational learning models to isolate trial-by-trial effects of two distinct learning processes in behavior, and used these as signatures to analyze the separate neural substrates of each process. RTs were best explained via the combination of two delta rule learning processes with different learning rates. To examine neural manifestations of these learning processes, we used functional magnetic resonance imaging to seek correlates of timeseries related to expectancy or surprise. We observed such correlates in two regions, hippocampus and striatum. By estimating the learning rates best explaining each signal, we verified that they were uniquely associated with one of the two distinct processes identified behaviorally. These differential correlates suggest that complementary anticipatory functions drive each region's effect on behavior. Our results provide novel insights as to the quantitative computational distinctions between medial temporal and basal ganglia learning networks and enable experiments that exploit trial-by-trial measurement of the unique contributions of both hippocampus and striatum to response behavior. PMID:22487032
Reinforcement learning for resource allocation in LEO satellite networks.
Usaha, Wipawee; Barria, Javier A
2007-06-01
In this paper, we develop and assess online decision-making algorithms for call admission and routing for low Earth orbit (LEO) satellite networks. It has been shown in a recent paper that, in a LEO satellite system, a semi-Markov decision process formulation of the call admission and routing problem can achieve better performance in terms of an average revenue function than existing routing methods. However, the conventional dynamic programming (DP) numerical solution becomes prohibited as the problem size increases. In this paper, two solution methods based on reinforcement learning (RL) are proposed in order to circumvent the computational burden of DP. The first method is based on an actor-critic method with temporal-difference (TD) learning. The second method is based on a critic-only method, called optimistic TD learning. The algorithms enhance performance in terms of requirements in storage, computational complexity and computational time, and in terms of an overall long-term average revenue function that penalizes blocked calls. Numerical studies are carried out, and the results obtained show that the RL framework can achieve up to 56% higher average revenue over existing routing methods used in LEO satellite networks with reasonable storage and computational requirements.
Reinforcement Learning Strategies for Clinical Trials in Non-small Cell Lung Cancer
Zhao, Yufan; Zeng, Donglin; Socinski, Mark A.; Kosorok, Michael R.
2010-01-01
Summary Typical regimens for advanced metastatic stage IIIB/IV non-small cell lung cancer (NSCLC) consist of multiple lines of treatment. We present an adaptive reinforcement learning approach to discover optimal individualized treatment regimens from a specially designed clinical trial (a “clinical reinforcement trial”) of an experimental treatment for patients with advanced NSCLC who have not been treated previously with systemic therapy. In addition to the complexity of the problem of selecting optimal compounds for first and second-line treatments based on prognostic factors, another primary goal is to determine the optimal time to initiate second-line therapy, either immediately or delayed after induction therapy, yielding the longest overall survival time. A reinforcement learning method called Q-learning is utilized which involves learning an optimal regimen from patient data generated from the clinical reinforcement trial. Approximating the Q-function with time-indexed parameters can be achieved by using a modification of support vector regression which can utilize censored data. Within this framework, a simulation study shows that the procedure can extract optimal regimens for two lines of treatment directly from clinical data without prior knowledge of the treatment effect mechanism. In addition, we demonstrate that the design reliably selects the best initial time for second-line therapy while taking into account the heterogeneity of NSCLC across patients. PMID:21385164
Goltz, Sonia M.
1992-01-01
Reinforcement process may underlie decisions frequently found in organizations to escalate investments of time, money and other resources in strategies (e.g., product development, capital investment, plant expansion) that do not result in immediate reinforces. Whereas cognitive biases have been proffered in previous explanations, the present analysis suggested that this persistence is a form of resistance to extinction arising from experiences with past investments that were variably reinforced. This explanation was examined in two experiments by varying the pattern of returns and losses subjects experienced for investment decisions prior to experiencing a series losses. Consistent with the proposed explanation, two conditions resulted in higher levels of recommitment during continuous losses: (a) training using a variable schedule of partial reinforcement, and (b) no training on the task. Results indicate that behavior analysis can be used to understand and control situations in organizations that are prone to escalation, such as investments in the research and development of new product lines and extensions of further loans to customers. PMID:16795785
Fidelity of the representation of value in decision-making
Dowding, Ben A.
2017-01-01
The ability to make optimal decisions depends on evaluating the expected rewards associated with different potential actions. This process is critically dependent on the fidelity with which reward value information can be maintained in the nervous system. Here we directly probe the fidelity of value representation following a standard reinforcement learning task. The results demonstrate a previously-unrecognized bias in the representation of value: extreme reward values, both low and high, are stored significantly more accurately and precisely than intermediate rewards. The symmetry between low and high rewards pertained despite substantially higher frequency of exposure to high rewards, resulting from preferential exploitation of more rewarding options. The observed variation in fidelity of value representation retrospectively predicted performance on the reinforcement learning task, demonstrating that the bias in representation has an impact on decision-making. A second experiment in which one or other extreme-valued option was omitted from the learning sequence showed that representational fidelity is primarily determined by the relative position of an encoded value on the scale of rewards experienced during learning. Both variability and guessing decreased with the reduction in the number of options, consistent with allocation of a limited representational resource. These findings have implications for existing models of reward-based learning, which typically assume defectless representation of reward value. PMID:28248958
Changing to Concept-Based Curricula: The Process for Nurse Educators.
Baron, Kristy A
2017-01-01
The complexity of health care today requires nursing graduates to use effective thinking skills. Many nursing programs are revising curricula to include concept-based learning that encourages problem-solving, effective thinking, and the ability to transfer knowledge to a variety of situations-requiring nurse educators to modify their teaching styles and methods to promote student-centered learning. Changing from teacher-centered learning to student-centered learning requires a major shift in thinking and application. The focus of this qualitative study was to understand the process of changing to concept-based curricula for nurse educators who previously taught in traditional curriculum designs. The sample included eight educators from two institutions in one Western state using a grounded theory design. The themes that emerged from participants' experiences consisted of the overarching concept, support for change, and central concept, finding meaning in the change. Finding meaning is supported by three main themes : preparing for the change, teaching in a concept-based curriculum, and understanding the teaching-learning process. Changing to a concept-based curriculum required a major shift in thinking and application. Through support, educators discovered meaning to make the change by constructing authentic learning opportunities that mirrored practice, refining the change process, and reinforcing benefits of teaching.
Interconnected growing self-organizing maps for auditory and semantic acquisition modeling
Cao, Mengxue; Li, Aijun; Fang, Qiang; Kaufmann, Emily; Kröger, Bernd J.
2014-01-01
Based on the incremental nature of knowledge acquisition, in this study we propose a growing self-organizing neural network approach for modeling the acquisition of auditory and semantic categories. We introduce an Interconnected Growing Self-Organizing Maps (I-GSOM) algorithm, which takes associations between auditory information and semantic information into consideration, in this paper. Direct phonetic–semantic association is simulated in order to model the language acquisition in early phases, such as the babbling and imitation stages, in which no phonological representations exist. Based on the I-GSOM algorithm, we conducted experiments using paired acoustic and semantic training data. We use a cyclical reinforcing and reviewing training procedure to model the teaching and learning process between children and their communication partners. A reinforcing-by-link training procedure and a link-forgetting procedure are introduced to model the acquisition of associative relations between auditory and semantic information. Experimental results indicate that (1) I-GSOM has good ability to learn auditory and semantic categories presented within the training data; (2) clear auditory and semantic boundaries can be found in the network representation; (3) cyclical reinforcing and reviewing training leads to a detailed categorization as well as to a detailed clustering, while keeping the clusters that have already been learned and the network structure that has already been developed stable; and (4) reinforcing-by-link training leads to well-perceived auditory–semantic associations. Our I-GSOM model suggests that it is important to associate auditory information with semantic information during language acquisition. Despite its high level of abstraction, our I-GSOM approach can be interpreted as a biologically-inspired neurocomputational model. PMID:24688478
Depression, Activity, and Evaluation of Reinforcement
ERIC Educational Resources Information Center
Hammen, Constance L.; Glass, David R., Jr.
1975-01-01
This research attempted to find the causal relation between mood and level of reinforcement. An effort was made to learn what mood change might occur if depressed subjects increased their levels of participation in reinforcing activities. (Author/RK)
Kinesthetic Reinforcement-Is It a Boon to Learning?
ERIC Educational Resources Information Center
Bohrer, Roxilu K.
1970-01-01
Language instruction, particularly in the elementary school, should be reinforced through the use of visual aids and through associated physical activity. Kinesthetic experiences provide an opportunity to make use of non-verbal cues to meaning, enliven classroom activities, and maximize learning for pupils. The author discusses the educational…
Reinforcing Basic Skills Through Social Studies. Grades 4-7.
ERIC Educational Resources Information Center
Lewis, Teresa Marie
Arranged into seven parts, this document provides a variety of games and activities, bulletin board ideas, overhead transparencies, student handouts, and learning station ideas to help reinforce basic social studies skills in the intermediate grades. In part 1, students learn about timelines, first constructing their own life timeline, then a…
Effects of Reinforcement on Peer Imitation in a Small Group Play Context
ERIC Educational Resources Information Center
Barton, Erin E.; Ledford, Jennifer R.
2018-01-01
Children with disabilities often have deficits in imitation skills, particularly in imitating peers. Imitation is considered a behavioral cusp--which, once learned, allows a child to access additional and previously unavailable learning opportunities. In the current study, researchers examined the efficacy of contingent reinforcement delivered…
Neurofeedback in Learning Disabled Children: Visual versus Auditory Reinforcement.
Fernández, Thalía; Bosch-Bayard, Jorge; Harmony, Thalía; Caballero, María I; Díaz-Comas, Lourdes; Galán, Lídice; Ricardo-Garcell, Josefina; Aubert, Eduardo; Otero-Ojeda, Gloria
2016-03-01
Children with learning disabilities (LD) frequently have an EEG characterized by an excess of theta and a deficit of alpha activities. NFB using an auditory stimulus as reinforcer has proven to be a useful tool to treat LD children by positively reinforcing decreases of the theta/alpha ratio. The aim of the present study was to optimize the NFB procedure by comparing the efficacy of visual (with eyes open) versus auditory (with eyes closed) reinforcers. Twenty LD children with an abnormally high theta/alpha ratio were randomly assigned to the Auditory or the Visual group, where a 500 Hz tone or a visual stimulus (a white square), respectively, was used as a positive reinforcer when the value of the theta/alpha ratio was reduced. Both groups had signs consistent with EEG maturation, but only the Auditory Group showed behavioral/cognitive improvements. In conclusion, the auditory reinforcer was more efficacious in reducing the theta/alpha ratio, and it improved the cognitive abilities more than the visual reinforcer.
Reinforcement learning agents providing advice in complex video games
NASA Astrophysics Data System (ADS)
Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa
2014-01-01
This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.
Harris, Justin A; Kwok, Dorothy W S
2018-01-01
During magazine approach conditioning, rats do not discriminate between a conditional stimulus (CS) that is consistently reinforced with food and a CS that is occasionally (partially) reinforced, as long as the CSs have the same overall reinforcement rate per second. This implies that rats are indifferent to the probability of reinforcement per trial. However, in the same rats, the per-trial reinforcement rate will affect subsequent extinction-responding extinguishes more rapidly for a CS that was consistently reinforced than for a partially reinforced CS. Here, we trained rats with consistently and partially reinforced CSs that were matched for overall reinforcement rate per second. We measured conditioned responding both during and immediately after the CSs. Differences in the per-trial probability of reinforcement did not affect the acquisition of responding during the CS but did affect subsequent extinction of that responding, and also affected the post-CS response rates during conditioning. Indeed, CSs with the same probability of reinforcement per trial evoked the same amount of post-CS responding even when they differed in overall reinforcement rate and thus evoked different amounts of responding during the CS. We conclude that reinforcement rate per second controls rats' acquisition of responding during the CS, but at the same time, rats also learn specifically about the probability of reinforcement per trial. The latter learning affects the rats' expectation of reinforcement as an outcome of the trial, which influences their ability to detect retrospectively that an opportunity for reinforcement was missed, and, in turn, drives extinction. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
ERIC Educational Resources Information Center
Zrinzo, Michelle; Greer, R. Douglas
2013-01-01
Prior research has demonstrated the establishment of reinforcers for learning and maintenance with young children as a function of social learning where a peer and an adult experimenter were present. The presence of an adult experimenter was eliminated in the present study to test if the effect produced in the prior studies would occur with only…
ERIC Educational Resources Information Center
Punnett, Audrey F.; Steinhauer, Gene D.
1984-01-01
Four reading disabled children were given eight sessions of ocular motor training with reinforcement and eight sessions without reinforcement. Two reading disabled control Ss were treated similarly but received no ocular motor training. Results demonstrated that reinforcement can improve ocular motor skills, which in turn elevates reading…
The evolution of continuous learning of the structure of the environment
Kolodny, Oren; Edelman, Shimon; Lotem, Arnon
2014-01-01
Continuous, ‘always on’, learning of structure from a stream of data is studied mainly in the fields of machine learning or language acquisition, but its evolutionary roots may go back to the first organisms that were internally motivated to learn and represent their environment. Here, we study under what conditions such continuous learning (CL) may be more adaptive than simple reinforcement learning and examine how it could have evolved from the same basic associative elements. We use agent-based computer simulations to compare three learning strategies: simple reinforcement learning; reinforcement learning with chaining (RL-chain) and CL that applies the same associative mechanisms used by the other strategies, but also seeks statistical regularities in the relations among all items in the environment, regardless of the initial association with food. We show that a sufficiently structured environment favours the evolution of both RL-chain and CL and that CL outperforms the other strategies when food is relatively rare and the time for learning is limited. This advantage of internally motivated CL stems from its ability to capture statistical patterns in the environment even before they are associated with food, at which point they immediately become useful for planning. PMID:24402920
The partial-reinforcement extinction effect and the contingent-sampling hypothesis.
Hochman, Guy; Erev, Ido
2013-12-01
The partial-reinforcement extinction effect (PREE) implies that learning under partial reinforcements is more robust than learning under full reinforcements. While the advantages of partial reinforcements have been well-documented in laboratory studies, field research has failed to support this prediction. In the present study, we aimed to clarify this pattern. Experiment 1 showed that partial reinforcements increase the tendency to select the promoted option during extinction; however, this effect is much smaller than the negative effect of partial reinforcements on the tendency to select the promoted option during the training phase. Experiment 2 demonstrated that the overall effect of partial reinforcements varies inversely with the attractiveness of the alternative to the promoted behavior: The overall effect is negative when the alternative is relatively attractive, and positive when the alternative is relatively unattractive. These results can be captured with a contingent-sampling model assuming that people select options that provided the best payoff in similar past experiences. The best fit was obtained under the assumption that similarity is defined by the sequence of the last four outcomes.
Goal-directed, habitual and Pavlovian prosocial behavior
Gęsiarz, Filip; Crockett, Molly J.
2015-01-01
Although prosocial behaviors have been widely studied across disciplines, the mechanisms underlying them are not fully understood. Evidence from psychology, biology and economics suggests that prosocial behaviors can be driven by a variety of seemingly opposing factors: altruism or egoism, intuition or deliberation, inborn instincts or learned dispositions, and utility derived from actions or their outcomes. Here we propose a framework inspired by research on reinforcement learning and decision making that links these processes and explains characteristics of prosocial behaviors in different contexts. More specifically, we suggest that prosocial behaviors inherit features of up to three decision-making systems employed to choose between self- and other- regarding acts: a goal-directed system that selects actions based on their predicted consequences, a habitual system that selects actions based on their reinforcement history, and a Pavlovian system that emits reflexive responses based on evolutionarily prescribed priors. This framework, initially described in the field of cognitive neuroscience and machine learning, provides insight into the potential neural circuits and computations shaping prosocial behaviors. Furthermore, it identifies specific conditions in which each of these three systems should dominate and promote other- or self- regarding behavior. PMID:26074797
Tiger salamanders' (Ambystoma tigrinum) response learning and usage of visual cues.
Kundey, Shannon M A; Millar, Roberto; McPherson, Justin; Gonzalez, Maya; Fitz, Aleyna; Allen, Chadbourne
2016-05-01
We explored tiger salamanders' (Ambystoma tigrinum) learning to execute a response within a maze as proximal visual cue conditions varied. In Experiment 1, salamanders learned to turn consistently in a T-maze for reinforcement before the maze was rotated. All learned the initial task and executed the trained turn during test, suggesting that they learned to demonstrate the reinforced response during training and continued to perform it during test. In a second experiment utilizing a similar procedure, two visual cues were placed consistently at the maze junction. Salamanders were reinforced for turning towards one cue. Cue placement was reversed during test. All learned the initial task, but executed the trained turn rather than turning towards the visual cue during test, evidencing response learning. In Experiment 3, we investigated whether a compound visual cue could control salamanders' behaviour when it was the only cue predictive of reinforcement in a cross-maze by varying start position and cue placement. All learned to turn in the direction indicated by the compound visual cue, indicating that visual cues can come to control their behaviour. Following training, testing revealed that salamanders attended to stimuli foreground over background features. Overall, these results suggest that salamanders learn to execute responses over learning to use visual cues but can use visual cues if required. Our success with this paradigm offers the potential in future studies to explore salamanders' cognition further, as well as to shed light on how features of the tiger salamanders' life history (e.g. hibernation and metamorphosis) impact cognition.
Optimal and Scalable Caching for 5G Using Reinforcement Learning of Space-Time Popularities
NASA Astrophysics Data System (ADS)
Sadeghi, Alireza; Sheikholeslami, Fatemeh; Giannakis, Georgios B.
2018-02-01
Small basestations (SBs) equipped with caching units have potential to handle the unprecedented demand growth in heterogeneous networks. Through low-rate, backhaul connections with the backbone, SBs can prefetch popular files during off-peak traffic hours, and service them to the edge at peak periods. To intelligently prefetch, each SB must learn what and when to cache, while taking into account SB memory limitations, the massive number of available contents, the unknown popularity profiles, as well as the space-time popularity dynamics of user file requests. In this work, local and global Markov processes model user requests, and a reinforcement learning (RL) framework is put forth for finding the optimal caching policy when the transition probabilities involved are unknown. Joint consideration of global and local popularity demands along with cache-refreshing costs allow for a simple, yet practical asynchronous caching approach. The novel RL-based caching relies on a Q-learning algorithm to implement the optimal policy in an online fashion, thus enabling the cache control unit at the SB to learn, track, and possibly adapt to the underlying dynamics. To endow the algorithm with scalability, a linear function approximation of the proposed Q-learning scheme is introduced, offering faster convergence as well as reduced complexity and memory requirements. Numerical tests corroborate the merits of the proposed approach in various realistic settings.
Henderson, Amanda; Schoonbeek, Sue; Ossenberg, Christine; Caddick, Alison; Wing, Diane; Capell, Lorna; Gould, Karen
2014-06-01
To critically analyse the success of staff's behaviour changes in the practice setting. Facilitators were employed to initiate and facilitate a four-step process (optimism, overcoming obstacles, oversight and reinforcing outcomes) that fostered development of behaviours consistent with learning in everyday practice. Many studies seek to engage staff in workplace behaviour improvement. The success of such studies is highly variable. Little is known about the work of the facilitator in ensuring success. Understanding the contextual factors that contribute to effective facilitation of workplace improvement is essential to ensure best use of resources. Mixed methods Facilitators employed a four-step process - optimism, overcoming obstacles, oversight and reinforcing outcomes - to stage behaviour change implementation. The analysis of staff engagement in behaviour changes was assessed through weekly observation of workplaces, informal discussions with staff and facilitator diaries. The impact of behaviour change was informed through pre- and postsurveys on staff's perception across three midwifery sites. Surveys measured (1) midwives' perception of support for their role in facilitating learning (Support Instrument for Nurses Facilitating the Learning of Others) and (2) development of a learning culture in midwifery practice settings (Clinical Learning Organisational Culture Survey). Midwives across three sites completed the presurvey (n = 216) and postsurvey (n = 90). Impact varied according to the degree that facilitators were able to progress teams through four stages necessary for change (OOORO). Statistically significant results were apparent in two subscales important for supporting staff, namely teamwork and acknowledgement; in the two areas, facilitators worked through 'obstacles' and coached staff in performing the desired behaviours and rewarded them for their success. Elements of the learning culture also statistically improved in one site. Findings suggest behaviour change success is dependent on facilitators to systematically engage staff through all four stages of implementation. It is important that investment is made to commitment and resources to all four stages before embarking on change processes. © 2013 John Wiley & Sons Ltd.
Intelligent multiagent coordination based on reinforcement hierarchical neuro-fuzzy models.
Mendoza, Leonardo Forero; Vellasco, Marley; Figueiredo, Karla
2014-12-01
This paper presents the research and development of two hybrid neuro-fuzzy models for the hierarchical coordination of multiple intelligent agents. The main objective of the models is to have multiple agents interact intelligently with each other in complex systems. We developed two new models of coordination for intelligent multiagent systems, which integrates the Reinforcement Learning Hierarchical Neuro-Fuzzy model with two proposed coordination mechanisms: the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with a market-driven coordination mechanism (MA-RL-HNFP-MD) and the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with graph coordination (MA-RL-HNFP-CG). In order to evaluate the proposed models and verify the contribution of the proposed coordination mechanisms, two multiagent benchmark applications were developed: the pursuit game and the robot soccer simulation. The results obtained demonstrated that the proposed coordination mechanisms greatly improve the performance of the multiagent system when compared with other strategies.
Refining Linear Fuzzy Rules by Reinforcement Learning
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Khedkar, Pratap S.; Malkani, Anil
1996-01-01
Linear fuzzy rules are increasingly being used in the development of fuzzy logic systems. Radial basis functions have also been used in the antecedents of the rules for clustering in product space which can automatically generate a set of linear fuzzy rules from an input/output data set. Manual methods are usually used in refining these rules. This paper presents a method for refining the parameters of these rules using reinforcement learning which can be applied in domains where supervised input-output data is not available and reinforcements are received only after a long sequence of actions. This is shown for a generalization of radial basis functions. The formation of fuzzy rules from data and their automatic refinement is an important step in closing the gap between the application of reinforcement learning methods in the domains where only some limited input-output data is available.
The control of tonic pain by active relief learning
Mano, Hiroaki; Lee, Michael; Yoshida, Wako; Kawato, Mitsuo; Robbins, Trevor W
2018-01-01
Tonic pain after injury characterises a behavioural state that prioritises recovery. Although generally suppressing cognition and attention, tonic pain needs to allow effective relief learning to reduce the cause of the pain. Here, we describe a central learning circuit that supports learning of relief and concurrently suppresses the level of ongoing pain. We used computational modelling of behavioural, physiological and neuroimaging data in two experiments in which subjects learned to terminate tonic pain in static and dynamic escape-learning paradigms. In both studies, we show that active relief-seeking involves a reinforcement learning process manifest by error signals observed in the dorsal putamen. Critically, this system uses an uncertainty (‘associability’) signal detected in pregenual anterior cingulate cortex that both controls the relief learning rate, and endogenously and parametrically modulates the level of tonic pain. The results define a self-organising learning circuit that reduces ongoing pain when learning about potential relief. PMID:29482716
Atlas, Lauren Y; Doll, Bradley B; Li, Jian; Daw, Nathaniel D; Phelps, Elizabeth A
2016-01-01
Socially-conveyed rules and instructions strongly shape expectations and emotions. Yet most neuroscientific studies of learning consider reinforcement history alone, irrespective of knowledge acquired through other means. We examined fear conditioning and reversal in humans to test whether instructed knowledge modulates the neural mechanisms of feedback-driven learning. One group was informed about contingencies and reversals. A second group learned only from reinforcement. We combined quantitative models with functional magnetic resonance imaging and found that instructions induced dissociations in the neural systems of aversive learning. Responses in striatum and orbitofrontal cortex updated with instructions and correlated with prefrontal responses to instructions. Amygdala responses were influenced by reinforcement similarly in both groups and did not update with instructions. Results extend work on instructed reward learning and reveal novel dissociations that have not been observed with punishments or rewards. Findings support theories of specialized threat-detection and may have implications for fear maintenance in anxiety. DOI: http://dx.doi.org/10.7554/eLife.15192.001 PMID:27171199
Flow Navigation by Smart Microswimmers via Reinforcement Learning
NASA Astrophysics Data System (ADS)
Colabrese, Simona; Gustavsson, Kristian; Celani, Antonio; Biferale, Luca
2017-04-01
Smart active particles can acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. Their goal is to learn the best way to navigate by exploiting the underlying flow whenever possible. As an example, we focus our attention on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, given the constraints enforced by fluid mechanics. By means of numerical experiments, we show that swimmers indeed learn nearly optimal strategies just by experience. A reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This Letter illustrates the potential of reinforcement learning algorithms to model adaptive behavior in complex flows and paves the way towards the engineering of smart microswimmers that solve difficult navigation problems.
Learning and tuning fuzzy logic controllers through reinforcements
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Khedkar, Pratap
1992-01-01
A new method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system is presented. In particular, our Generalized Approximate Reasoning-based Intelligent Control (GARIC) architecture: (1) learns and tunes a fuzzy logic controller even when only weak reinforcements, such as a binary failure signal, is available; (2) introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; (3) introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and (4) learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward network, which can then adaptively improve performance by using gradient descent methods. We extend the AHC algorithm of Barto, Sutton, and Anderson to include the prior control knowledge of human operators. The GARIC architecture is applied to a cart-pole balancing system and has demonstrated significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.
Opposite Actions of Dopamine on Aversive and Appetitive Memories in the Crab
ERIC Educational Resources Information Center
Klappenbach, Martin; Maldonado, Hector; Locatelli, Fernando; Kaczer, Laura
2012-01-01
The understanding of how the reinforcement is represented in the central nervous system during memory formation is a current issue in neurobiology. Several studies in insects provide evidence of the instructive role of biogenic amines during the learning and memory process. In insects it was widely accepted that dopamine (DA) mediates aversive…
ERIC Educational Resources Information Center
Waltereit, Robert; Mannhardt, Sonke; Nescholta, Sabine; Maser-Gluth, Christiane; Bartsch, Dusan
2008-01-01
Memory extinction, defined as a decrease of a conditioned response as a function of a non-reinforced conditioned stimulus presentation, has high biological and clinical relevance. Extinction is not a passive reversing or erasing of the plasticity associated with acquisition, but a novel, active learning process. Nifedipine blocks L-type voltage…
ERIC Educational Resources Information Center
Vidigal, Luis
1995-01-01
Examines education and childhood in Portugal. Uses oral history methods in an educational context, exploring oral statements pedagogically. Considers these statements especially suitable to maintaining aspects of collective memory and social identity, reinforcing students' national and regional identities. Suggests this is very important in…
Effects of Intrinsic Motivation on Feedback Processing During Learning
DePasque, Samantha; Tricomi, Elizabeth
2015-01-01
Learning commonly requires feedback about the consequences of one’s actions, which can drive learners to modify their behavior. Motivation may determine how sensitive an individual might be to such feedback, particularly in educational contexts where some students value academic achievement more than others. Thus, motivation for a task might influence the value placed on performance feedback and how effectively it is used to improve learning. To investigate the interplay between intrinsic motivation and feedback processing, we used functional magnetic resonance imaging (fMRI) during feedback-based learning before and after a novel manipulation based on motivational interviewing, a technique for enhancing treatment motivation in mental health settings. Because of its role in the reinforcement learning system, the striatum is situated to play a significant role in the modulation of learning based on motivation. Consistent with this idea, motivation levels during the task were associated with sensitivity to positive versus negative feedback in the striatum. Additionally, heightened motivation following a brief motivational interview was associated with increases in feedback sensitivity in the left medial temporal lobe. Our results suggest that motivation modulates neural responses to performance-related feedback, and furthermore that changes in motivation facilitates processing in areas that support learning and memory. PMID:26112370
Categorization = Decision Making + Generalization
Seger, Carol A; Peterson, Erik J.
2013-01-01
We rarely, if ever, repeatedly encounter exactly the same situation. This makes generalization crucial for real world decision making. We argue that categorization, the study of generalizable representations, is a type of decision making, and that categorization learning research would benefit from approaches developed to study the neuroscience of decision making. Similarly, methods developed to examine generalization and learning within the field of categorization may enhance decision making research. We first discuss perceptual information processing and integration, with an emphasis on accumulator models. We then examine learning the value of different decision making choices via experience, emphasizing reinforcement learning modeling approaches. Next we discuss how value is combined with other factors in decision making, emphasizing the effects of uncertainty. Finally, we describe how a final decision is selected via thresholding processes implemented by the basal ganglia and related regions. We also consider how memory related functions in the hippocampus may be integrated with decision making mechanisms and contribute to categorization. PMID:23548891
ERIC Educational Resources Information Center
Hammerer, Dorothea; Li, Shu-Chen; Muller, Viktor; Lindenberger, Ulman
2011-01-01
By recording the feedback-related negativity (FRN) in response to gains and losses, we investigated the contribution of outcome monitoring mechanisms to age-associated differences in probabilistic reinforcement learning. Specifically, we assessed the difference of the monitoring reactions to gains and losses to investigate the monitoring of…
Reinforcement Learning in Young Adults with Developmental Language Impairment
ERIC Educational Resources Information Center
Lee, Joanna C.; Tomblin, J. Bruce
2012-01-01
The aim of the study was to examine reinforcement learning (RL) in young adults with developmental language impairment (DLI) within the context of a neurocomputational model of the basal ganglia-dopamine system (Frank, Seeberger, & O'Reilly, 2004). Two groups of young adults, one with DLI and the other without, were recruited. A probabilistic…
Effective Reinforcement Techniques in Elementary Physical Education: The Key to Behavior Management
ERIC Educational Resources Information Center
Downing, John; Keating, Tedd; Bennett, Carl
2005-01-01
The ability to shape appropriate behavior while extinguishing misbehavior is critical to teaching and learning in physical education. The scientific principles that affect student learning in the gymnasium also apply to the methods teachers use to influence social behaviors. Research indicates that reinforcement strategies are more effective than…
Reinforcement learning state estimator.
Morimoto, Jun; Doya, Kenji
2007-03-01
In this study, we propose a novel use of reinforcement learning for estimating hidden variables and parameters of nonlinear dynamical systems. A critical issue in hidden-state estimation is that we cannot directly observe estimation errors. However, by defining errors of observable variables as a delayed penalty, we can apply a reinforcement learning frame-work to state estimation problems. Specifically, we derive a method to construct a nonlinear state estimator by finding an appropriate feedback input gain using the policy gradient method. We tested the proposed method on single pendulum dynamics and show that the joint angle variable could be successfully estimated by observing only the angular velocity, and vice versa. In addition, we show that we could acquire a state estimator for the pendulum swing-up task in which a swing-up controller is also acquired by reinforcement learning simultaneously. Furthermore, we demonstrate that it is possible to estimate the dynamics of the pendulum itself while the hidden variables are estimated in the pendulum swing-up task. Application of the proposed method to a two-linked biped model is also presented.
Reciprocity Family Counseling: A Multi-Ethnic Model.
ERIC Educational Resources Information Center
Penrose, David M.
The Reciprocity Family Counseling Method involves learning principles of behavior modification including selective reinforcement, behavioral contracting, self-correction, and over-correction. Selective reinforcement refers to the recognition and modification of parent/child responses and reinforcers. Parents and children are asked to identify…
Reinforcement learning: Solving two case studies
NASA Astrophysics Data System (ADS)
Duarte, Ana Filipa; Silva, Pedro; dos Santos, Cristina Peixoto
2012-09-01
Reinforcement Learning algorithms offer interesting features for the control of autonomous systems, such as the ability to learn from direct interaction with the environment, and the use of a simple reward signalas opposed to the input-outputs pairsused in classic supervised learning. The reward signal indicates the success of failure of the actions executed by the agent in the environment. In this work, are described RL algorithmsapplied to two case studies: the Crawler robot and the widely known inverted pendulum. We explore RL capabilities to autonomously learn a basic locomotion pattern in the Crawler, andapproach the balancing problem of biped locomotion using the inverted pendulum.
Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning
Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien
2015-01-01
Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction. PMID:26065018
Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning.
Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien
2015-01-01
Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction.
A comparison of differential reinforcement procedures with children with autism.
Boudreau, Brittany A; Vladescu, Jason C; Kodak, Tiffany M; Argott, Paul J; Kisamore, April N
2015-12-01
The current evaluation compared the effects of 2 differential reinforcement arrangements and a nondifferential reinforcement arrangement on the acquisition of tacts for 3 children with autism. Participants learned in all reinforcement-based conditions, and we discuss areas for future research in light of these findings and potential limitations. © Society for the Experimental Analysis of Behavior.
Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis
2013-01-01
Background Depression is characterised partly by blunted reactions to reward. However, tasks probing this deficiency have not distinguished insensitivity to reward from insensitivity to the prediction errors for reward that determine learning and are putatively reported by the phasic activity of dopamine neurons. We attempted to disentangle these factors with respect to anhedonia in the context of stress, Major Depressive Disorder (MDD), Bipolar Disorder (BPD) and a dopaminergic challenge. Methods Six behavioural datasets involving 392 experimental sessions were subjected to a model-based, Bayesian meta-analysis. Participants across all six studies performed a probabilistic reward task that used an asymmetric reinforcement schedule to assess reward learning. Healthy controls were tested under baseline conditions, stress or after receiving the dopamine D2 agonist pramipexole. In addition, participants with current or past MDD or BPD were evaluated. Reinforcement learning models isolated the contributions of variation in reward sensitivity and learning rate. Results MDD and anhedonia reduced reward sensitivity more than they affected the learning rate, while a low dose of the dopamine D2 agonist pramipexole showed the opposite pattern. Stress led to a pattern consistent with a mixed effect on reward sensitivity and learning rate. Conclusion Reward-related learning reflected at least two partially separable contributions. The first related to phasic prediction error signalling, and was preferentially modulated by a low dose of the dopamine agonist pramipexole. The second related directly to reward sensitivity, and was preferentially reduced in MDD and anhedonia. Stress altered both components. Collectively, these findings highlight the contribution of model-based reinforcement learning meta-analysis for dissecting anhedonic behavior. PMID:23782813
Schönberg, Tom; Daw, Nathaniel D; Joel, Daphna; O'Doherty, John P
2007-11-21
The computational framework of reinforcement learning has been used to forward our understanding of the neural mechanisms underlying reward learning and decision-making behavior. It is known that humans vary widely in their performance in decision-making tasks. Here, we used a simple four-armed bandit task in which subjects are almost evenly split into two groups on the basis of their performance: those who do learn to favor choice of the optimal action and those who do not. Using models of reinforcement learning we sought to determine the neural basis of these intrinsic differences in performance by scanning both groups with functional magnetic resonance imaging. We scanned 29 subjects while they performed the reward-based decision-making task. Our results suggest that these two groups differ markedly in the degree to which reinforcement learning signals in the striatum are engaged during task performance. While the learners showed robust prediction error signals in both the ventral and dorsal striatum during learning, the nonlearner group showed a marked absence of such signals. Moreover, the magnitude of prediction error signals in a region of dorsal striatum correlated significantly with a measure of behavioral performance across all subjects. These findings support a crucial role of prediction error signals, likely originating from dopaminergic midbrain neurons, in enabling learning of action selection preferences on the basis of obtained rewards. Thus, spontaneously observed individual differences in decision making performance demonstrate the suggested dependence of this type of learning on the functional integrity of the dopaminergic striatal system in humans.
Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.
Ezaki, Takahiro; Horita, Yutaka; Takezawa, Masanori; Masuda, Naoki
2016-07-01
Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody) conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.
Vicarious reinforcement learning signals when instructing others.
Apps, Matthew A J; Lesage, Elise; Ramnani, Narender
2015-02-18
Reinforcement learning (RL) theory posits that learning is driven by discrepancies between the predicted and actual outcomes of actions (prediction errors [PEs]). In social environments, learning is often guided by similar RL mechanisms. For example, teachers monitor the actions of students and provide feedback to them. This feedback evokes PEs in students that guide their learning. We report the first study that investigates the neural mechanisms that underpin RL signals in the brain of a teacher. Neurons in the anterior cingulate cortex (ACC) signal PEs when learning from the outcomes of one's own actions but also signal information when outcomes are received by others. Does a teacher's ACC signal PEs when monitoring a student's learning? Using fMRI, we studied brain activity in human subjects (teachers) as they taught a confederate (student) action-outcome associations by providing positive or negative feedback. We examined activity time-locked to the students' responses, when teachers infer student predictions and know actual outcomes. We fitted a RL-based computational model to the behavior of the student to characterize their learning, and examined whether a teacher's ACC signals when a student's predictions are wrong. In line with our hypothesis, activity in the teacher's ACC covaried with the PE values in the model. Additionally, activity in the teacher's insula and ventromedial prefrontal cortex covaried with the predicted value according to the student. Our findings highlight that the ACC signals PEs vicariously for others' erroneous predictions, when monitoring and instructing their learning. These results suggest that RL mechanisms, processed vicariously, may underpin and facilitate teaching behaviors. Copyright © 2015 Apps et al.
Social stress reactivity alters reward and punishment learning
Frank, Michael J.; Allen, John J. B.
2011-01-01
To examine how stress affects cognitive functioning, individual differences in trait vulnerability (punishment sensitivity) and state reactivity (negative affect) to social evaluative threat were examined during concurrent reinforcement learning. Lower trait-level punishment sensitivity predicted better reward learning and poorer punishment learning; the opposite pattern was found in more punishment sensitive individuals. Increasing state-level negative affect was directly related to punishment learning accuracy in highly punishment sensitive individuals, but these measures were inversely related in less sensitive individuals. Combined electrophysiological measurement, performance accuracy and computational estimations of learning parameters suggest that trait and state vulnerability to stress alter cortico-striatal functioning during reinforcement learning, possibly mediated via medio-frontal cortical systems. PMID:20453038
Social stress reactivity alters reward and punishment learning.
Cavanagh, James F; Frank, Michael J; Allen, John J B
2011-06-01
To examine how stress affects cognitive functioning, individual differences in trait vulnerability (punishment sensitivity) and state reactivity (negative affect) to social evaluative threat were examined during concurrent reinforcement learning. Lower trait-level punishment sensitivity predicted better reward learning and poorer punishment learning; the opposite pattern was found in more punishment sensitive individuals. Increasing state-level negative affect was directly related to punishment learning accuracy in highly punishment sensitive individuals, but these measures were inversely related in less sensitive individuals. Combined electrophysiological measurement, performance accuracy and computational estimations of learning parameters suggest that trait and state vulnerability to stress alter cortico-striatal functioning during reinforcement learning, possibly mediated via medio-frontal cortical systems.
Strauss, Gregory P; Thaler, Nicholas S; Matveeva, Tatyana M; Vogel, Sally J; Sutton, Griffin P; Lee, Bern G; Allen, Daniel N
2015-08-01
There is increasing evidence that schizophrenia (SZ) and bipolar disorder (BD) share a number of cognitive, neurobiological, and genetic markers. Shared features may be most prevalent among SZ and BD with a history of psychosis. This study extended this literature by examining reinforcement learning (RL) performance in individuals with SZ (n = 29), BD with a history of psychosis (BD+; n = 24), BD without a history of psychosis (BD-; n = 23), and healthy controls (HC; n = 24). RL was assessed through a probabilistic stimulus selection task with acquisition and test phases. Computational modeling evaluated competing accounts of the data. Each participant's trial-by-trial decision-making behavior was fit to 3 computational models of RL: (a) a standard actor-critic model simulating pure basal ganglia-dependent learning, (b) a pure Q-learning model simulating action selection as a function of learned expected reward value, and (c) a hybrid model where an actor-critic is "augmented" by a Q-learning component, meant to capture the top-down influence of orbitofrontal cortex value representations on the striatum. The SZ group demonstrated greater reinforcement learning impairments at acquisition and test phases than the BD+, BD-, and HC groups. The BD+ and BD- groups displayed comparable performance at acquisition and test phases. Collapsing across diagnostic categories, greater severity of current psychosis was associated with poorer acquisition of the most rewarding stimuli as well as poor go/no-go learning at test. Model fits revealed that reinforcement learning in SZ was best characterized by a pure actor-critic model where learning is driven by prediction error signaling alone. In contrast, BD-, BD+, and HC were best fit by a hybrid model where prediction errors are influenced by top-down expected value representations that guide decision making. These findings suggest that abnormalities in the reward system are more prominent in SZ than BD; however, current psychotic symptoms may be associated with reinforcement learning deficits regardless of a Diagnostic and Statistical Manual of Mental Disorders (5th Edition; American Psychiatric Association, 2013) diagnosis. (c) 2015 APA, all rights reserved).
Error-Induced Learning as a Resource-Adaptive Process in Young and Elderly Individuals
NASA Astrophysics Data System (ADS)
Ferdinand, Nicola K.; Weiten, Anja; Mecklinger, Axel; Kray, Jutta
Thorndike described in his law of effect [44] that actions followed by positive events are more likely to be repeated in the future, whereas actions that are followed by negative outcomes are less likely to be repeated. This implies that behavior is evaluated in the light of its potential consequences, and non-reward events (i.e., errors) must be detected for reinforcement learning to take place. In short, humans have to monitor their performance in order to detect and correct errors, and this allows them to successfully adapt their behavior to changing environmental demands and acquire new behavior, i.e., to learn.
A review of reward processing and motivational impairment in schizophrenia.
Strauss, Gregory P; Waltz, James A; Gold, James M
2014-03-01
This article reviews and synthesizes research on reward processing in schizophrenia, which has begun to provide important insights into the cognitive and neural mechanisms associated with motivational impairments. Aberrant cortical-striatal interactions may be involved with multiple reward processing abnormalities, including: (1) dopamine-mediated basal ganglia systems that support reinforcement learning and the ability to predict cues that lead to rewarding outcomes; (2) orbitofrontal cortex-driven deficits in generating, updating, and maintaining value representations; (3) aberrant effort-value computations, which may be mediated by disrupted anterior cingulate cortex and midbrain dopamine functioning; and (4) altered activation of the prefrontal cortex, which is important for generating exploratory behaviors in environments where reward outcomes are uncertain. It will be important for psychosocial interventions targeting negative symptoms to account for abnormalities in each of these reward processes, which may also have important interactions; suggestions for novel behavioral intervention strategies that make use of external cues, reinforcers, and mobile technology are discussed.
Differential Training Facilitates Early Consolidation in Motor Learning
Henz, Diana; Schöllhorn, Wolfgang I.
2016-01-01
Current research demonstrates increased learning rates in differential learning (DL) compared to repetitive training. To date, little is known on the underlying neurophysiological processes in DL that contribute to superior performance over repetitive practice. In the present study, we measured electroencephalographic (EEG) brain activation patterns after DL and repetitive badminton serve training. Twenty-four semi-professional badminton players performed badminton serves in a DL and repetitive training schedule in a within-subjects design. EEG activity was recorded from 19 electrodes according to the 10–20 system before and immediately after each 20-min exercise. Increased theta activity was obtained in contralateral parieto-occipital regions after DL. Further, increased posterior alpha activity was obtained in DL compared to repetitive training. Results indicate different underlying neuronal processes in DL and repetitive training with a higher involvement of parieto-occipital areas in DL. We argue that DL facilitates early consolidation in motor learning indicated by post-training increases in theta and alpha activity. Further, brain activation patterns indicate somatosensory working memory processes where attentional resources are allocated in processing of somatosensory information in DL. Reinforcing a somatosensory memory trace might explain increased motor learning rates in DL. Finally, this memory trace is more stable against interference from internal and external disturbances that afford executively controlled processing such as attentional processes. PMID:27818627
Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning.
Sidarta, Ananda; Vahdat, Shahabeddin; Bernardi, Nicolò F; Ostry, David J
2016-11-16
As one learns to dance or play tennis, the desired somatosensory state is typically unknown. Trial and error is important as motor behavior is shaped by successful and unsuccessful movements. As an experimental model, we designed a task in which human participants make reaching movements to a hidden target and receive positive reinforcement when successful. We identified somatic and reinforcement-based sources of plasticity on the basis of changes in functional connectivity using resting-state fMRI before and after learning. The neuroimaging data revealed reinforcement-related changes in both motor and somatosensory brain areas in which a strengthening of connectivity was related to the amount of positive reinforcement during learning. Areas of prefrontal cortex were similarly altered in relation to reinforcement, with connectivity between sensorimotor areas of putamen and the reward-related ventromedial prefrontal cortex strengthened in relation to the amount of successful feedback received. In other analyses, we assessed connectivity related to changes in movement direction between trials, a type of variability that presumably reflects exploratory strategies during learning. We found that connectivity in a network linking motor and somatosensory cortices increased with trial-to-trial changes in direction. Connectivity varied as well with the change in movement direction following incorrect movements. Here the changes were observed in a somatic memory and decision making network involving ventrolateral prefrontal cortex and second somatosensory cortex. Our results point to the idea that the initial stages of motor learning are not wholly motor but rather involve plasticity in somatic and prefrontal networks related both to reward and exploration. In the initial stages of motor learning, the placement of the limbs is learned primarily through trial and error. In an experimental analog, participants make reaching movements to a hidden target and receive positive feedback when successful. We identified sources of plasticity based on changes in functional connectivity using resting-state fMRI. The main finding is that there is a strengthening of connectivity between reward-related prefrontal areas and sensorimotor areas in the basal ganglia and frontal cortex. There is also a strengthening of connectivity related to movement exploration in sensorimotor circuits involved in somatic memory and decision making. The results indicate that initial stages of motor learning depend on plasticity in somatic and prefrontal networks related to reward and exploration. Copyright © 2016 the authors 0270-6474/16/3611682-11$15.00/0.
Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning
Sidarta, Ananda; Vahdat, Shahabeddin; Bernardi, Nicolò F.
2016-01-01
As one learns to dance or play tennis, the desired somatosensory state is typically unknown. Trial and error is important as motor behavior is shaped by successful and unsuccessful movements. As an experimental model, we designed a task in which human participants make reaching movements to a hidden target and receive positive reinforcement when successful. We identified somatic and reinforcement-based sources of plasticity on the basis of changes in functional connectivity using resting-state fMRI before and after learning. The neuroimaging data revealed reinforcement-related changes in both motor and somatosensory brain areas in which a strengthening of connectivity was related to the amount of positive reinforcement during learning. Areas of prefrontal cortex were similarly altered in relation to reinforcement, with connectivity between sensorimotor areas of putamen and the reward-related ventromedial prefrontal cortex strengthened in relation to the amount of successful feedback received. In other analyses, we assessed connectivity related to changes in movement direction between trials, a type of variability that presumably reflects exploratory strategies during learning. We found that connectivity in a network linking motor and somatosensory cortices increased with trial-to-trial changes in direction. Connectivity varied as well with the change in movement direction following incorrect movements. Here the changes were observed in a somatic memory and decision making network involving ventrolateral prefrontal cortex and second somatosensory cortex. Our results point to the idea that the initial stages of motor learning are not wholly motor but rather involve plasticity in somatic and prefrontal networks related both to reward and exploration. SIGNIFICANCE STATEMENT In the initial stages of motor learning, the placement of the limbs is learned primarily through trial and error. In an experimental analog, participants make reaching movements to a hidden target and receive positive feedback when successful. We identified sources of plasticity based on changes in functional connectivity using resting-state fMRI. The main finding is that there is a strengthening of connectivity between reward-related prefrontal areas and sensorimotor areas in the basal ganglia and frontal cortex. There is also a strengthening of connectivity related to movement exploration in sensorimotor circuits involved in somatic memory and decision making. The results indicate that initial stages of motor learning depend on plasticity in somatic and prefrontal networks related to reward and exploration. PMID:27852776
Pérez, Omar D; Aitken, Michael R F; Zhukovsky, Peter; Soto, Fabián A; Urcelay, Gonzalo P; Dickinson, Anthony
2016-12-15
Associative learning theories regard the probability of reinforcement as the critical factor determining responding. However, the role of this factor in instrumental conditioning is not completely clear. In fact, free-operant experiments show that participants respond at a higher rate on variable ratio than on variable interval schedules even though the reinforcement probability is matched between the schedules. This difference has been attributed to the differential reinforcement of long inter-response times (IRTs) by interval schedules, which acts to slow responding. In the present study, we used a novel experimental design to investigate human responding under random ratio (RR) and regulated probability interval (RPI) schedules, a type of interval schedule that sets a reinforcement probability independently of the IRT duration. Participants responded on each type of schedule before a final choice test in which they distributed responding between two schedules similar to those experienced during training. Although response rates did not differ during training, the participants responded at a lower rate on the RPI schedule than on the matched RR schedule during the choice test. This preference cannot be attributed to a higher probability of reinforcement for long IRTs and questions the idea that similar associative processes underlie classical and instrumental conditioning.
Wood, Rodger Ll; Alderman, Nick
2011-01-01
For more than 3 decades, interventions derived from learning theory have been delivered within a neurobehavioral framework to manage challenging behavior after traumatic brain injury with the aim of promoting engagement in the rehabilitation process and ameliorating social handicap. Learning theory provides a conceptual structure that facilitates our ability to understand the relationship between challenging behavior and environmental contingencies, while accommodating the constraints upon learning imposed by impaired cognition. Interventions derived from operant learning theory have most frequently been described in the literature because this method of associational learning provides good evidence for the effectiveness of differential reinforcement methods. This article therefore examines the efficacy of applying operant learning theory to manage challenging behavior after TBI as well as some of the limitations of this approach. Future developments in the application of learning theory are also considered.
Higher incentives can impair performance: neural evidence on reinforcement and rationality.
Achtziger, Anja; Alós-Ferrer, Carlos; Hügelschäfer, Sabine; Steinhauser, Marco
2015-11-01
Standard economic thinking postulates that increased monetary incentives should increase performance. Human decision makers, however, frequently focus on past performance, a form of reinforcement learning occasionally at odds with rational decision making. We used an incentivized belief-updating task from economics to investigate this conflict through measurements of neural correlates of reward processing. We found that higher incentives fail to improve performance when immediate feedback on decision outcomes is provided. Subsequent analysis of the feedback-related negativity, an early event-related potential following feedback, revealed the mechanism behind this paradoxical effect. As incentives increase, the win/lose feedback becomes more prominent, leading to an increased reliance on reinforcement and more errors. This mechanism is relevant for economic decision making and the debate on performance-based payment. © The Author (2015). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Prefrontal cortical BDNF: A regulatory key in cocaine- and food-reinforced behaviors
Pitts, Elizabeth G.; Taylor, Jane R.; Gourley, Shannon L.
2016-01-01
Brain-derived neurotrophic factor (BDNF) affects synaptic plasticity and neural structure and plays key roles in learning and memory processes. Recent evidence also points to important, yet complex, roles for BDNF in rodent models of cocaine abuse and addiction. Here we examine the role of prefrontal cortical (PFC) BDNF in reward-related decision making and behavioral sensitivity to, and responding for, cocaine. We focus on BDNF within the medial and orbital PFC, its regulation by cocaine during early postnatal development and in adulthood, and how BDNF in turn influences responding for drug reinforcement, including in reinstatement models. When relevant, we draw comparisons and contrasts with experiments using natural (food) reinforcers. We also summarize findings supporting, or refuting, the possibility that BDNF in the medial and orbital PFC regulate the development and maintenance of stimulus-response habits. Further investigation could assist in the development of novel treatment approaches for cocaine use disorders. PMID:26923993
Deep Learning of Orthographic Representations in Baboons
Hannagan, Thomas; Ziegler, Johannes C.; Dufau, Stéphane; Fagot, Joël; Grainger, Jonathan
2014-01-01
What is the origin of our ability to learn orthographic knowledge? We use deep convolutional networks to emulate the primate's ventral visual stream and explore the recent finding that baboons can be trained to discriminate English words from nonwords [1]. The networks were exposed to the exact same sequence of stimuli and reinforcement signals as the baboons in the experiment, and learned to map real visual inputs (pixels) of letter strings onto binary word/nonword responses. We show that the networks' highest levels of representations were indeed sensitive to letter combinations as postulated in our previous research. The model also captured the key empirical findings, such as generalization to novel words, along with some intriguing inter-individual differences. The present work shows the merits of deep learning networks that can simulate the whole processing chain all the way from the visual input to the response while allowing researchers to analyze the complex representations that emerge during the learning process. PMID:24416300
What is the optimal task difficulty for reinforcement learning of brain self-regulation?
Bauer, Robert; Vukelić, Mathias; Gharabaghi, Alireza
2016-09-01
The balance between action and reward during neurofeedback may influence reinforcement learning of brain self-regulation. Eleven healthy volunteers participated in three runs of motor imagery-based brain-machine interface feedback where a robot passively opened the hand contingent to β-band modulation. For each run, the β-desynchronization threshold to initiate the hand robot movement increased in difficulty (low, moderate, and demanding). In this context, the incentive to learn was estimated by the change of reward per action, operationalized as the change in reward duration per movement onset. Variance analysis revealed a significant interaction between threshold difficulty and the relationship between reward duration and number of movement onsets (p<0.001), indicating a negative learning incentive for low difficulty, but a positive learning incentive for moderate and demanding runs. Exploration of different thresholds in the same data set indicated that the learning incentive peaked at higher thresholds than the threshold which resulted in maximum classification accuracy. Specificity is more important than sensitivity of neurofeedback for reinforcement learning of brain self-regulation. Learning efficiency requires adequate challenge by neurofeedback interventions. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Robust reinforcement learning.
Morimoto, Jun; Doya, Kenji
2005-02-01
This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both offline learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H(infinity) control, we consider a differential game in which a "disturbing" agent tries to make the worst possible disturbance while a "control" agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H(infinity) control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.
Fee, Michale S.
2012-01-01
In its simplest formulation, reinforcement learning is based on the idea that if an action taken in a particular context is followed by a favorable outcome, then, in the same context, the tendency to produce that action should be strengthened, or reinforced. While reinforcement learning forms the basis of many current theories of basal ganglia (BG) function, these models do not incorporate distinct computational roles for signals that convey context, and those that convey what action an animal takes. Recent experiments in the songbird suggest that vocal-related BG circuitry receives two functionally distinct excitatory inputs. One input is from a cortical region that carries context information about the current “time” in the motor sequence. The other is an efference copy of motor commands from a separate cortical brain region that generates vocal variability during learning. Based on these findings, I propose here a general model of vertebrate BG function that combines context information with a distinct motor efference copy signal. The signals are integrated by a learning rule in which efference copy inputs gate the potentiation of context inputs (but not efference copy inputs) onto medium spiny neurons in response to a rewarded action. The hypothesis is described in terms of a circuit that implements the learning of visually guided saccades. The model makes testable predictions about the anatomical and functional properties of hypothesized context and efference copy inputs to the striatum from both thalamic and cortical sources. PMID:22754501
Fee, Michale S
2012-01-01
In its simplest formulation, reinforcement learning is based on the idea that if an action taken in a particular context is followed by a favorable outcome, then, in the same context, the tendency to produce that action should be strengthened, or reinforced. While reinforcement learning forms the basis of many current theories of basal ganglia (BG) function, these models do not incorporate distinct computational roles for signals that convey context, and those that convey what action an animal takes. Recent experiments in the songbird suggest that vocal-related BG circuitry receives two functionally distinct excitatory inputs. One input is from a cortical region that carries context information about the current "time" in the motor sequence. The other is an efference copy of motor commands from a separate cortical brain region that generates vocal variability during learning. Based on these findings, I propose here a general model of vertebrate BG function that combines context information with a distinct motor efference copy signal. The signals are integrated by a learning rule in which efference copy inputs gate the potentiation of context inputs (but not efference copy inputs) onto medium spiny neurons in response to a rewarded action. The hypothesis is described in terms of a circuit that implements the learning of visually guided saccades. The model makes testable predictions about the anatomical and functional properties of hypothesized context and efference copy inputs to the striatum from both thalamic and cortical sources.
ERIC Educational Resources Information Center
Kahnt, Thorsten; Park, Soyoung Q.; Cohen, Michael X.; Beck, Anne; Heinz, Andreas; Wrase, Jana
2009-01-01
It has been suggested that the target areas of dopaminergic midbrain neurons, the dorsal (DS) and ventral striatum (VS), are differently involved in reinforcement learning especially as actor and critic. Whereas the critic learns to predict rewards, the actor maintains action values to guide future decisions. The different midbrain connections to…
Autonomous Inter-Task Transfer in Reinforcement Learning Domains
2008-08-01
Twentieth International Joint Conference on Artificial Intelli - gence, 2007. 304 Fumihide Tanaka and Masayuki Yamamura. Multitask reinforcement learning...Functions . . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . 18 2.2.4 Instance-based...tures [Laird et al., 1986, Choi et al., 2007]. However, TL for RL tasks has only recently been gaining attention in the artificial intelligence
A look at Behaviourism and Perceptual Control Theory in Interface Design
1998-02-01
behaviours such as response variability, instinctive drift, autoshaping , etc. Perceptual Control Theory (PCT) postulates that behaviours result from the...internal variables. Behaviourism, on the other hand, can not account for variability in responses, instinctive drift, autoshaping , etc. Researchers... Autoshaping . Animals appear to learn without reinforcement. However, conditioning theory speculates that learning results only when reinforcement
BEHAVIORAL MECHANISMS UNDERLYING NICOTINE REINFORCEMENT
Rupprecht, Laura E.; Smith, Tracy T.; Schassburger, Rachel L.; Buffalari, Deanne M.; Sved, Alan F.; Donny, Eric C.
2015-01-01
Cigarette smoking is the leading cause of preventable deaths worldwide and nicotine, the primary psychoactive constituent in tobacco, drives sustained use. The behavioral actions of nicotine are complex and extend well beyond the actions of the drug as a primary reinforcer. Stimuli that are consistently paired with nicotine can, through associative learning, take on reinforcing properties as conditioned stimuli. These conditioned stimuli can then impact the rate and probability of behavior and even function as conditioning reinforcers that maintain behavior in the absence of nicotine. Nicotine can also act as a conditioned stimulus, predicting the delivery of other reinforcers, which may allow nicotine to acquire value as a conditioned reinforcer. These associative effects, establishing non-nicotine stimuli as conditioned stimuli with discriminative stimulus and conditioned reinforcing properties as well as establishing nicotine as a conditioned stimulus, are predicted by basic conditioning principles. However, nicotine can also act non-associatively. Nicotine directly enhances the reinforcing efficacy of other reinforcing stimuli in the environment, an effect that does not require a temporal or predictive relationship between nicotine and either the stimulus or the behavior. Hence, the reinforcing actions of nicotine stem both from the primary reinforcing actions of the drug (and the subsequent associative learning effects) as well as the reinforcement enhancement action of nicotine which is non-associative in nature. Gaining a better understanding of how nicotine impacts behavior will allow for maximally effective tobacco control efforts aimed at reducing the harm associated with tobacco use by reducing and/or treating its addictiveness. PMID:25638333
Markou, Athina; Salamone, John D; Bussey, Timothy J; Mar, Adam C; Brunner, Daniela; Gilmour, Gary; Balsam, Peter
2013-11-01
The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu) meeting. A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. Copyright © 2013 Elsevier Ltd. All rights reserved.
Markou, Athina; Salamone, John D.; Bussey, Timothy; Mar, Adam; Brunner, Daniela; Gilmour, Gary; Balsam, Peter
2013-01-01
The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu). A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. PMID:23994273
The role of within-compound associations in learning about absent cues.
Witnauer, James E; Miller, Ralph R
2011-05-01
When two cues are reinforced together (in compound), most associative models assume that animals learn an associative network that includes direct cue-outcome associations and a within-compound association. All models of associative learning subscribe to the importance of cue-outcome associations, but most models assume that within-compound associations are irrelevant to each cue's subsequent behavioral control. In the present article, we present an extension of Van Hamme and Wasserman's (Learning and Motivation 25:127-151, 1994) model of retrospective revaluation based on learning about absent cues that are retrieved through within-compound associations. The model was compared with a model lacking retrieval through within-compound associations. Simulations showed that within-compound associations are necessary for the model to explain higher-order retrospective revaluation and the observed greater retrospective revaluation after partial reinforcement than after continuous reinforcement alone. These simulations suggest that the associability of an absent stimulus is determined by the extent to which the stimulus is activated through the within-compound association.
The role of within-compound associations in learning about absent cues
Witnauer, James E.
2011-01-01
When two cues are reinforced together (in compound), most associative models assume that animals learn an associative network that includes direct cue–outcome associations and a within-compound association. All models of associative learning subscribe to the importance of cue–outcome associations, but most models assume that within-compound associations are irrelevant to each cue's subsequent behavioral control. In the present article, we present an extension of Van Hamme and Wasserman's (Learning and Motivation 25:127–151, 1994) model of retrospective revaluation based on learning about absent cues that are retrieved through within-compound associations. The model was compared with a model lacking retrieval through within-compound associations. Simulations showed that within-compound associations are necessary for the model to explain higher-order retrospective revaluation and the observed greater retrospective revaluation after partial reinforcement than after continuous reinforcement alone. These simulations suggest that the associability of an absent stimulus is determined by the extent to which the stimulus is activated through the within-compound association. PMID:21264569
A simple computational algorithm of model-based choice preference.
Toyama, Asako; Katahira, Kentaro; Ohira, Hideki
2017-08-01
A broadly used computational framework posits that two learning systems operate in parallel during the learning of choice preferences-namely, the model-free and model-based reinforcement-learning systems. In this study, we examined another possibility, through which model-free learning is the basic system and model-based information is its modulator. Accordingly, we proposed several modified versions of a temporal-difference learning model to explain the choice-learning process. Using the two-stage decision task developed by Daw, Gershman, Seymour, Dayan, and Dolan (2011), we compared their original computational model, which assumes a parallel learning process, and our proposed models, which assume a sequential learning process. Choice data from 23 participants showed a better fit with the proposed models. More specifically, the proposed eligibility adjustment model, which assumes that the environmental model can weight the degree of the eligibility trace, can explain choices better under both model-free and model-based controls and has a simpler computational algorithm than the original model. In addition, the forgetting learning model and its variation, which assume changes in the values of unchosen actions, substantially improved the fits to the data. Overall, we show that a hybrid computational model best fits the data. The parameters used in this model succeed in capturing individual tendencies with respect to both model use in learning and exploration behavior. This computational model provides novel insights into learning with interacting model-free and model-based components.
Cognitive components underpinning the development of model-based learning.
Potter, Tracey C S; Bryce, Nessa V; Hartley, Catherine A
2017-06-01
Reinforcement learning theory distinguishes "model-free" learning, which fosters reflexive repetition of previously rewarded actions, from "model-based" learning, which recruits a mental model of the environment to flexibly select goal-directed actions. Whereas model-free learning is evident across development, recruitment of model-based learning appears to increase with age. However, the cognitive processes underlying the development of model-based learning remain poorly characterized. Here, we examined whether age-related differences in cognitive processes underlying the construction and flexible recruitment of mental models predict developmental increases in model-based choice. In a cohort of participants aged 9-25, we examined whether the abilities to infer sequential regularities in the environment ("statistical learning"), maintain information in an active state ("working memory") and integrate distant concepts to solve problems ("fluid reasoning") predicted age-related improvements in model-based choice. We found that age-related improvements in statistical learning performance did not mediate the relationship between age and model-based choice. Ceiling performance on our working memory assay prevented examination of its contribution to model-based learning. However, age-related improvements in fluid reasoning statistically mediated the developmental increase in the recruitment of a model-based strategy. These findings suggest that gradual development of fluid reasoning may be a critical component process underlying the emergence of model-based learning. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Pleasurable music affects reinforcement learning according to the listener
Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira
2013-01-01
Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875
Network Supervision of Adult Experience and Learning Dependent Sensory Cortical Plasticity.
Blake, David T
2017-06-18
The brain is capable of remodeling throughout life. The sensory cortices provide a useful preparation for studying neuroplasticity both during development and thereafter. In adulthood, sensory cortices change in the cortical area activated by behaviorally relevant stimuli, by the strength of response within that activated area, and by the temporal profiles of those responses. Evidence supports forms of unsupervised, reinforcement, and fully supervised network learning rules. Studies on experience-dependent plasticity have mostly not controlled for learning, and they find support for unsupervised learning mechanisms. Changes occur with greatest ease in neurons containing α-CamKII, which are pyramidal neurons in layers II/III and layers V/VI. These changes use synaptic mechanisms including long term depression. Synaptic strengthening at NMDA-containing synapses does occur, but its weak association with activity suggests other factors also initiate changes. Studies that control learning find support of reinforcement learning rules and limited evidence of other forms of supervised learning. Behaviorally associating a stimulus with reinforcement leads to a strengthening of cortical response strength and enlarging of response area with poor selectivity. Associating a stimulus with omission of reinforcement leads to a selective weakening of responses. In some preparations in which these associations are not as clearly made, neurons with the most informative discharges are relatively stronger after training. Studies analyzing the temporal profile of responses associated with omission of reward, or of plasticity in studies with different discriminanda but statistically matched stimuli, support the existence of limited supervised network learning. © 2017 American Physiological Society. Compr Physiol 7:977-1008, 2017. Copyright © 2017 John Wiley & Sons, Inc.
Starosta, Sarah; Stüttgen, Maik C; Güntürkün, Onur
2014-06-02
While the subject of learning has attracted immense interest from both behavioral and neural scientists, only relatively few investigators have observed single-neuron activity while animals are acquiring an operantly conditioned response, or when that response is extinguished. But even in these cases, observation periods usually encompass only a single stage of learning, i.e. acquisition or extinction, but not both (exceptions include protocols employing reversal learning; see Bingman et al.(1) for an example). However, acquisition and extinction entail different learning mechanisms and are therefore expected to be accompanied by different types and/or loci of neural plasticity. Accordingly, we developed a behavioral paradigm which institutes three stages of learning in a single behavioral session and which is well suited for the simultaneous recording of single neurons' action potentials. Animals are trained on a single-interval forced choice task which requires mapping each of two possible choice responses to the presentation of different novel visual stimuli (acquisition). After having reached a predefined performance criterion, one of the two choice responses is no longer reinforced (extinction). Following a certain decrement in performance level, correct responses are reinforced again (reacquisition). By using a new set of stimuli in every session, animals can undergo the acquisition-extinction-reacquisition process repeatedly. Because all three stages of learning occur in a single behavioral session, the paradigm is ideal for the simultaneous observation of the spiking output of multiple single neurons. We use pigeons as model systems, but the task can easily be adapted to any other species capable of conditioned discrimination learning.
A universal role of the ventral striatum in reward-based learning: Evidence from human studies
Daniel, Reka; Pollmann, Stefan
2014-01-01
Reinforcement learning enables organisms to adjust their behavior in order to maximize rewards. Electrophysiological recordings of dopaminergic midbrain neurons have shown that they code the difference between actual and predicted rewards, i.e., the reward prediction error, in many species. This error signal is conveyed to both the striatum and cortical areas and is thought to play a central role in learning to optimize behavior. However, in human daily life rewards are diverse and often only indirect feedback is available. Here we explore the range of rewards that are processed by the dopaminergic system in human participants, and examine whether it is also involved in learning in the absence of explicit rewards. While results from electrophysiological recordings in humans are sparse, evidence linking dopaminergic activity to the metabolic signal recorded from the midbrain and striatum with functional magnetic resonance imaging (fMRI) is available. Results from fMRI studies suggest that the human ventral striatum (VS) receives valuation information for a diverse set of rewarding stimuli. These range from simple primary reinforcers such as juice rewards over abstract social rewards to internally generated signals on perceived correctness, suggesting that the VS is involved in learning from trial-and-error irrespective of the specific nature of provided rewards. In addition, we summarize evidence that the VS can also be implicated when learning from observing others, and in tasks that go beyond simple stimulus-action-outcome learning, indicating that the reward system is also recruited in more complex learning tasks. PMID:24825620
ERIC Educational Resources Information Center
Kelly, Resa M.; Jones, Loretta L.
2008-01-01
Animations of the particulate level of matter are widely available for use in chemistry classes and are often the primary means of representing molecular behavior. These animations may also be viewed by individual students using textbook Web sites, although without reinforcement or feedback. It is not known to what extent the material in these…
Cognitive Control Predicts Use of Model-Based Reinforcement-Learning
Otto, A. Ross; Skatova, Anya; Madlon-Kay, Seth; Daw, Nathaniel D.
2015-01-01
Accounts of decision-making and its neural substrates have long posited the operation of separate, competing valuation systems in the control of choice behavior. Recent theoretical and experimental work suggest that this classic distinction between behaviorally and neurally dissociable systems for habitual and goal-directed (or more generally, automatic and controlled) choice may arise from two computational strategies for reinforcement learning (RL), called model-free and model-based RL, but the cognitive or computational processes by which one system may dominate over the other in the control of behavior is a matter of ongoing investigation. To elucidate this question, we leverage the theoretical framework of cognitive control, demonstrating that individual differences in utilization of goal-related contextual information—in the service of overcoming habitual, stimulus-driven responses—in established cognitive control paradigms predict model-based behavior in a separate, sequential choice task. The behavioral correspondence between cognitive control and model-based RL compellingly suggests that a common set of processes may underpin the two behaviors. In particular, computational mechanisms originally proposed to underlie controlled behavior may be applicable to understanding the interactions between model-based and model-free choice behavior. PMID:25170791
Analysis of the Pricing Process in Electricity Market using Multi-Agent Model
NASA Astrophysics Data System (ADS)
Shimomura, Takahiro; Saisho, Yuichi; Fujii, Yasumasa; Yamaji, Kenji
Many electric utilities world-wide have been forced to change their ways of doing business, from vertically integrated mechanisms to open market systems. We are facing urgent issues about how we design the structures of power market systems. In order to settle down these issues, many studies have been made with market models of various characteristics and regulations. The goal of modeling analysis is to enrich our understanding of fundamental process that may appear. However, there are many kinds of modeling methods. Each has drawback and advantage about validity and versatility. This paper presents two kinds of methods to construct multi-agent market models. One is based on game theory and another is based on reinforcement learning. By comparing the results of the two methods, they can advance in validity and help us figure out potential problems in electricity markets which have oligopolistic generators, demand fluctuation and inelastic demand. Moreover, this model based on reinforcement learning enables us to consider characteristics peculiar to electricity markets which have plant unit characteristics, seasonable and hourly demand fluctuation, real-time regulation market and operating reserve market. This model figures out importance of the share of peak-load-plants and the way of designing operating reserve market.
Franklin, Nicholas T; Frank, Michael J
2015-12-25
Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments.
Pilarski, Patrick M; Dawson, Michael R; Degris, Thomas; Fahimi, Farbod; Carey, Jason P; Sutton, Richard S
2011-01-01
As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis. © 2011 IEEE
Representation of aversive prediction errors in the human periaqueductal gray
Roy, Mathieu; Shohamy, Daphna; Daw, Nathaniel; Jepma, Marieke; Wimmer, Elliott; Wager, Tor D.
2014-01-01
Pain is a primary driver of learning and motivated action. It is also a target of learning, as nociceptive brain responses are shaped by learning processes. We combined an instrumental pain avoidance task with an axiomatic approach to assessing fMRI signals related to prediction errors (PEs), which drive reinforcement-based learning. We found that pain PEs were encoded in the periaqueductal gray (PAG), an important structure for pain control and learning in animal models. Axiomatic tests combined with dynamic causal modeling suggested that ventromedial prefrontal cortex, supported by putamen, provides an expected value-related input to the PAG, which then conveys PE signals to prefrontal regions important for behavioral regulation, including orbitofrontal, anterior mid-cingulate, and dorsomedial prefrontal cortices. Thus, pain-related learning involves distinct neural circuitry, with implications for behavior and pain dynamics. PMID:25282614
The orbitofrontal cortex and beyond: from affect to decision-making.
Rolls, Edmund T; Grabenhorst, Fabian
2008-11-01
The orbitofrontal cortex represents the reward or affective value of primary reinforcers including taste, touch, texture, and face expression. It learns to associate other stimuli with these to produce representations of the expected reward value for visual, auditory, and abstract stimuli including monetary reward value. The orbitofrontal cortex thus plays a key role in emotion, by representing the goals for action. The learning process is stimulus-reinforcer association learning. Negative reward prediction error neurons are related to this affective learning. Activations in the orbitofrontal cortex correlate with the subjective emotional experience of affective stimuli, and damage to the orbitofrontal cortex impairs emotion-related learning, emotional behaviour, and subjective affective state. With an origin from beyond the orbitofrontal cortex, top-down attention to affect modulates orbitofrontal cortex representations, and attention to intensity modulates representations in earlier cortical areas of the physical properties of stimuli. Top-down word-level cognitive inputs can bias affective representations in the orbitofrontal cortex, providing a mechanism for cognition to influence emotion. Whereas the orbitofrontal cortex provides a representation of reward or affective value on a continuous scale, areas beyond the orbitofrontal cortex such as the medial prefrontal cortex area 10 are involved in binary decision-making when a choice must be made. For this decision-making, the orbitofrontal cortex provides a representation of each specific reward in a common currency.
Lee, Jae Young; Park, Jin Bae; Choi, Yoon Ho
2015-05-01
This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral Q -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.
Speed/accuracy trade-off between the habitual and the goal-directed processes.
Keramati, Mehdi; Dezfouli, Amir; Piray, Payam
2011-05-01
Instrumental responses are hypothesized to be of two kinds: habitual and goal-directed, mediated by the sensorimotor and the associative cortico-basal ganglia circuits, respectively. The existence of the two heterogeneous associative learning mechanisms can be hypothesized to arise from the comparative advantages that they have at different stages of learning. In this paper, we assume that the goal-directed system is behaviourally flexible, but slow in choice selection. The habitual system, in contrast, is fast in responding, but inflexible in adapting its behavioural strategy to new conditions. Based on these assumptions and using the computational theory of reinforcement learning, we propose a normative model for arbitration between the two processes that makes an approximately optimal balance between search-time and accuracy in decision making. Behaviourally, the model can explain experimental evidence on behavioural sensitivity to outcome at the early stages of learning, but insensitivity at the later stages. It also explains that when two choices with equal incentive values are available concurrently, the behaviour remains outcome-sensitive, even after extensive training. Moreover, the model can explain choice reaction time variations during the course of learning, as well as the experimental observation that as the number of choices increases, the reaction time also increases. Neurobiologically, by assuming that phasic and tonic activities of midbrain dopamine neurons carry the reward prediction error and the average reward signals used by the model, respectively, the model predicts that whereas phasic dopamine indirectly affects behaviour through reinforcing stimulus-response associations, tonic dopamine can directly affect behaviour through manipulating the competition between the habitual and the goal-directed systems and thus, affect reaction time.
NASA Astrophysics Data System (ADS)
Laird, John E.
2009-05-01
Our long-term goal is to develop autonomous robotic systems that have the cognitive abilities of humans, including communication, coordination, adapting to novel situations, and learning through experience. Our approach rests on the recent integration of the Soar cognitive architecture with both virtual and physical robotic systems. Soar has been used to develop a wide variety of knowledge-rich agents for complex virtual environments, including distributed training environments and interactive computer games. For development and testing in robotic virtual environments, Soar interfaces to a variety of robotic simulators and a simple mobile robot. We have recently made significant extensions to Soar that add new memories and new non-symbolic reasoning to Soar's original symbolic processing, which should significantly improve Soar abilities for control of robots. These extensions include episodic memory, semantic memory, reinforcement learning, and mental imagery. Episodic memory and semantic memory support the learning and recalling of prior events and situations as well as facts about the world. Reinforcement learning provides the ability of the system to tune its procedural knowledge - knowledge about how to do things. Mental imagery supports the use of diagrammatic and visual representations that are critical to support spatial reasoning. We speculate on the future of unmanned systems and the need for cognitive robotics to support dynamic instruction and taskability.
Neural signals of vicarious extinction learning
Haaker, Jan; Selbing, Ida; Olsson, Andreas
2016-01-01
Social transmission of both threat and safety is ubiquitous, but little is known about the neural circuitry underlying vicarious safety learning. This is surprising given that these processes are critical to flexibly adapt to a changeable environment. To address how the expression of previously learned fears can be modified by the transmission of social information, two conditioned stimuli (CS + s) were paired with shock and the third was not. During extinction, we held constant the amount of direct, non-reinforced, exposure to the CSs (i.e. direct extinction), and critically varied whether another individual—acting as a demonstrator—experienced safety (CS + vic safety) or aversive reinforcement (CS + vic reinf). During extinction, ventromedial prefrontal cortex (vmPFC) responses to the CS + vic reinf increased but decreased to the CS + vic safety. This pattern of vmPFC activity was reversed during a subsequent fear reinstatement test, suggesting a temporal shift in the involvement of the vmPFC. Moreover, only the CS + vic reinf association recovered. Our data suggest that vicarious extinction prevents the return of conditioned fear responses, and that this efficacy is reflected by diminished vmPFC involvement during extinction learning. The present findings may have important implications for understanding how social information influences the persistence of fear memories in individuals suffering from emotional disorders. PMID:27278792
Changing to Concept-Based Curricula: The Process for Nurse Educators
Baron, Kristy A.
2017-01-01
Background: The complexity of health care today requires nursing graduates to use effective thinking skills. Many nursing programs are revising curricula to include concept-based learning that encourages problem-solving, effective thinking, and the ability to transfer knowledge to a variety of situations—requiring nurse educators to modify their teaching styles and methods to promote student-centered learning. Changing from teacher-centered learning to student-centered learning requires a major shift in thinking and application. Objective: The focus of this qualitative study was to understand the process of changing to concept-based curricula for nurse educators who previously taught in traditional curriculum designs. Methods: The sample included eight educators from two institutions in one Western state using a grounded theory design. Results: The themes that emerged from participants’ experiences consisted of the overarching concept, support for change, and central concept, finding meaning in the change. Finding meaning is supported by three main themes: preparing for the change, teaching in a concept-based curriculum, and understanding the teaching-learning process. Conclusion: Changing to a concept-based curriculum required a major shift in thinking and application. Through support, educators discovered meaning to make the change by constructing authentic learning opportunities that mirrored practice, refining the change process, and reinforcing benefits of teaching. PMID:29399236
Reinforcement Learning of Two-Joint Virtual Arm Reaching in a Computer Model of Sensorimotor Cortex
Neymotin, Samuel A.; Chadderdon, George L.; Kerr, Cliff C.; Francis, Joseph T.; Lytton, William W.
2014-01-01
Neocortical mechanisms of learning sensorimotor control involve a complex series of interactions at multiple levels, from synaptic mechanisms to cellular dynamics to network connectomics. We developed a model of sensory and motor neocortex consisting of 704 spiking model neurons. Sensory and motor populations included excitatory cells and two types of interneurons. Neurons were interconnected with AMPA/NMDA and GABAA synapses. We trained our model using spike-timing-dependent reinforcement learning to control a two-joint virtual arm to reach to a fixed target. For each of 125 trained networks, we used 200 training sessions, each involving 15 s reaches to the target from 16 starting positions. Learning altered network dynamics, with enhancements to neuronal synchrony and behaviorally relevant information flow between neurons. After learning, networks demonstrated retention of behaviorally relevant memories by using proprioceptive information to perform reach-to-target from multiple starting positions. Networks dynamically controlled which joint rotations to use to reach a target, depending on current arm position. Learning-dependent network reorganization was evident in both sensory and motor populations: learned synaptic weights showed target-specific patterning optimized for particular reach movements. Our model embodies an integrative hypothesis of sensorimotor cortical learning that could be used to interpret future electrophysiological data recorded in vivo from sensorimotor learning experiments. We used our model to make the following predictions: learning enhances synchrony in neuronal populations and behaviorally relevant information flow across neuronal populations, enhanced sensory processing aids task-relevant motor performance and the relative ease of a particular movement in vivo depends on the amount of sensory information required to complete the movement. PMID:24047323
NASA Technical Reports Server (NTRS)
Jani, Yashvant
1992-01-01
The reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Maximum Mission (SMM) satellite simulation. In utilizing these fuzzy learning techniques, we also use the Approximate Reasoning based Intelligent Control (ARIC) architecture, and so we use two terms interchangeable to imply the same. This activity is carried out in the Software Technology Laboratory utilizing the Orbital Operations Simulator (OOS). This report is the deliverable D3 in our project activity and provides the test results of the fuzzy learning translational controller. This report is organized in six sections. Based on our experience and analysis with the attitude controller, we have modified the basic configuration of the reinforcement learning algorithm in ARIC as described in section 2. The shuttle translational controller and its implementation in fuzzy learning architecture is described in section 3. Two test cases that we have performed are described in section 4. Our results and conclusions are discussed in section 5, and section 6 provides future plans and summary for the project.
Learning and tuning fuzzy logic controllers through reinforcements
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Khedkar, Pratap
1992-01-01
This paper presents a new method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system. In particular, our generalized approximate reasoning-based intelligent control (GARIC) architecture (1) learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; (2) introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; (3) introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and (4) learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward neural network, which can then adaptively improve performance by using gradient descent methods. We extend the AHC algorithm of Barto et al. (1983) to include the prior control knowledge of human operators. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.
Learning the specific quality of taste reinforcement in larval Drosophila
Schleyer, Michael; Miura, Daisuke; Tanimura, Teiichi; Gerber, Bertram
2015-01-01
The only property of reinforcement insects are commonly thought to learn about is its value. We show that larval Drosophila not only remember the value of reinforcement (How much?), but also its quality (What?). This is demonstrated both within the appetitive domain by using sugar vs amino acid as different reward qualities, and within the aversive domain by using bitter vs high-concentration salt as different qualities of punishment. From the available literature, such nuanced memories for the quality of reinforcement are unexpected and pose a challenge to present models of how insect memory is organized. Given that animals as simple as larval Drosophila, endowed with but 10,000 neurons, operate with both reinforcement value and quality, we suggest that both are fundamental aspects of mnemonic processing—in any brain. DOI: http://dx.doi.org/10.7554/eLife.04711.001 PMID:25622533
Evidence for a neural law of effect.
Athalye, Vivek R; Santos, Fernando J; Carmena, Jose M; Costa, Rui M
2018-03-02
Thorndike's law of effect states that actions that lead to reinforcements tend to be repeated more often. Accordingly, neural activity patterns leading to reinforcement are also reentered more frequently. Reinforcement relies on dopaminergic activity in the ventral tegmental area (VTA), and animals shape their behavior to receive dopaminergic stimulation. Seeking evidence for a neural law of effect, we found that mice learn to reenter more frequently motor cortical activity patterns that trigger optogenetic VTA self-stimulation. Learning was accompanied by gradual shaping of these patterns, with participating neurons progressively increasing and aligning their covariance to that of the target pattern. Motor cortex patterns that lead to phasic dopaminergic VTA activity are progressively reinforced and shaped, suggesting a mechanism by which animals select and shape actions to reliably achieve reinforcement. Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Boctor, Lisa
2013-03-01
The majority of nursing students are kinesthetic learners, preferring a hands-on, active approach to education. Research shows that active-learning strategies can increase student learning and satisfaction. This study looks at the use of one active-learning strategy, a Jeopardy-style game, 'Nursopardy', to reinforce Fundamentals of Nursing material, aiding in students' preparation for a standardized final exam. The game was created keeping students varied learning styles and the NCLEX blueprint in mind. The blueprint was used to create 5 categories, with 26 total questions. Student survey results, using a five-point Likert scale showed that they did find this learning method enjoyable and beneficial to learning. More research is recommended regarding learning outcomes, when using active-learning strategies, such as games. Copyright © 2012 Elsevier Ltd. All rights reserved.
Neural Modularity Helps Organisms Evolve to Learn New Skills without Forgetting Old Skills
Ellefsen, Kai Olav; Mouret, Jean-Baptiste; Clune, Jeff
2015-01-01
A long-standing goal in artificial intelligence is creating agents that can learn a variety of different skills for different problems. In the artificial intelligence subfield of neural networks, a barrier to that goal is that when agents learn a new skill they typically do so by losing previously acquired skills, a problem called catastrophic forgetting. That occurs because, to learn the new task, neural learning algorithms change connections that encode previously acquired skills. How networks are organized critically affects their learning dynamics. In this paper, we test whether catastrophic forgetting can be reduced by evolving modular neural networks. Modularity intuitively should reduce learning interference between tasks by separating functionality into physically distinct modules in which learning can be selectively turned on or off. Modularity can further improve learning by having a reinforcement learning module separate from sensory processing modules, allowing learning to happen only in response to a positive or negative reward. In this paper, learning takes place via neuromodulation, which allows agents to selectively change the rate of learning for each neural connection based on environmental stimuli (e.g. to alter learning in specific locations based on the task at hand). To produce modularity, we evolve neural networks with a cost for neural connections. We show that this connection cost technique causes modularity, confirming a previous result, and that such sparsely connected, modular networks have higher overall performance because they learn new skills faster while retaining old skills more and because they have a separate reinforcement learning module. Our results suggest (1) that encouraging modularity in neural networks may help us overcome the long-standing barrier of networks that cannot learn new skills without forgetting old ones, and (2) that one benefit of the modularity ubiquitous in the brains of natural animals might be to alleviate the problem of catastrophic forgetting. PMID:25837826
Neural modularity helps organisms evolve to learn new skills without forgetting old skills.
Ellefsen, Kai Olav; Mouret, Jean-Baptiste; Clune, Jeff
2015-04-01
A long-standing goal in artificial intelligence is creating agents that can learn a variety of different skills for different problems. In the artificial intelligence subfield of neural networks, a barrier to that goal is that when agents learn a new skill they typically do so by losing previously acquired skills, a problem called catastrophic forgetting. That occurs because, to learn the new task, neural learning algorithms change connections that encode previously acquired skills. How networks are organized critically affects their learning dynamics. In this paper, we test whether catastrophic forgetting can be reduced by evolving modular neural networks. Modularity intuitively should reduce learning interference between tasks by separating functionality into physically distinct modules in which learning can be selectively turned on or off. Modularity can further improve learning by having a reinforcement learning module separate from sensory processing modules, allowing learning to happen only in response to a positive or negative reward. In this paper, learning takes place via neuromodulation, which allows agents to selectively change the rate of learning for each neural connection based on environmental stimuli (e.g. to alter learning in specific locations based on the task at hand). To produce modularity, we evolve neural networks with a cost for neural connections. We show that this connection cost technique causes modularity, confirming a previous result, and that such sparsely connected, modular networks have higher overall performance because they learn new skills faster while retaining old skills more and because they have a separate reinforcement learning module. Our results suggest (1) that encouraging modularity in neural networks may help us overcome the long-standing barrier of networks that cannot learn new skills without forgetting old ones, and (2) that one benefit of the modularity ubiquitous in the brains of natural animals might be to alleviate the problem of catastrophic forgetting.
Zsuga, Judit; Biro, Klara; Papp, Csaba; Tajti, Gabor; Gesztelyi, Rudolf
2016-02-01
Reinforcement learning (RL) is a powerful concept underlying forms of associative learning governed by the use of a scalar reward signal, with learning taking place if expectations are violated. RL may be assessed using model-based and model-free approaches. Model-based reinforcement learning involves the amygdala, the hippocampus, and the orbitofrontal cortex (OFC). The model-free system involves the pedunculopontine-tegmental nucleus (PPTgN), the ventral tegmental area (VTA) and the ventral striatum (VS). Based on the functional connectivity of VS, model-free and model based RL systems center on the VS that by integrating model-free signals (received as reward prediction error) and model-based reward related input computes value. Using the concept of reinforcement learning agent we propose that the VS serves as the value function component of the RL agent. Regarding the model utilized for model-based computations we turned to the proactive brain concept, which offers an ubiquitous function for the default network based on its great functional overlap with contextual associative areas. Hence, by means of the default network the brain continuously organizes its environment into context frames enabling the formulation of analogy-based association that are turned into predictions of what to expect. The OFC integrates reward-related information into context frames upon computing reward expectation by compiling stimulus-reward and context-reward information offered by the amygdala and hippocampus, respectively. Furthermore we suggest that the integration of model-based expectations regarding reward into the value signal is further supported by the efferent of the OFC that reach structures canonical for model-free learning (e.g., the PPTgN, VTA, and VS). (c) 2016 APA, all rights reserved).
Impaired implicit learning and feedback processing after stroke.
Lam, J M; Globas, C; Hosp, J A; Karnath, H-O; Wächter, T; Luft, A R
2016-02-09
The ability to learn is assumed to support successful recovery and rehabilitation therapy after stroke. Hence, learning impairments may reduce the recovery potential. Here, the hypothesis is tested that stroke survivors have deficits in feedback-driven implicit learning. Stroke survivors (n=30) and healthy age-matched control subjects (n=21) learned a probabilistic classification task with brain activation measured using functional magnetic resonance imaging in a subset of these individuals (17 stroke and 10 controls). Stroke subjects learned slower than controls to classify cues. After being rewarded with a smiley face, they were less likely to give the same response when the cue was repeated. Stroke subjects showed reduced brain activation in putamen, pallidum, thalamus, frontal and prefrontal cortices and cerebellum when compared with controls. Lesion analysis identified those stroke survivors as learning-impaired who had lesions in frontal areas, putamen, thalamus, caudate and insula. Lesion laterality had no effect on learning efficacy or brain activation. These findings suggest that stroke survivors have deficits in reinforcement learning that may be related to dysfunctional processing of feedback-based decision-making, reward signals and working memory. Copyright © 2015 IBRO. Published by Elsevier Ltd. All rights reserved.
Using reinforcement learning to examine dynamic attention allocation during reading.
Liu, Yanping; Reichle, Erik D; Gao, Ding-Guo
2013-01-01
A fundamental question in reading research concerns whether attention is allocated strictly serially, supporting lexical processing of one word at a time, or in parallel, supporting concurrent lexical processing of two or more words (Reichle, Liversedge, Pollatsek, & Rayner, 2009). The origins of this debate are reviewed. We then report three simulations to address this question using artificial reading agents (Liu & Reichle, 2010; Reichle & Laurent, 2006) that learn to dynamically allocate attention to 1-4 words to "read" as efficiently as possible. These simulation results indicate that the agents strongly preferred serial word processing, although they occasionally attended to more than one word concurrently. The reason for this preference is discussed, along with implications for the debate about how humans allocate attention during reading. Copyright © 2013 Cognitive Science Society, Inc.
Incorporating Dispositional Traits into the Treatment of Anorexia Nervosa
Herzog, David; Moskovich, Ashley; Merwin, Rhonda; Lin, Tammy
2014-01-01
We provide a general framework to guide the development of interventions that aim to address persistent features in eating disorders that may preclude effective treatment. Using perfectionism as an exemplar, we draw from research in cognitive neuroscience regarding attention and reinforcement learning, from learning theory and social psychology regarding vicarious learning and implications for the role modeling of significant others, and from clinical psychology on the importance of verbal narratives as barriers that may influence expectations and shape reinforcement schedules. PMID:21243482
Hybrid learning in signalling games
NASA Astrophysics Data System (ADS)
Barrett, Jeffrey A.; Cochran, Calvin T.; Huttegger, Simon; Fujiwara, Naoki
2017-09-01
Lewis-Skyrms signalling games have been studied under a variety of low-rationality learning dynamics. Reinforcement dynamics are stable but slow and prone to evolving suboptimal signalling conventions. A low-inertia trial-and-error dynamical like win-stay/lose-randomise is fast and reliable at finding perfect signalling conventions but unstable in the context of noise or agent error. Here we consider a low-rationality hybrid of reinforcement and win-stay/lose-randomise learning that exhibits the virtues of both. This hybrid dynamics is reliable, stable and exceptionally fast.
Sitaraman, Divya; Kramer, Elizabeth F.; Kahsai, Lily; Ostrowski, Daniela; Zars, Troy
2017-01-01
Feedback mechanisms in operant learning are critical for animals to increase reward or reduce punishment. However, not all conditions have a behavior that can readily resolve an event. Animals must then try out different behaviors to better their situation through outcome learning. This form of learning allows for novel solutions and with positive experience can lead to unexpected behavioral routines. Learned helplessness, as a type of outcome learning, manifests in part as increases in escape latency in the face of repeated unpredicted shocks. Little is known about the mechanisms of outcome learning. When fruit fly Drosophila melanogaster are exposed to unpredicted high temperatures in a place learning paradigm, flies both increase escape latencies and have a higher memory when given control of a place/temperature contingency. Here we describe discrete serotonin neuronal circuits that mediate aversive reinforcement, escape latencies, and memory levels after place learning in the presence and absence of unexpected aversive events. The results show that two features of learned helplessness depend on the same modulatory system as aversive reinforcement. Moreover, changes in aversive reinforcement and escape latency depend on local neural circuit modulation, while memory enhancement requires larger modulation of multiple behavioral control circuits. PMID:29321732
Warren, Christopher M.; Holroyd, Clay B.
2012-01-01
We applied the event-related brain potential (ERP) technique to investigate the involvement of two neuromodulatory systems in learning and decision making: The locus coeruleus–norepinephrine system (NE system) and the mesencephalic dopamine system (DA system). We have previously presented evidence that the N2, a negative deflection in the ERP elicited by task-relevant events that begins approximately 200 ms after onset of the eliciting stimulus and that is sensitive to low-probability events, is a manifestation of cortex-wide noradrenergic modulation recruited to facilitate the processing of unexpected stimuli. Further, we hold that the impact of DA reinforcement learning signals on the anterior cingulate cortex (ACC) produces a component of the ERP called the feedback-related negativity (FRN). The N2 and the FRN share a similar time range, a similar topography, and similar antecedent conditions. We varied factors related to the degree of cognitive deliberation across a series of experiments to dissociate these two ERP components. Across four experiments we varied the demand for a deliberative strategy, from passively watching feedback, to more complex/challenging decision tasks. Consistent with our predictions, the FRN was largest in the experiment involving active learning and smallest in the experiment involving passive learning whereas the N2 exhibited the opposite effect. Within each experiment, when subjects attended to color, the N2 was maximal at frontal–central sites, and when they attended to gender it was maximal over lateral-occipital areas, whereas the topology of the FRN was frontal–central in both task conditions. We conclude that both the DA system and the NE system act in concert when learning from rewards that vary in expectedness, but that the DA system is relatively more exercised when subjects are relatively more engaged by the learning task. PMID:22493568
Ryu, Vin; Ha, Ra Yeon; Lee, Su Jin; Ha, Kyooseob; Cho, Hyun-Sang
2017-03-01
Bipolar disorder is characterized by behavioral changes such as risk-taking and increasing goal-directed activities, which may result from altered reward processing. Patients with bipolar disorder show impaired reward learning in situations that require the integration of reinforced feedback over time. In this study, we examined the behavioral and electrophysiological characteristics of reward learning in manic and euthymic patients with bipolar disorder using a probabilistic reward task. Twenty-four manic and 20 euthymic patients with bipolar I disorder and 24 healthy control subjects performed the probabilistic reward task. We assessed response bias (RB) as a preference for the stimulus paired with the more frequent reward and feedback-related negativity (FRN) to correct identification of the rich stimulus. Both manic and euthymic patients showed significantly lower RB scores in the early learning stage (block 1) in comparison with the late learning stage (block 2 or block 3) of the task, as well as significantly lower RB scores in the early stage compared to healthy subjects. Relatively more negative FRN amplitude is elicited by no presentation of an expected reward, compared to that elicited by presentation of expected feedback. The FRN became significantly more negative from the early (block 1) to the later stages (blocks 2 and 3) in both manic and euthymic patients, but not in healthy subjects. Changes in RB scores and FRN amplitudes between blocks 2 and 3 and block 1 correlated positively in healthy controls, but correlated negatively in manic and euthymic patients. The severity of manic symptoms correlated positively with reward learning scores and negatively with the FRN. These findings suggest that patients with bipolar disorder during euthymic or manic states have behavioral and electrophysiological alterations in reward learning compared to healthy subjects. This dysfunctional reward processing may be related to the abnormal decision-making or altered goal-directed activities frequently seen in patients with bipolar disorder. © 2017 John Wiley & Sons Ltd.
Autonomous learning based on cost assumptions: theoretical studies and experiments in robot control.
Ribeiro, C H; Hemerly, E M
2000-02-01
Autonomous learning techniques are based on experience acquisition. In most realistic applications, experience is time-consuming: it implies sensor reading, actuator control and algorithmic update, constrained by the learning system dynamics. The information crudeness upon which classical learning algorithms operate make such problems too difficult and unrealistic. Nonetheless, additional information for facilitating the learning process ideally should be embedded in such a way that the structural, well-studied characteristics of these fundamental algorithms are maintained. We investigate in this article a more general formulation of the Q-learning method that allows for a spreading of information derived from single updates towards a neighbourhood of the instantly visited state and converges to optimality. We show how this new formulation can be used as a mechanism to safely embed prior knowledge about the structure of the state space, and demonstrate it in a modified implementation of a reinforcement learning algorithm in a real robot navigation task.
Cognitive Components Underpinning the Development of Model-Based Learning
Potter, Tracey C.S.; Bryce, Nessa V.; Hartley, Catherine A.
2016-01-01
Reinforcement learning theory distinguishes “model-free” learning, which fosters reflexive repetition of previously rewarded actions, from “model-based” learning, which recruits a mental model of the environment to flexibly select goal-directed actions. Whereas model-free learning is evident across development, recruitment of model-based learning appears to increase with age. However, the cognitive processes underlying the development of model-based learning remain poorly characterized. Here, we examined whether age-related differences in cognitive processes underlying the construction and flexible recruitment of mental models predict developmental increases in model-based choice. In a cohort of participants aged 9–25, we examined whether the abilities to infer sequential regularities in the environment (“statistical learning”), maintain information in an active state (“working memory”) and integrate distant concepts to solve problems (“fluid reasoning”) predicted age-related improvements in model-based choice. We found that age-related improvements in statistical learning performance did not mediate the relationship between age and model-based choice. Ceiling performance on our working memory assay prevented examination of its contribution to model-based learning. However, age-related improvements in fluid reasoning statistically mediated the developmental increase in the recruitment of a model-based strategy. These findings suggest that gradual development of fluid reasoning may be a critical component process underlying the emergence of model-based learning. PMID:27825732
Ellwood, Ian T.; Patel, Tosha; Wadia, Varun; Lee, Anthony T.; Liptak, Alayna T.
2017-01-01
Dopamine neurons in the ventral tegmental area (VTA) encode reward prediction errors and can drive reinforcement learning through their projections to striatum, but much less is known about their projections to prefrontal cortex (PFC). Here, we studied these projections and observed phasic VTA–PFC fiber photometry signals after the delivery of rewards. Next, we studied how optogenetic stimulation of these projections affects behavior using conditioned place preference and a task in which mice learn associations between cues and food rewards and then use those associations to make choices. Neither phasic nor tonic stimulation of dopaminergic VTA–PFC projections elicited place preference. Furthermore, substituting phasic VTA–PFC stimulation for food rewards was not sufficient to reinforce new cue–reward associations nor maintain previously learned ones. However, the same patterns of stimulation that failed to reinforce place preference or cue–reward associations were able to modify behavior in other ways. First, continuous tonic stimulation maintained previously learned cue–reward associations even after they ceased being valid. Second, delivering phasic stimulation either continuously or after choices not previously associated with reward induced mice to make choices that deviated from previously learned associations. In summary, despite the fact that dopaminergic VTA–PFC projections exhibit phasic increases in activity that are time locked to the delivery of rewards, phasic activation of these projections does not necessarily reinforce specific actions. Rather, dopaminergic VTA–PFC activity can control whether mice maintain or deviate from previously learned cue–reward associations. SIGNIFICANCE STATEMENT Dopaminergic inputs from ventral tegmental area (VTA) to striatum encode reward prediction errors and reinforce specific actions; however, it is currently unknown whether dopaminergic inputs to prefrontal cortex (PFC) play similar or distinct roles. Here, we used bulk Ca2+ imaging to show that unexpected rewards or reward-predicting cues elicit phasic increases in the activity of dopaminergic VTA–PFC fibers. However, in multiple behavioral paradigms, we failed to observe reinforcing effects after stimulation of these fibers. In these same experiments, we did find that tonic or phasic patterns of stimulation caused mice to maintain or deviate from previously learned cue–reward associations, respectively. Therefore, although they may exhibit similar patterns of activity, dopaminergic inputs to striatum and PFC can elicit divergent behavioral effects. PMID:28739583
ERIC Educational Resources Information Center
Galbreath, Joy; Feldman, David
The relationship of reading comprehension accuracy and a contingently administered token reinforcement program used with an elementary level learning disabled student in the classroom was examined. The S earned points for each correct answer made after oral reading sessions. At the conclusion of the class he could exchange his points for rewards.…
Orexin signaling via the orexin 1 receptor mediates operant responding for food reinforcement.
Sharf, Ruth; Sarhan, Maysa; Brayton, Catherine E; Guarnieri, Douglas J; Taylor, Jane R; DiLeone, Ralph J
2010-04-15
Orexin (hypocretin) signaling is implicated in drug addiction and reward, but its role in feeding and food-motivated behavior remains unclear. We investigated orexin's contribution to food-reinforced instrumental responding using an orexin 1 receptor (Ox1r) antagonist, orexin -/- (OKO) and littermate wildtype (WT) mice, and RNAi-mediated knockdown of orexin. C57BL/6J (n = 76) and OKO (n = 39) mice were trained to nose poke for food under a variable ratio schedule of reinforcement. After responding stabilized, a progressive ratio schedule was initiated to evaluate motivation to obtain food reinforcement. Blockade of Ox1r in C57BL/6J mice impaired performance under both the variable ratio and progressive ratio schedules of reinforcement, indicating impaired motivational processes. In contrast, OKO mice initially demonstrated a delay in acquisition but eventually achieved levels of responding similar to those observed in WT animals. Moreover, OKO mice did not differ from WT mice under a progressive ratio schedule, indicating delayed learning processes but no motivational impairments. Considering the differences between pharmacologic blockade of Ox1r and the OKO mice, animals with RNAi mediated knockdown of orexin were then generated and analyzed to eliminate possible developmental effects of missing orexin. Orexin gene knockdown in the lateral hypothalamus in C57BL/6J mice resulted in blunted performance under both the variable ratio and progressive ratio schedules, resembling data obtained following Ox1r antagonism. The behavior seen in OKO mice likely reflects developmental compensation often seen in mutant animals. These data suggest that activation of the Ox1r is a necessary component of food-reinforced responding, motivation, or both in normal mice. Copyright 2010 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Orexin signaling via the orexin 1 receptor mediates operant responding for food reinforcement
Sharf, Ruth; Sarhan, Maysa; Brayton, Catherine E.; Guarnieri, Douglas J.; Taylor, Jane R.; DiLeone, Ralph J.
2010-01-01
Background Orexin (hypocretin) signaling is implicated in drug addiction and reward, but its role in feeding and food-motivated behavior remains unclear. Methods We investigated orexin’s contribution to food-reinforced instrumental responding using an orexin 1 receptor (Ox1r) antagonist, orexin −/− (OKO) and littermate wild-type (WT) mice, and RNAi-mediated knockdown of orexin. C57BL/6J (n=76) and OKO (n=39) mice were trained to nose poke for food under a variable ratio (VR) schedule of reinforcement. Once responding stabilized, a progressive ratio (PR) schedule was initiated to evaluate motivation to obtain food reinforcement. Results Blockade of Ox1r in C57BL/6J mice impaired performance under both the VR and PR schedules of reinforcement, indicating impaired motivational processes. In contrast, OKO mice initially demonstrated a delay in acquisition, but eventually achieved levels of responding similar to those observed in WT animals. Moreover, OKO mice did not differ from WT mice under a PR schedule, indicating delayed learning processes but no motivational impairments. Considering the differences between pharmacological blockade of Ox1r and the OKO mice, animals with RNAi mediated knockdown of orexin were then generated and analyzed to eliminate possible developmental effects of missing orexin. Orexin gene knockdown in the lateral hypothalamus (LH) in C57BL/6J mice resulted in blunted performance under both the VR and PR schedules, resembling data obtained following Ox1r antagonism. Conclusions The behavior seen in OKO mice likely reflects developmental compensation often seen in mutant animals. These data suggest that activation of the Ox1r is a necessary component of food-reinforced responding and/or motivation in normal mice. PMID:20189166