Navigating complex decision spaces: Problems and paradigms in sequential choice
Walsh, Matthew M.; Anderson, John R.
2015-01-01
To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult when the consequences of an action follow a delay. This introduces the problem of temporal credit assignment. When feedback follows a sequence of decisions, how should the individual assign credit to the intermediate actions that comprise the sequence? Research in reinforcement learning provides two general solutions to this problem: model-free reinforcement learning and model-based reinforcement learning. In this review, we examine connections between stimulus-response and cognitive learning theories, habitual and goal-directed control, and model-free and model-based reinforcement learning. We then consider a range of problems related to temporal credit assignment. These include second-order conditioning and secondary reinforcers, latent learning and detour behavior, partially observable Markov decision processes, actions with distributed outcomes, and hierarchical learning. We ask whether humans and animals, when faced with these problems, behave in a manner consistent with reinforcement learning techniques. Throughout, we seek to identify neural substrates of model-free and model-based reinforcement learning. The former class of techniques is understood in terms of the neurotransmitter dopamine and its effects in the basal ganglia. The latter is understood in terms of a distributed network of regions including the prefrontal cortex, medial temporal lobes cerebellum, and basal ganglia. Not only do reinforcement learning techniques have a natural interpretation in terms of human and animal behavior, but they also provide a useful framework for understanding neural reward valuation and action selection. PMID:23834192
On the integration of reinforcement learning and approximate reasoning for control
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1991-01-01
The author discusses the importance of strengthening the knowledge representation characteristic of reinforcement learning techniques using methods such as approximate reasoning. The ARIC (approximate reasoning-based intelligent control) architecture is an example of such a hybrid approach in which the fuzzy control rules are modified (fine-tuned) using reinforcement learning. ARIC also demonstrates that it is possible to start with an approximately correct control knowledge base and learn to refine this knowledge through further experience. On the other hand, techniques such as the TD (temporal difference) algorithm and Q-learning establish stronger theoretical foundations for their use in adaptive control and also in stability analysis of hybrid reinforcement learning and approximate reasoning-based controllers.
B-tree search reinforcement learning for model based intelligent agent
NASA Astrophysics Data System (ADS)
Bhuvaneswari, S.; Vignashwaran, R.
2013-03-01
Agents trained by learning techniques provide a powerful approximation of active solutions for naive approaches. In this study using B - Trees implying reinforced learning the data search for information retrieval is moderated to achieve accuracy with minimum search time. The impact of variables and tactics applied in training are determined using reinforcement learning. Agents based on these techniques perform satisfactory baseline and act as finite agents based on the predetermined model against competitors from the course.
Adaptive Educational Software by Applying Reinforcement Learning
ERIC Educational Resources Information Center
Bennane, Abdellah
2013-01-01
The introduction of the intelligence in teaching software is the object of this paper. In software elaboration process, one uses some learning techniques in order to adapt the teaching software to characteristics of student. Generally, one uses the artificial intelligence techniques like reinforcement learning, Bayesian network in order to adapt…
Reinforcement of Science Learning through Local Culture: A Delphi Study
ERIC Educational Resources Information Center
Nuangchalerm, Prasart
2008-01-01
This study aims to explore the ways to reinforce science learning through local culture by using Delphi technique. Twenty four participants in various fields of study were selected. The result of study provides a framework for reinforcement of science learning through local culture on the theme life and environment. (Contains 1 table.)
Liu, Chunming; Xu, Xin; Hu, Dewen
2013-04-29
Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.
Effective Reinforcement Techniques in Elementary Physical Education: The Key to Behavior Management
ERIC Educational Resources Information Center
Downing, John; Keating, Tedd; Bennett, Carl
2005-01-01
The ability to shape appropriate behavior while extinguishing misbehavior is critical to teaching and learning in physical education. The scientific principles that affect student learning in the gymnasium also apply to the methods teachers use to influence social behaviors. Research indicates that reinforcement strategies are more effective than…
Applications of Deep Learning and Reinforcement Learning to Biological Data.
Mahmud, Mufti; Kaiser, Mohammed Shamim; Hussain, Amir; Vassanelli, Stefano
2018-06-01
Rapid advances in hardware-based technologies during the past decades have opened up new possibilities for life scientists to gather multimodal data in various application domains, such as omics, bioimaging, medical imaging, and (brain/body)-machine interfaces. These have generated novel opportunities for development of dedicated data-intensive machine learning techniques. In particular, recent research in deep learning (DL), reinforcement learning (RL), and their combination (deep RL) promise to revolutionize the future of artificial intelligence. The growth in computational power accompanied by faster and increased data storage, and declining computing costs have already allowed scientists in various fields to apply these techniques on data sets that were previously intractable owing to their size and complexity. This paper provides a comprehensive survey on the application of DL, RL, and deep RL techniques in mining biological data. In addition, we compare the performances of DL techniques when applied to different data sets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.
NASA Technical Reports Server (NTRS)
Jani, Yashvant
1992-01-01
The reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Maximum Mission (SMM) satellite simulation. In utilizing these fuzzy learning techniques, we also use the Approximate Reasoning based Intelligent Control (ARIC) architecture, and so we use two terms interchangeable to imply the same. This activity is carried out in the Software Technology Laboratory utilizing the Orbital Operations Simulator (OOS). This report is the deliverable D3 in our project activity and provides the test results of the fuzzy learning translational controller. This report is organized in six sections. Based on our experience and analysis with the attitude controller, we have modified the basic configuration of the reinforcement learning algorithm in ARIC as described in section 2. The shuttle translational controller and its implementation in fuzzy learning architecture is described in section 3. Two test cases that we have performed are described in section 4. Our results and conclusions are discussed in section 5, and section 6 provides future plans and summary for the project.
Chang, Li-Chiu; Chen, Pin-An; Chang, Fi-John
2012-08-01
A reliable forecast of future events possesses great value. The main purpose of this paper is to propose an innovative learning technique for reinforcing the accuracy of two-step-ahead (2SA) forecasts. The real-time recurrent learning (RTRL) algorithm for recurrent neural networks (RNNs) can effectively model the dynamics of complex processes and has been used successfully in one-step-ahead forecasts for various time series. A reinforced RTRL algorithm for 2SA forecasts using RNNs is proposed in this paper, and its performance is investigated by two famous benchmark time series and a streamflow during flood events in Taiwan. Results demonstrate that the proposed reinforced 2SA RTRL algorithm for RNNs can adequately forecast the benchmark (theoretical) time series, significantly improve the accuracy of flood forecasts, and effectively reduce time-lag effects.
Wang, Yiwen; Wang, Fang; Xu, Kai; Zhang, Qiaosheng; Zhang, Shaomin; Zheng, Xiaoxiang
2015-05-01
Reinforcement learning (RL)-based brain machine interfaces (BMIs) enable the user to learn from the environment through interactions to complete the task without desired signals, which is promising for clinical applications. Previous studies exploited Q-learning techniques to discriminate neural states into simple directional actions providing the trial initial timing. However, the movements in BMI applications can be quite complicated, and the action timing explicitly shows the intention when to move. The rich actions and the corresponding neural states form a large state-action space, imposing generalization difficulty on Q-learning. In this paper, we propose to adopt attention-gated reinforcement learning (AGREL) as a new learning scheme for BMIs to adaptively decode high-dimensional neural activities into seven distinct movements (directional moves, holdings and resting) due to the efficient weight-updating. We apply AGREL on neural data recorded from M1 of a monkey to directly predict a seven-action set in a time sequence to reconstruct the trajectory of a center-out task. Compared to Q-learning techniques, AGREL could improve the target acquisition rate to 90.16% in average with faster convergence and more stability to follow neural activity over multiple days, indicating the potential to achieve better online decoding performance for more complicated BMI tasks.
Stimulating Deep Learning Using Active Learning Techniques
ERIC Educational Resources Information Center
Yew, Tee Meng; Dawood, Fauziah K. P.; a/p S. Narayansany, Kannaki; a/p Palaniappa Manickam, M. Kamala; Jen, Leong Siok; Hoay, Kuan Chin
2016-01-01
When students and teachers behave in ways that reinforce learning as a spectator sport, the result can often be a classroom and overall learning environment that is mostly limited to transmission of information and rote learning rather than deep approaches towards meaningful construction and application of knowledge. A group of college instructors…
English and the Learning-Disabled Student: A Survey of Research.
ERIC Educational Resources Information Center
Siegel, Gerald
The author reviews literature on teaching the learning disabled (LD) in college English classrooms. He notes work by V. Davis which suggests the following methods and techniques: (1) reinforce coping techniques the students have already developed; (2) provide help with reading tasks through summaries of vocabulary; (3) allow taping of classes (to…
Figure Analysis: An Implementation Dialogue
ERIC Educational Resources Information Center
Wiles, Amy M.
2016-01-01
Figure analysis is a novel active learning teaching technique that reinforces visual literacy. Small groups of students discuss diagrams in class in order to learn content. The instructor then gives a brief introduction and later summarizes the content of the figure. This teaching technique can be used in place of lecture as a mechanism to deliver…
NASA Technical Reports Server (NTRS)
Jani, Yashvant
1992-01-01
As part of the Research Institute for Computing and Information Systems (RICIS) activity, the reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Max satellite simulation. This activity is carried out in the software technology laboratory utilizing the Orbital Operations Simulator (OOS). This interim report provides the status of the project and outlines the future plans.
A reinforcement learning-based architecture for fuzzy logic control
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1992-01-01
This paper introduces a new method for learning to refine a rule-based fuzzy logic controller. A reinforcement learning technique is used in conjunction with a multilayer neural network model of a fuzzy controller. The approximate reasoning based intelligent control (ARIC) architecture proposed here learns by updating its prediction of the physical system's behavior and fine tunes a control knowledge base. Its theory is related to Sutton's temporal difference (TD) method. Because ARIC has the advantage of using the control knowledge of an experienced operator and fine tuning it through the process of learning, it learns faster than systems that train networks from scratch. The approach is applied to a cart-pole balancing system.
Deep imitation learning for 3D navigation tasks.
Hussein, Ahmed; Elyan, Eyad; Gaber, Mohamed Medhat; Jayne, Chrisina
2018-01-01
Deep learning techniques have shown success in learning from raw high-dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in imitation learning has been scarcely explored. Imitation learning can be an efficient method to teach intelligent agents by providing a set of demonstrations to learn from. However, generalizing to situations that are not represented in the demonstrations can be challenging, especially in 3D environments. In this paper, we propose a deep imitation learning method to learn navigation tasks from demonstrations in a 3D environment. The supervised policy is refined using active learning in order to generalize to unseen situations. This approach is compared to two popular deep reinforcement learning techniques: deep-Q-networks and Asynchronous actor-critic (A3C). The proposed method as well as the reinforcement learning methods employ deep convolutional neural networks and learn directly from raw visual input. Methods for combining learning from demonstrations and experience are also investigated. This combination aims to join the generalization ability of learning by experience with the efficiency of learning by imitation. The proposed methods are evaluated on 4 navigation tasks in a 3D simulated environment. Navigation tasks are a typical problem that is relevant to many real applications. They pose the challenge of requiring demonstrations of long trajectories to reach the target and only providing delayed rewards (usually terminal) to the agent. The experiments show that the proposed method can successfully learn navigation tasks from raw visual input while learning from experience methods fail to learn an effective policy. Moreover, it is shown that active learning can significantly improve the performance of the initially learned policy using a small number of active samples.
Reinforcement learning produces dominant strategies for the Iterated Prisoner's Dilemma.
Harper, Marc; Knight, Vincent; Jones, Martin; Koutsovoulos, Georgios; Glynatsi, Nikoleta E; Campbell, Owen
2017-01-01
We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also.
Studies Show Curricular Efficiency Can Be Attained.
ERIC Educational Resources Information Center
Walberg, Herbert J.
1987-01-01
Reviews the nine factors contributing to educational productivity, the effectiveness of instructional techniques (mastery learning ranks high and Skinnerian reinforcement has the largest overall effect), and the effects of psychological enviroments on learning. Includes references and a table. (MD)
A Cooperative Approach To Teaching Mineral Identification.
ERIC Educational Resources Information Center
Constantopoulos, Terri Lynn
1994-01-01
Describes Jigsaw Teaching, a cooperative learning approach, in relation to mineral identification. This technique may also be applied to rock identification. Students work in groups of four and learn to identify 20 minerals, becoming an "expert" on five of them. Helping to teach other students reinforces what each student has learned.…
Reinforcement learning produces dominant strategies for the Iterated Prisoner’s Dilemma
Jones, Martin; Koutsovoulos, Georgios; Glynatsi, Nikoleta E.; Campbell, Owen
2017-01-01
We present tournament results and several powerful strategies for the Iterated Prisoner’s Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also. PMID:29228001
Design issues for a reinforcement-based self-learning fuzzy controller
NASA Technical Reports Server (NTRS)
Yen, John; Wang, Haojin; Dauherity, Walter
1993-01-01
Fuzzy logic controllers have some often cited advantages over conventional techniques such as PID control: easy implementation, its accommodation to natural language, the ability to cover wider range of operating conditions and others. One major obstacle that hinders its broader application is the lack of a systematic way to develop and modify its rules and as result the creation and modification of fuzzy rules often depends on try-error or pure experimentation. One of the proposed approaches to address this issue is self-learning fuzzy logic controllers (SFLC) that use reinforcement learning techniques to learn the desirability of states and to adjust the consequent part of fuzzy control rules accordingly. Due to the different dynamics of the controlled processes, the performance of self-learning fuzzy controller is highly contingent on the design. The design issue has not received sufficient attention. The issues related to the design of a SFLC for the application to chemical process are discussed and its performance is compared with that of PID and self-tuning fuzzy logic controller.
Salina, Loris; Ruffinengo, Carlo; Garrino, Lorenza; Massariello, Patrizia; Charrier, Lorena; Martin, Barbara; Favale, Maria Santina; Dimonte, Valerio
2012-05-01
The Undergraduate Nursing Course has been using videos for the past year or so. Videos are used for many different purposes such as during lessons, nurse refresher courses, reinforcement, and sharing and comparison of knowledge with the professional and scientific community. The purpose of this study was to estimate the efficacy of the video (moving an uncooperative patient from the supine to the lateral position) as an instrument to refresh and reinforce nursing techniques. A two-arm randomized controlled trial (RCT) design was chosen: both groups attended lessons in the classroom as well as in the laboratory; a month later while one group received written information as a refresher, the other group watched the video. Both groups were evaluated in a blinded fashion. A total of 223 students agreed to take part in the study. The difference observed between those who had seen the video and those who had read up on the technique turned out to be an average of 6.19 points in favour of the first (P < 0.05). The results of the RCT demonstrated that students who had seen the video were better able to apply the technique, resulting in a better performance. The video, therefore, represents an important tool to refresh and reinforce previous learning.
Coker, Joshua; Castiglioni, Analia; Kraemer, Ryan R; Massie, F Stanford; Morris, Jason L; Rodriguez, Martin; Russell, Stephen W; Shaneyfelt, Terrance; Willett, Lisa L; Estrada, Carlos A
2014-03-01
Current evaluation tools of medical school courses are limited by the scope of questions asked and may not fully engage the student to think on areas to improve. The authors sought to explore whether a technique to study consumer preferences would elicit specific and prioritized information for course evaluation from medical students. Using the nominal group technique (4 sessions), 12 senior medical students prioritized and weighed expectations and topics learned in a 100-hour advanced physical diagnosis course (4-week course; February 2012). Students weighted their top 3 responses (top = 3, middle = 2 and bottom = 1). Before the course, 12 students identified 23 topics they expected to learn; the top 3 were review sensitivity/specificity and high-yield techniques (percentage of total weight, 18.5%), improving diagnosis (13.8%) and reinforce usual and less well-known techniques (13.8%). After the course, students generated 22 topics learned; the top 3 were practice and reinforce advanced maneuvers (25.4%), gaining confidence (22.5%) and learn the evidence (16.9%). The authors observed no differences in the priority of responses before and after the course (P = 0.07). In a physical diagnosis course, medical students elicited specific and prioritized information using the nominal group technique. The course met student expectations regarding education of the evidence-based physical examination, building skills and confidence on the proper techniques and maneuvers and experiential learning. The novel use for curriculum evaluation may be used to evaluate other courses-especially comprehensive and multicomponent courses.
ERIC Educational Resources Information Center
Munoz-Organero, M.; Munoz-Merino, P. J.; Kloos, Carlos Delgado
2011-01-01
The use of technology in learning environments should be targeted at improving the learning outcome of the process. Several technology enhanced techniques can be used for maximizing the learning gain of particular students when having access to learning resources. One of them is content adaptation. Adapting content is especially important when…
Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning
Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien
2015-01-01
Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction. PMID:26065018
Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning.
Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien
2015-01-01
Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction.
Collaborating Fuzzy Reinforcement Learning Agents
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1997-01-01
Earlier, we introduced GARIC-Q, a new method for doing incremental Dynamic Programming using a society of intelligent agents which are controlled at the top level by Fuzzy Relearning and at the local level, each agent learns and operates based on ANTARCTIC, a technique for fuzzy reinforcement learning. In this paper, we show that it is possible for these agents to compete in order to affect the selected control policy but at the same time, they can collaborate while investigating the state space. In this model, the evaluator or the critic learns by observing all the agents behaviors but the control policy changes only based on the behavior of the winning agent also known as the super agent.
[Learning experience of acupuncture technique from professor ZHANG Jin].
Xue, Hongsheng; Zhang, Jin
2017-08-12
As a famous acupuncturist in the world, professor ZHANG Jin believes the key of acupuncture technique is the use of force, and the understanding of the "concentrating the force into needle body" is essential to understand the essence of acupuncture technique. With deep study of Huangdi Neijing ( The Inner Canon of Huangdi ) and Zhenjiu Dacheng ( Compendium of Acupuncture and Moxibustion ), the author further learned professor ZHANG Jin 's theory and operation specification of "concentrating force into needle body, so the force arriving before and together with needle". The whole-body force should be subtly focused on the tip of needle, and gentle force at tip of needle could get significant reinforcing and reducing effect. In addition, proper timing at tip of needle could start reinforcing and reducing effect, lead qi to disease location, and achieve superior clinical efficacy.
ERIC Educational Resources Information Center
Holburn, C. Steven; Dougher, Michael J.
1985-01-01
Techniques for training a severely retarded blind client to exit his living unit during a fire drill used a combination of negative and positive reinforcement. Following a shaping procedure, the client learned to leave his living unit from any internal point through generalization training and subsequent test probes. (Author/CL)
2010-02-01
multi-agent reputation management. State abstraction is a technique used to allow machine learning technologies to cope with problems that have large...state abstrac- tion process to enable reinforcement learning in domains with large state spaces. State abstraction is vital to machine learning ...across a collective of independent platforms. These individual elements, often referred to as agents in the machine learning community, should exhibit both
Álvarez de Toledo, Santiago; Anguera, Aurea; Barreiro, José M; Lara, Juan A; Lizcano, David
2017-01-19
Over the last few decades, a number of reinforcement learning techniques have emerged, and different reinforcement learning-based applications have proliferated. However, such techniques tend to specialize in a particular field. This is an obstacle to their generalization and extrapolation to other areas. Besides, neither the reward-punishment (r-p) learning process nor the convergence of results is fast and efficient enough. To address these obstacles, this research proposes a general reinforcement learning model. This model is independent of input and output types and based on general bioinspired principles that help to speed up the learning process. The model is composed of a perception module based on sensors whose specific perceptions are mapped as perception patterns. In this manner, similar perceptions (even if perceived at different positions in the environment) are accounted for by the same perception pattern. Additionally, the model includes a procedure that statistically associates perception-action pattern pairs depending on the positive or negative results output by executing the respective action in response to a particular perception during the learning process. To do this, the model is fitted with a mechanism that reacts positively or negatively to particular sensory stimuli in order to rate results. The model is supplemented by an action module that can be configured depending on the maneuverability of each specific agent. The model has been applied in the air navigation domain, a field with strong safety restrictions, which led us to implement a simulated system equipped with the proposed model. Accordingly, the perception sensors were based on Automatic Dependent Surveillance-Broadcast (ADS-B) technology, which is described in this paper. The results were quite satisfactory, and it outperformed traditional methods existing in the literature with respect to learning reliability and efficiency.
Álvarez de Toledo, Santiago; Anguera, Aurea; Barreiro, José M.; Lara, Juan A.; Lizcano, David
2017-01-01
Over the last few decades, a number of reinforcement learning techniques have emerged, and different reinforcement learning-based applications have proliferated. However, such techniques tend to specialize in a particular field. This is an obstacle to their generalization and extrapolation to other areas. Besides, neither the reward-punishment (r-p) learning process nor the convergence of results is fast and efficient enough. To address these obstacles, this research proposes a general reinforcement learning model. This model is independent of input and output types and based on general bioinspired principles that help to speed up the learning process. The model is composed of a perception module based on sensors whose specific perceptions are mapped as perception patterns. In this manner, similar perceptions (even if perceived at different positions in the environment) are accounted for by the same perception pattern. Additionally, the model includes a procedure that statistically associates perception-action pattern pairs depending on the positive or negative results output by executing the respective action in response to a particular perception during the learning process. To do this, the model is fitted with a mechanism that reacts positively or negatively to particular sensory stimuli in order to rate results. The model is supplemented by an action module that can be configured depending on the maneuverability of each specific agent. The model has been applied in the air navigation domain, a field with strong safety restrictions, which led us to implement a simulated system equipped with the proposed model. Accordingly, the perception sensors were based on Automatic Dependent Surveillance-Broadcast (ADS-B) technology, which is described in this paper. The results were quite satisfactory, and it outperformed traditional methods existing in the literature with respect to learning reliability and efficiency. PMID:28106849
NASA Technical Reports Server (NTRS)
Jani, Yashvant
1993-01-01
As part of the RICIS project, the reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Maximum Mission (SMM) satellite simulation. In utilizing these fuzzy learning techniques, we use the Approximate Reasoning based Intelligent Control (ARIC) architecture, and so we use these two terms interchangeably to imply the same. This activity is carried out in the Software Technology Laboratory utilizing the Orbital Operations Simulator (OOS) and programming/testing support from other contractor personnel. This report is the final deliverable D4 in our milestones and project activity. It provides the test results for the special testcase of approach/docking scenario for the shuttle and SMM satellite. Based on our experience and analysis with the attitude and translational controllers, we have modified the basic configuration of the reinforcement learning algorithm in ARIC. The shuttle translational controller and its implementation in ARIC is described in our deliverable D3. In order to simulate the final approach and docking operations, we have set-up this special testcase as described in section 2. The ARIC performance results for these operations are discussed in section 3 and conclusions are provided in section 4 along with the summary for the project.
Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C
2014-03-01
Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.
NASA Technical Reports Server (NTRS)
Yen, John; Wang, Haojin; Daugherity, Walter C.
1992-01-01
Fuzzy logic controllers have some often-cited advantages over conventional techniques such as PID control, including easier implementation, accommodation to natural language, and the ability to cover a wider range of operating conditions. One major obstacle that hinders the broader application of fuzzy logic controllers is the lack of a systematic way to develop and modify their rules; as a result the creation and modification of fuzzy rules often depends on trial and error or pure experimentation. One of the proposed approaches to address this issue is a self-learning fuzzy logic controller (SFLC) that uses reinforcement learning techniques to learn the desirability of states and to adjust the consequent part of its fuzzy control rules accordingly. Due to the different dynamics of the controlled processes, the performance of a self-learning fuzzy controller is highly contingent on its design. The design issue has not received sufficient attention. The issues related to the design of a SFLC for application to a petrochemical process are discussed, and its performance is compared with that of a PID and a self-tuning fuzzy logic controller.
Motivation and Teaching: A Practical Guide.
ERIC Educational Resources Information Center
Wlodkowski, Raymond J.
Motivational techniques and strategies that may be used by teachers to strengthen student performance and reinforce positive attitudes toward learning are discussed and analyzed. The topic is divided into eight major factors for consideration: (1) why human behavior occurs, what is motivation, the mythology of motivation and learning, and the time…
Preliminary Work for Examining the Scalability of Reinforcement Learning
NASA Technical Reports Server (NTRS)
Clouse, Jeff
1998-01-01
Researchers began studying automated agents that learn to perform multiple-step tasks early in the history of artificial intelligence (Samuel, 1963; Samuel, 1967; Waterman, 1970; Fikes, Hart & Nilsonn, 1972). Multiple-step tasks are tasks that can only be solved via a sequence of decisions, such as control problems, robotics problems, classic problem-solving, and game-playing. The objective of agents attempting to learn such tasks is to use the resources they have available in order to become more proficient at the tasks. In particular, each agent attempts to develop a good policy, a mapping from states to actions, that allows it to select actions that optimize a measure of its performance on the task; for example, reducing the number of steps necessary to complete the task successfully. Our study focuses on reinforcement learning, a set of learning techniques where the learner performs trial-and-error experiments in the task and adapts its policy based on the outcome of those experiments. Much of the work in reinforcement learning has focused on a particular, simple representation, where every problem state is represented explicitly in a table, and associated with each state are the actions that can be chosen in that state. A major advantage of this table lookup representation is that one can prove that certain reinforcement learning techniques will develop an optimal policy for the current task. The drawback is that the representation limits the application of reinforcement learning to multiple-step tasks with relatively small state-spaces. There has been a little theoretical work that proves that convergence to optimal solutions can be obtained when using generalization structures, but the structures are quite simple. The theory says little about complex structures, such as multi-layer, feedforward artificial neural networks (Rumelhart & McClelland, 1986), but empirical results indicate that the use of reinforcement learning with such structures is promising. These empirical results make no theoretical claims, nor compare the policies produced to optimal policies. A goal of our work is to be able to make the comparison between an optimal policy and one stored in an artificial neural network. A difficulty of performing such a study is finding a multiple-step task that is small enough that one can find an optimal policy using table lookup, yet large enough that, for practical purposes, an artificial neural network is really required. We have identified a limited form of the game OTHELLO as satisfying these requirements. The work we report here is in the very preliminary stages of research, but this paper provides background for the problem being studied and a description of our initial approach to examining the problem. In the remainder of this paper, we first describe reinforcement learning in more detail. Next, we present the game OTHELLO. Finally we argue that a restricted form of the game meets the requirements of our study, and describe our preliminary approach to finding an optimal solution to the problem.
Therrien, Amanda S; Wolpert, Daniel M; Bastian, Amy J
2016-01-01
Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. © The Author (2015). Published by Oxford University Press on behalf of the Guarantors of Brain.
Therrien, Amanda S.; Wolpert, Daniel M.
2016-01-01
Abstract See Miall and Galea (doi: 10.1093/awv343 ) for a scientific commentary on this article. Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. PMID:26626368
NASA Astrophysics Data System (ADS)
Radac, Mircea-Bogdan; Precup, Radu-Emil; Roman, Raul-Cristian
2017-04-01
This paper proposes the combination of two model-free controller tuning techniques, namely linear virtual reference feedback tuning (VRFT) and nonlinear state-feedback Q-learning, referred to as a new mixed VRFT-Q learning approach. VRFT is first used to find stabilising feedback controller using input-output experimental data from the process in a model reference tracking setting. Reinforcement Q-learning is next applied in the same setting using input-state experimental data collected under perturbed VRFT to ensure good exploration. The Q-learning controller learned with a batch fitted Q iteration algorithm uses two neural networks, one for the Q-function estimator and one for the controller, respectively. The VRFT-Q learning approach is validated on position control of a two-degrees-of-motion open-loop stable multi input-multi output (MIMO) aerodynamic system (AS). Extensive simulations for the two independent control channels of the MIMO AS show that the Q-learning controllers clearly improve performance over the VRFT controllers.
ERIC Educational Resources Information Center
Corbett, James J.; Kezim, Boualem; Stewart, James
2010-01-01
This study investigates the effectiveness of a video team-based activity as a learning experience in a sales management course. Students perceived this learning activity approach as a beneficial and effective instructional technique. The benefits of making a video in a marketing course reinforce the understanding and the use of the sales process…
Apprenticeship Learning: Learning to Schedule from Human Experts
2016-06-09
approaches to learning such models are based on Markov models, such as reinforcement learning or inverse reinforcement learning (Busoniu, Babuska, and De...via inverse reinforcement learning. In ICML. Barto, A. G., and Mahadevan, S. 2003. Recent advances in hierarchical reinforcement learning. Discrete...of tasks with temporal constraints. In Proc. AAAI, 2110–2116. Odom, P., and Natarajan, S. 2015. Active advice seeking for inverse reinforcement
Use of Inverse Reinforcement Learning for Identity Prediction
NASA Technical Reports Server (NTRS)
Hayes, Roy; Bao, Jonathan; Beling, Peter; Horowitz, Barry
2011-01-01
We adopt Markov Decision Processes (MDP) to model sequential decision problems, which have the characteristic that the current decision made by a human decision maker has an uncertain impact on future opportunity. We hypothesize that the individuality of decision makers can be modeled as differences in the reward function under a common MDP model. A machine learning technique, Inverse Reinforcement Learning (IRL), was used to learn an individual's reward function based on limited observation of his or her decision choices. This work serves as an initial investigation for using IRL to analyze decision making, conducted through a human experiment in a cyber shopping environment. Specifically, the ability to determine the demographic identity of users is conducted through prediction analysis and supervised learning. The results show that IRL can be used to correctly identify participants, at a rate of 68% for gender and 66% for one of three college major categories.
Miranda-Morales, Roberto Sebastián; Nizhnikov, Michael E.; Spear, Norman E.
2014-01-01
Prenatal ethanol exposure modifies postnatal affinity to the drug, increasing the probability of ethanol use and abuse. The present study tested developing rats (5-day-old) in a novel operant technique to assess the degree of ethanol self-administration as a result of prenatal exposure to low ethanol doses during late gestation. On a single occasion during each of gestational days 17–20, pregnant rats were intragastrically administered ethanol 1 g/kg, or water (vehicle). On postnatal day 5, pups were tested on a novel operant conditioning procedure in which they learned to touch a sensor to obtain 0.1% saccharin, 3% ethanol, or 5% ethanol. Immediately after a 15-min training session, a 6-min extinction session was given in which operant behavior had no consequence. Pups were positioned on a smooth surface and had access to a touch-sensitive sensor. Physical contact with the sensor activated an infusion pump, which served to deliver an intraoral solution as reinforcement (Paired group). A Yoked control animal evaluated at the same time received the reinforcer when its corresponding Paired pup touched the sensor. Operant behavior to gain access to 3% ethanol was facilitated by prenatal exposure to ethanol during late gestation. In contrast, operant learning reflecting ethanol reinforcement did not occur in control animals prenatally exposed to water only. Similarly, saccharin reinforcement was not affected by prenatal ethanol exposure. These results suggest that in 5-day-old rats, prenatal exposure to a low ethanol dose facilitates operant learning reinforced by intraoral administration of a low-concentration ethanol solution. This emphasizes the importance of intrauterine experiences with ethanol in later susceptibility to drug reinforcement. The present operant conditioning technique represents an alternative tool to assess self-administration and seeking behavior during early stages of development. PMID:24355072
Miranda-Morales, Roberto Sebastián; Nizhnikov, Michael E; Spear, Norman E
2014-02-01
Prenatal ethanol exposure modifies postnatal affinity to the drug, increasing the probability of ethanol use and abuse. The present study tested developing rats (5-day-old) in a novel operant technique to assess the degree of ethanol self-administration as a result of prenatal exposure to low ethanol doses during late gestation. On a single occasion during each of gestational days 17-20, pregnant rats were intragastrically administered ethanol 1 g/kg, or water (vehicle). On postnatal day 5, pups were tested on a novel operant conditioning procedure in which they learned to touch a sensor to obtain 0.1% saccharin, 3% ethanol, or 5% ethanol. Immediately after a 15-min training session, a 6-min extinction session was given in which operant behavior had no consequence. Pups were positioned on a smooth surface and had access to a touch-sensitive sensor. Physical contact with the sensor activated an infusion pump, which served to deliver an intraoral solution as reinforcement (Paired group). A Yoked control animal evaluated at the same time received the reinforcer when its corresponding Paired pup touched the sensor. Operant behavior to gain access to 3% ethanol was facilitated by prenatal exposure to ethanol during late gestation. In contrast, operant learning reflecting ethanol reinforcement did not occur in control animals prenatally exposed to water only. Similarly, saccharin reinforcement was not affected by prenatal ethanol exposure. These results suggest that in 5-day-old rats, prenatal exposure to a low ethanol dose facilitates operant learning reinforced by intraoral administration of a low-concentration ethanol solution. This emphasizes the importance of intrauterine experiences with ethanol in later susceptibility to drug reinforcement. The present operant conditioning technique represents an alternative tool to assess self-administration and seeking behavior during early stages of development. Published by Elsevier Inc.
An architecture for designing fuzzy logic controllers using neural networks
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1991-01-01
Described here is an architecture for designing fuzzy controllers through a hierarchical process of control rule acquisition and by using special classes of neural network learning techniques. A new method for learning to refine a fuzzy logic controller is introduced. A reinforcement learning technique is used in conjunction with a multi-layer neural network model of a fuzzy controller. The model learns by updating its prediction of the plant's behavior and is related to the Sutton's Temporal Difference (TD) method. The method proposed here has the advantage of using the control knowledge of an experienced operator and fine-tuning it through the process of learning. The approach is applied to a cart-pole balancing system.
NASA Astrophysics Data System (ADS)
Han, Ke-Zhen; Feng, Jian; Cui, Xiaohong
2017-10-01
This paper considers the fault-tolerant optimised tracking control (FTOTC) problem for unknown discrete-time linear system. A research scheme is proposed on the basis of data-based parity space identification, reinforcement learning and residual compensation techniques. The main characteristic of this research scheme lies in the parity-space-identification-based simultaneous tracking control and residual compensation. The specific technical line consists of four main contents: apply subspace aided method to design observer-based residual generator; use reinforcement Q-learning approach to solve optimised tracking control policy; rely on robust H∞ theory to achieve noise attenuation; adopt fault estimation triggered by residual generator to perform fault compensation. To clarify the design and implementation procedures, an integrated algorithm is further constructed to link up these four functional units. The detailed analysis and proof are subsequently given to explain the guaranteed FTOTC performance of the proposed conclusions. Finally, a case simulation is provided to verify its effectiveness.
Outlining Techniques That Help Disabled Readers.
ERIC Educational Resources Information Center
Giordano, Gerard
1982-01-01
As alternatives to hierarchical outlining, pictorial, topical, and critical outlining kinesthetically reinforce reading comprehension and can be useful in helping older students who are learning disabled or poor readers. Examples of each approach are given. (CL)
The Effects of Interspersal and Reinforcement on Math Fact Accuracy and Learning Rate
ERIC Educational Resources Information Center
Rumberger, Jessica L.
2013-01-01
Mathematics skill acquisition is a crucial component of education and ongoing research is needed to determine quality instructional techniques. A ubiquitous instructional question is how to manage time. This study investigated several flashcard presentation methods to determine the one that would provide the most learning in a set amount of time.…
Application of a model of instrumental conditioning to mobile robot control
NASA Astrophysics Data System (ADS)
Saksida, Lisa M.; Touretzky, D. S.
1997-09-01
Instrumental conditioning is a psychological process whereby an animal learns to associate its actions with their consequences. This type of learning is exploited in animal training techniques such as 'shaping by successive approximations,' which enables trainers to gradually adjust the animal's behavior by giving strategically timed reinforcements. While this is similar in principle to reinforcement learning, the real phenomenon includes many subtle effects not considered in the machine learning literature. In addition, a good deal of domain information is utilized by an animal learning a new task; it does not start from scratch every time it learns a new behavior. For these reasons, it is not surprising that mobile robot learning algorithms have yet to approach the sophistication and robustness of animal learning. A serious attempt to model instrumental learning could prove fruitful for improving machine learning techniques. In the present paper, we develop a computational theory of shaping at a level appropriate for controlling mobile robots. The theory is based on a series of mechanisms for 'behavior editing,' in which pre-existing behaviors, either innate or previously learned, can be dramatically changed in magnitude, shifted in direction, or otherwise manipulated so as to produce new behavioral routines. We have implemented our theory on Amelia, an RWI B21 mobile robot equipped with a gripper and color video camera. We provide results from training Amelia on several tasks, all of which were constructed as variations of one innate behavior, object-pursuit.
A Machine Learning Concept for DTN Routing
NASA Technical Reports Server (NTRS)
Dudukovich, Rachel; Hylton, Alan; Papachristou, Christos
2017-01-01
This paper discusses the concept and architecture of a machine learning based router for delay tolerant space networks. The techniques of reinforcement learning and Bayesian learning are used to supplement the routing decisions of the popular Contact Graph Routing algorithm. An introduction to the concepts of Contact Graph Routing, Q-routing and Naive Bayes classification are given. The development of an architecture for a cross-layer feedback framework for DTN (Delay-Tolerant Networking) protocols is discussed. Finally, initial simulation setup and results are given.
Multi-Objective Reinforcement Learning for Cognitive Radio-Based Satellite Communications
NASA Technical Reports Server (NTRS)
Ferreira, Paulo Victor R.; Paffenroth, Randy; Wyglinski, Alexander M.; Hackett, Timothy M.; Bilen, Sven G.; Reinhart, Richard C.; Mortensen, Dale J.
2016-01-01
Previous research on cognitive radios has addressed the performance of various machine-learning and optimization techniques for decision making of terrestrial link properties. In this paper, we present our recent investigations with respect to reinforcement learning that potentially can be employed by future cognitive radios installed onboard satellite communications systems specifically tasked with radio resource management. This work analyzes the performance of learning, reasoning, and decision making while considering multiple objectives for time-varying communications channels, as well as different cross-layer requirements. Based on the urgent demand for increased bandwidth, which is being addressed by the next generation of high-throughput satellites, the performance of cognitive radio is assessed considering links between a geostationary satellite and a fixed ground station operating at Ka-band (26 GHz). Simulation results show multiple objective performance improvements of more than 3.5 times for clear sky conditions and 6.8 times for rain conditions.
Lee, Elliot; Lavieri, Mariel S; Volk, Michael L; Xu, Yongcai
2015-09-01
We investigate the problem faced by a healthcare system wishing to allocate its constrained screening resources across a population at risk for developing a disease. A patient's risk of developing the disease depends on his/her biomedical dynamics. However, knowledge of these dynamics must be learned by the system over time. Three classes of reinforcement learning policies are designed to address this problem of simultaneously gathering and utilizing information across multiple patients. We investigate a case study based upon the screening for Hepatocellular Carcinoma (HCC), and optimize each of the three classes of policies using the indifference zone method. A simulation is built to gauge the performance of these policies, and their performance is compared to current practice. We then demonstrate how the benefits of learning-based screening policies differ across various levels of resource scarcity and provide metrics of policy performance.
Multi-Objective Reinforcement Learning for Cognitive Radio Based Satellite Communications
NASA Technical Reports Server (NTRS)
Ferreira, Paulo; Paffenroth, Randy; Wyglinski, Alexander; Hackett, Timothy; Bilen, Sven; Reinhart, Richard; Mortensen, Dale John
2016-01-01
Previous research on cognitive radios has addressed the performance of various machine learning and optimization techniques for decision making of terrestrial link properties. In this paper, we present our recent investigations with respect to reinforcement learning that potentially can be employed by future cognitive radios installed onboard satellite communications systems specifically tasked with radio resource management. This work analyzes the performance of learning, reasoning, and decision making while considering multiple objectives for time-varying communications channels, as well as different crosslayer requirements. Based on the urgent demand for increased bandwidth, which is being addressed by the next generation of high-throughput satellites, the performance of cognitive radio is assessed considering links between a geostationary satellite and a fixed ground station operating at Ka-band (26 GHz). Simulation results show multiple objective performance improvements of more than 3:5 times for clear sky conditions and 6:8 times for rain conditions.
ERIC Educational Resources Information Center
Giannoukos, Georgios; Besas, Georgios; Galiropoulos, Christos; Hioctour, Vasilios
2015-01-01
This paper is concerned with the methods and techniques used in adult education in order to allow the educator to successfully respond to suitable learning experiences on the part of the learner as well as to reinforce interaction between the learners. The strategies adopted, teaching aids and the choice of suitable teaching material also is…
Rational and Mechanistic Perspectives on Reinforcement Learning
ERIC Educational Resources Information Center
Chater, Nick
2009-01-01
This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…
Simultaneous vibration control and energy harvesting using actor-critic based reinforcement learning
NASA Astrophysics Data System (ADS)
Loong, Cheng Ning; Chang, C. C.; Dimitrakopoulos, Elias G.
2018-03-01
Mitigating excessive vibration of civil engineering structures using various types of devices has been a conspicuous research topic in the past few decades. Some devices, such as electromagnetic transducers, which have a capability of exerting control forces while simultaneously harvesting energy, have been proposed recently. These devices make possible a self-regenerative system that can semi-actively mitigate structural vibration without the need of external energy. Integrating mechanical, electrical components, and control algorithms, these devices open up a new research domain that needs to be addressed. In this study, the feasibility of using an actor-critic based reinforcement learning control algorithm for simultaneous vibration control and energy harvesting for a civil engineering structure is investigated. The actor-critic based reinforcement learning control algorithm is a real-time, model-free adaptive technique that can adjust the controller parameters based on observations and reward signals without knowing the system characteristics. It is suitable for the control of a partially known nonlinear system with uncertain parameters. The feasibility of implementing this algorithm on a building structure equipped with an electromagnetic damper will be investigated in this study. Issues related to the modelling of learning algorithm, initialization and convergence will be presented and discussed.
Bublitz, Alexander; Weinhold, Severine R.; Strobel, Sophia; Dehnhardt, Guido; Hanke, Frederike D.
2017-01-01
Octopuses (Octopus vulgaris) are generally considered to possess extraordinary cognitive abilities including the ability to successfully perform in a serial reversal learning task. During reversal learning, an animal is presented with a discrimination problem and after reaching a learning criterion, the signs of the stimuli are reversed: the former positive becomes the negative stimulus and vice versa. If an animal improves its performance over reversals, it is ascribed advanced cognitive abilities. Reversal learning has been tested in octopus in a number of studies. However, the experimental procedures adopted in these studies involved pre-training on the new positive stimulus after a reversal, strong negative reinforcement or might have enabled secondary cueing by the experimenter. These procedures could have all affected the outcome of reversal learning. Thus, in this study, serial visual reversal learning was revisited in octopus. We trained four common octopuses (O. vulgaris) to discriminate between 2-dimensional stimuli presented on a monitor in a simultaneous visual discrimination task and reversed the signs of the stimuli each time the animals reached the learning criterion of ≥80% in two consecutive sessions. The animals were trained using operant conditioning techniques including a secondary reinforcer, a rod that was pushed up and down the feeding tube, which signaled the correctness of a response and preceded the subsequent primary reinforcement of food. The experimental protocol did not involve negative reinforcement. One animal completed four reversals and showed progressive improvement, i.e., it decreased its errors to criterion the more reversals it experienced. This animal developed a generalized response strategy. In contrast, another animal completed only one reversal, whereas two animals did not learn to reverse during the first reversal. In conclusion, some octopus individuals can learn to reverse in a visual task demonstrating behavioral flexibility even with a refined methodology. PMID:28223940
Distributed reinforcement learning for adaptive and robust network intrusion response
NASA Astrophysics Data System (ADS)
Malialis, Kleanthis; Devlin, Sam; Kudenko, Daniel
2015-07-01
Distributed denial of service (DDoS) attacks constitute a rapidly evolving threat in the current Internet. Multiagent Router Throttling is a novel approach to defend against DDoS attacks where multiple reinforcement learning agents are installed on a set of routers and learn to rate-limit or throttle traffic towards a victim server. The focus of this paper is on online learning and scalability. We propose an approach that incorporates task decomposition, team rewards and a form of reward shaping called difference rewards. One of the novel characteristics of the proposed system is that it provides a decentralised coordinated response to the DDoS problem, thus being resilient to DDoS attacks themselves. The proposed system learns remarkably fast, thus being suitable for online learning. Furthermore, its scalability is successfully demonstrated in experiments involving 1000 learning agents. We compare our approach against a baseline and a popular state-of-the-art throttling technique from the network security literature and show that the proposed approach is more effective, adaptive to sophisticated attack rate dynamics and robust to agent failures.
Negative reinforcement learning is affected in substance dependence.
Thompson, Laetitia L; Claus, Eric D; Mikulich-Gilbertson, Susan K; Banich, Marie T; Crowley, Thomas; Krmpotich, Theodore; Miller, David; Tanabe, Jody
2012-06-01
Negative reinforcement results in behavior to escape or avoid an aversive outcome. Withdrawal symptoms are purported to be negative reinforcers in perpetuating substance dependence, but little is known about negative reinforcement learning in this population. The purpose of this study was to examine reinforcement learning in substance dependent individuals (SDI), with an emphasis on assessing negative reinforcement learning. We modified the Iowa Gambling Task to separately assess positive and negative reinforcement. We hypothesized that SDI would show differences in negative reinforcement learning compared to controls and we investigated whether learning differed as a function of the relative magnitude or frequency of the reinforcer. Thirty subjects dependent on psychostimulants were compared with 28 community controls on a decision making task that manipulated outcome frequencies and magnitudes and required an action to avoid a negative outcome. SDI did not learn to avoid negative outcomes to the same degree as controls. This difference was driven by the magnitude, not the frequency, of negative feedback. In contrast, approach behaviors in response to positive reinforcement were similar in both groups. Our findings are consistent with a specific deficit in negative reinforcement learning in SDI. SDI were relatively insensitive to the magnitude, not frequency, of loss. If this generalizes to drug-related stimuli, it suggests that repeated episodes of withdrawal may drive relapse more than the severity of a single episode. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Davidow, Juliet Y; Foerde, Karin; Galván, Adriana; Shohamy, Daphna
2016-10-05
Adolescents are notorious for engaging in reward-seeking behaviors, a tendency attributed to heightened activity in the brain's reward systems during adolescence. It has been suggested that reward sensitivity in adolescence might be adaptive, but evidence of an adaptive role has been scarce. Using a probabilistic reinforcement learning task combined with reinforcement learning models and fMRI, we found that adolescents showed better reinforcement learning and a stronger link between reinforcement learning and episodic memory for rewarding outcomes. This behavioral benefit was related to heightened prediction error-related BOLD activity in the hippocampus and to stronger functional connectivity between the hippocampus and the striatum at the time of reinforcement. These findings reveal an important role for the hippocampus in reinforcement learning in adolescence and suggest that reward sensitivity in adolescence is related to adaptive differences in how adolescents learn from experience. Copyright © 2016 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Segal, Bertha E.
Materials from a teacher workshop on the Total Physical Response method for teaching English as a second language are presented. The technique describes the process of first language acquisition, uses physical activities in the classroom to reinforce learning, and allows a long period of receptive language learning before requiring production. The…
A System for Generating Instructional Computer Graphics.
ERIC Educational Resources Information Center
Nygard, Kendall E.; Ranganathan, Babusankar
1983-01-01
Description of the Tektronix-Based Interactive Graphics System for Instruction (TIGSI), which was developed for generating graphics displays in computer-assisted instruction materials, discusses several applications (e.g., reinforcing learning of concepts, principles, rules, and problem-solving techniques) and presents advantages of the TIGSI…
Awata, Hiroko; Wakuda, Ryo; Ishimaru, Yoshiyasu; Matsuoka, Yuji; Terao, Kanta; Katata, Satomi; Matsumoto, Yukihisa; Hamanaka, Yoshitaka; Noji, Sumihare; Mito, Taro; Mizunami, Makoto
2016-01-01
Revealing reinforcing mechanisms in associative learning is important for elucidation of brain mechanisms of behavior. In mammals, dopamine neurons are thought to mediate both appetitive and aversive reinforcement signals. Studies using transgenic fruit-flies suggested that dopamine neurons mediate both appetitive and aversive reinforcements, through the Dop1 dopamine receptor, but our studies using octopamine and dopamine receptor antagonists and using Dop1 knockout crickets suggested that octopamine neurons mediate appetitive reinforcement and dopamine neurons mediate aversive reinforcement in associative learning in crickets. To fully resolve this issue, we examined the effects of silencing of expression of genes that code the OA1 octopamine receptor and Dop1 and Dop2 dopamine receptors by RNAi in crickets. OA1-silenced crickets exhibited impairment in appetitive learning with water but not in aversive learning with sodium chloride solution, while Dop1-silenced crickets exhibited impairment in aversive learning but not in appetitive learning. Dop2-silenced crickets showed normal scores in both appetitive learning and aversive learning. The results indicate that octopamine neurons mediate appetitive reinforcement via OA1 and that dopamine neurons mediate aversive reinforcement via Dop1 in crickets, providing decisive evidence that neurotransmitters and receptors that mediate appetitive reinforcement indeed differ among different species of insects. PMID:27412401
Awata, Hiroko; Wakuda, Ryo; Ishimaru, Yoshiyasu; Matsuoka, Yuji; Terao, Kanta; Katata, Satomi; Matsumoto, Yukihisa; Hamanaka, Yoshitaka; Noji, Sumihare; Mito, Taro; Mizunami, Makoto
2016-07-14
Revealing reinforcing mechanisms in associative learning is important for elucidation of brain mechanisms of behavior. In mammals, dopamine neurons are thought to mediate both appetitive and aversive reinforcement signals. Studies using transgenic fruit-flies suggested that dopamine neurons mediate both appetitive and aversive reinforcements, through the Dop1 dopamine receptor, but our studies using octopamine and dopamine receptor antagonists and using Dop1 knockout crickets suggested that octopamine neurons mediate appetitive reinforcement and dopamine neurons mediate aversive reinforcement in associative learning in crickets. To fully resolve this issue, we examined the effects of silencing of expression of genes that code the OA1 octopamine receptor and Dop1 and Dop2 dopamine receptors by RNAi in crickets. OA1-silenced crickets exhibited impairment in appetitive learning with water but not in aversive learning with sodium chloride solution, while Dop1-silenced crickets exhibited impairment in aversive learning but not in appetitive learning. Dop2-silenced crickets showed normal scores in both appetitive learning and aversive learning. The results indicate that octopamine neurons mediate appetitive reinforcement via OA1 and that dopamine neurons mediate aversive reinforcement via Dop1 in crickets, providing decisive evidence that neurotransmitters and receptors that mediate appetitive reinforcement indeed differ among different species of insects.
Oliveira, Emileane C; Hunziker, Maria Helena
2014-07-01
In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title. Copyright © 2014. Published by Elsevier B.V.
Rational and mechanistic perspectives on reinforcement learning.
Chater, Nick
2009-12-01
This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: mechanistic and rational. Reinforcement learning is often viewed in mechanistic terms--as describing the operation of aspects of an agent's cognitive and neural machinery. Yet it can also be viewed as a rational level of description, specifically, as describing a class of methods for learning from experience, using minimal background knowledge. This paper considers how rational and mechanistic perspectives differ, and what types of evidence distinguish between them. Reinforcement learning research in the cognitive and brain sciences is often implicitly committed to the mechanistic interpretation. Here the opposite view is put forward: that accounts of reinforcement learning should apply at the rational level, unless there is strong evidence for a mechanistic interpretation. Implications of this viewpoint for reinforcement-based theories in the cognitive and brain sciences are discussed.
Valenchon, Mathilde; Lévy, Frédéric; Moussu, Chantal; Lansade, Léa
2017-01-01
The present study investigated how stress affects instrumental learning performance in horses (Equus caballus) depending on the type of reinforcement. Horses were assigned to four groups (N = 15 per group); each group received training with negative or positive reinforcement in the presence or absence of stressors unrelated to the learning task. The instrumental learning task consisted of the horse entering one of two compartments at the appearance of a visual signal given by the experimenter. In the absence of stressors unrelated to the task, learning performance did not differ between negative and positive reinforcements. The presence of stressors unrelated to the task (exposure to novel and sudden stimuli) impaired learning performance. Interestingly, this learning deficit was smaller when the negative reinforcement was used. The negative reinforcement, considered as a stressor related to the task, could have counterbalanced the impact of the extrinsic stressor by focusing attention toward the learning task. In addition, learning performance appears to differ between certain dimensions of personality depending on the presence of stressors and the type of reinforcement. These results suggest that when negative reinforcement is used (i.e. stressor related to the task), the most fearful horses may be the best performers in the absence of stressors but the worst performers when stressors are present. On the contrary, when positive reinforcement is used, the most fearful horses appear to be consistently the worst performers, with and without exposure to stressors unrelated to the learning task. This study is the first to demonstrate in ungulates that stress affects learning performance differentially according to the type of reinforcement and in interaction with personality. It provides fundamental and applied perspectives in the understanding of the relationships between personality and training abilities. PMID:28475581
ERIC Educational Resources Information Center
Brewer, Evelyn J.
1999-01-01
Describes an activity in which students use computers and techniques from Op Art to learn various geometric concepts. Allows them to see the distinct connection between art and mathematics from a personal perspective. Reinforces writing, speaking, and drawing skills while creating slide shows related to the project. (ASK)
ERIC Educational Resources Information Center
Torre, Liz; And Others
Information and accompanying exercises are provided in this learning module to reinforce basic reading, writing, and math skills and, at the same time, introduce personal assessment and job-seeking techniques. The module's first section provides suggestions for assessing personal interests and identifying the assets one has to offer an employer.…
An analysis of intergroup rivalry using Ising model and reinforcement learning
NASA Astrophysics Data System (ADS)
Zhao, Feng-Fei; Qin, Zheng; Shao, Zhuo
2014-01-01
Modeling of intergroup rivalry can help us better understand economic competitions, political elections and other similar activities. The result of intergroup rivalry depends on the co-evolution of individual behavior within one group and the impact from the rival group. In this paper, we model the rivalry behavior using Ising model. Different from other simulation studies using Ising model, the evolution rules of each individual in our model are not static, but have the ability to learn from historical experience using reinforcement learning technique, which makes the simulation more close to real human behavior. We studied the phase transition in intergroup rivalry and focused on the impact of the degree of social freedom, the personality of group members and the social experience of individuals. The results of computer simulation show that a society with a low degree of social freedom and highly educated, experienced individuals is more likely to be one-sided in intergroup rivalry.
Racial bias shapes social reinforcement learning.
Lindström, Björn; Selbing, Ida; Molapour, Tanaz; Olsson, Andreas
2014-03-01
Both emotional facial expressions and markers of racial-group belonging are ubiquitous signals in social interaction, but little is known about how these signals together affect future behavior through learning. To address this issue, we investigated how emotional (threatening or friendly) in-group and out-group faces reinforced behavior in a reinforcement-learning task. We asked whether reinforcement learning would be modulated by intergroup attitudes (i.e., racial bias). The results showed that individual differences in racial bias critically modulated reinforcement learning. As predicted, racial bias was associated with more efficiently learned avoidance of threatening out-group individuals. We used computational modeling analysis to quantitatively delimit the underlying processes affected by social reinforcement. These analyses showed that racial bias modulates the rate at which exposure to threatening out-group individuals is transformed into future avoidance behavior. In concert, these results shed new light on the learning processes underlying social interaction with racial-in-group and out-group individuals.
Hierarchically organized behavior and its neural foundations: A reinforcement-learning perspective
Botvinick, Matthew M.; Niv, Yael; Barto, Andrew C.
2009-01-01
Research on human and animal behavior has long emphasized its hierarchical structure — the divisibility of ongoing behavior into discrete tasks, which are comprised of subtask sequences, which in turn are built of simple actions. The hierarchical structure of behavior has also been of enduring interest within neuroscience, where it has been widely considered to reflect prefrontal cortical functions. In this paper, we reexamine behavioral hierarchy and its neural substrates from the point of view of recent developments in computational reinforcement learning. Specifically, we consider a set of approaches known collectively as hierarchical reinforcement learning, which extend the reinforcement learning paradigm by allowing the learning agent to aggregate actions into reusable subroutines or skills. A close look at the components of hierarchical reinforcement learning suggests how they might map onto neural structures, in particular regions within the dorsolateral and orbital prefrontal cortex. It also suggests specific ways in which hierarchical reinforcement learning might provide a complement to existing psychological models of hierarchically structured behavior. A particularly important question that hierarchical reinforcement learning brings to the fore is that of how learning identifies new action routines that are likely to provide useful building blocks in solving a wide range of future problems. Here and at many other points, hierarchical reinforcement learning offers an appealing framework for investigating the computational and neural underpinnings of hierarchically structured behavior. PMID:18926527
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aziz, H. M. Abdul; Zhu, Feng; Ukkusuri, Satish V.
Here, this research applies R-Markov Average Reward Technique based reinforcement learning (RL) algorithm, namely RMART, for vehicular signal control problem leveraging information sharing among signal controllers in connected vehicle environment. We implemented the algorithm in a network of 18 signalized intersections and compare the performance of RMART with fixed, adaptive, and variants of the RL schemes. Results show significant improvement in system performance for RMART algorithm with information sharing over both traditional fixed signal timing plans and real time adaptive control schemes. Additionally, the comparison with reinforcement learning algorithms including Q learning and SARSA indicate that RMART performs better atmore » higher congestion levels. Further, a multi-reward structure is proposed that dynamically adjusts the reward function with varying congestion states at the intersection. Finally, the results from test networks show significant reduction in emissions (CO, CO 2, NO x, VOC, PM 10) when RL algorithms are implemented compared to fixed signal timings and adaptive schemes.« less
Does arousal interfere with operant conditioning of spike-wave discharges in genetic epileptic rats?
Osterhagen, Lasse; Breteler, Marinus; van Luijtelaar, Gilles
2010-06-01
One of the ways in which brain computer interfaces can be used is neurofeedback (NF). Subjects use their brain activation to control an external device, and with this technique it is also possible to learn to control aspects of the brain activity by operant conditioning. Beneficial effects of NF training on seizure occurrence have been described in epileptic patients. Little research has been done about differentiating NF effectiveness by type of epilepsy, particularly, whether idiopathic generalized seizures are susceptible to NF. In this experiment, seizures that manifest themselves as spike-wave discharges (SWDs) in the EEG were reinforced during 10 sessions in 6 rats of the WAG/Rij strain, an animal model for absence epilepsy. EEG's were recorded before and after the training sessions. Reinforcing SWDs let to decreased SWD occurrences during training; however, the changes during training were not persistent in the post-training sessions. Because behavioural states are known to have an influence on the occurrence of SWDs, it is proposed that the reinforcement situation increased arousal which resulted in fewer SWDs. Additional tests supported this hypothesis. The outcomes have implications for the possibility to train SWDs with operant learning techniques. Copyright (c) 2010 Elsevier B.V. All rights reserved.
Better Care Teams: A Stepwise Skill Reinforcement Model.
Christopher, Beth-Anne; Grantner, Mary; Coke, Lola A; Wideman, Marilyn; Kwakwa, Francis
2016-06-01
The Building Healthy Urban Communities initiative presents a path for organizations partnering to improve patient outcomes with continuing education (CE) as a key component. Components of the CE initiative included traditional CE delivery formats with an essential element of adaptability and new methods, with rigorous evaluation over time that included evaluation prior to the course, immediately following the CE session, 6 to 8 weeks after the CE session, and then subsequent monthly "testlets." Outcome measures were designed to allow for ongoing adaptation of content, reinforcement of key learning objectives, and use of innovative concordant testing and retrieval practice techniques. The results after 1 year of programming suggest the stepwise skill reinforcement model is effective for learning and is an efficient use of financial and human resources. More important, its design is one that could be adopted at low cost by organizations willing to work in close partnership. J Contin Educ Nurs. 2016;47(6):283-288. Copyright 2016, SLACK Incorporated.
Model-Based Reinforcement Learning under Concurrent Schedules of Reinforcement in Rodents
ERIC Educational Resources Information Center
Huh, Namjung; Jo, Suhyun; Kim, Hoseok; Sul, Jung Hoon; Jung, Min Whan
2009-01-01
Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's…
GA-based fuzzy reinforcement learning for control of a magnetic bearing system.
Lin, C T; Jou, C P
2000-01-01
This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.
Theoretical assumptions of Maffesoli's sensitivity and Problem-Based Learning in Nursing Education1
Rodríguez-Borrego, María-Aurora; Nitschke, Rosane Gonçalves; do Prado, Marta Lenise; Martini, Jussara Gue; Guerra-Martín, María-Dolores; González-Galán, Carmen
2014-01-01
Objective understand the everyday and the imaginary of Nursing students in their knowledge socialization process through the Problem-Based Learning (PBL) strategy. Method Action Research, involving 86 students from the second year of an undergraduate Nursing program in Spain. A Critical Incident Questionnaire and Group interview were used. Thematic/categorical analysis, triangulation of researchers, subjects and techniques. Results the students signal the need to have a view from within, reinforcing the criticism against the schematic dualism; PBL allows one to learn how to be with the other, with his mechanical and organic solidarity; the feeling together, with its emphasis on learning to work in group and wanting to be close to the person taking care. Conclusions The great contradictions the protagonists of the process, that is, the students experience seem to express that group learning is not a form of gaining knowledge, as it makes them lose time to study. The daily, the execution time and the imaginary of how learning should be do not seem to have an intersection point in the use of Problem-Based Learning. The importance of focusing on the daily and the imaginary should be reinforced when we consider nursing education. PMID:25029064
Improving Robot Locomotion Through Learning Methods for Expensive Black-Box Systems
2013-11-01
development of a class of “gradient free” optimization techniques; these include local approaches, such as a Nelder- Mead simplex search (c.f. [73]), and global...1Note that this simple method differs from the Nelder Mead constrained nonlinear optimization method [73]. 39 the Non-dominated Sorting Genetic Algorithm...Kober, and Jan Peters. Model-free inverse reinforcement learning. In International Conference on Artificial Intelligence and Statistics, 2011. [12] George
11.2 YIP Human In the Loop Statistical RelationalLearners
2017-10-23
learning formalisms including inverse reinforcement learning [4] and statistical relational learning [7, 5, 8]. We have also applied our algorithms in...one introduced for label preferences. 4 Figure 2: Active Advice Seeking for Inverse Reinforcement Learning. active advice seeking is in selecting the...learning tasks. 1.2.1 Sequential Decision-Making Our previous work on advice for inverse reinforcement learning (IRL) defined advice as action
Artificial neural networks and approximate reasoning for intelligent control in space
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1991-01-01
A method is introduced for learning to refine the control rules of approximate reasoning-based controllers. A reinforcement-learning technique is used in conjunction with a multi-layer neural network model of an approximate reasoning-based controller. The model learns by updating its prediction of the physical system's behavior. The model can use the control knowledge of an experienced operator and fine-tune it through the process of learning. Some of the space domains suitable for applications of the model such as rendezvous and docking, camera tracking, and tethered systems control are discussed.
A Rent-Seeking Experiment for the Classroom
ERIC Educational Resources Information Center
Strow, Brian Kent; Strow, Claudia Wood
2007-01-01
Recent research has demonstrated that active learning techniques improve student comprehension and retention of abstract economic ideas such as rent seeking. Instructors can reinforce the concept of rent seeking with a classroom game, particularly one involving real money. The authors improve upon a game first introduced by Goeree and Holt (1999)…
NASA Technical Reports Server (NTRS)
Jani, Yashvant
1992-01-01
As part of the RICIS activity, the reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Max satellite simulation. This activity is carried out in the software technology laboratory utilizing the Orbital Operations Simulator (OOS). This report is deliverable D2 Altitude Control Results and provides the status of the project after four months of activities and outlines the future plans. In section 2 we describe the Fuzzy-Learner system for the attitude control functions. In section 3, we provide the description of test cases and results in a chronological order. In section 4, we have summarized our results and conclusions. Our future plans and recommendations are provided in section 5.
Effects of dopamine on reinforcement learning and consolidation in Parkinson's disease.
Grogan, John P; Tsivos, Demitra; Smith, Laura; Knight, Brogan E; Bogacz, Rafal; Whone, Alan; Coulthard, Elizabeth J
2017-07-10
Emerging evidence suggests that dopamine may modulate learning and memory with important implications for understanding the neurobiology of memory and future therapeutic targeting. An influential hypothesis posits that dopamine biases reinforcement learning. More recent data also suggest an influence during both consolidation and retrieval. Eighteen Parkinson's disease patients learned through feedback ON or OFF medication, with memory tested 24 hr later ON or OFF medication (4 conditions, within-subjects design with matched healthy control group). Patients OFF medication during learning decreased in memory accuracy over the following 24 hr. In contrast to previous studies, however, dopaminergic medication during learning and testing did not affect expression of positive or negative reinforcement. Two further experiments were run without the 24 hr delay, but they too failed to reproduce effects of dopaminergic medication on reinforcement learning. While supportive of a dopaminergic role in consolidation, this study failed to replicate previous findings on reinforcement learning.
Enhanced Experience Replay for Deep Reinforcement Learning
2015-11-01
ARL-TR-7538 ● NOV 2015 US Army Research Laboratory Enhanced Experience Replay for Deep Reinforcement Learning by David Doria...Experience Replay for Deep Reinforcement Learning by David Doria, Bryan Dawson, and Manuel Vindiola Computational and Information Sciences Directorate...
Prespeech motor learning in a neural network using reinforcement.
Warlaumont, Anne S; Westermann, Gert; Buder, Eugene H; Oller, D Kimbrough
2013-02-01
Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one's language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the different conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network's post-learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network's post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model's post-learning productions were more likely to resemble the English vowels and vice versa. Copyright © 2012 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Redish, A. David; Jensen, Steve; Johnson, Adam; Kurth-Nelson, Zeb
2007-01-01
Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated value and are thus unable to capture renewal. TDRL…
Dunsmoor, Joseph E.; Niv, Yael; Daw, Nathaniel; Phelps, Elizabeth A.
2015-01-01
Extinction serves as the leading theoretical framework and experimental model to describe how learned behaviors diminish through absence of anticipated reinforcement. In the past decade, extinction has moved beyond the realm of associative learning theory and behavioral experimentation in animals and has become a topic of considerable interest in the neuroscience of learning, memory, and emotion. Here, we review research and theories of extinction, both as a learning process and as a behavioral technique, and consider whether traditional understandings warrant a re-examination. We discuss the neurobiology, cognitive factors, and major computational theories, and revisit the predominant view that extinction results in new learning that interferes with expression of the original memory. Additionally, we reconsider the limitations of extinction as a technique to prevent the relapse of maladaptive behavior, and discuss novel approaches, informed by contemporary theoretical advances, that augment traditional extinction methods to target and potentially alter maladaptive memories. PMID:26447572
A learning theory account of depression.
Ramnerö, Jonas; Folke, Fredrik; Kanter, Jonathan W
2015-06-11
Learning theory provides a foundation for understanding and deriving treatment principles for impacting a spectrum of functional processes relevant to the construct of depression. While behavioral interventions have been commonplace in the cognitive behavioral tradition, most often conceptualized within a cognitive theoretical framework, recent years have seen renewed interest in more purely behavioral models. These modern learning theory accounts of depression focus on the interchange between behavior and the environment, mainly in terms of lack of reinforcement, extinction of instrumental behavior, and excesses of aversive control, and include a conceptualization of relevant cognitive and emotional variables. These positions, drawn from extensive basic and applied research, cohere with biological theories on reduced reward learning and reward responsiveness and views of depression as a heterogeneous, complex set of disorders. Treatment techniques based on learning theory, often labeled Behavioral Activation (BA) focus on activating the individual in directions that increase contact with potential reinforcers, as defined ideographically with the client. BA is considered an empirically well-established treatment that generalizes well across diverse contexts and populations. The learning theory account is discussed in terms of being a parsimonious model and ground for treatments highly suitable for large scale dissemination. © 2015 Scandinavian Psychological Associations and John Wiley & Sons Ltd.
Behavioral and neural properties of social reinforcement learning
Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Libby, Victoria; Glover, Gary; Voss, Henning U.; Ballon, Douglas J.; Casey, BJ
2011-01-01
Social learning is critical for engaging in complex interactions with other individuals. Learning from positive social exchanges, such as acceptance from peers, may be similar to basic reinforcement learning. We formally test this hypothesis by developing a novel paradigm that is based upon work in non-human primates and human imaging studies of reinforcement learning. The probability of receiving positive social reinforcement from three distinct peers was parametrically manipulated while brain activity was recorded in healthy adults using event-related functional magnetic resonance imaging (fMRI). Over the course of the experiment, participants responded more quickly to faces of peers who provided more frequent positive social reinforcement, and rated them as more likeable. Modeling trial-by-trial learning showed ventral striatum and orbital frontal cortex activity correlated positively with forming expectations about receiving social reinforcement. Rostral anterior cingulate cortex activity tracked positively with modulations of expected value of the cues (peers). Together, the findings across three levels of analysis - social preferences, response latencies and modeling neural responses – are consistent with reinforcement learning theory and non-human primate electrophysiological studies of reward. This work highlights the fundamental influence of acceptance by one’s peers in altering subsequent behavior. PMID:21917787
Ilango, A; Wetzel, W; Scheich, H; Ohl, F W
2010-03-31
Learned changes in behavior can be elicited by either appetitive or aversive reinforcers. It is, however, not clear whether the two types of motivation, (approaching appetitive stimuli and avoiding aversive stimuli) drive learning in the same or different ways, nor is their interaction understood in situations where the two types are combined in a single experiment. To investigate this question we have developed a novel learning paradigm for Mongolian gerbils, which not only allows rewards and punishments to be presented in isolation or in combination with each other, but also can use these opposite reinforcers to drive the same learned behavior. Specifically, we studied learning of tone-conditioned hurdle crossing in a shuttle box driven by either an appetitive reinforcer (brain stimulation reward) or an aversive reinforcer (electrical footshock), or by a combination of both. Combination of the two reinforcers potentiated speed of acquisition, led to maximum possible performance, and delayed extinction as compared to either reinforcer alone. Additional experiments, using partial reinforcement protocols and experiments in which one of the reinforcers was omitted after the animals had been previously trained with the combination of both reinforcers, indicated that appetitive and aversive reinforcers operated together but acted in different ways: in this particular experimental context, punishment appeared to be more effective for initial acquisition and reward more effective to maintain a high level of conditioned responses (CRs). The results imply that learning mechanisms in problem solving were maximally effective when the initial punishment of mistakes was combined with the subsequent rewarding of correct performance. Copyright 2010 IBRO. Published by Elsevier Ltd. All rights reserved.
Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers
ERIC Educational Resources Information Center
Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.
2014-01-01
Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…
The cerebellum: a neural system for the study of reinforcement learning.
Swain, Rodney A; Kerr, Abigail L; Thompson, Richard F
2011-01-01
In its strictest application, the term "reinforcement learning" refers to a computational approach to learning in which an agent (often a machine) interacts with a mutable environment to maximize reward through trial and error. The approach borrows essentials from several fields, most notably Computer Science, Behavioral Neuroscience, and Psychology. At the most basic level, a neural system capable of mediating reinforcement learning must be able to acquire sensory information about the external environment and internal milieu (either directly or through connectivities with other brain regions), must be able to select a behavior to be executed, and must be capable of providing evaluative feedback about the success of that behavior. Given that Psychology informs us that reinforcers, both positive and negative, are stimuli or consequences that increase the probability that the immediately antecedent behavior will be repeated and that reinforcer strength or viability is modulated by the organism's past experience with the reinforcer, its affect, and even the state of its muscles (e.g., eyes open or closed); it is the case that any neural system that supports reinforcement learning must also be sensitive to these same considerations. Once learning is established, such a neural system must finally be able to maintain continued response expression and prevent response drift. In this report, we examine both historical and recent evidence that the cerebellum satisfies all of these requirements. While we report evidence from a variety of learning paradigms, the majority of our discussion will focus on classical conditioning of the rabbit eye blink response as an ideal model system for the study of reinforcement and reinforcement learning.
Pointwise probability reinforcements for robust statistical inference.
Frénay, Benoît; Verleysen, Michel
2014-02-01
Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation. Copyright © 2013 Elsevier Ltd. All rights reserved.
Myers, Catherine E.; Moustafa, Ahmed A.; Sheynin, Jony; VanMeenen, Kirsten M.; Gilbertson, Mark W.; Orr, Scott P.; Beck, Kevin D.; Pang, Kevin C. H.; Servatius, Richard J.
2013-01-01
Post-traumatic stress disorder (PTSD) symptoms include behavioral avoidance which is acquired and tends to increase with time. This avoidance may represent a general learning bias; indeed, individuals with PTSD are often faster than controls on acquiring conditioned responses based on physiologically-aversive feedback. However, it is not clear whether this learning bias extends to cognitive feedback, or to learning from both reward and punishment. Here, male veterans with self-reported current, severe PTSD symptoms (PTSS group) or with few or no PTSD symptoms (control group) completed a probabilistic classification task that included both reward-based and punishment-based trials, where feedback could take the form of reward, punishment, or an ambiguous “no-feedback” outcome that could signal either successful avoidance of punishment or failure to obtain reward. The PTSS group outperformed the control group in total points obtained; the PTSS group specifically performed better than the control group on reward-based trials, with no difference on punishment-based trials. To better understand possible mechanisms underlying observed performance, we used a reinforcement learning model of the task, and applied maximum likelihood estimation techniques to derive estimated parameters describing individual participants’ behavior. Estimations of the reinforcement value of the no-feedback outcome were significantly greater in the control group than the PTSS group, suggesting that the control group was more likely to value this outcome as positively reinforcing (i.e., signaling successful avoidance of punishment). This is consistent with the control group’s generally poorer performance on reward trials, where reward feedback was to be obtained in preference to the no-feedback outcome. Differences in the interpretation of ambiguous feedback may contribute to the facilitated reinforcement learning often observed in PTSD patients, and may in turn provide new insight into how pathological behaviors are acquired and maintained in PTSD. PMID:24015254
Towards a genetics-based adaptive agent to support flight testing
NASA Astrophysics Data System (ADS)
Cribbs, Henry Brown, III
Although the benefits of aircraft simulation have been known since the late 1960s, simulation almost always entails interaction with a human test pilot. This "pilot-in-the-loop" simulation process provides useful evaluative information to the aircraft designer and provides a training tool to the pilot. Emulation of a pilot during the early phases of the aircraft design process might provide designers a useful evaluative tool. Machine learning might emulate a pilot in a simulated aircraft/cockpit setting. Preliminary work in the application of machine learning techniques, such as reinforcement learning, to aircraft maneuvering have shown promise. These studies used simplified interfaces between machine learning agent and the aircraft simulation. The simulations employed low order equivalent system models. High-fidelity aircraft simulations exist, such as the simulations developed by NASA at its Dryden Flight Research Center. To expand the applicational domain of reinforcement learning to aircraft designs, this study presents a series of experiments that examine a reinforcement learning agent in the role of test pilot. The NASA X-31 and F-106 high-fidelity simulations provide realistic aircraft for the agent to maneuver. The approach of the study is to examine an agent possessing a genetic-based, artificial neural network to approximate long-term, expected cost (Bellman value) in a basic maneuvering task. The experiments evaluate different learning methods based on a common feedback function and an identical task. The learning methods evaluated are: Q-learning, Q(lambda)-learning, SARSA learning, and SARSA(lambda) learning. Experimental results indicate that, while prediction error remain quite high, similar, repeatable behaviors occur in both aircraft. Similar behavior exhibits portability of the agent between aircraft with different handling qualities (dynamics). Besides the adaptive behavior aspects of the study, the genetic algorithm used in the agent is shown to play an additive role in the shaping of the artificial neural network to the prediction task.
The role of GABAB receptors in human reinforcement learning.
Ort, Andres; Kometer, Michael; Rohde, Judith; Seifritz, Erich; Vollenweider, Franz X
2014-10-01
Behavioral evidence from human studies suggests that the γ-aminobutyric acid type B receptor (GABAB receptor) agonist baclofen modulates reinforcement learning and reduces craving in patients with addiction spectrum disorders. However, in contrast to the well established role of dopamine in reinforcement learning, the mechanisms by which the GABAB receptor influences reinforcement learning in humans remain completely unknown. To further elucidate this issue, a cross-over, double-blind, placebo-controlled study was performed in healthy human subjects (N=15) to test the effects of baclofen (20 and 50mg p.o.) on probabilistic reinforcement learning. Outcomes were the feedback-induced P2 component of the event-related potential, the feedback-related negativity, and the P300 component of the event-related potential. Baclofen produced a reduction of P2 amplitude over the course of the experiment, but did not modulate the feedback-related negativity. Furthermore, there was a trend towards increased learning after baclofen administration relative to placebo over the course of the experiment. The present results extend previous theories of reinforcement learning, which focus on the importance of mesolimbic dopamine signaling, and indicate that stimulation of cortical GABAB receptors in a fronto-parietal network leads to better attentional allocation in reinforcement learning. This observation is a first step in our understanding of how baclofen may improve reinforcement learning in healthy subjects. Further studies with bigger sample sizes are needed to corroborate this conclusion and furthermore, test this effect in patients with addiction spectrum disorder. Copyright © 2014 Elsevier B.V. and ECNP. All rights reserved.
Effects of dopamine on reinforcement learning and consolidation in Parkinson’s disease
Grogan, John P; Tsivos, Demitra; Smith, Laura; Knight, Brogan E; Bogacz, Rafal; Whone, Alan; Coulthard, Elizabeth J
2017-01-01
Emerging evidence suggests that dopamine may modulate learning and memory with important implications for understanding the neurobiology of memory and future therapeutic targeting. An influential hypothesis posits that dopamine biases reinforcement learning. More recent data also suggest an influence during both consolidation and retrieval. Eighteen Parkinson’s disease patients learned through feedback ON or OFF medication, with memory tested 24 hr later ON or OFF medication (4 conditions, within-subjects design with matched healthy control group). Patients OFF medication during learning decreased in memory accuracy over the following 24 hr. In contrast to previous studies, however, dopaminergic medication during learning and testing did not affect expression of positive or negative reinforcement. Two further experiments were run without the 24 hr delay, but they too failed to reproduce effects of dopaminergic medication on reinforcement learning. While supportive of a dopaminergic role in consolidation, this study failed to replicate previous findings on reinforcement learning. DOI: http://dx.doi.org/10.7554/eLife.26801.001 PMID:28691905
ERIC Educational Resources Information Center
Dorrego, Maria Elena
This discussion of programed instruction begins with the fundamental psychological aspects and learning theories behind this teaching method. Negative and positive reinforcement, conditioning, and their relationship to programed instruction are considered. Different types of programs, both linear and branching, are discussed; criticism of the…
ERIC Educational Resources Information Center
Dade County Public Schools, Miami, FL.
The essential elements of grammar required to write business letters, memorandums, and reports are covered in this quinmester course. The course consists of a complete grammar review and the learning of proofreading skills for students in the Cooperative Business Education program in Dade County High Schools. Instruction techniques include group…
Experiments on Adaptive Techniques for Host-Based Intrusion Detection
DOE Office of Scientific and Technical Information (OSTI.GOV)
DRAELOS, TIMOTHY J.; COLLINS, MICHAEL J.; DUGGAN, DAVID P.
2001-09-01
This research explores four experiments of adaptive host-based intrusion detection (ID) techniques in an attempt to develop systems that can detect novel exploits. The technique considered to have the most potential is adaptive critic designs (ACDs) because of their utilization of reinforcement learning, which allows learning exploits that are difficult to pinpoint in sensor data. Preliminary results of ID using an ACD, an Elman recurrent neural network, and a statistical anomaly detection technique demonstrate an ability to learn to distinguish between clean and exploit data. We used the Solaris Basic Security Module (BSM) as a data source and performed considerablemore » preprocessing on the raw data. A detection approach called generalized signature-based ID is recommended as a middle ground between signature-based ID, which has an inability to detect novel exploits, and anomaly detection, which detects too many events including events that are not exploits. The primary results of the ID experiments demonstrate the use of custom data for generalized signature-based intrusion detection and the ability of neural network-based systems to learn in this application environment.« less
Fear of losing money? Aversive conditioning with secondary reinforcers.
Delgado, M R; Labouliere, C D; Phelps, E A
2006-12-01
Money is a secondary reinforcer that acquires its value through social communication and interaction. In everyday human behavior and laboratory studies, money has been shown to influence appetitive or reward learning. It is unclear, however, if money has a similar impact on aversive learning. The goal of this study was to investigate the efficacy of money in aversive learning, comparing it with primary reinforcers that are traditionally used in fear conditioning paradigms. A series of experiments were conducted in which participants initially played a gambling game that led to a monetary gain. They were then presented with an aversive conditioning paradigm, with either shock (primary reinforcer) or loss of money (secondary reinforcer) as the unconditioned stimulus. Skin conductance responses and subjective ratings indicated that potential monetary loss modulated the conditioned response. Depending on the presentation context, the secondary reinforcer was as effective as the primary reinforcer during aversive conditioning. These results suggest that stimuli that acquire reinforcing properties through social communication and interaction, such as money, can effectively influence aversive learning.
Reinforcement learning and Tourette syndrome.
Palminteri, Stefano; Pessiglione, Mathias
2013-01-01
In this chapter, we report the first experimental explorations of reinforcement learning in Tourette syndrome, realized by our team in the last few years. This report will be preceded by an introduction aimed to provide the reader with the state of the art of the knowledge concerning the neural bases of reinforcement learning at the moment of these studies and the scientific rationale beyond them. In short, reinforcement learning is learning by trial and error to maximize rewards and minimize punishments. This decision-making and learning process implicates the dopaminergic system projecting to the frontal cortex-basal ganglia circuits. A large body of evidence suggests that the dysfunction of the same neural systems is implicated in the pathophysiology of Tourette syndrome. Our results show that Tourette condition, as well as the most common pharmacological treatments (dopamine antagonists), affects reinforcement learning performance in these patients. Specifically, the results suggest a deficit in negative reinforcement learning, possibly underpinned by a functional hyperdopaminergia, which could explain the persistence of tics, despite their evident inadaptive (negative) value. This idea, together with the implications of these results in Tourette therapy and the future perspectives, is discussed in Section 4 of this chapter. © 2013 Elsevier Inc. All rights reserved.
Intelligence moderates reinforcement learning: a mini-review of the neural evidence
2014-01-01
Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. PMID:25185818
Intelligence moderates reinforcement learning: a mini-review of the neural evidence.
Chen, Chong
2015-06-01
Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. Copyright © 2015 the American Physiological Society.
Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold
2014-12-01
In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.
Switching Reinforcement Learning for Continuous Action Space
NASA Astrophysics Data System (ADS)
Nagayoshi, Masato; Murao, Hajime; Tamaki, Hisashi
Reinforcement Learning (RL) attracts much attention as a technique of realizing computational intelligence such as adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. This difficulty includes a problem of designing a suitable action space of an agent, i.e., satisfying two requirements in trade-off: (i) to keep the characteristics (or structure) of an original search space as much as possible in order to seek strategies that lie close to the optimal, and (ii) to reduce the search space as much as possible in order to expedite the learning process. In order to design a suitable action space adaptively, we propose switching RL model to mimic a process of an infant's motor development in which gross motor skills develop before fine motor skills. Then, a method for switching controllers is constructed by introducing and referring to the “entropy”. Further, through computational experiments by using robot navigation problems with one and two-dimensional continuous action space, the validity of the proposed method has been confirmed.
Molina, Michael; Plaza, Victoria; Fuentes, Luis J.; Estévez, Angeles F.
2015-01-01
Memory for medical recommendations is a prerequisite for good adherence to treatment, and therefore to ameliorate the negative effects of the disease, a problem that mainly affects people with memory deficits. We conducted a simulated study to test the utility of a procedure (the differential outcomes procedure, DOP) that may improve adherence to treatment by increasing the patient’s learning and retention of medical recommendations regarding medication. The DOP requires the structure of a conditional discriminative learning task in which correct choice responses to specific stimulus–stimulus associations are reinforced with a particular reinforcer or outcome. In two experiments, participants had to learn and retain in their memory the pills that were associated with particular disorders. To assess whether the DOP improved long-term retention of the learned disorder/pill associations, participants were asked to perform two recognition memory tests, 1 h and 1 week after completing the learning phase. The results showed that compared with the standard non-differential outcomes procedure, the DOP produced better learning and long-term retention of the previously learned associations. These findings suggest that the DOP can be used as a useful complementary technique in intervention programs targeted at increasing adherence to clinical recommendations. PMID:26913010
Prespeech motor learning in a neural network using reinforcement☆
Warlaumont, Anne S.; Westermann, Gert; Buder, Eugene H.; Oller, D. Kimbrough
2012-01-01
Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one’s language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the differ-ent conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network’s post learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network’s post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model’s post-learning productions were more likely to resemble the English vowels and vice versa. PMID:23275137
Tian, Hongfang; Yang, Chao; Tang, Jie; Qin, Qiuguo; Zhao, Mingwen; Zhao, Jiping
2015-07-01
The book Acupuncture-moxibustion Clinical Skills Training is one of "Twelfth Five-Year Plan" in novative teaching materials, which is published by People's Medical Publishing House. Through learning the first half of the book commonly used needling and moxibustion techniques, it is realized that the selection of book content is reasonable and much attention is paid to needling and moxibustion techniques; the chapter arrangement is well-organized, and the form is novel, which is concise and intuitive; for every technique, great attention is paid to standardize the manipulation procedure and clarify the technique key, simultaneously the safety of acupuncture and moxibustion is also emphasized. The characteristics of the book, including innovativeness, practicability, are highlighted, and it greatly helps to improve students' clinical skills and examination ability.
Reinforcement learning in complementarity game and population dynamics
NASA Astrophysics Data System (ADS)
Jost, Jürgen; Li, Wei
2014-02-01
We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005), 10.1016/j.physa.2004.07.005] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.
The prefrontal cortex and hybrid learning during iterative competitive games.
Abe, Hiroshi; Seo, Hyojung; Lee, Daeyeol
2011-12-01
Behavioral changes driven by reinforcement and punishment are referred to as simple or model-free reinforcement learning. Animals can also change their behaviors by observing events that are neither appetitive nor aversive when these events provide new information about payoffs available from alternative actions. This is an example of model-based reinforcement learning and can be accomplished by incorporating hypothetical reward signals into the value functions for specific actions. Recent neuroimaging and single-neuron recording studies showed that the prefrontal cortex and the striatum are involved not only in reinforcement and punishment, but also in model-based reinforcement learning. We found evidence for both types of learning, and hence hybrid learning, in monkeys during simulated competitive games. In addition, in both the dorsolateral prefrontal cortex and orbitofrontal cortex, individual neurons heterogeneously encoded signals related to actual and hypothetical outcomes from specific actions, suggesting that both areas might contribute to hybrid learning. © 2011 New York Academy of Sciences.
Deserno, Lorenz; Boehme, Rebecca; Heinz, Andreas; Schlagenhauf, Florian
2013-01-01
Abnormalities in reinforcement learning are a key finding in schizophrenia and have been proposed to be linked to elevated levels of dopamine neurotransmission. Behavioral deficits in reinforcement learning and their neural correlates may contribute to the formation of clinical characteristics of schizophrenia. The ability to form predictions about future outcomes is fundamental for environmental interactions and depends on neuronal teaching signals, like reward prediction errors. While aberrant prediction errors, that encode non-salient events as surprising, have been proposed to contribute to the formation of positive symptoms, a failure to build neural representations of decision values may result in negative symptoms. Here, we review behavioral and neuroimaging research in schizophrenia and focus on studies that implemented reinforcement learning models. In addition, we discuss studies that combined reinforcement learning with measures of dopamine. Thereby, we suggest how reinforcement learning abnormalities in schizophrenia may contribute to the formation of psychotic symptoms and may interact with cognitive deficits. These ideas point toward an interplay of more rigid versus flexible control over reinforcement learning. Pronounced deficits in the flexible or model-based domain may allow for a detailed characterization of well-established cognitive deficits in schizophrenia patients based on computational models of learning. Finally, we propose a framework based on the potentially crucial contribution of dopamine to dysfunctional reinforcement learning on the level of neural networks. Future research may strongly benefit from computational modeling but also requires further methodological improvement for clinical group studies. These research tools may help to improve our understanding of disease-specific mechanisms and may help to identify clinically relevant subgroups of the heterogeneous entity schizophrenia. PMID:24391603
Generalization of value in reinforcement learning by humans.
Wimmer, G Elliott; Daw, Nathaniel D; Shohamy, Daphna
2012-04-01
Research in decision-making has focused on the role of dopamine and its striatal targets in guiding choices via learned stimulus-reward or stimulus-response associations, behavior that is well described by reinforcement learning theories. However, basic reinforcement learning is relatively limited in scope and does not explain how learning about stimulus regularities or relations may guide decision-making. A candidate mechanism for this type of learning comes from the domain of memory, which has highlighted a role for the hippocampus in learning of stimulus-stimulus relations, typically dissociated from the role of the striatum in stimulus-response learning. Here, we used functional magnetic resonance imaging and computational model-based analyses to examine the joint contributions of these mechanisms to reinforcement learning. Humans performed a reinforcement learning task with added relational structure, modeled after tasks used to isolate hippocampal contributions to memory. On each trial participants chose one of four options, but the reward probabilities for pairs of options were correlated across trials. This (uninstructed) relationship between pairs of options potentially enabled an observer to learn about option values based on experience with the other options and to generalize across them. We observed blood oxygen level-dependent (BOLD) activity related to learning in the striatum and also in the hippocampus. By comparing a basic reinforcement learning model to one augmented to allow feedback to generalize between correlated options, we tested whether choice behavior and BOLD activity were influenced by the opportunity to generalize across correlated options. Although such generalization goes beyond standard computational accounts of reinforcement learning and striatal BOLD, both choices and striatal BOLD activity were better explained by the augmented model. Consistent with the hypothesized role for the hippocampus in this generalization, functional connectivity between the ventral striatum and hippocampus was modulated, across participants, by the ability of the augmented model to capture participants' choice. Our results thus point toward an interactive model in which striatal reinforcement learning systems may employ relational representations typically associated with the hippocampus. © 2012 The Authors. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.
Mitchell, D G V; Fine, C; Richell, R A; Newman, C; Lumsden, J; Blair, K S; Blair, R J R
2006-05-01
Previous work has shown that individuals with psychopathy are impaired on some forms of associative learning, particularly stimulus-reinforcement learning (Blair et al., 2004; Newman & Kosson, 1986). Animal work suggests that the acquisition of stimulus-reinforcement associations requires the amygdala (Baxter & Murray, 2002). Individuals with psychopathy also show impoverished reversal learning (Mitchell, Colledge, Leonard, & Blair, 2002). Reversal learning is supported by the ventrolateral and orbitofrontal cortex (Rolls, 2004). In this paper we present experiments investigating stimulus-reinforcement learning and relearning in patients with lesions of the orbitofrontal cortex or amygdala, and individuals with developmental psychopathy without known trauma. The results are interpreted with reference to current neurocognitive models of stimulus-reinforcement learning, relearning, and developmental psychopathy. Copyright (c) 2006 APA, all rights reserved.
Model-based reinforcement learning with dimension reduction.
Tangkaratt, Voot; Morimoto, Jun; Sugiyama, Masashi
2016-12-01
The goal of reinforcement learning is to learn an optimal policy which controls an agent to acquire the maximum cumulative reward. The model-based reinforcement learning approach learns a transition model of the environment from data, and then derives the optimal policy using the transition model. However, learning an accurate transition model in high-dimensional environments requires a large amount of data which is difficult to obtain. To overcome this difficulty, in this paper, we propose to combine model-based reinforcement learning with the recently developed least-squares conditional entropy (LSCE) method, which simultaneously performs transition model estimation and dimension reduction. We also further extend the proposed method to imitation learning scenarios. The experimental results show that policy search combined with LSCE performs well for high-dimensional control tasks including real humanoid robot control. Copyright © 2016 Elsevier Ltd. All rights reserved.
Neural mechanisms of cue-approach training
Bakkour, Akram; Lewis-Peacock, Jarrod A.; Poldrack, Russell A.; Schonberg, Tom
2016-01-01
Biasing choices may prove a useful way to implement behavior change. Previous work has shown that a simple training task (the cue-approach task), which does not rely on external reinforcement, can robustly influence choice behavior by biasing choice toward items that were targeted during training. In the current study, we replicate previous behavioral findings and explore the neural mechanisms underlying the shift in preferences following cue-approach training. Given recent successes in the development and application of machine learning techniques to task-based fMRI data, which have advanced understanding of the neural substrates of cognition, we sought to leverage the power of these techniques to better understand neural changes during cue-approach training that subsequently led to a shift in choice behavior. Contrary to our expectations, we found that machine learning techniques applied to fMRI data during non-reinforced training were unsuccessful in elucidating the neural mechanism underlying the behavioral effect. However, univariate analyses during training revealed that the relationship between BOLD and choices for Go items increases as training progresses compared to choices of NoGo items primarily in lateral prefrontal cortical areas. This new imaging finding suggests that preferences are shifted via differential engagement of task control networks that interact with value networks during cue-approach training. PMID:27677231
Murakoshi, Kazushi; Mizuno, Junya
2004-11-01
In order to rapidly follow unexpected environmental changes, we propose a parameter control method in reinforcement learning that changes each of learning parameters in appropriate directions. We determine each appropriate direction on the basis of relationships between behaviors and neuromodulators by considering an emergency as a key word. Computer experiments show that the agents using our proposed method could rapidly respond to unexpected environmental changes, not depending on either two reinforcement learning algorithms (Q-learning and actor-critic (AC) architecture) or two learning problems (discontinuous and continuous state-action problems).
Effects of Reinforcement Schedule on Facilitation of Operant Extinction by Chlordiazepoxide
ERIC Educational Resources Information Center
Leslie, Julian C.; Shaw, David; Gregg, Gillian; McCormick, Nichola; Reynolds, David S.; Dawson, Gerard R.
2005-01-01
Learning and memory are central topics in behavioral neuroscience, and inbred mice strains are widely investigated. However, operant conditioning techniques are not as extensively used in this field as they should be, given the effectiveness of the methodology of the experimental analysis of behavior. In the present study, male C57Bl/6 mice,…
González, Felisa; Garcia-Burgos, David; Hall, Geoffrey
2014-09-01
In Experiment 1 rats were given training in which a mixture of two flavors was paired with sucrose. This established a substantial preference for each of the flavors; however, when rats were given prior experience with just one of the flavors paired with sucrose, training with the compound produced only a weak preference for the other - an example of the blocking effect, well known in other associative learning paradigms. Both the palatable taste of sucrose and its nutrient properties contribute to its ability to reinforce preference acquisition. The role of these two forms of learning was examined in two further experiments in which the reinforcer used was fructose (which is considered to support preference learning because it is palatable but not through its nutrient properties) or maltodextrin (thought to support preference learning by way of its nutrient properties). In neither case was blocking observed. At the theoretical level, this outcome constitutes a challenge to the attempt to explain flavor-preference learning in terms of the standard principles of associative learning theory. Its implication at the level of application is that the potential of the blocking procedure as a technique for preventing the development of unwanted flavor preferences may be limited. Copyright © 2014 Elsevier Ltd. All rights reserved.
Partial Planning Reinforcement Learning
2012-08-31
Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 15. SUBJECT TERMS Reinforcement Learning, Bayesian Optimization, Active ... Learning , Action Model Learning, Decision Theoretic Assistance Prasad Tadepalli, Alan Fern Oregon State University Office of Sponsored Programs Oregon State
Ghorbani, Ahmad; Ghazvini, Kiarash
2016-03-01
Many studies have emphasized the incorporation of active learning into classrooms to reinforce didactic lectures for physiology courses. This work aimed to determine if presenting classic papers during didactic lectures improves the learning of physiology among undergraduate students. Twenty-two students of health information technology were randomly divided into the following two groups: 1) didactic lecture only (control group) and 2) didactic lecture plus paper presentation breaks (DLPP group). In the control group, main topics of gastrointestinal and endocrine physiology were taught using only the didactic lecture technique. In the DLPP group, some topics were presented by the didactic lecture method (similar to the control group) and some topics were taught by the DLPP technique (first, concepts were covered briefly in a didactic format and then reinforced with presentation of a related classic paper). The combination of didactic lecture and paper breaks significantly improved learning so that students in the DLPP group showed higher scores on related topics compared with those in the control group (P < 0.001). Comparison of the scores of topics taught by only the didactic lecture and those using both the didactic lecture and paper breaks showed significant improvement only in the DLPP group (P < 0.001). Data obtained from the final exam showed that in the DLPP group, the mean score of the topics taught by the combination of didactic lecture and paper breaks was significantly higher than those taught by only didactic lecture (P < 0.05). In conclusion, the combination of paper presentation breaks and didactic lectures improves the learning of physiology. Copyright © 2016 The American Physiological Society.
Reinforcement learning in supply chains.
Valluri, Annapurna; North, Michael J; Macal, Charles M
2009-10-01
Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice.
Reinforcement learning in scheduling
NASA Technical Reports Server (NTRS)
Dietterich, Tom G.; Ok, Dokyeong; Zhang, Wei; Tadepalli, Prasad
1994-01-01
The goal of this research is to apply reinforcement learning methods to real-world problems like scheduling. In this preliminary paper, we show that learning to solve scheduling problems such as the Space Shuttle Payload Processing and the Automatic Guided Vehicle (AGV) scheduling can be usefully studied in the reinforcement learning framework. We discuss some of the special challenges posed by the scheduling domain to these methods and propose some possible solutions we plan to implement.
Neural Basis of Reinforcement Learning and Decision Making
Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan
2012-01-01
Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain. PMID:22462543
Effect of reinforcement learning on coordination of multiangent systems
NASA Astrophysics Data System (ADS)
Bukkapatnam, Satish T. S.; Gao, Greg
2000-12-01
For effective coordination of distributed environments involving multiagent systems, learning ability of each agent in the environment plays a crucial role. In this paper, we develop a simple group learning method based on reinforcement, and study its effect on coordination through application to a supply chain procurement scenario involving a computer manufacturer. Here, all parties are represented by self-interested, autonomous agents, each capable of performing specific simple tasks. They negotiate with each other to perform complex tasks and thus coordinate supply chain procurement. Reinforcement learning is intended to enable each agent to reach a best negotiable price within a shortest possible time. Our simulations of the application scenario under different learning strategies reveals the positive effects of reinforcement learning on an agent's as well as the system's performance.
Otto, A Ross; Gershman, Samuel J; Markman, Arthur B; Daw, Nathaniel D
2013-05-01
A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls behavior-and under what circumstances-are still unclear. Following the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrated that having human decision makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement-learning strategy. Further, we showed that, across trials, people negotiate the trade-off between the two systems dynamically as a function of concurrent executive-function demands, and people's choice latencies reflect the computational expenses of the strategy they employ. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.
Otto, A. Ross; Gershman, Samuel J.; Markman, Arthur B.; Daw, Nathaniel D.
2013-01-01
A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. Along these lines, a flexible but computationally expensive model-based reinforcement learning system has been contrasted with a less flexible but more efficient model-free reinforcement learning system. The factors governing which system controls behavior—and under what circumstances—are still unclear. Based on the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrate that having human decision-makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement learning strategy. Further, we show that across trials, people negotiate this tradeoff dynamically as a function of concurrent executive function demands and their choice latencies reflect the computational expenses of the strategy employed. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources. PMID:23558545
Fernandes, Henrique; Zhang, Hai; Figueiredo, Alisson; Malheiros, Fernando; Ignacio, Luis Henrique; Sfarra, Stefano; Ibarra-Castanedo, Clemente; Guimaraes, Gilmar; Maldague, Xavier
2018-01-19
The use of fiber reinforced materials such as randomly-oriented strands has grown in recent years, especially for manufacturing of aerospace composite structures. This growth is mainly due to their advantageous properties: they are lighter and more resistant to corrosion when compared to metals and are more easily shaped than continuous fiber composites. The resistance and stiffness of these materials are directly related to their fiber orientation. Thus, efficient approaches to assess their fiber orientation are in demand. In this paper, a non-destructive evaluation method is applied to assess the fiber orientation on laminates reinforced with randomly-oriented strands. More specifically, a method called pulsed thermal ellipsometry combined with an artificial neural network, a machine learning technique, is used in order to estimate the fiber orientation on the surface of inspected parts. Results showed that the method can be potentially used to inspect large areas with good accuracy and speed.
Maldague, Xavier
2018-01-01
The use of fiber reinforced materials such as randomly-oriented strands has grown in recent years, especially for manufacturing of aerospace composite structures. This growth is mainly due to their advantageous properties: they are lighter and more resistant to corrosion when compared to metals and are more easily shaped than continuous fiber composites. The resistance and stiffness of these materials are directly related to their fiber orientation. Thus, efficient approaches to assess their fiber orientation are in demand. In this paper, a non-destructive evaluation method is applied to assess the fiber orientation on laminates reinforced with randomly-oriented strands. More specifically, a method called pulsed thermal ellipsometry combined with an artificial neural network, a machine learning technique, is used in order to estimate the fiber orientation on the surface of inspected parts. Results showed that the method can be potentially used to inspect large areas with good accuracy and speed. PMID:29351240
Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.
Ren, Zhipeng; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Zhipeng Ren; Daoyi Dong; Huaxiong Li; Chunlin Chen; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Ren, Zhipeng
2018-06-01
In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.
Using Fuzzy Logic for Performance Evaluation in Reinforcement Learning
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Khedkar, Pratap S.
1992-01-01
Current reinforcement learning algorithms require long training periods which generally limit their applicability to small size problems. A new architecture is described which uses fuzzy rules to initialize its two neural networks: a neural network for performance evaluation and another for action selection. This architecture is applied to control of dynamic systems and it is demonstrated that it is possible to start with an approximate prior knowledge and learn to refine it through experiments using reinforcement learning.
Reinforcement learning in multidimensional environments relies on attention mechanisms.
Niv, Yael; Daniel, Reka; Geana, Andra; Gershman, Samuel J; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C
2015-05-27
In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this "representation learning" process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the "curse of dimensionality" in reinforcement learning. Copyright © 2015 the authors 0270-6474/15/358145-13$15.00/0.
Changes in corticostriatal connectivity during reinforcement learning in humans.
Horga, Guillermo; Maia, Tiago V; Marsh, Rachel; Hao, Xuejun; Xu, Dongrong; Duan, Yunsuo; Tau, Gregory Z; Graniello, Barbara; Wang, Zhishun; Kangarlu, Alayar; Martinez, Diana; Packard, Mark G; Peterson, Bradley S
2015-02-01
Many computational models assume that reinforcement learning relies on changes in synaptic efficacy between cortical regions representing stimuli and striatal regions involved in response selection, but this assumption has thus far lacked empirical support in humans. We recorded hemodynamic signals with fMRI while participants navigated a virtual maze to find hidden rewards. We fitted a reinforcement-learning algorithm to participants' choice behavior and evaluated the neural activity and the changes in functional connectivity related to trial-by-trial learning variables. Activity in the posterior putamen during choice periods increased progressively during learning. Furthermore, the functional connections between the sensorimotor cortex and the posterior putamen strengthened progressively as participants learned the task. These changes in corticostriatal connectivity differentiated participants who learned the task from those who did not. These findings provide a direct link between changes in corticostriatal connectivity and learning, thereby supporting a central assumption common to several computational models of reinforcement learning. © 2014 Wiley Periodicals, Inc.
Hisey, Erin; Kearney, Matthew Gene; Mooney, Richard
2018-04-01
The complex skills underlying verbal and musical expression can be learned without external punishment or reward, indicating their learning is internally guided. The neural mechanisms that mediate internally guided learning are poorly understood, but a circuit comprising dopamine-releasing neurons in the midbrain ventral tegmental area (VTA) and their targets in the basal ganglia are important to externally reinforced learning. Juvenile zebra finches copy a tutor song in a process that is internally guided and, in adulthood, can learn to modify the fundamental frequency (pitch) of a target syllable in response to external reinforcement with white noise. Here we combined intersectional genetic ablation of VTA neurons, reversible blockade of dopamine receptors in the basal ganglia, and singing-triggered optogenetic stimulation of VTA terminals to establish that a common VTA-basal ganglia circuit enables internally guided song copying and externally reinforced syllable pitch learning.
Awata, Hiroko; Watanabe, Takahito; Hamanaka, Yoshitaka; Mito, Taro; Noji, Sumihare; Mizunami, Makoto
2015-11-02
Elucidation of reinforcement mechanisms in associative learning is an important subject in neuroscience. In mammals, dopamine neurons are thought to play critical roles in mediating both appetitive and aversive reinforcement. Our pharmacological studies suggested that octopamine and dopamine neurons mediate reward and punishment, respectively, in crickets, but recent studies in fruit-flies concluded that dopamine neurons mediates both reward and punishment, via the type 1 dopamine receptor Dop1. To resolve the discrepancy between studies in different insect species, we produced Dop1 knockout crickets using the CRISPR/Cas9 system and found that they are defective in aversive learning with sodium chloride punishment but not appetitive learning with water or sucrose reward. The results suggest that dopamine and octopamine neurons mediate aversive and appetitive reinforcement, respectively, in crickets. We suggest unexpected diversity in neurotransmitters mediating appetitive reinforcement between crickets and fruit-flies, although the neurotransmitter mediating aversive reinforcement is conserved. This study demonstrates usefulness of the CRISPR/Cas9 system for producing knockout animals for the study of learning and memory.
Social Cognition as Reinforcement Learning: Feedback Modulates Emotion Inference.
Zaki, Jamil; Kallman, Seth; Wimmer, G Elliott; Ochsner, Kevin; Shohamy, Daphna
2016-09-01
Neuroscientific studies of social cognition typically employ paradigms in which perceivers draw single-shot inferences about the internal states of strangers. Real-world social inference features much different parameters: People often encounter and learn about particular social targets (e.g., friends) over time and receive feedback about whether their inferences are correct or incorrect. Here, we examined this process and, more broadly, the intersection between social cognition and reinforcement learning. Perceivers were scanned using fMRI while repeatedly encountering three social targets who produced conflicting visual and verbal emotional cues. Perceivers guessed how targets felt and received feedback about whether they had guessed correctly. Visual cues reliably predicted one target's emotion, verbal cues predicted a second target's emotion, and neither reliably predicted the third target's emotion. Perceivers successfully used this information to update their judgments over time. Furthermore, trial-by-trial learning signals-estimated using two reinforcement learning models-tracked activity in ventral striatum and ventromedial pFC, structures associated with reinforcement learning, and regions associated with updating social impressions, including TPJ. These data suggest that learning about others' emotions, like other forms of feedback learning, relies on domain-general reinforcement mechanisms as well as domain-specific social information processing.
Human-level control through deep reinforcement learning.
Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis
2015-02-26
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Human-level control through deep reinforcement learning
NASA Astrophysics Data System (ADS)
Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis
2015-02-01
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Learning New Basic Movements for Robotics
NASA Astrophysics Data System (ADS)
Kober, Jens; Peters, Jan
Obtaining novel skills is one of the most important problems in robotics. Machine learning techniques may be a promising approach for automatic and autonomous acquisition of movement policies. However, this requires both an appropriate policy representation and suitable learning algorithms. Employing the most recent form of the dynamical systems motor primitives originally introduced by Ijspeert et al. [1], we show how both discrete and rhythmic tasks can be learned using a concerted approach of both imitation and reinforcement learning, and present our current best performing learning algorithms. Finally, we show that it is possible to include a start-up phase in rhythmic primitives. We apply our approach to two elementary movements, i.e., Ball-in-a-Cup and Ball-Paddling, which can be learned on a real Barrett WAM robot arm at a pace similar to human learning.
Probabilistic Reinforcement Learning in Adults with Autism Spectrum Disorders
Solomon, Marjorie; Smith, Anne C.; Frank, Michael J.; Ly, Stanford; Carter, Cameron S.
2017-01-01
Background Autism spectrum disorders (ASDs) can be conceptualized as disorders of learning, however there have been few experimental studies taking this perspective. Methods We examined the probabilistic reinforcement learning performance of 28 adults with ASDs and 30 typically developing adults on a task requiring learning relationships between three stimulus pairs consisting of Japanese characters with feedback that was valid with different probabilities (80%, 70%, and 60%). Both univariate and Bayesian state–space data analytic methods were employed. Hypotheses were based on the extant literature as well as on neurobiological and computational models of reinforcement learning. Results Both groups learned the task after training. However, there were group differences in early learning in the first task block where individuals with ASDs acquired the most frequently accurately reinforced stimulus pair (80%) comparably to typically developing individuals; exhibited poorer acquisition of the less frequently reinforced 70% pair as assessed by state–space learning curves; and outperformed typically developing individuals on the near chance (60%) pair. Individuals with ASDs also demonstrated deficits in using positive feedback to exploit rewarded choices. Conclusions Results support the contention that individuals with ASDs are slower learners. Based on neurobiology and on the results of computational modeling, one interpretation of this pattern of findings is that impairments are related to deficits in flexible updating of reinforcement history as mediated by the orbito-frontal cortex, with spared functioning of the basal ganglia. This hypothesis about the pathophysiology of learning in ASDs can be tested using functional magnetic resonance imaging. PMID:21425243
A framework for learning and planning against switching strategies in repeated games
NASA Astrophysics Data System (ADS)
Hernandez-Leal, Pablo; Munoz de Cote, Enrique; Sucar, L. Enrique
2014-04-01
Intelligent agents, human or artificial, often change their behaviour as they interact with other agents. For an agent to optimise its performance when interacting with such agents, it must be capable of detecting and adapting according to such changes. This work presents an approach on how to effectively deal with non-stationary switching opponents in a repeated game context. Our main contribution is a framework for online learning and planning against opponents that switch strategies. We present how two opponent modelling techniques work within the framework and prove the usefulness of the approach experimentally in the iterated prisoner's dilemma, when the opponent is modelled as an agent that switches between different strategies (e.g. TFT, Pavlov and Bully). The results of both models were compared against each other and against a state-of-the-art non-stationary reinforcement learning technique. Results reflect that our approach obtains competitive results without needing an offline training phase, as opposed to the state-of-the-art techniques.
Role of dopamine D2 receptors in human reinforcement learning.
Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W
2014-09-01
Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well.
Role of Dopamine D2 Receptors in Human Reinforcement Learning
Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W
2014-01-01
Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well. PMID:24713613
The role of punishment in the in-patient treatment of psychiatrically disturbed children.
Alderton, H R
1967-02-01
The role of punishment in the psychiatric in-patient treatment of nonpsychotic latency-age children with behaviourdisorders is discussed. Punishment is defined as the removal of previously existing positive reinforcers or the administration of aversive stimuli. Ways in which appropriate social behaviour may be acquired are briefly considered. These include reinforcement of desirable responses, non-reinforcement of undesirable responses, reinforcement of incompatible responses and imitative learning. The reported effects of punishment on behaviour are reviewed and the psychological functions necessary before punishment can have the intended effects considered. For seriously disturbed children punishment is ineffective as a treatment technique. It reinforces pathological perceptions of self and adults even if it successfully suppresses behaviour. The frame of reference of the seriously disturbed child contraindicates the removal of positive reinforcers and verbal as well as physical aversive stimuli. Controls and punishments must be clearly distinguished. Controls continue only as long as the behaviour towards which they are directed. They do not include the deliberate establishment of an unpleasant state by the adult as a result of particular behaviour. Control techniques such as removal from a group may be necessary but when possible should be avoided in favour of techniques less likely to be misinterpreted. Avoidance of punishment in treatment makes even more important explicit expectations and provision of realistic controls. Natural laws may result in unpleasant experiences as an unavoidable result of certain behaviour. By definition such results can never be imposed by the adult. Treatment considerations may necessitate that the child be protected from the results of his actions. Avoidance of punishment requires a higher staff/child ratio, more mature and better trained staff. Sometimes children have previously been deterred from serious community acting out only by punishment. Should the therapeutic endeavours outlined not prevent such behaviour, treatment in a closed setting without punishment is indicated, not the use of punishment in an open centre.
Reinforcement learning in computer vision
NASA Astrophysics Data System (ADS)
Bernstein, A. V.; Burnaev, E. V.
2018-04-01
Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computer vision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computer vision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computer vision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computer vision problems.
Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms
Daniel, Reka; Geana, Andra; Gershman, Samuel J.; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C.
2015-01-01
In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this “representation learning” process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the “curse of dimensionality” in reinforcement learning. PMID:26019331
Network congestion control algorithm based on Actor-Critic reinforcement learning model
NASA Astrophysics Data System (ADS)
Xu, Tao; Gong, Lina; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen
2018-04-01
Aiming at the network congestion control problem, a congestion control algorithm based on Actor-Critic reinforcement learning model is designed. Through the genetic algorithm in the congestion control strategy, the network congestion problems can be better found and prevented. According to Actor-Critic reinforcement learning, the simulation experiment of network congestion control algorithm is designed. The simulation experiments verify that the AQM controller can predict the dynamic characteristics of the network system. Moreover, the learning strategy is adopted to optimize the network performance, and the dropping probability of packets is adaptively adjusted so as to improve the network performance and avoid congestion. Based on the above finding, it is concluded that the network congestion control algorithm based on Actor-Critic reinforcement learning model can effectively avoid the occurrence of TCP network congestion.
From Recurrent Choice to Skill Learning: A Reinforcement-Learning Model
ERIC Educational Resources Information Center
Fu, Wai-Tat; Anderson, John R.
2006-01-01
The authors propose a reinforcement-learning mechanism as a model for recurrent choice and extend it to account for skill learning. The model was inspired by recent research in neurophysiological studies of the basal ganglia and provides an integrated explanation of recurrent choice behavior and skill learning. The behavior includes effects of…
Adolescent-specific patterns of behavior and neural activity during social reinforcement learning
Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, BJ
2014-01-01
Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents towards action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggests possible explanations for how peers may motivate adolescent behavior. PMID:24550063
Adolescent-specific patterns of behavior and neural activity during social reinforcement learning.
Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, B J
2014-06-01
Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.
Stochastic Reinforcement Benefits Skill Acquisition
ERIC Educational Resources Information Center
Dayan, Eran; Averbeck, Bruno B.; Richmond, Barry J.; Cohen, Leonardo G.
2014-01-01
Learning complex skills is driven by reinforcement, which facilitates both online within-session gains and retention of the acquired skills. Yet, in ecologically relevant situations, skills are often acquired when mapping between actions and rewarding outcomes is unknown to the learning agent, resulting in reinforcement schedules of a stochastic…
A reward optimization method based on action subrewards in hierarchical reinforcement learning.
Fu, Yuchen; Liu, Quan; Ling, Xionghong; Cui, Zhiming
2014-01-01
Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.
Multi-agent Reinforcement Learning Model for Effective Action Selection
NASA Astrophysics Data System (ADS)
Youk, Sang Jo; Lee, Bong Keun
Reinforcement learning is a sub area of machine learning concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward. In the case of multi-agent, especially, which state space and action space gets very enormous in compared to single agent, so it needs to take most effective measure available select the action strategy for effective reinforcement learning. This paper proposes a multi-agent reinforcement learning model based on fuzzy inference system in order to improve learning collect speed and select an effective action in multi-agent. This paper verifies an effective action select strategy through evaluation tests based on Robocop Keep away which is one of useful test-beds for multi-agent. Our proposed model can apply to evaluate efficiency of the various intelligent multi-agents and also can apply to strategy and tactics of robot soccer system.
Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models
Najnin, Shamima; Banerjee, Bonny
2018-01-01
Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The “novel words to novel objects” language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task. PMID:29441027
Reinforcement learning improves behaviour from evaluative feedback
NASA Astrophysics Data System (ADS)
Littman, Michael L.
2015-05-01
Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.
Reinforcement learning improves behaviour from evaluative feedback.
Littman, Michael L
2015-05-28
Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.
Knowledge-Based Reinforcement Learning for Data Mining
NASA Astrophysics Data System (ADS)
Kudenko, Daniel; Grzes, Marek
Data Mining is the process of extracting patterns from data. Two general avenues of research in the intersecting areas of agents and data mining can be distinguished. The first approach is concerned with mining an agent’s observation data in order to extract patterns, categorize environment states, and/or make predictions of future states. In this setting, data is normally available as a batch, and the agent’s actions and goals are often independent of the data mining task. The data collection is mainly considered as a side effect of the agent’s activities. Machine learning techniques applied in such situations fall into the class of supervised learning. In contrast, the second scenario occurs where an agent is actively performing the data mining, and is responsible for the data collection itself. For example, a mobile network agent is acquiring and processing data (where the acquisition may incur a certain cost), or a mobile sensor agent is moving in a (perhaps hostile) environment, collecting and processing sensor readings. In these settings, the tasks of the agent and the data mining are highly intertwined and interdependent (or even identical). Supervised learning is not a suitable technique for these cases. Reinforcement Learning (RL) enables an agent to learn from experience (in form of reward and punishment for explorative actions) and adapt to new situations, without a teacher. RL is an ideal learning technique for these data mining scenarios, because it fits the agent paradigm of continuous sensing and acting, and the RL agent is able to learn to make decisions on the sampling of the environment which provides the data. Nevertheless, RL still suffers from scalability problems, which have prevented its successful use in many complex real-world domains. The more complex the tasks, the longer it takes a reinforcement learning algorithm to converge to a good solution. For many real-world tasks, human expert knowledge is available. For example, human experts have developed heuristics that help them in planning and scheduling resources in their work place. However, this domain knowledge is often rough and incomplete. When the domain knowledge is used directly by an automated expert system, the solutions are often sub-optimal, due to the incompleteness of the knowledge, the uncertainty of environments, and the possibility to encounter unexpected situations. RL, on the other hand, can overcome the weaknesses of the heuristic domain knowledge and produce optimal solutions. In the talk we propose two techniques, which represent first steps in the area of knowledge-based RL (KBRL). The first technique [1] uses high-level STRIPS operator knowledge in reward shaping to focus the search for the optimal policy. Empirical results show that the plan-based reward shaping approach outperforms other RL techniques, including alternative manual and MDP-based reward shaping when it is used in its basic form. We showed that MDP-based reward shaping may fail and successful experiments with STRIPS-based shaping suggest modifications which can overcome encountered problems. The STRIPSbased method we propose allows expressing the same domain knowledge in a different way and the domain expert can choose whether to define an MDP or STRIPS planning task. We also evaluated the robustness of the proposed STRIPS-based technique to errors in the plan knowledge. In case that STRIPS knowledge is not available, we propose a second technique [2] that shapes the reward with hierarchical tile coding. Where the Q-function is represented with low-level tile coding, a V-function with coarser tile coding can be learned in parallel and used to approximate the potential for ground states. In the context of data mining, our KBRL approaches can also be used for any data collection task where the acquisition of data may incur considerable cost. In addition, observing the data collection agent in specific scenarios may lead to new insights into optimal data collection behaviour in the respective domains. In future work, we intend to demonstrate and evaluate our techniques on concrete real-world data mining applications.
Quantum-Enhanced Machine Learning
NASA Astrophysics Data System (ADS)
Dunjko, Vedran; Taylor, Jacob M.; Briegel, Hans J.
2016-09-01
The emerging field of quantum machine learning has the potential to substantially aid in the problems and scope of artificial intelligence. This is only enhanced by recent successes in the field of classical machine learning. In this work we propose an approach for the systematic treatment of machine learning, from the perspective of quantum information. Our approach is general and covers all three main branches of machine learning: supervised, unsupervised, and reinforcement learning. While quantum improvements in supervised and unsupervised learning have been reported, reinforcement learning has received much less attention. Within our approach, we tackle the problem of quantum enhancements in reinforcement learning as well, and propose a systematic scheme for providing improvements. As an example, we show that quadratic improvements in learning efficiency, and exponential improvements in performance over limited time periods, can be obtained for a broad class of learning problems.
The drift diffusion model as the choice rule in reinforcement learning.
Pedersen, Mads Lund; Frank, Michael J; Biele, Guido
2017-08-01
Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.
Can model-free reinforcement learning explain deontological moral judgments?
Ayars, Alisabeth
2016-05-01
Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework. Copyright © 2016 Elsevier B.V. All rights reserved.
The drift diffusion model as the choice rule in reinforcement learning
Frank, Michael J.
2017-01-01
Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyper-activity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups. PMID:27966103
General functioning predicts reward and punishment learning in schizophrenia.
Somlai, Zsuzsanna; Moustafa, Ahmed A; Kéri, Szabolcs; Myers, Catherine E; Gluck, Mark A
2011-04-01
Previous studies investigating feedback-driven reinforcement learning in patients with schizophrenia have provided mixed results. In this study, we explored the clinical predictors of reward and punishment learning using a probabilistic classification learning task. Patients with schizophrenia (n=40) performed similarly to healthy controls (n=30) on the classification learning task. However, more severe negative and general symptoms were associated with lower reward-learning performance, whereas poorer general psychosocial functioning was correlated with both lower reward- and punishment-learning performances. Multiple linear regression analyses indicated that general psychosocial functioning was the only significant predictor of reinforcement learning performance when education, antipsychotic dose, and positive, negative and general symptoms were included in the analysis. These results suggest a close relationship between reinforcement learning and general psychosocial functioning in schizophrenia. Published by Elsevier B.V.
Agent-based traffic management and reinforcement learning in congested intersection network.
DOT National Transportation Integrated Search
2012-08-01
This study evaluates the performance of traffic control systems based on reinforcement learning (RL), also called approximate dynamic programming (ADP). Two algorithms have been selected for testing: 1) Q-learning and 2) approximate dynamic programmi...
Operant conditioning of enhanced pain sensitivity by heat-pain titration.
Becker, Susanne; Kleinböhl, Dieter; Klossika, Iris; Hölzl, Rupert
2008-11-15
Operant conditioning mechanisms have been demonstrated to be important in the development of chronic pain. Most experimental studies have investigated the operant modulation of verbal pain reports with extrinsic reinforcement, such as verbal reinforcement. Whether this reflects actual changes in the subjective experience of the nociceptive stimulus remained unclear. This study replicates and extends our previous demonstration that enhanced pain sensitivity to prolonged heat-pain stimulation could be learned in healthy participants through intrinsic reinforcement (contingent changes in nociceptive input) independent of verbal pain reports. In addition, we examine whether different magnitudes of reinforcement differentially enhance pain sensitivity using an operant heat-pain titration paradigm. It is based on the previously developed non-verbal behavioral discrimination task for the assessment of sensitization, which uses discriminative down- or up-regulation of stimulus temperatures in response to changes in subjective intensity. In operant heat-pain titration, this discriminative behavior and not verbal pain report was contingently reinforced or punished by acute decreases or increases in heat-pain intensity. The magnitude of reinforcement was varied between three groups: low (N1=13), medium (N2=11) and high reinforcement (N3=12). Continuous reinforcement was applied to acquire and train the operant behavior, followed by partial reinforcement to analyze the underlying learning mechanisms. Results demonstrated that sensitization to prolonged heat-pain stimulation was enhanced by operant learning within 1h. The extent of sensitization was directly dependent on the received magnitude of reinforcement. Thus, operant learning mechanisms based on intrinsic reinforcement may provide an explanation for the gradual development of sustained hypersensitivity during pain that is becoming chronic.
Machine learning in cardiovascular medicine: are we there yet?
Shameer, Khader; Johnson, Kipp W; Glicksberg, Benjamin S; Dudley, Joel T; Sengupta, Partho P
2018-01-19
Artificial intelligence (AI) broadly refers to analytical algorithms that iteratively learn from data, allowing computers to find hidden insights without being explicitly programmed where to look. These include a family of operations encompassing several terms like machine learning, cognitive learning, deep learning and reinforcement learning-based methods that can be used to integrate and interpret complex biomedical and healthcare data in scenarios where traditional statistical methods may not be able to perform. In this review article, we discuss the basics of machine learning algorithms and what potential data sources exist; evaluate the need for machine learning; and examine the potential limitations and challenges of implementing machine in the context of cardiovascular medicine. The most promising avenues for AI in medicine are the development of automated risk prediction algorithms which can be used to guide clinical care; use of unsupervised learning techniques to more precisely phenotype complex disease; and the implementation of reinforcement learning algorithms to intelligently augment healthcare providers. The utility of a machine learning-based predictive model will depend on factors including data heterogeneity, data depth, data breadth, nature of modelling task, choice of machine learning and feature selection algorithms, and orthogonal evidence. A critical understanding of the strength and limitations of various methods and tasks amenable to machine learning is vital. By leveraging the growing corpus of big data in medicine, we detail pathways by which machine learning may facilitate optimal development of patient-specific models for improving diagnoses, intervention and outcome in cardiovascular medicine. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Improving Robot Motor Learning with Negatively Valenced Reinforcement Signals
Navarro-Guerrero, Nicolás; Lowe, Robert J.; Wermter, Stefan
2017-01-01
Both nociception and punishment signals have been used in robotics. However, the potential for using these negatively valenced types of reinforcement learning signals for robot learning has not been exploited in detail yet. Nociceptive signals are primarily used as triggers of preprogrammed action sequences. Punishment signals are typically disembodied, i.e., with no or little relation to the agent-intrinsic limitations, and they are often used to impose behavioral constraints. Here, we provide an alternative approach for nociceptive signals as drivers of learning rather than simple triggers of preprogrammed behavior. Explicitly, we use nociception to expand the state space while we use punishment as a negative reinforcement learning signal. We compare the performance—in terms of task error, the amount of perceived nociception, and length of learned action sequences—of different neural networks imbued with punishment-based reinforcement signals for inverse kinematic learning. We contrast the performance of a version of the neural network that receives nociceptive inputs to that without such a process. Furthermore, we provide evidence that nociception can improve learning—making the algorithm more robust against network initializations—as well as behavioral performance by reducing the task error, perceived nociception, and length of learned action sequences. Moreover, we provide evidence that punishment, at least as typically used within reinforcement learning applications, may be detrimental in all relevant metrics. PMID:28420976
Place preference and vocal learning rely on distinct reinforcers in songbirds.
Murdoch, Don; Chen, Ruidong; Goldberg, Jesse H
2018-04-30
In reinforcement learning (RL) agents are typically tasked with maximizing a single objective function such as reward. But it remains poorly understood how agents might pursue distinct objectives at once. In machines, multiobjective RL can be achieved by dividing a single agent into multiple sub-agents, each of which is shaped by agent-specific reinforcement, but it remains unknown if animals adopt this strategy. Here we use songbirds to test if navigation and singing, two behaviors with distinct objectives, can be differentially reinforced. We demonstrate that strobe flashes aversively condition place preference but not song syllables. Brief noise bursts aversively condition song syllables but positively reinforce place preference. Thus distinct behavior-generating systems, or agencies, within a single animal can be shaped by correspondingly distinct reinforcement signals. Our findings suggest that spatially segregated vocal circuits can solve a credit assignment problem associated with multiobjective learning.
ERIC Educational Resources Information Center
Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias
2011-01-01
Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…
Machine Learning Control For Highly Reconfigurable High-Order Systems
2015-01-02
develop and flight test a Reinforcement Learning based approach for autonomous tracking of ground targets using a fixed wing Unmanned...Reinforcement Learning - based algorithms are developed for learning agents’ time dependent dynamics while also learning to control them. Three algorithms...to a wide range of engineering- based problems . Implementation of these solutions, however, is often complicated by the hysteretic, non-linear,
Reinforcement and inference in cross-situational word learning.
Tilles, Paulo F C; Fontanari, José F
2013-01-01
Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation) inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only) reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.
Autonomous learning based on cost assumptions: theoretical studies and experiments in robot control.
Ribeiro, C H; Hemerly, E M
2000-02-01
Autonomous learning techniques are based on experience acquisition. In most realistic applications, experience is time-consuming: it implies sensor reading, actuator control and algorithmic update, constrained by the learning system dynamics. The information crudeness upon which classical learning algorithms operate make such problems too difficult and unrealistic. Nonetheless, additional information for facilitating the learning process ideally should be embedded in such a way that the structural, well-studied characteristics of these fundamental algorithms are maintained. We investigate in this article a more general formulation of the Q-learning method that allows for a spreading of information derived from single updates towards a neighbourhood of the instantly visited state and converges to optimality. We show how this new formulation can be used as a mechanism to safely embed prior knowledge about the structure of the state space, and demonstrate it in a modified implementation of a reinforcement learning algorithm in a real robot navigation task.
Evolution with Reinforcement Learning in Negotiation
Zou, Yi; Zhan, Wenjie; Shao, Yuan
2014-01-01
Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm. PMID:25048108
Evolution with reinforcement learning in negotiation.
Zou, Yi; Zhan, Wenjie; Shao, Yuan
2014-01-01
Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm.
Overcoming Learned Helplessness in Community College Students.
ERIC Educational Resources Information Center
Roueche, John E.; Mink, Oscar G.
1982-01-01
Reviews research on the effects of repeated experiences of helplessness and on locus of control. Identifies conditions necessary for overcoming learned helplessness; i.e., the potential for learning to occur; consistent reinforcement; relevant, valued reinforcers; and favorable psychological situation. Recommends eight ways for teachers to…
Michaelides, Michael; Miller, Michael L; DiNieri, Jennifer A; Gomez, Juan L; Schwartz, Elizabeth; Egervari, Gabor; Wang, Gene Jack; Mobbs, Charles V; Volkow, Nora D; Hurd, Yasmin L
2017-11-01
Appetitive drive is influenced by coordinated interactions between brain circuits that regulate reinforcement and homeostatic signals that control metabolism. Glucose modulates striatal dopamine (DA) and regulates appetitive drive and reinforcement learning. Striatal DA D2 receptors (D2Rs) also regulate reinforcement learning and are implicated in glucose-related metabolic disorders. Nevertheless, interactions between striatal D2R and peripheral glucose have not been previously described. Here we show that manipulations involving striatal D2R signaling coincide with perseverative and impulsive-like responding for sucrose, a disaccharide consisting of fructose and glucose. Fructose conveys orosensory (ie, taste) reinforcement but does not convey metabolic (ie, nutrient-derived) reinforcement. Glucose however conveys orosensory reinforcement but unlike fructose, it is a major metabolic energy source, underlies sustained reinforcement, and activates striatal circuitry. We found that mice with deletion of dopamine- and cAMP-regulated neuronal phosphoprotein (DARPP-32) exclusively in D2R-expressing cells exhibited preferential D2R changes in the nucleus accumbens (NAc), a striatal region that critically regulates sucrose reinforcement. These changes coincided with perseverative and impulsive-like responding for sucrose pellets and sustained reinforcement learning of glucose-paired flavors. These mice were also characterized by significant glucose intolerance (ie, impaired glucose utilization). Systemic glucose administration significantly attenuated sucrose operant responding and D2R activation or blockade in the NAc bidirectionally modulated blood glucose levels and glucose tolerance. Collectively, these results implicate NAc D2R in regulating both peripheral glucose levels and glucose-dependent reinforcement learning behaviors and highlight the notion that glucose metabolic impairments arising from disrupted NAc D2R signaling are involved in compulsive and perseverative feeding behaviors.
Zhu, Feng; Aziz, H. M. Abdul; Qian, Xinwu; ...
2015-01-31
Our study develops a novel reinforcement learning algorithm for the challenging coordinated signal control problem. Traffic signals are modeled as intelligent agents interacting with the stochastic traffic environment. The model is built on the framework of coordinated reinforcement learning. The Junction Tree Algorithm (JTA) based reinforcement learning is proposed to obtain an exact inference of the best joint actions for all the coordinated intersections. Moreover, the algorithm is implemented and tested with a network containing 18 signalized intersections in VISSIM. Finally, our results show that the JTA based algorithm outperforms independent learning (Q-learning), real-time adaptive learning, and fixed timing plansmore » in terms of average delay, number of stops, and vehicular emissions at the network level.« less
The Effects of Partial Reinforcement in the Acquisition and Extinction of Recurrent Serial Patterns.
ERIC Educational Resources Information Center
Dockstader, Steven L.
The purpose of these 2 experiments was to determine whether sequential response pattern behavior is affected by partial reinforcement in the same way as other behavior systems. The first experiment investigated the partial reinforcement extinction effects (PREE) in a sequential concept learning task where subjects were required to learn a…
Microstimulation of the Human Substantia Nigra Alters Reinforcement Learning
Ramayya, Ashwin G.; Misra, Amrit
2014-01-01
Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action–reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action–reward associations rather than stimulus–reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action–reward associations during reinforcement learning. PMID:24828643
1987-09-01
Luthans (28) expanded the concept of learning as follows: 1. Learning involves a change, though not necessarily an improvement, in behaviour. Learning...that results in an unpleasant outcome is not likely to be repeated (36:244). Luthans and Kreitner (27) described the various forms of reinforcement as...four 33 alternatives (defined previously on page 24 and taken from Luthans ) of positive reinforcement, negative reinforcement, extinction and punishment
Reinforcement learning techniques for controlling resources in power networks
NASA Astrophysics Data System (ADS)
Kowli, Anupama Sunil
As power grids transition towards increased reliance on renewable generation, energy storage and demand response resources, an effective control architecture is required to harness the full functionalities of these resources. There is a critical need for control techniques that recognize the unique characteristics of the different resources and exploit the flexibility afforded by them to provide ancillary services to the grid. The work presented in this dissertation addresses these needs. Specifically, new algorithms are proposed, which allow control synthesis in settings wherein the precise distribution of the uncertainty and its temporal statistics are not known. These algorithms are based on recent developments in Markov decision theory, approximate dynamic programming and reinforcement learning. They impose minimal assumptions on the system model and allow the control to be "learned" based on the actual dynamics of the system. Furthermore, they can accommodate complex constraints such as capacity and ramping limits on generation resources, state-of-charge constraints on storage resources, comfort-related limitations on demand response resources and power flow limits on transmission lines. Numerical studies demonstrating applications of these algorithms to practical control problems in power systems are discussed. Results demonstrate how the proposed control algorithms can be used to improve the performance and reduce the computational complexity of the economic dispatch mechanism in a power network. We argue that the proposed algorithms are eminently suitable to develop operational decision-making tools for large power grids with many resources and many sources of uncertainty.
Conformity does not perpetuate suboptimal traditions in a wild population of songbirds
Aplin, Lucy M.; Sheldon, Ben C.; McElreath, Richard
2017-01-01
Social learning is important to the life history of many animals, helping individuals to acquire new adaptive behavior. However despite long-running debate, it remains an open question whether a reliance on social learning can also lead to mismatched or maladaptive behavior. In a previous study, we experimentally induced traditions for opening a bidirectional door puzzle box in replicate subpopulations of the great tit Parus major. Individuals were conformist social learners, resulting in stable cultural behaviors. Here, we vary the rewards gained by these techniques to ask to what extent established behaviors are flexible to changing conditions. When subpopulations with established foraging traditions for one technique were subjected to a reduced foraging payoff, 49% of birds switched their behavior to a higher-payoff foraging technique after only 14 days, with younger individuals showing a faster rate of change. We elucidated the decision-making process for each individual, using a mechanistic learning model to demonstrate that, perhaps surprisingly, this population-level change was achieved without significant asocial exploration and without any evidence for payoff-biased copying. Rather, by combining conformist social learning with payoff-sensitive individual reinforcement (updating of experience), individuals and populations could both acquire adaptive behavior and track environmental change. PMID:28739943
Mastery Learning through Individualized Instruction: A Reinforcement Strategy
ERIC Educational Resources Information Center
Sagy, John; Ravi, R.; Ananthasayanam, R.
2009-01-01
The present study attempts to gauge the effect of individualized instructional methods as a reinforcement strategy for mastery learning. Among various individualized instructional methods, the study focuses on PIM (Programmed Instructional Method) and CAIM (Computer Assisted Instruction Method). Mastery learning is a process where students achieve…
Segers, Elien; Beckers, Tom; Geurts, Hilde; Claes, Laurence; Danckaerts, Marina; van der Oord, Saskia
2018-01-01
Introduction: Behavioral Parent Training (BPT) is often provided for childhood psychiatric disorders. These disorders have been shown to be associated with working memory impairments. BPT is based on operant learning principles, yet how operant principles shape behavior (through the partial reinforcement (PRF) extinction effect, i.e., greater resistance to extinction that is created when behavior is reinforced partially rather than continuously) and the potential role of working memory therein is scarcely studied in children. This study explored the PRF extinction effect and the role of working memory therein using experimental tasks in typically developing children. Methods: Ninety-seven children (age 6–10) completed a working memory task and an operant learning task, in which children acquired a response-sequence rule under either continuous or PRF (120 trials), followed by an extinction phase (80 trials). Data of 88 children were used for analysis. Results: The PRF extinction effect was confirmed: We observed slower acquisition and extinction in the PRF condition as compared to the continuous reinforcement (CRF) condition. Working memory was negatively related to acquisition but not extinction performance. Conclusion: Both reinforcement contingencies and working memory relate to acquisition performance. Potential implications for BPT are that decreasing working memory load may enhance the chance of optimally learning through reinforcement. PMID:29643822
Instructional control of reinforcement learning: A behavioral and neurocomputational investigation
Doll, Bradley B.; Jacobs, W. Jake; Sanfey, Alan G.; Frank, Michael J.
2011-01-01
Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is “overridden” at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract “Q-learning” and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a “confirmation bias” in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes. PMID:19595993
van den Akker, Karolien; Havermans, Remco C; Bouton, Mark E; Jansen, Anita
2014-10-01
Animals and humans can easily learn to associate an initially neutral cue with food intake through classical conditioning, but extinction of learned appetitive responses can be more difficult. Intermittent or partial reinforcement of food cues causes especially persistent behaviour in animals: after exposure to such learning schedules, the decline in responding that occurs during extinction is slow. After extinction, increases in responding with renewed reinforcement of food cues (reacquisition) might be less rapid after acquisition with partial reinforcement. In humans, it may be that the eating behaviour of some individuals resembles partial reinforcement schedules to a greater extent, possibly affecting dieting success by interacting with extinction and reacquisition. Furthermore, impulsivity has been associated with less successful dieting, and this association might be explained by impulsivity affecting the learning and extinction of appetitive responses. In the present two studies, the effects of different reinforcement schedules and impulsivity on the acquisition, extinction, and reacquisition of appetitive responses were investigated in a conditioning paradigm involving food rewards in healthy humans. Overall, the results indicate both partial reinforcement schedules and, possibly, impulsivity to be associated with worse extinction performance. A new model of dieting success is proposed: learning histories and, perhaps, certain personality traits (impulsivity) can interfere with the extinction and reacquisition of appetitive responses to food cues and they may be causally related to unsuccessful dieting. Copyright © 2014 Elsevier Ltd. All rights reserved.
Regulating recognition decisions through incremental reinforcement learning.
Han, Sanghoon; Dobbins, Ian G
2009-06-01
Does incremental reinforcement learning influence recognition memory judgments? We examined this question by subtly altering the relative validity or availability of feedback in order to differentially reinforce old or new recognition judgments. Experiment 1 probabilistically and incorrectly indicated that either misses or false alarms were correct in the context of feedback that was otherwise accurate. Experiment 2 selectively withheld feedback for either misses or false alarms in the context of feedback that was otherwise present. Both manipulations caused prominent shifts of recognition memory decision criteria that remained for considerable periods even after feedback had been altogether removed. Overall, these data demonstrate that incremental reinforcement-learning mechanisms influence the degree of caution subjects exercise when evaluating explicit memories.
Infant Contingency Learning in Different Cultural Contexts
ERIC Educational Resources Information Center
Graf, Frauke; Lamm, Bettina; Goertz, Claudia; Kolling, Thorsten; Freitag, Claudia; Spangler, Sibylle; Fassbender, Ina; Teubert, Manuel; Vierhaus, Marc; Keller, Heidi; Lohaus, Arnold; Schwarzer, Gudrun; Knopf, Monika
2012-01-01
Three-month-old Cameroonian Nso farmer and German middle-class infants were compared regarding learning and retention in a computerized mobile task. Infants achieving a preset learning criterion during reinforcement were tested for immediate and long-term retention measured in terms of an increased response rate after reinforcement and after a…
A Robust Cooperated Control Method with Reinforcement Learning and Adaptive H∞ Control
NASA Astrophysics Data System (ADS)
Obayashi, Masanao; Uchiyama, Shogo; Kuremoto, Takashi; Kobayashi, Kunikazu
This study proposes a robust cooperated control method combining reinforcement learning with robust control to control the system. A remarkable characteristic of the reinforcement learning is that it doesn't require model formula, however, it doesn't guarantee the stability of the system. On the other hand, robust control system guarantees stability and robustness, however, it requires model formula. We employ both the actor-critic method which is a kind of reinforcement learning with minimal amount of computation to control continuous valued actions and the traditional robust control, that is, H∞ control. The proposed system was compared method with the conventional control method, that is, the actor-critic only used, through the computer simulation of controlling the angle and the position of a crane system, and the simulation result showed the effectiveness of the proposed method.
Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji
2015-01-01
A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach.
Punishment insensitivity and impaired reinforcement learning in preschoolers.
Briggs-Gowan, Margaret J; Nichols, Sara R; Voss, Joel; Zobel, Elvira; Carter, Alice S; McCarthy, Kimberly J; Pine, Daniel S; Blair, James; Wakschlag, Lauren S
2014-01-01
Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a developmental vulnerability to psychopathic traits. One hundred and fifty-seven preschoolers (mean age 4.7 ± 0.8 years) participated in a substudy that was embedded within a larger project. Children completed the 'Stars-in-Jars' task, which involved learning to select rewarded jars and avoid punished jars. Maternal report of responsiveness to socialization was assessed with the Punishment Insensitivity and Low Concern for Others scales of the Multidimensional Assessment of Preschool Disruptive Behavior (MAP-DB). Punishment Insensitivity, but not Low Concern for Others, was significantly associated with reinforcement learning in multivariate models that accounted for age and sex. Specifically, higher Punishment Insensitivity was associated with significantly lower overall performance and more errors on punished trials ('passive avoidance'). Impairments in reinforcement learning manifest in preschoolers who are high in maternal ratings of Punishment Insensitivity. If replicated, these findings may help to pinpoint the neurodevelopmental antecedents of psychopathic tendencies and suggest novel intervention targets beginning in early childhood. © 2013 The Authors. Journal of Child Psychology and Psychiatry © 2013 Association for Child and Adolescent Mental Health.
Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji
2015-01-01
A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach. PMID:25734662
Schulz, Daniela; Henn, Fritz A; Petri, David; Huston, Joseph P
2016-08-04
Principles of negative reinforcement learning may play a critical role in the etiology and treatment of depression. We examined the integrity of positive reinforcement learning in congenitally helpless (cH) rats, an animal model of depression, using a random ratio schedule and a devaluation-extinction procedure. Furthermore, we tested whether an antidepressant dose of the monoamine oxidase (MAO)-B inhibitor deprenyl would reverse any deficits in positive reinforcement learning. We found that cH rats (n=9) were impaired in the acquisition of even simple operant contingencies, such as a fixed interval (FI) 20 schedule. cH rats exhibited no apparent deficits in appetite or reward sensitivity. They reacted to the devaluation of food in a manner consistent with a dose-response relationship. Reinforcer motivation as assessed by lever pressing across sessions with progressively decreasing reward probabilities was highest in congenitally non-helpless (cNH, n=10) rats as long as the reward probabilities remained relatively high. cNH compared to wild-type (n=10) rats were also more resistant to extinction across sessions. Compared to saline (n=5), deprenyl (n=5) reduced the duration of immobility of cH rats in the forced swimming test, indicative of antidepressant effects, but did not restore any deficits in the acquisition of a FI 20 schedule. We conclude that positive reinforcement learning was impaired in rats bred for helplessness, possibly due to motivational impairments but not deficits in reward sensitivity, and that deprenyl exerted antidepressant effects but did not reverse the deficits in positive reinforcement learning. Copyright © 2016 IBRO. Published by Elsevier Ltd. All rights reserved.
Insel, Catherine; Reinen, Jenna; Weber, Jochen; Wager, Tor D; Jarskog, L Fredrik; Shohamy, Daphna; Smith, Edward E
2014-03-01
Schizophrenia is characterized by an abnormal dopamine system, and dopamine blockade is the primary mechanism of antipsychotic treatment. Consistent with the known role of dopamine in reward processing, prior research has demonstrated that patients with schizophrenia exhibit impairments in reward-based learning. However, it remains unknown how treatment with antipsychotic medication impacts the behavioral and neural signatures of reinforcement learning in schizophrenia. The goal of this study was to examine whether antipsychotic medication modulates behavioral and neural responses to prediction error coding during reinforcement learning. Patients with schizophrenia completed a reinforcement learning task while undergoing functional magnetic resonance imaging. The task consisted of two separate conditions in which participants accumulated monetary gain or avoided monetary loss. Behavioral results indicated that antipsychotic medication dose was associated with altered behavioral approaches to learning, such that patients taking higher doses of medication showed increased sensitivity to negative reinforcement. Higher doses of antipsychotic medication were also associated with higher learning rates (LRs), suggesting that medication enhanced sensitivity to trial-by-trial feedback. Neuroimaging data demonstrated that antipsychotic dose was related to differences in neural signatures of feedback prediction error during the loss condition. Specifically, patients taking higher doses of medication showed attenuated prediction error responses in the striatum and the medial prefrontal cortex. These findings indicate that antipsychotic medication treatment may influence motivational processes in patients with schizophrenia.
Microstimulation of the human substantia nigra alters reinforcement learning.
Ramayya, Ashwin G; Misra, Amrit; Baltuch, Gordon H; Kahana, Michael J
2014-05-14
Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action-reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action-reward associations rather than stimulus-reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action-reward associations during reinforcement learning. Copyright © 2014 the authors 0270-6474/14/346887-09$15.00/0.
Batch Mode Reinforcement Learning based on the Synthesis of Artificial Trajectories
Fonteneau, Raphael; Murphy, Susan A.; Wehenkel, Louis; Ernst, Damien
2013-01-01
In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value function. As an alternative to the use of function approximators, we rely on the synthesis of “artificial trajectories” from the given sample of trajectories, and show that this idea opens new avenues for designing and analyzing algorithms for batch mode reinforcement learning. PMID:24049244
Gershman, Samuel J.; Pesaran, Bijan; Daw, Nathaniel D.
2009-01-01
Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable, due to the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning – such as prediction error signals for action valuation associated with dopamine and the striatum – can cope with this “curse of dimensionality.” We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and BOLD activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to “divide and conquer” reinforcement learning over high-dimensional action spaces. PMID:19864565
Gershman, Samuel J; Pesaran, Bijan; Daw, Nathaniel D
2009-10-28
Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.
Separation of time-based and trial-based accounts of the partial reinforcement extinction effect.
Bouton, Mark E; Woods, Amanda M; Todd, Travis P
2014-01-01
Two appetitive conditioning experiments with rats examined time-based and trial-based accounts of the partial reinforcement extinction effect (PREE). In the PREE, the loss of responding that occurs in extinction is slower when the conditioned stimulus (CS) has been paired with a reinforcer on some of its presentations (partially reinforced) instead of every presentation (continuously reinforced). According to a time-based or "time-accumulation" view (e.g., Gallistel and Gibbon, 2000), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger amount of time has accumulated in the CS over trials. In contrast, according to a trial-based view (e.g., Capaldi, 1967), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger number of CS presentations. Experiment 1 used a procedure that equated partially and continuously reinforced groups on their expected times to reinforcement during conditioning. A PREE was still observed. Experiment 2 then used an extinction procedure that allowed time in the CS and the number of trials to accumulate differentially through extinction. The PREE was still evident when responding was examined as a function of expected time units to the reinforcer, but was eliminated when responding was examined as a function of expected trial units to the reinforcer. There was no evidence that the animal responded according to the ratio of time accumulated during the CS in extinction over the time in the CS expected before the reinforcer. The results thus favor a trial-based account over a time-based account of extinction and the PREE. This article is part of a Special Issue entitled: Associative and Temporal Learning. Copyright © 2013 Elsevier B.V. All rights reserved.
Levy, I Martin; Pryor, Karen W; McKeon, Theresa R
2016-04-01
A surgical procedure is a complex behavior that can be constructed from foundation or component behaviors. Both the component and the composite behaviors built from them are much more likely to recur if it they are reinforced (operant learning). Behaviors in humans have been successfully reinforced using the acoustic stimulus from a mechanical clicker, where the clicker serves as a conditioned reinforcer that communicates in a way that is language- and judgment-free; however, to our knowledge, the use of operant-learning principles has not been formally evaluated for acquisition of surgical skills. Two surgical tasks were taught and compared using two teaching strategies: (1) an operant learning methodology using a conditioned, acoustic reinforcer (a clicker) for positive reinforcement; and (2) a more classical approach using demonstration alone. Our goal was to determine whether a group that is taught a surgical skill using an operant learning procedure would more precisely perform that skill than a group that is taught by demonstration alone. Two specific behaviors, "tying the locking, sliding knot" and "making a low-angle drill hole," were taught to the 2014 Postgraduate Year (PGY)-1 class and first- and second-year medical students, using an operant learning procedure incorporating precise scripts along with acoustic feedback. The control groups, composed of PGY-1 and -2 nonorthopaedic surgical residents and first- and second-year medical students, were taught using demonstration alone. The precision and speed of each behavior was recorded for each individual by a single experienced surgeon, skilled in operant learning. The groups were then compared. The operant learning group achieved better precision tying the locking, sliding knot than did the control group. Twelve of the 12 test group learners tied the knot and precisely performed all six component steps, whereas only four of the 12 control group learners tied the knot and correctly performed all six component steps (the test group median was 10 [range, 10-10], the control group median was 0 [range, 0-10], p = 0.004). However, the median "time to tie the first knot" for the test group was longer than for the control group (test group median 271 seconds [range, 184-626 seconds], control group median 163 seconds [range 93-900 seconds], p = 0.017), whereas the "time to tie 10 of the locking, sliding knots" was the same for both groups (test group mean 95 seconds ± SD = 15 [range, 67-120 seconds], control group mean 95 seconds ± SD = 28 [range, 62-139 seconds], p = 0.996). For the low-angle drill hole test, the test group more consistently achieved the ideal six-step behavior for precisely drilling the low-angle hole compared with the control group (p = 0.006 for the median number of technique success comparison with an odds ratio [at the 95% confidence interval] of 82.3 [29.1-232.8]). The mean time to drill 10 low-angle holes was not different between the test group (mean 193 seconds ± SD = 26 [range, 153-222 seconds]) and the control group (mean 146 seconds ± SD = 63 [range, 114-294 seconds]) (p = 0.084). Operant learning occurs as the behavior is constructed and is highly reinforced with the result measured, not in the time saved, but in the ultimate outcome of an accurately built complex behavior. Level II, therapeutic study.
Autonomous reinforcement learning with experience replay.
Wawrzyński, Paweł; Tanwani, Ajay Kumar
2013-05-01
This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. Copyright © 2012 Elsevier Ltd. All rights reserved.
Shephard, Elizabeth; Jackson, Georgina M; Groom, Madeleine J
2016-06-01
Altered reinforcement learning is implicated in the causes of Tourette syndrome (TS) and attention-deficit/hyperactivity disorder (ADHD). TS and ADHD frequently co-occur but how this affects reinforcement learning has not been investigated. We examined the ability of young people with TS (n=18), TS+ADHD (N=17), ADHD (n=13) and typically developing controls (n=20) to learn and reverse stimulus-response (S-R) associations based on positive and negative reinforcement feedback. We used a 2 (TS-yes, TS-no)×2 (ADHD-yes, ADHD-no) factorial design to assess the effects of TS, ADHD, and their interaction on behavioural (accuracy, RT) and event-related potential (stimulus-locked P3, feedback-locked P2, feedback-related negativity, FRN) indices of learning and reversing the S-R associations. TS was associated with intact learning and reversal performance and largely typical ERP amplitudes. ADHD was associated with lower accuracy during S-R learning and impaired reversal learning (significantly reduced accuracy and a trend for smaller P3 amplitude). The results indicate that co-occurring ADHD symptoms impair reversal learning in TS+ADHD. The implications of these findings for behavioural tic therapies are discussed. Copyright © 2016 ISDN. Published by Elsevier Ltd. All rights reserved.
Utilising reinforcement learning to develop strategies for driving auditory neural implants.
Lee, Geoffrey W; Zambetta, Fabio; Li, Xiaodong; Paolini, Antonio G
2016-08-01
In this paper we propose a novel application of reinforcement learning to the area of auditory neural stimulation. We aim to develop a simulation environment which is based off real neurological responses to auditory and electrical stimulation in the cochlear nucleus (CN) and inferior colliculus (IC) of an animal model. Using this simulator we implement closed loop reinforcement learning algorithms to determine which methods are most effective at learning effective acoustic neural stimulation strategies. By recording a comprehensive set of acoustic frequency presentations and neural responses from a set of animals we created a large database of neural responses to acoustic stimulation. Extensive electrical stimulation in the CN and the recording of neural responses in the IC provides a mapping of how the auditory system responds to electrical stimuli. The combined dataset is used as the foundation for the simulator, which is used to implement and test learning algorithms. Reinforcement learning, utilising a modified n-Armed Bandit solution, is implemented to demonstrate the model's function. We show the ability to effectively learn stimulation patterns which mimic the cochlea's ability to covert acoustic frequencies to neural activity. Time taken to learn effective replication using neural stimulation takes less than 20 min under continuous testing. These results show the utility of reinforcement learning in the field of neural stimulation. These results can be coupled with existing sound processing technologies to develop new auditory prosthetics that are adaptable to the recipients current auditory pathway. The same process can theoretically be abstracted to other sensory and motor systems to develop similar electrical replication of neural signals.
Embedded Incremental Feature Selection for Reinforcement Learning
2012-05-01
Prior to this work, feature selection for reinforce- ment learning has focused on linear value function ap- proximation ( Kolter and Ng, 2009; Parr et al...InProceed- ings of the the 23rd International Conference on Ma- chine Learning, pages 449–456. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature
Social Learning, Reinforcement and Crime: Evidence from Three European Cities
ERIC Educational Resources Information Center
Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina
2012-01-01
This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…
Novelty and Inductive Generalization in Human Reinforcement Learning
Gershman, Samuel J.; Niv, Yael
2015-01-01
In reinforcement learning, a decision maker searching for the most rewarding option is often faced with the question: what is the value of an option that has never been tried before? One way to frame this question is as an inductive problem: how can I generalize my previous experience with one set of options to a novel option? We show how hierarchical Bayesian inference can be used to solve this problem, and describe an equivalence between the Bayesian model and temporal difference learning algorithms that have been proposed as models of reinforcement learning in humans and animals. According to our view, the search for the best option is guided by abstract knowledge about the relationships between different options in an environment, resulting in greater search efficiency compared to traditional reinforcement learning algorithms previously applied to human cognition. In two behavioral experiments, we test several predictions of our model, providing evidence that humans learn and exploit structured inductive knowledge to make predictions about novel options. In light of this model, we suggest a new interpretation of dopaminergic responses to novelty. PMID:25808176
Learning with incomplete information and the mathematical structure behind it.
Kühn, Reimer; Stamatescu, Ion-Olimpiu
2007-07-01
We investigate the problem of learning with incomplete information as exemplified by learning with delayed reinforcement. We study a two phase learning scenario in which a phase of Hebbian associative learning based on momentary internal representations is supplemented by an 'unlearning' phase depending on a graded reinforcement signal. The reinforcement signal quantifies the success-rate globally for a number of learning steps in phase one, and 'unlearning' is indiscriminate with respect to associations learnt in that phase. Learning according to this model is studied via simulations and analytically within a student-teacher scenario for both single layer networks and, for a committee machine. Success and speed of learning depend on the ratio lambda of the learning rates used for the associative Hebbian learning phase and for the unlearning-correction in response to the reinforcement signal, respectively. Asymptotically perfect generalization is possible only, if this ratio exceeds a critical value lambda( c ), in which case the generalization error exhibits a power law decay with the number of examples seen by the student, with an exponent that depends in a non-universal manner on the parameter lambda. We find these features to be robust against a wide spectrum of modifications of microscopic modelling details. Two illustrative applications-one of a robot learning to navigate a field containing obstacles, and the problem of identifying a specific component in a collection of stimuli-are also provided.
Neural Modularity Helps Organisms Evolve to Learn New Skills without Forgetting Old Skills
Ellefsen, Kai Olav; Mouret, Jean-Baptiste; Clune, Jeff
2015-01-01
A long-standing goal in artificial intelligence is creating agents that can learn a variety of different skills for different problems. In the artificial intelligence subfield of neural networks, a barrier to that goal is that when agents learn a new skill they typically do so by losing previously acquired skills, a problem called catastrophic forgetting. That occurs because, to learn the new task, neural learning algorithms change connections that encode previously acquired skills. How networks are organized critically affects their learning dynamics. In this paper, we test whether catastrophic forgetting can be reduced by evolving modular neural networks. Modularity intuitively should reduce learning interference between tasks by separating functionality into physically distinct modules in which learning can be selectively turned on or off. Modularity can further improve learning by having a reinforcement learning module separate from sensory processing modules, allowing learning to happen only in response to a positive or negative reward. In this paper, learning takes place via neuromodulation, which allows agents to selectively change the rate of learning for each neural connection based on environmental stimuli (e.g. to alter learning in specific locations based on the task at hand). To produce modularity, we evolve neural networks with a cost for neural connections. We show that this connection cost technique causes modularity, confirming a previous result, and that such sparsely connected, modular networks have higher overall performance because they learn new skills faster while retaining old skills more and because they have a separate reinforcement learning module. Our results suggest (1) that encouraging modularity in neural networks may help us overcome the long-standing barrier of networks that cannot learn new skills without forgetting old ones, and (2) that one benefit of the modularity ubiquitous in the brains of natural animals might be to alleviate the problem of catastrophic forgetting. PMID:25837826
Neural modularity helps organisms evolve to learn new skills without forgetting old skills.
Ellefsen, Kai Olav; Mouret, Jean-Baptiste; Clune, Jeff
2015-04-01
A long-standing goal in artificial intelligence is creating agents that can learn a variety of different skills for different problems. In the artificial intelligence subfield of neural networks, a barrier to that goal is that when agents learn a new skill they typically do so by losing previously acquired skills, a problem called catastrophic forgetting. That occurs because, to learn the new task, neural learning algorithms change connections that encode previously acquired skills. How networks are organized critically affects their learning dynamics. In this paper, we test whether catastrophic forgetting can be reduced by evolving modular neural networks. Modularity intuitively should reduce learning interference between tasks by separating functionality into physically distinct modules in which learning can be selectively turned on or off. Modularity can further improve learning by having a reinforcement learning module separate from sensory processing modules, allowing learning to happen only in response to a positive or negative reward. In this paper, learning takes place via neuromodulation, which allows agents to selectively change the rate of learning for each neural connection based on environmental stimuli (e.g. to alter learning in specific locations based on the task at hand). To produce modularity, we evolve neural networks with a cost for neural connections. We show that this connection cost technique causes modularity, confirming a previous result, and that such sparsely connected, modular networks have higher overall performance because they learn new skills faster while retaining old skills more and because they have a separate reinforcement learning module. Our results suggest (1) that encouraging modularity in neural networks may help us overcome the long-standing barrier of networks that cannot learn new skills without forgetting old ones, and (2) that one benefit of the modularity ubiquitous in the brains of natural animals might be to alleviate the problem of catastrophic forgetting.
2014-09-29
Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity , and Performance W. Bradley Knox...positive a trainer’s reward values are; temporal discounting, the extent to which future reward is discounted in value; episodicity , whether task...learning occurs in discrete learning episodes instead of one continuing session; and task performance, the agent’s performance on the task the trainer
Fuzzy Q-Learning for Generalization of Reinforcement Learning
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1996-01-01
Fuzzy Q-Learning, introduced earlier by the author, is an extension of Q-Learning into fuzzy environments. GARIC is a methodology for fuzzy reinforcement learning. In this paper, we introduce GARIC-Q, a new method for doing incremental Dynamic Programming using a society of intelligent agents which are controlled at the top level by Fuzzy Q-Learning and at the local level, each agent learns and operates based on GARIC. GARIC-Q improves the speed and applicability of Fuzzy Q-Learning through generalization of input space by using fuzzy rules and bridges the gap between Q-Learning and rule based intelligent systems.
Moral learning: Psychological and philosophical perspectives.
Cushman, Fiery; Kumar, Victor; Railton, Peter
2017-10-01
The past 15years occasioned an extraordinary blossoming of research into the cognitive and affective mechanisms that support moral judgment and behavior. This growth in our understanding of moral mechanisms overshadowed a crucial and complementary question, however: How are they learned? As this special issue of the journal Cognition attests, a new crop of research into moral learning has now firmly taken root. This new literature draws on recent advances in formal methods developed in other domains, such as Bayesian inference, reinforcement learning and other machine learning techniques. Meanwhile, it also demonstrates how learning and deciding in a social domain-and especially in the moral domain-sometimes involves specialized cognitive systems. We review the contributions to this special issue and situate them within the broader contemporary literature. Our review focuses on how we learn moral values and moral rules, how we learn about personal moral character and relationships, and the philosophical implications of these emerging models. Copyright © 2017 Elsevier B.V. All rights reserved.
Framework for robot skill learning using reinforcement learning
NASA Astrophysics Data System (ADS)
Wei, Yingzi; Zhao, Mingyang
2003-09-01
Robot acquiring skill is a process similar to human skill learning. Reinforcement learning (RL) is an on-line actor critic method for a robot to develop its skill. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. We present an augmented reward function that provides a new way for RL controller to incorporate prior knowledge and experience into the RL controller. Also, the difference form of augmented reward function is considered carefully. The additional reward beyond conventional reward will provide more heuristic information for RL. In this paper, we present a strategy for the task of complex skill learning. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. The new form of value function is introduced to attain smooth motion switching swiftly. We present a formal, but practical, framework for robot skill learning and also illustrate with an example the utility of method for learning skilled robot control on line.
Proactivity and Reinforcement: The Contingency of Social Behavior
ERIC Educational Resources Information Center
Williams, J. Sherwood; And Others
1976-01-01
This paper analyzes development of group structure in terms of the stimulus-sampling perspective. Learning is the continual sampling of possibilities, with those reinforced possibilities increasing in probability of occurance. This contingency learning approach is tested experimentally. (NG)
Fagen, Ariel; Acharya, Narayan; Kaufman, Gretchen E
2014-01-01
Many trainers of animals in the zoo now rely on positive reinforcement training to teach animals to voluntarily participate in husbandry and veterinary procedures in an effort to improve behavioral reliability, captive management, and welfare. However, captive elephant handlers in Nepal still rely heavily on punishment- and aversion-based methods. The aim of this project was to determine the effectiveness of secondary positive reinforcement (SPR) in training free-contact elephants in Nepal to voluntarily participate in a trunk wash for the purpose of tuberculosis testing. Five female elephants, 4 juveniles and 1 adult, were enrolled in the project. Data were collected in the form of minutes of training, number of offers made for each training task, and success rate for each task in performance tests. Four out of 5 elephants, all juveniles, successfully learned the trunk wash in 35 sessions or fewer, with each session lasting a mean duration of 12 min. The elephants' performance improved from a mean success rate of 39.0% to 89.3% during the course of the training. This study proves that it is feasible to efficiently train juvenile, free-contact, traditionally trained elephants in Nepal to voluntarily and reliably participate in a trunk wash using only SPR techniques.
Fagen, Ariel; Acharya, Narayan; Kaufman, Gretchen E.
2016-01-01
Many trainers of animals in the zoo now rely on positive reinforcement training to teach animals to voluntarily participate in husbandry and veterinary procedures in an effort to improve behavioral reliability, captive management, and welfare. However, captive elephant handlers in Nepal still rely heavily on punishment- and aversion-based methods. The aim of this project was to determine the effectiveness of secondary positive reinforcement (SPR) in training free-contact elephants in Nepal to voluntarily participate in a trunk wash for the purpose of tuberculosis testing. Five female elephants, 4 juveniles and 1 adult, were enrolled in the project. Data were collected in the form of minutes of training, number of offers made for each training task, and success rate for each task in performance tests. Four out of 5 elephants, all juveniles, successfully learned the trunk wash in 35 sessions or fewer, with each session lasting a mean duration of 12 min. The elephants’ performance improved from a mean success rate of 39.0% to 89.3% during the course of the training. This study proves that it is feasible to efficiently train juvenile, free-contact, traditionally trained elephants in Nepal to voluntarily and reliably participate in a trunk wash using only SPR techniques. PMID:24410366
Mobile robots exploration through cnn-based reinforcement learning.
Tai, Lei; Liu, Ming
2016-01-01
Exploration in an unknown environment is an elemental application for mobile robots. In this paper, we outlined a reinforcement learning method aiming for solving the exploration problem in a corridor environment. The learning model took the depth image from an RGB-D sensor as the only input. The feature representation of the depth image was extracted through a pre-trained convolutional-neural-networks model. Based on the recent success of deep Q-network on artificial intelligence, the robot controller achieved the exploration and obstacle avoidance abilities in several different simulated environments. It is the first time that the reinforcement learning is used to build an exploration strategy for mobile robots through raw sensor information.
Altered neural encoding of prediction errors in assault-related posttraumatic stress disorder.
Ross, Marisa C; Lenow, Jennifer K; Kilts, Clinton D; Cisler, Josh M
2018-05-12
Posttraumatic stress disorder (PTSD) is widely associated with deficits in extinguishing learned fear responses, which relies on mechanisms of reinforcement learning (e.g., updating expectations based on prediction errors). However, the degree to which PTSD is associated with impairments in general reinforcement learning (i.e., outside of the context of fear stimuli) remains poorly understood. Here, we investigate brain and behavioral differences in general reinforcement learning between adult women with and without a current diagnosis of PTSD. 29 adult females (15 PTSD with exposure to assaultive violence, 14 controls) underwent a neutral reinforcement-learning task (i.e., two arm bandit task) during fMRI. We modeled participant behavior using different adaptations of the Rescorla-Wagner (RW) model and used Independent Component Analysis to identify timecourses for large-scale a priori brain networks. We found that an anticorrelated and risk sensitive RW model best fit participant behavior, with no differences in computational parameters between groups. Women in the PTSD group demonstrated significantly less neural encoding of prediction errors in both a ventral striatum/mPFC and anterior insula network compared to healthy controls. Weakened encoding of prediction errors in the ventral striatum/mPFC and anterior insula during a general reinforcement learning task, outside of the context of fear stimuli, suggests the possibility of a broader conceptualization of learning differences in PTSD than currently proposed in current neurocircuitry models of PTSD. Copyright © 2018 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Hwang, Kuo-An; Yang, Chia-Hao
2009-01-01
Most courses based on distance learning focus on the cognitive domain of learning. Because students are sometimes inattentive or tired, they may neglect the attention goal of learning. This study proposes an auto-detection and reinforcement mechanism for the distance-education system based on the reinforcement teaching strategy. If a student is…
When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition
ERIC Educational Resources Information Center
Janssen, Christian P.; Gray, Wayne D.
2012-01-01
Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other…
Dissociating error-based and reinforcement-based loss functions during sensorimotor learning
McGregor, Heather R.; Mohatarem, Ayman
2017-01-01
It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback. PMID:28753634
Dissociating error-based and reinforcement-based loss functions during sensorimotor learning.
Cashaback, Joshua G A; McGregor, Heather R; Mohatarem, Ayman; Gribble, Paul L
2017-07-01
It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback.
Adaptive Fuzzy Systems in Computational Intelligence
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1996-01-01
In recent years, the interest in computational intelligence techniques, which currently includes neural networks, fuzzy systems, and evolutionary programming, has grown significantly and a number of their applications have been developed in the government and industry. In future, an essential element in these systems will be fuzzy systems that can learn from experience by using neural network in refining their performances. The GARIC architecture, introduced earlier, is an example of a fuzzy reinforcement learning system which has been applied in several control domains such as cart-pole balancing, simulation of to Space Shuttle orbital operations, and tether control. A number of examples from GARIC's applications in these domains will be demonstrated.
Hart, Andrew S.; Collins, Anne L.; Bernstein, Ilene L.; Phillips, Paul E. M.
2012-01-01
Alcohol use during adolescence has profound and enduring consequences on decision-making under risk. However, the fundamental psychological processes underlying these changes are unknown. Here, we show that alcohol use produces over-fast learning for better-than-expected, but not worse-than-expected, outcomes without altering subjective reward valuation. We constructed a simple reinforcement learning model to simulate altered decision making using behavioral parameters extracted from rats with a history of adolescent alcohol use. Remarkably, the learning imbalance alone was sufficient to simulate the divergence in choice behavior observed between these groups of animals. These findings identify a selective alteration in reinforcement learning following adolescent alcohol use that can account for a robust change in risk-based decision making persisting into later life. PMID:22615989
Multiagent Reinforcement Learning With Sparse Interactions by Negotiation and Knowledge Transfer.
Zhou, Luowei; Yang, Pei; Chen, Chunlin; Gao, Yang
2017-05-01
Reinforcement learning has significant applications for multiagent systems, especially in unknown dynamic environments. However, most multiagent reinforcement learning (MARL) algorithms suffer from such problems as exponential computation complexity in the joint state-action space, which makes it difficult to scale up to realistic multiagent problems. In this paper, a novel algorithm named negotiation-based MARL with sparse interactions (NegoSIs) is presented. In contrast to traditional sparse-interaction-based MARL algorithms, NegoSI adopts the equilibrium concept and makes it possible for agents to select the nonstrict equilibrium-dominating strategy profile (nonstrict EDSP) or meta equilibrium for their joint actions. The presented NegoSI algorithm consists of four parts: 1) the equilibrium-based framework for sparse interactions; 2) the negotiation for the equilibrium set; 3) the minimum variance method for selecting one joint action; and 4) the knowledge transfer of local Q -values. In this integrated algorithm, three techniques, i.e., unshared value functions, equilibrium solutions, and sparse interactions are adopted to achieve privacy protection, better coordination and lower computational complexity, respectively. To evaluate the performance of the presented NegoSI algorithm, two groups of experiments are carried out regarding three criteria: 1) steps of each episode; 2) rewards of each episode; and 3) average runtime. The first group of experiments is conducted using six grid world games and shows fast convergence and high scalability of the presented algorithm. Then in the second group of experiments NegoSI is applied to an intelligent warehouse problem and simulated results demonstrate the effectiveness of the presented NegoSI algorithm compared with other state-of-the-art MARL algorithms.
An operant approach to rehabilitation medicine: overcoming learned nonuse by shaping.
Taub, E; Crago, J E; Burgio, L D; Groomes, T E; Cook, E W; DeLuca, S C; Miller, N E
1994-03-01
A new approach to the rehabilitation of movement, based primarily on the principles of operant conditioning, was derived from research with deafferented monkeys. The analysis suggests that a certain proportion of excess motor disability after certain types of injury involves a learned suppression of movement and may be termed learned nonuse. Learned nonuse can be overcome by changing the contingencies of reinforcement so that they strongly favor use of an affected upper extremity in the chronic postinjury situation. The techniques employed here involved 2 weeks of restricting movement of the opposite (unaffected) extremity and training of the affected limb. Initial work with humans has been with chronic stroke patients for whom the approach has yielded large improvements in motor ability and functional independence. We report here preliminary data suggesting that shaping with verbal feedback further enhances the motor recovery.
A Discussion of Possibility of Reinforcement Learning Using Event-Related Potential in BCI
NASA Astrophysics Data System (ADS)
Yamagishi, Yuya; Tsubone, Tadashi; Wada, Yasuhiro
Recently, Brain computer interface (BCI) which is a direct connecting pathway an external device such as a computer or a robot and a human brain have gotten a lot of attention. Since BCI can control the machines as robots by using the brain activity without using the voluntary muscle, the BCI may become a useful communication tool for handicapped persons, for instance, amyotrophic lateral sclerosis patients. However, in order to realize the BCI system which can perform precise tasks on various environments, it is necessary to design the control rules to adapt to the dynamic environments. Reinforcement learning is one approach of the design of the control rule. If this reinforcement leaning can be performed by the brain activity, it leads to the attainment of BCI that has general versatility. In this research, we paid attention to P300 of event-related potential as an alternative signal of the reward of reinforcement learning. We discriminated between the success and the failure trials from P300 of the EEG of the single trial by using the proposed discrimination algorithm based on Support vector machine. The possibility of reinforcement learning was examined from the viewpoint of the number of discriminated trials. It was shown that there was a possibility to be able to learn in most subjects.
Comparative learning theory and its application in the training of horses.
Cooper, J J
1998-11-01
Training can best be explained as a process that occurs through stimulus-response-reinforcement chains, whereby animals are conditioned to associate cues in their environment, with specific behavioural responses and their rewarding consequences. Research into learning in horses has concentrated on their powers of discrimination and on primary positive reinforcement schedules, where the correct response is paired with a desirable consequence such as food. In contrast, a number of other learning processes that are used in training have been widely studied in other species, but have received little scientific investigation in the horse. These include: negative reinforcement, where performance of the correct response is followed by removal of, or decrease in, intensity of a unpleasant stimulus; punishment, where an incorrect response is paired with an undesirable consequence, but without consistent prior warning; secondary conditioning, where a natural primary reinforcer such as food is closely associated with an arbitrary secondary reinforcer such as vocal praise; and variable or partial conditioning, where once the correct response has been learnt, reinforcement is presented according to an intermittent schedule to increase resistance to extinction outside of training.
The nature of sexual reinforcement.
Crawford, L L; Holloway, K S; Domjan, M
1993-01-01
Sexual reinforcers are not part of a regulatory system involved in the maintenance of critical metabolic processes, they differ for males and females, they differ as a function of species and mating system, and they show ontogenetic and seasonal changes related to endocrine conditions. Exposure to a member of the opposite sex without copulation can be sufficient for sexual reinforcement. However, copulatory access is a stronger reinforcer, and copulatory opportunity can serve to enhance the reinforcing efficacy of stimulus features of a sexual partner. Conversely, under certain conditions, noncopulatory exposure serves to decrease reinforcer efficacy. Many common learning phenomena such as acquisition, extinction, discrimination learning, second-order conditioning, and latent inhibition have been demonstrated in sexual conditioning. These observations extend the generality of findings obtained with more conventional reinforcers, but the mechanisms of these effects and their gender and species specificity remain to be explored. PMID:8354970
Mesolimbic confidence signals guide perceptual learning in the absence of external feedback
Guggenmos, Matthias; Wilbertz, Gregor; Hebart, Martin N; Sterzer, Philipp
2016-01-01
It is well established that learning can occur without external feedback, yet normative reinforcement learning theories have difficulties explaining such instances of learning. Here, we propose that human observers are capable of generating their own feedback signals by monitoring internal decision variables. We investigated this hypothesis in a visual perceptual learning task using fMRI and confidence reports as a measure for this monitoring process. Employing a novel computational model in which learning is guided by confidence-based reinforcement signals, we found that mesolimbic brain areas encoded both anticipation and prediction error of confidence—in remarkable similarity to previous findings for external reward-based feedback. We demonstrate that the model accounts for choice and confidence reports and show that the mesolimbic confidence prediction error modulation derived through the model predicts individual learning success. These results provide a mechanistic neurobiological explanation for learning without external feedback by augmenting reinforcement models with confidence-based feedback. DOI: http://dx.doi.org/10.7554/eLife.13388.001 PMID:27021283
Deep patch technique for landslide repair. Final report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Helwany, B.M.
1994-10-01
The report describes the laboratory testing of the `USFS deep patch` technique and a CTI modification of this technique for repairing landslides with geosynthetic reinforcement. The technique involves replacing sections of roadway lost due to landslides on top of a geosynthetically-reinforced embankment. The CTI modification involves replacing the reinforced slope with a geosynthetically-reinforced retaining wall with a truncated base. Both techniques rely on the cantilevering ability of the reinforced mass to limit the load on the foundation with a high slide potential. The tests with road base showed that (1) both the USFS and CTI repair reduced effectively the adversemore » effects of local landsliding on the highway pavement by preventing crack propagation; (2) the USFS repair increased the stability of the repaired slope, which was in progressive failure, by reducing the stresses exerted on it; and (3) the CTI repair produced substantially greater stresses on its foundation due to the truncated base of the reinforced mass.« less
Vicarious extinction learning during reconsolidation neutralizes fear memory.
Golkar, Armita; Tjaden, Cathelijn; Kindt, Merel
2017-05-01
Previous studies have suggested that fear memories can be updated when recalled, a process referred to as reconsolidation. Given the beneficial effects of model-based safety learning (i.e. vicarious extinction) in preventing the recovery of short-term fear memory, we examined whether consolidated long-term fear memories could be updated with safety learning accomplished through vicarious extinction learning initiated within the reconsolidation time-window. We assessed this in a final sample of 19 participants that underwent a three-day within-subject fear-conditioning design, using fear-potentiated startle as our primary index of fear learning. On day 1, two fear-relevant stimuli (reinforced CSs) were paired with shock (US) and a third stimulus served as a control (CS). On day 2, one of the two previously reinforced stimuli (the reminded CS) was presented once in order to reactivate the fear memory 10 min before vicarious extinction training was initiated for all CSs. The recovery of the fear memory was tested 24 h later. Vicarious extinction training conducted within the reconsolidation time window specifically prevented the recovery of the reactivated fear memory (p = 0.03), while leaving fear-potentiated startle responses to the non-reactivated cue intact (p = 0.62). These findings are relevant to both basic and clinical research, suggesting that a safe, non-invasive model-based exposure technique has the potential to enhance the efficiency and durability of anxiolytic therapies. Copyright © 2017 Elsevier Ltd. All rights reserved.
Mustapha, Ibrahim; Ali, Borhanuddin Mohd; Rasid, Mohd Fadlee A.; Sali, Aduwati; Mohamad, Hafizal
2015-01-01
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
Mustapha, Ibrahim; Mohd Ali, Borhanuddin; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal
2015-08-13
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach.
Mirolli, Marco; Santucci, Vieri G; Baldassarre, Gianluca
2013-03-01
An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions. Copyright © 2013 Elsevier Ltd. All rights reserved.
Collins, Anne G E; Frank, Michael J
2018-03-06
Learning from rewards and punishments is essential to survival and facilitates flexible human behavior. It is widely appreciated that multiple cognitive and reinforcement learning systems contribute to decision-making, but the nature of their interactions is elusive. Here, we leverage methods for extracting trial-by-trial indices of reinforcement learning (RL) and working memory (WM) in human electro-encephalography to reveal single-trial computations beyond that afforded by behavior alone. Neural dynamics confirmed that increases in neural expectation were predictive of reduced neural surprise in the following feedback period, supporting central tenets of RL models. Within- and cross-trial dynamics revealed a cooperative interplay between systems for learning, in which WM contributes expectations to guide RL, despite competition between systems during choice. Together, these results provide a deeper understanding of how multiple neural systems interact for learning and decision-making and facilitate analysis of their disruption in clinical populations.
Learning and tuning fuzzy logic controllers through reinforcements.
Berenji, H R; Khedkar, P
1992-01-01
A method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system is presented. It is shown that: the generalized approximate-reasoning-based intelligent control (GARIC) architecture learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward network, which can then adaptively improve performance by using gradient descent methods. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.
Impairments in action-outcome learning in schizophrenia.
Morris, Richard W; Cyrzon, Chad; Green, Melissa J; Le Pelley, Mike E; Balleine, Bernard W
2018-03-03
Learning the causal relation between actions and their outcomes (AO learning) is critical for goal-directed behavior when actions are guided by desire for the outcome. This can be contrasted with habits that are acquired by reinforcement and primed by prevailing stimuli, in which causal learning plays no part. Recently, we demonstrated that goal-directed actions are impaired in schizophrenia; however, whether this deficit exists alongside impairments in habit or reinforcement learning is unknown. The present study distinguished deficits in causal learning from reinforcement learning in schizophrenia. We tested people with schizophrenia (SZ, n = 25) and healthy adults (HA, n = 25) in a vending machine task. Participants learned two action-outcome contingencies (e.g., push left to get a chocolate M&M, push right to get a cracker), and they also learned one contingency was degraded by delivery of noncontingent outcomes (e.g., free M&Ms), as well as changes in value by outcome devaluation. Both groups learned the best action to obtain rewards; however, SZ did not distinguish the more causal action when one AO contingency was degraded. Moreover, action selection in SZ was insensitive to changes in outcome value unless feedback was provided, and this was related to the deficit in AO learning. The failure to encode the causal relation between action and outcome in schizophrenia occurred without any apparent deficit in reinforcement learning. This implies that poor goal-directed behavior in schizophrenia cannot be explained by a more primary deficit in reward learning such as insensitivity to reward value or reward prediction errors.
ERIC Educational Resources Information Center
Heitzman, Andrew J.
The New York State Center for Migrant Studies conducted this 1968 study which investigated effects of token reinforcers on reading and arithmetic skills learnings of migrant primary school students during a 6-week summer school session. Students (Negro and Caucasian) received plastic tokens to reward skills learning responses. Tokens were traded…
ERIC Educational Resources Information Center
Neu, Jessica Adele
2013-01-01
I conducted two studies on the comparative effects of the observation of learn units during (a) reinforcement or (b) correction conditions on the acquisition of math objectives. The dependent variables were the within-session cumulative numbers of correct responses emitted during observational sessions. The independent variables were the…
ERIC Educational Resources Information Center
Chi, Min; VanLehn, Kurt; Litman, Diane; Jordan, Pamela
2011-01-01
Pedagogical strategies are policies for a tutor to decide the next action when there are multiple actions available. When the content is controlled to be the same across experimental conditions, there has been little evidence that tutorial decisions have an impact on students' learning. In this paper, we applied Reinforcement Learning (RL) to…
The Identification and Establishment of Reinforcement for Collaboration in Elementary Students
ERIC Educational Resources Information Center
Darcy, Laura
2017-01-01
In Experiment 1, I conducted a functional analysis of student rate of learning with and without a peer-yoked contingency for 12 students in Kindergarten through 2nd grade in order to determine if they had conditioned reinforcement for collaboration. Using an ABAB reversal design, I compared rate of learning as measured by learn units to criterion…
Stress enhances model-free reinforcement learning only after negative outcome
Lee, Daeyeol
2017-01-01
Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i.e., how quickly information from the new environment is incorporated into choices. Participants were randomly assigned to either a stress or a control condition, and performed a two-stage Markov decision-making task in which the reward probabilities underwent periodic reversals without notice. We found that stress increased the contribution of model-free reinforcement learning only after negative outcome. Furthermore, stress decreased the learning rate. The results suggest that stress diminishes one’s ability to make adaptive choices in multiple aspects of reinforcement learning. This finding has implications for understanding how stress facilitates maladaptive habits, such as addictive behavior, and other dysfunctional behaviors associated with stress in clinical and educational contexts. PMID:28723943
Stress enhances model-free reinforcement learning only after negative outcome.
Park, Heyeon; Lee, Daeyeol; Chey, Jeanyung
2017-01-01
Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i.e., how quickly information from the new environment is incorporated into choices. Participants were randomly assigned to either a stress or a control condition, and performed a two-stage Markov decision-making task in which the reward probabilities underwent periodic reversals without notice. We found that stress increased the contribution of model-free reinforcement learning only after negative outcome. Furthermore, stress decreased the learning rate. The results suggest that stress diminishes one's ability to make adaptive choices in multiple aspects of reinforcement learning. This finding has implications for understanding how stress facilitates maladaptive habits, such as addictive behavior, and other dysfunctional behaviors associated with stress in clinical and educational contexts.
Dillon, Laura; Collins, Meaghan; Conway, Maura; Cunningham, Kate
2013-01-01
Three experiments examined the implicit learning of sequences under conditions in which the elements comprising a sequence were equated in terms of reinforcement probability. In Experiment 1 cotton-top tamarins (Saguinus oedipus) experienced a five-element sequence displayed serially on a touch screen in which reinforcement probability was equated across elements at .16 per element. Tamarins demonstrated learning of this sequence with higher latencies during a random test as compared to baseline sequence training. In Experiments 2 and 3, manipulations of the procedure used in the first experiment were undertaken to rule out a confound owing to the fact that the elements in Experiment 1 bore different temporal relations to the intertrial interval (ITI), an inhibitory period. The results of Experiments 2 and 3 indicated that the implicit learning observed in Experiment 1 was not due to temporal proximity between some elements and the inhibitory ITI. The results taken together support two conclusion: First that tamarins engaged in sequence learning whether or not there was contingent reinforcement for learning the sequence, and second that this learning was not due to subtle differences in associative strength between the elements of the sequence. PMID:23344718
Improving the Science Excursion: An Educational Technologist's View
ERIC Educational Resources Information Center
Balson, M.
1973-01-01
Analyzes the nature of the learning process and attempts to show how the three components of a reinforcement contingency, the stimulus, the response and the reinforcement can be utilized to increase the efficiency of a typical science learning experience, the excursion. (JR)
NASA Astrophysics Data System (ADS)
Insua-Arevalo, Juan M.; Alvarez-Gomez, Jose A.; Castiñeiras, Pedro; Tejero-Lopez, Rosa; Martinez-Diaz, Jose J.; Rodriguez-Peces, Martin J.
2017-04-01
STEREOVIDEO channel (https://www.youtube.com/user/geostereovideo) is a YouTube channel of short educational videos (<5 min) focused on learning the handling of the stereographic projection technique applied to Structural Geology (also to Engineering Geology). This type of videos aims to reinforce the traditional classroom lessons with the use of communication technologies resources. Such a reinforcing facilitates the possibility to deepen more on conceptual aspects once the students dominate the representation tool helping them to develop their own critical thinking skills. After three years of being launched on-line (on 2014), we analyze the broadcast and acceptance of the channel by the academic community. For this purpose we have taken into account two different sources: (1) the analytics tool from YouTube (subscriptions, views, countries, comments from the users, type of device for viewing), and (2) our own survey among users (students and teachers) to get their opinion about the videos. By January, 2017 (date of sending of this abstract), the channel has a total of 650 subscriptions, with more than 85,000 views all around the world, mainly in Spanish speaking countries (as the videos are in Spanish). The main devices for viewing the videos are PCs, but is noteworthy the use of smart phones and tablets. The video users, both students and teachers, value this type of content positively.
Smith, Tim J.; Senju, Atsushi
2017-01-01
While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue–reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue–reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. PMID:28250186
Vernetti, Angélina; Smith, Tim J; Senju, Atsushi
2017-03-15
While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. © 2017 The Authors.
Shephard, E; Jackson, G M; Groom, M J
2014-01-01
This study examined neurocognitive differences between children and adults in the ability to learn and adapt simple stimulus-response associations through feedback. Fourteen typically developing children (mean age=10.2) and 15 healthy adults (mean age=25.5) completed a simple task in which they learned to associate visually presented stimuli with manual responses based on performance feedback (acquisition phase), and then reversed and re-learned those associations following an unexpected change in reinforcement contingencies (reversal phase). Electrophysiological activity was recorded throughout task performance. We found no group differences in learning-related changes in performance (reaction time, accuracy) or in the amplitude of event-related potentials (ERPs) associated with stimulus processing (P3 ERP) or feedback processing (feedback-related negativity; FRN) during the acquisition phase. However, children's performance was significantly more disrupted by the reversal than adults and FRN amplitudes were significantly modulated by the reversal phase in children but not adults. These findings indicate that children have specific difficulties with reinforcement learning when acquired behaviours must be altered. This may be caused by the added demands on immature executive functioning, specifically response monitoring, created by the requirement to reverse the associations, or a developmental difference in the way in which children and adults approach reinforcement learning. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.
Reinforcement Learning with Orthonormal Basis Adaptation Based on Activity-Oriented Index Allocation
NASA Astrophysics Data System (ADS)
Satoh, Hideki
An orthonormal basis adaptation method for function approximation was developed and applied to reinforcement learning with multi-dimensional continuous state space. First, a basis used for linear function approximation of a control function is set to an orthonormal basis. Next, basis elements with small activities are replaced with other candidate elements as learning progresses. As this replacement is repeated, the number of basis elements with large activities increases. Example chaos control problems for multiple logistic maps were solved, demonstrating that the method for adapting an orthonormal basis can modify a basis while holding the orthonormality in accordance with changes in the environment to improve the performance of reinforcement learning and to eliminate the adverse effects of redundant noisy states.
Intelligent Control of a Sensor-Actuator System via Kernelized Least-Squares Policy Iteration
Liu, Bo; Chen, Sanfeng; Li, Shuai; Liang, Yongsheng
2012-01-01
In this paper a new framework, called Compressive Kernelized Reinforcement Learning (CKRL), for computing near-optimal policies in sequential decision making with uncertainty is proposed via incorporating the non-adaptive data-independent Random Projections and nonparametric Kernelized Least-squares Policy Iteration (KLSPI). Random Projections are a fast, non-adaptive dimensionality reduction framework in which high-dimensionality data is projected onto a random lower-dimension subspace via spherically random rotation and coordination sampling. KLSPI introduce kernel trick into the LSPI framework for Reinforcement Learning, often achieving faster convergence and providing automatic feature selection via various kernel sparsification approaches. In this approach, policies are computed in a low-dimensional subspace generated by projecting the high-dimensional features onto a set of random basis. We first show how Random Projections constitute an efficient sparsification technique and how our method often converges faster than regular LSPI, while at lower computational costs. Theoretical foundation underlying this approach is a fast approximation of Singular Value Decomposition (SVD). Finally, simulation results are exhibited on benchmark MDP domains, which confirm gains both in computation time and in performance in large feature spaces. PMID:22736969
Flow Navigation by Smart Microswimmers via Reinforcement Learning
NASA Astrophysics Data System (ADS)
Colabrese, Simona; Biferale, Luca; Celani, Antonio; Gustavsson, Kristian
2017-11-01
We have numerically modeled active particles which are able to acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. We show that those swimmers can learn effective strategies just by experience, using a reinforcement learning algorithm. As an example, we focus on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, exploiting the underlying flow whenever possible. The reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This work paves the way towards the engineering of smart microswimmers that solve difficult navigation problems. ERC AdG NewTURB 339032.
A neural model of hierarchical reinforcement learning.
Rasmussen, Daniel; Voelker, Aaron; Eliasmith, Chris
2017-01-01
We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain's general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model's behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions.
Neural correlates of reinforcement learning and social preferences in competitive bidding.
van den Bos, Wouter; Talwar, Arjun; McClure, Samuel M
2013-01-30
In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.
Balcarras, Matthew; Ardid, Salva; Kaping, Daniel; Everling, Stefan; Womelsdorf, Thilo
2016-02-01
Attention includes processes that evaluate stimuli relevance, select the most relevant stimulus against less relevant stimuli, and bias choice behavior toward the selected information. It is not clear how these processes interact. Here, we captured these processes in a reinforcement learning framework applied to a feature-based attention task that required macaques to learn and update the value of stimulus features while ignoring nonrelevant sensory features, locations, and action plans. We found that value-based reinforcement learning mechanisms could account for feature-based attentional selection and choice behavior but required a value-independent stickiness selection process to explain selection errors while at asymptotic behavior. By comparing different reinforcement learning schemes, we found that trial-by-trial selections were best predicted by a model that only represents expected values for the task-relevant feature dimension, with nonrelevant stimulus features and action plans having only a marginal influence on covert selections. These findings show that attentional control subprocesses can be described by (1) the reinforcement learning of feature values within a restricted feature space that excludes irrelevant feature dimensions, (2) a stochastic selection process on feature-specific value representations, and (3) value-independent stickiness toward previous feature selections akin to perseveration in the motor domain. We speculate that these three mechanisms are implemented by distinct but interacting brain circuits and that the proposed formal account of feature-based stimulus selection will be important to understand how attentional subprocesses are implemented in primate brain networks.
Full reinforcement operators in aggregation techniques.
Yager, R R; Rybalov, A
1998-01-01
We introduce the concept of upward reinforcement in aggregation as one in which a collection of high scores can reinforce or corroborate each other to give an even higher score than any of the individual arguments. The concept of downward reinforcement is also introduced as one in which low scores reinforce each other. Our concern is with full reinforcement aggregation operators, those exhibiting both upward and downward reinforcement. It is shown that the t-norm and t-conorm operators are not full reinforcement operators. A class of operators called fixed identity MICA operators are shown to exhibit the property of full reinforcement. We present some families of these operators. We use the fuzzy system modeling technique to provide further examples of these operators.
Extinction of Pavlovian conditioning: The influence of trial number and reinforcement history.
Chan, C K J; Harris, Justin A
2017-08-01
Pavlovian conditioning is sensitive to the temporal relationship between the conditioned stimulus (CS) and the unconditioned stimulus (US). This has motivated models that describe learning as a process that continuously updates associative strength during the trial or specifically encodes the CS-US interval. These models predict that extinction of responding is also continuous, such that response loss is proportional to the cumulative duration of exposure to the CS without the US. We review evidence showing that this prediction is incorrect, and that extinction is trial-based rather than time-based. We also present two experiments that test the importance of trials versus time on the Partial Reinforcement Extinction Effect (PREE), in which responding extinguishes more slowly for a CS that was inconsistently reinforced with the US than for a consistently reinforced one. We show that increasing the number of extinction trials of the partially reinforced CS, relative to the consistently reinforced CS, overcomes the PREE. However, increasing the duration of extinction trials by the same amount does not overcome the PREE. We conclude that animals learn about the likelihood of the US per trial during conditioning, and learn trial-by-trial about the absence of the US during extinction. Moreover, what they learn about the likelihood of the US during conditioning affects how sensitive they are to the absence of the US during extinction. Copyright © 2017 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Dunn-Kenney, Maylan
2010-01-01
Service learning is often used in teacher education as a way to challenge social bias and provide teacher candidates with skills needed to work in partnership with diverse families. Although some literature suggests that service learning could reinforce cultural bias, there is little documentation. In a study of 21 early childhood teacher…
Deep Gate Recurrent Neural Network
2016-11-22
Schmidhuber. A system for robotic heart surgery that learns to tie knots using recurrent neural networks. In IEEE International Conference on...tasks, such as Machine Translation (Bahdanau et al. (2015)) or Robot Reinforcement Learning (Bakker (2001)). The main idea behind these networks is to...and J. Peters. Reinforcement learning in robotics : A survey. The International Journal of Robotics Research, 32:1238–1274, 2013. ISSN 0278-3649. doi
Walker, Brendan M.
2013-01-01
This article represents one of five contributions focusing on the topic “Plasticity and neuroadaptive responses within the extended amygdala in response to chronic or excessive alcohol exposure” that were developed by awardees participating in the Young Investigator Award Symposium at the “Alcoholism and Stress: A Framework for Future Treatment Strategies” conference in Volterra, Italy on May 3–6, 2011 that was organized/chaired by Drs. Antonio Noronha and Fulton Crews and sponsored by the National Institute on Alcohol Abuse and Alcoholism. This review discusses the dependence-induced neuroadaptations in affective systems that provide a basis for negative reinforcement learning and presents evidence demonstrating that escalated alcohol consumption during withdrawal is a learned, plasticity-dependent process. The review concludes by identifying changes within extended amygdala dynorphin/kappa-opioid receptor systems that could serve as the foundation for the occurrence of negative reinforcement processes. While some evidence contained herein may be specific to alcohol dependence-related learning and plasticity, much of the information will be of relevance to any addictive disorder involving negative reinforcement mechanisms. Collectively, the information presented within this review provides a framework to assess the negative reinforcing effects of alcohol in a manner that distinguishes neuroadaptations produced by chronic alcohol exposure from the actual plasticity that is associated with negative reinforcement learning in dependent organisms. PMID:22459874
Reinforcement Learning Strategies for Clinical Trials in Non-small Cell Lung Cancer
Zhao, Yufan; Zeng, Donglin; Socinski, Mark A.; Kosorok, Michael R.
2010-01-01
Summary Typical regimens for advanced metastatic stage IIIB/IV non-small cell lung cancer (NSCLC) consist of multiple lines of treatment. We present an adaptive reinforcement learning approach to discover optimal individualized treatment regimens from a specially designed clinical trial (a “clinical reinforcement trial”) of an experimental treatment for patients with advanced NSCLC who have not been treated previously with systemic therapy. In addition to the complexity of the problem of selecting optimal compounds for first and second-line treatments based on prognostic factors, another primary goal is to determine the optimal time to initiate second-line therapy, either immediately or delayed after induction therapy, yielding the longest overall survival time. A reinforcement learning method called Q-learning is utilized which involves learning an optimal regimen from patient data generated from the clinical reinforcement trial. Approximating the Q-function with time-indexed parameters can be achieved by using a modification of support vector regression which can utilize censored data. Within this framework, a simulation study shows that the procedure can extract optimal regimens for two lines of treatment directly from clinical data without prior knowledge of the treatment effect mechanism. In addition, we demonstrate that the design reliably selects the best initial time for second-line therapy while taking into account the heterogeneity of NSCLC across patients. PMID:21385164
Hierarchical extreme learning machine based reinforcement learning for goal localization
NASA Astrophysics Data System (ADS)
AlDahoul, Nouar; Zaw Htike, Zaw; Akmeliawati, Rini
2017-03-01
The objective of goal localization is to find the location of goals in noisy environments. Simple actions are performed to move the agent towards the goal. The goal detector should be capable of minimizing the error between the predicted locations and the true ones. Few regions need to be processed by the agent to reduce the computational effort and increase the speed of convergence. In this paper, reinforcement learning (RL) method was utilized to find optimal series of actions to localize the goal region. The visual data, a set of images, is high dimensional unstructured data and needs to be represented efficiently to get a robust detector. Different deep Reinforcement models have already been used to localize a goal but most of them take long time to learn the model. This long learning time results from the weights fine tuning stage that is applied iteratively to find an accurate model. Hierarchical Extreme Learning Machine (H-ELM) was used as a fast deep model that doesn’t fine tune the weights. In other words, hidden weights are generated randomly and output weights are calculated analytically. H-ELM algorithm was used in this work to find good features for effective representation. This paper proposes a combination of Hierarchical Extreme learning machine and Reinforcement learning to find an optimal policy directly from visual input. This combination outperforms other methods in terms of accuracy and learning speed. The simulations and results were analysed by using MATLAB.
Depression, Activity, and Evaluation of Reinforcement
ERIC Educational Resources Information Center
Hammen, Constance L.; Glass, David R., Jr.
1975-01-01
This research attempted to find the causal relation between mood and level of reinforcement. An effort was made to learn what mood change might occur if depressed subjects increased their levels of participation in reinforcing activities. (Author/RK)
What Can Reinforcement Learning Teach Us About Non-Equilibrium Quantum Dynamics
NASA Astrophysics Data System (ADS)
Bukov, Marin; Day, Alexandre; Sels, Dries; Weinberg, Phillip; Polkovnikov, Anatoli; Mehta, Pankaj
Equilibrium thermodynamics and statistical physics are the building blocks of modern science and technology. Yet, our understanding of thermodynamic processes away from equilibrium is largely missing. In this talk, I will reveal the potential of what artificial intelligence can teach us about the complex behaviour of non-equilibrium systems. Specifically, I will discuss the problem of finding optimal drive protocols to prepare a desired target state in quantum mechanical systems by applying ideas from Reinforcement Learning [one can think of Reinforcement Learning as the study of how an agent (e.g. a robot) can learn and perfect a given policy through interactions with an environment.]. The driving protocols learnt by our agent suggest that the non-equilibrium world features possibilities easily defying intuition based on equilibrium physics.
Kinesthetic Reinforcement-Is It a Boon to Learning?
ERIC Educational Resources Information Center
Bohrer, Roxilu K.
1970-01-01
Language instruction, particularly in the elementary school, should be reinforced through the use of visual aids and through associated physical activity. Kinesthetic experiences provide an opportunity to make use of non-verbal cues to meaning, enliven classroom activities, and maximize learning for pupils. The author discusses the educational…
Reinforcing Basic Skills Through Social Studies. Grades 4-7.
ERIC Educational Resources Information Center
Lewis, Teresa Marie
Arranged into seven parts, this document provides a variety of games and activities, bulletin board ideas, overhead transparencies, student handouts, and learning station ideas to help reinforce basic social studies skills in the intermediate grades. In part 1, students learn about timelines, first constructing their own life timeline, then a…
Effects of Reinforcement on Peer Imitation in a Small Group Play Context
ERIC Educational Resources Information Center
Barton, Erin E.; Ledford, Jennifer R.
2018-01-01
Children with disabilities often have deficits in imitation skills, particularly in imitating peers. Imitation is considered a behavioral cusp--which, once learned, allows a child to access additional and previously unavailable learning opportunities. In the current study, researchers examined the efficacy of contingent reinforcement delivered…
Neurofeedback in Learning Disabled Children: Visual versus Auditory Reinforcement.
Fernández, Thalía; Bosch-Bayard, Jorge; Harmony, Thalía; Caballero, María I; Díaz-Comas, Lourdes; Galán, Lídice; Ricardo-Garcell, Josefina; Aubert, Eduardo; Otero-Ojeda, Gloria
2016-03-01
Children with learning disabilities (LD) frequently have an EEG characterized by an excess of theta and a deficit of alpha activities. NFB using an auditory stimulus as reinforcer has proven to be a useful tool to treat LD children by positively reinforcing decreases of the theta/alpha ratio. The aim of the present study was to optimize the NFB procedure by comparing the efficacy of visual (with eyes open) versus auditory (with eyes closed) reinforcers. Twenty LD children with an abnormally high theta/alpha ratio were randomly assigned to the Auditory or the Visual group, where a 500 Hz tone or a visual stimulus (a white square), respectively, was used as a positive reinforcer when the value of the theta/alpha ratio was reduced. Both groups had signs consistent with EEG maturation, but only the Auditory Group showed behavioral/cognitive improvements. In conclusion, the auditory reinforcer was more efficacious in reducing the theta/alpha ratio, and it improved the cognitive abilities more than the visual reinforcer.
Reinforcement learning agents providing advice in complex video games
NASA Astrophysics Data System (ADS)
Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa
2014-01-01
This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.
Harris, Justin A; Kwok, Dorothy W S
2018-01-01
During magazine approach conditioning, rats do not discriminate between a conditional stimulus (CS) that is consistently reinforced with food and a CS that is occasionally (partially) reinforced, as long as the CSs have the same overall reinforcement rate per second. This implies that rats are indifferent to the probability of reinforcement per trial. However, in the same rats, the per-trial reinforcement rate will affect subsequent extinction-responding extinguishes more rapidly for a CS that was consistently reinforced than for a partially reinforced CS. Here, we trained rats with consistently and partially reinforced CSs that were matched for overall reinforcement rate per second. We measured conditioned responding both during and immediately after the CSs. Differences in the per-trial probability of reinforcement did not affect the acquisition of responding during the CS but did affect subsequent extinction of that responding, and also affected the post-CS response rates during conditioning. Indeed, CSs with the same probability of reinforcement per trial evoked the same amount of post-CS responding even when they differed in overall reinforcement rate and thus evoked different amounts of responding during the CS. We conclude that reinforcement rate per second controls rats' acquisition of responding during the CS, but at the same time, rats also learn specifically about the probability of reinforcement per trial. The latter learning affects the rats' expectation of reinforcement as an outcome of the trial, which influences their ability to detect retrospectively that an opportunity for reinforcement was missed, and, in turn, drives extinction. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
ERIC Educational Resources Information Center
Zrinzo, Michelle; Greer, R. Douglas
2013-01-01
Prior research has demonstrated the establishment of reinforcers for learning and maintenance with young children as a function of social learning where a peer and an adult experimenter were present. The presence of an adult experimenter was eliminated in the present study to test if the effect produced in the prior studies would occur with only…
Structure identification in fuzzy inference using reinforcement learning
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Khedkar, Pratap
1993-01-01
In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.
Bakic, Jasmina; Pourtois, Gilles; Jepma, Marieke; Duprat, Romain; De Raedt, Rudi; Baeken, Chris
2017-01-01
Major depressive disorder (MDD) creates debilitating effects on a wide range of cognitive functions, including reinforcement learning (RL). In this study, we sought to assess whether reward processing as such, or alternatively the complex interplay between motivation and reward might potentially account for the abnormal reward-based learning in MDD. A total of 35 treatment resistant MDD patients and 44 age matched healthy controls (HCs) performed a standard probabilistic learning task. RL was titrated using behavioral, computational modeling and event-related brain potentials (ERPs) data. MDD patients showed comparable learning rate compared to HCs. However, they showed decreased lose-shift responses as well as blunted subjective evaluations of the reinforcers used during the task, relative to HCs. Moreover, MDD patients showed normal internal (at the level of error-related negativity, ERN) but abnormal external (at the level of feedback-related negativity, FRN) reward prediction error (RPE) signals during RL, selectively when additional efforts had to be made to establish learning. Collectively, these results lend support to the assumption that MDD does not impair reward processing per se during RL. Instead, it seems to alter the processing of the emotional value of (external) reinforcers during RL, when additional intrinsic motivational processes have to be engaged. © 2016 Wiley Periodicals, Inc.
ERIC Educational Resources Information Center
Punnett, Audrey F.; Steinhauer, Gene D.
1984-01-01
Four reading disabled children were given eight sessions of ocular motor training with reinforcement and eight sessions without reinforcement. Two reading disabled control Ss were treated similarly but received no ocular motor training. Results demonstrated that reinforcement can improve ocular motor skills, which in turn elevates reading…
Learning the specific quality of taste reinforcement in larval Drosophila.
Schleyer, Michael; Miura, Daisuke; Tanimura, Teiichi; Gerber, Bertram
2015-01-27
The only property of reinforcement insects are commonly thought to learn about is its value. We show that larval Drosophila not only remember the value of reinforcement (How much?), but also its quality (What?). This is demonstrated both within the appetitive domain by using sugar vs amino acid as different reward qualities, and within the aversive domain by using bitter vs high-concentration salt as different qualities of punishment. From the available literature, such nuanced memories for the quality of reinforcement are unexpected and pose a challenge to present models of how insect memory is organized. Given that animals as simple as larval Drosophila, endowed with but 10,000 neurons, operate with both reinforcement value and quality, we suggest that both are fundamental aspects of mnemonic processing-in any brain.
The evolution of continuous learning of the structure of the environment
Kolodny, Oren; Edelman, Shimon; Lotem, Arnon
2014-01-01
Continuous, ‘always on’, learning of structure from a stream of data is studied mainly in the fields of machine learning or language acquisition, but its evolutionary roots may go back to the first organisms that were internally motivated to learn and represent their environment. Here, we study under what conditions such continuous learning (CL) may be more adaptive than simple reinforcement learning and examine how it could have evolved from the same basic associative elements. We use agent-based computer simulations to compare three learning strategies: simple reinforcement learning; reinforcement learning with chaining (RL-chain) and CL that applies the same associative mechanisms used by the other strategies, but also seeks statistical regularities in the relations among all items in the environment, regardless of the initial association with food. We show that a sufficiently structured environment favours the evolution of both RL-chain and CL and that CL outperforms the other strategies when food is relatively rare and the time for learning is limited. This advantage of internally motivated CL stems from its ability to capture statistical patterns in the environment even before they are associated with food, at which point they immediately become useful for planning. PMID:24402920
The partial-reinforcement extinction effect and the contingent-sampling hypothesis.
Hochman, Guy; Erev, Ido
2013-12-01
The partial-reinforcement extinction effect (PREE) implies that learning under partial reinforcements is more robust than learning under full reinforcements. While the advantages of partial reinforcements have been well-documented in laboratory studies, field research has failed to support this prediction. In the present study, we aimed to clarify this pattern. Experiment 1 showed that partial reinforcements increase the tendency to select the promoted option during extinction; however, this effect is much smaller than the negative effect of partial reinforcements on the tendency to select the promoted option during the training phase. Experiment 2 demonstrated that the overall effect of partial reinforcements varies inversely with the attractiveness of the alternative to the promoted behavior: The overall effect is negative when the alternative is relatively attractive, and positive when the alternative is relatively unattractive. These results can be captured with a contingent-sampling model assuming that people select options that provided the best payoff in similar past experiences. The best fit was obtained under the assumption that similarity is defined by the sequence of the last four outcomes.
The effects of aging on the interaction between reinforcement learning and attention.
Radulescu, Angela; Daniel, Reka; Niv, Yael
2016-11-01
Reinforcement learning (RL) in complex environments relies on selective attention to uncover those aspects of the environment that are most predictive of reward. Whereas previous work has focused on age-related changes in RL, it is not known whether older adults learn differently from younger adults when selective attention is required. In 2 experiments, we examined how aging affects the interaction between RL and selective attention. Younger and older adults performed a learning task in which only 1 stimulus dimension was relevant to predicting reward, and within it, 1 "target" feature was the most rewarding. Participants had to discover this target feature through trial and error. In Experiment 1, stimuli varied on 1 or 3 dimensions and participants received hints that revealed the target feature, the relevant dimension, or gave no information. Group-related differences in accuracy and RTs differed systematically as a function of the number of dimensions and the type of hint available. In Experiment 2 we used trial-by-trial computational modeling of the learning process to test for age-related differences in learning strategies. Behavior of both young and older adults was explained well by a reinforcement-learning model that uses selective attention to constrain learning. However, the model suggested that older adults restricted their learning to fewer features, employing more focused attention than younger adults. Furthermore, this difference in strategy predicted age-related deficits in accuracy. We discuss these results suggesting that a narrower filter of attention may reflect an adaptation to the reduced capabilities of the reinforcement learning system. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Tiger salamanders' (Ambystoma tigrinum) response learning and usage of visual cues.
Kundey, Shannon M A; Millar, Roberto; McPherson, Justin; Gonzalez, Maya; Fitz, Aleyna; Allen, Chadbourne
2016-05-01
We explored tiger salamanders' (Ambystoma tigrinum) learning to execute a response within a maze as proximal visual cue conditions varied. In Experiment 1, salamanders learned to turn consistently in a T-maze for reinforcement before the maze was rotated. All learned the initial task and executed the trained turn during test, suggesting that they learned to demonstrate the reinforced response during training and continued to perform it during test. In a second experiment utilizing a similar procedure, two visual cues were placed consistently at the maze junction. Salamanders were reinforced for turning towards one cue. Cue placement was reversed during test. All learned the initial task, but executed the trained turn rather than turning towards the visual cue during test, evidencing response learning. In Experiment 3, we investigated whether a compound visual cue could control salamanders' behaviour when it was the only cue predictive of reinforcement in a cross-maze by varying start position and cue placement. All learned to turn in the direction indicated by the compound visual cue, indicating that visual cues can come to control their behaviour. Following training, testing revealed that salamanders attended to stimuli foreground over background features. Overall, these results suggest that salamanders learn to execute responses over learning to use visual cues but can use visual cues if required. Our success with this paradigm offers the potential in future studies to explore salamanders' cognition further, as well as to shed light on how features of the tiger salamanders' life history (e.g. hibernation and metamorphosis) impact cognition.
Intelligent multiagent coordination based on reinforcement hierarchical neuro-fuzzy models.
Mendoza, Leonardo Forero; Vellasco, Marley; Figueiredo, Karla
2014-12-01
This paper presents the research and development of two hybrid neuro-fuzzy models for the hierarchical coordination of multiple intelligent agents. The main objective of the models is to have multiple agents interact intelligently with each other in complex systems. We developed two new models of coordination for intelligent multiagent systems, which integrates the Reinforcement Learning Hierarchical Neuro-Fuzzy model with two proposed coordination mechanisms: the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with a market-driven coordination mechanism (MA-RL-HNFP-MD) and the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with graph coordination (MA-RL-HNFP-CG). In order to evaluate the proposed models and verify the contribution of the proposed coordination mechanisms, two multiagent benchmark applications were developed: the pursuit game and the robot soccer simulation. The results obtained demonstrated that the proposed coordination mechanisms greatly improve the performance of the multiagent system when compared with other strategies.
Refining Linear Fuzzy Rules by Reinforcement Learning
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Khedkar, Pratap S.; Malkani, Anil
1996-01-01
Linear fuzzy rules are increasingly being used in the development of fuzzy logic systems. Radial basis functions have also been used in the antecedents of the rules for clustering in product space which can automatically generate a set of linear fuzzy rules from an input/output data set. Manual methods are usually used in refining these rules. This paper presents a method for refining the parameters of these rules using reinforcement learning which can be applied in domains where supervised input-output data is not available and reinforcements are received only after a long sequence of actions. This is shown for a generalization of radial basis functions. The formation of fuzzy rules from data and their automatic refinement is an important step in closing the gap between the application of reinforcement learning methods in the domains where only some limited input-output data is available.
Atlas, Lauren Y; Doll, Bradley B; Li, Jian; Daw, Nathaniel D; Phelps, Elizabeth A
2016-01-01
Socially-conveyed rules and instructions strongly shape expectations and emotions. Yet most neuroscientific studies of learning consider reinforcement history alone, irrespective of knowledge acquired through other means. We examined fear conditioning and reversal in humans to test whether instructed knowledge modulates the neural mechanisms of feedback-driven learning. One group was informed about contingencies and reversals. A second group learned only from reinforcement. We combined quantitative models with functional magnetic resonance imaging and found that instructions induced dissociations in the neural systems of aversive learning. Responses in striatum and orbitofrontal cortex updated with instructions and correlated with prefrontal responses to instructions. Amygdala responses were influenced by reinforcement similarly in both groups and did not update with instructions. Results extend work on instructed reward learning and reveal novel dissociations that have not been observed with punishments or rewards. Findings support theories of specialized threat-detection and may have implications for fear maintenance in anxiety. DOI: http://dx.doi.org/10.7554/eLife.15192.001 PMID:27171199
Flow Navigation by Smart Microswimmers via Reinforcement Learning
NASA Astrophysics Data System (ADS)
Colabrese, Simona; Gustavsson, Kristian; Celani, Antonio; Biferale, Luca
2017-04-01
Smart active particles can acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. Their goal is to learn the best way to navigate by exploiting the underlying flow whenever possible. As an example, we focus our attention on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, given the constraints enforced by fluid mechanics. By means of numerical experiments, we show that swimmers indeed learn nearly optimal strategies just by experience. A reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This Letter illustrates the potential of reinforcement learning algorithms to model adaptive behavior in complex flows and paves the way towards the engineering of smart microswimmers that solve difficult navigation problems.
Learning and tuning fuzzy logic controllers through reinforcements
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Khedkar, Pratap
1992-01-01
A new method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system is presented. In particular, our Generalized Approximate Reasoning-based Intelligent Control (GARIC) architecture: (1) learns and tunes a fuzzy logic controller even when only weak reinforcements, such as a binary failure signal, is available; (2) introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; (3) introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and (4) learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward network, which can then adaptively improve performance by using gradient descent methods. We extend the AHC algorithm of Barto, Sutton, and Anderson to include the prior control knowledge of human operators. The GARIC architecture is applied to a cart-pole balancing system and has demonstrated significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.
Shellenberger, Sylvia; Seale, J Paul; Harris, Dona L; Johnson, J Aaron; Dodrill, Carrie L; Velasquez, Mary M
2009-03-01
Educational research demonstrates little evidence of long-term retention from traditional lectures in residency programs. Team-based learning (TBL), an alternative, active learning technique, incites competition and generates discussion. This report presents data evaluating the ability of TBL to reinforce and enhance concepts taught during initial training in a National Institutes of Health-funded alcohol screening and brief intervention (SBI) program conducted in eight residency programs from 2005 to 2007 under the auspices of Mercer University School of Medicine. After initial training of three hours, the authors conducted three TBL booster sessions of one and a quarter hours, spaced four months apart at each site. They assessed feasibility through the amount of preparation time for faculty and staff, residents' evaluations of their training, self-reported use of SBI, residents' performance on individual quizzes compared with group quizzes, booster session evaluations, and levels of confidence in conducting SBI. After initial training and three TBL reinforcement sessions, 42 residents (63%) reported that they performed SBI and that their levels of confidence in performing interventions in their current and future practices was moderately high. Participants preferred TBL formats over lectures. Group performance was superior to individual performance on initial assessments. When invited to select a model for conducting SBI in current and future practices, all residents opted for procedures that included clinician involvement. Faculty found TBL to be efficient but labor-intensive for training large groups. TBL was well received by residents and helped maintain a newly learned clinical skill. Future research should compare TBL to other learning methods.
The Design of Collectives of Agents to Control Non-Markovian Systems
NASA Technical Reports Server (NTRS)
Lawson, John W.; Wolpert, David H.
2004-01-01
The Collective Intelligence (COIN) framework concerns the design of collectives of reinforcement-learning agents such that their interaction causes a provided "world" utility function concerning the entire collective to be maximized. Previously, we applied that framework to scenarios involving Markovian dynamics where no re-evolution of the system from counter-factual initial conditions (an often expensive calculation) is permitted. This approach sets the individual utility function of each agent to be both aligned with the world utility, and at the same time, easy for the associated agents to optimize. Here we extend that approach to systems involving non-Markovian dynamics. In computer simulations, we compare our techniques with each other and with conventional "team games". We show whereas in team games performance often degrades badly with time, it steadily improves when our techniques are used. We also investigate situations where the system's dimensionality is effectively reduced. We show that this leads to difficulties in the agents ability to learn. The implication is that learning is a property only of high-enough dimensional systems.
The Design of Collectives of Agents to Control Non-Markovian Systems
NASA Technical Reports Server (NTRS)
Lawson, John W.; Wolpert, David H.; Clancy, Daniel (Technical Monitor)
2002-01-01
The 'Collective Intelligence' (COIN) framework concerns the design of collectives of reinforcement-learning agents such that their interaction causes a provided 'world' utility function concerning the entire collective to be maximized. Previously, we applied that framework to scenarios involving Markovian dynamics where no re-evolution of the system from counter-factual initial conditions (an often expensive calculation) is permitted. This approach sets the individual utility function of each agent to be both aligned with the world utility, and at the same time, easy for the associated agents to optimize. Here we extend that approach to systems involving non-Markovian dynamics. In computer simulations, we compare our techniques with each other and with conventional-'team games'. We show whereas in team games performance often degrades badly with time, it steadily improves when our techniques are used. We also investigate situations where the system's dimensionality is effectively reduced. We show that this leads to difficulties in the agents' ability to learn. The implication is that 'learning' is a property only of high-enough dimensional systems.
ERIC Educational Resources Information Center
Hammerer, Dorothea; Li, Shu-Chen; Muller, Viktor; Lindenberger, Ulman
2011-01-01
By recording the feedback-related negativity (FRN) in response to gains and losses, we investigated the contribution of outcome monitoring mechanisms to age-associated differences in probabilistic reinforcement learning. Specifically, we assessed the difference of the monitoring reactions to gains and losses to investigate the monitoring of…
Reinforcement Learning in Young Adults with Developmental Language Impairment
ERIC Educational Resources Information Center
Lee, Joanna C.; Tomblin, J. Bruce
2012-01-01
The aim of the study was to examine reinforcement learning (RL) in young adults with developmental language impairment (DLI) within the context of a neurocomputational model of the basal ganglia-dopamine system (Frank, Seeberger, & O'Reilly, 2004). Two groups of young adults, one with DLI and the other without, were recruited. A probabilistic…
A neural model of hierarchical reinforcement learning
Rasmussen, Daniel; Eliasmith, Chris
2017-01-01
We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain’s general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model’s behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions. PMID:28683111
Zhu, Ruoqing; Zeng, Donglin; Kosorok, Michael R.
2015-01-01
In this paper, we introduce a new type of tree-based method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001) under high-dimensional settings. The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach enables linear combination cuts at little extra computational cost. Second, we propose a variable muting procedure that progressively eliminates noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards terminal nodes, where the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method under basic assumptions and discuss rationale in general settings. PMID:26903687
Reinforcement learning state estimator.
Morimoto, Jun; Doya, Kenji
2007-03-01
In this study, we propose a novel use of reinforcement learning for estimating hidden variables and parameters of nonlinear dynamical systems. A critical issue in hidden-state estimation is that we cannot directly observe estimation errors. However, by defining errors of observable variables as a delayed penalty, we can apply a reinforcement learning frame-work to state estimation problems. Specifically, we derive a method to construct a nonlinear state estimator by finding an appropriate feedback input gain using the policy gradient method. We tested the proposed method on single pendulum dynamics and show that the joint angle variable could be successfully estimated by observing only the angular velocity, and vice versa. In addition, we show that we could acquire a state estimator for the pendulum swing-up task in which a swing-up controller is also acquired by reinforcement learning simultaneously. Furthermore, we demonstrate that it is possible to estimate the dynamics of the pendulum itself while the hidden variables are estimated in the pendulum swing-up task. Application of the proposed method to a two-linked biped model is also presented.
Reciprocity Family Counseling: A Multi-Ethnic Model.
ERIC Educational Resources Information Center
Penrose, David M.
The Reciprocity Family Counseling Method involves learning principles of behavior modification including selective reinforcement, behavioral contracting, self-correction, and over-correction. Selective reinforcement refers to the recognition and modification of parent/child responses and reinforcers. Parents and children are asked to identify…
Reinforcement learning: Solving two case studies
NASA Astrophysics Data System (ADS)
Duarte, Ana Filipa; Silva, Pedro; dos Santos, Cristina Peixoto
2012-09-01
Reinforcement Learning algorithms offer interesting features for the control of autonomous systems, such as the ability to learn from direct interaction with the environment, and the use of a simple reward signalas opposed to the input-outputs pairsused in classic supervised learning. The reward signal indicates the success of failure of the actions executed by the agent in the environment. In this work, are described RL algorithmsapplied to two case studies: the Crawler robot and the widely known inverted pendulum. We explore RL capabilities to autonomously learn a basic locomotion pattern in the Crawler, andapproach the balancing problem of biped locomotion using the inverted pendulum.
Reinforcement active learning in the vibrissae system: optimal object localization.
Gordon, Goren; Dorfman, Nimrod; Ahissar, Ehud
2013-01-01
Rats move their whiskers to acquire information about their environment. It has been observed that they palpate novel objects and objects they are required to localize in space. We analyze whisker-based object localization using two complementary paradigms, namely, active learning and intrinsic-reward reinforcement learning. Active learning algorithms select the next training samples according to the hypothesized solution in order to better discriminate between correct and incorrect labels. Intrinsic-reward reinforcement learning uses prediction errors as the reward to an actor-critic design, such that behavior converges to the one that optimizes the learning process. We show that in the context of object localization, the two paradigms result in palpation whisking as their respective optimal solution. These results suggest that rats may employ principles of active learning and/or intrinsic reward in tactile exploration and can guide future research to seek the underlying neuronal mechanisms that implement them. Furthermore, these paradigms are easily transferable to biomimetic whisker-based artificial sensors and can improve the active exploration of their environment. Copyright © 2012 Elsevier Ltd. All rights reserved.
An intelligent agent for optimal river-reservoir system management
NASA Astrophysics Data System (ADS)
Rieker, Jeffrey D.; Labadie, John W.
2012-09-01
A generalized software package is presented for developing an intelligent agent for stochastic optimization of complex river-reservoir system management and operations. Reinforcement learning is an approach to artificial intelligence for developing a decision-making agent that learns the best operational policies without the need for explicit probabilistic models of hydrologic system behavior. The agent learns these strategies experientially in a Markov decision process through observational interaction with the environment and simulation of the river-reservoir system using well-calibrated models. The graphical user interface for the reinforcement learning process controller includes numerous learning method options and dynamic displays for visualizing the adaptive behavior of the agent. As a case study, the generalized reinforcement learning software is applied to developing an intelligent agent for optimal management of water stored in the Truckee river-reservoir system of California and Nevada for the purpose of streamflow augmentation for water quality enhancement. The intelligent agent successfully learns long-term reservoir operational policies that specifically focus on mitigating water temperature extremes during persistent drought periods that jeopardize the survival of threatened and endangered fish species.
A comparison of differential reinforcement procedures with children with autism.
Boudreau, Brittany A; Vladescu, Jason C; Kodak, Tiffany M; Argott, Paul J; Kisamore, April N
2015-12-01
The current evaluation compared the effects of 2 differential reinforcement arrangements and a nondifferential reinforcement arrangement on the acquisition of tacts for 3 children with autism. Participants learned in all reinforcement-based conditions, and we discuss areas for future research in light of these findings and potential limitations. © Society for the Experimental Analysis of Behavior.
Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis
2013-01-01
Background Depression is characterised partly by blunted reactions to reward. However, tasks probing this deficiency have not distinguished insensitivity to reward from insensitivity to the prediction errors for reward that determine learning and are putatively reported by the phasic activity of dopamine neurons. We attempted to disentangle these factors with respect to anhedonia in the context of stress, Major Depressive Disorder (MDD), Bipolar Disorder (BPD) and a dopaminergic challenge. Methods Six behavioural datasets involving 392 experimental sessions were subjected to a model-based, Bayesian meta-analysis. Participants across all six studies performed a probabilistic reward task that used an asymmetric reinforcement schedule to assess reward learning. Healthy controls were tested under baseline conditions, stress or after receiving the dopamine D2 agonist pramipexole. In addition, participants with current or past MDD or BPD were evaluated. Reinforcement learning models isolated the contributions of variation in reward sensitivity and learning rate. Results MDD and anhedonia reduced reward sensitivity more than they affected the learning rate, while a low dose of the dopamine D2 agonist pramipexole showed the opposite pattern. Stress led to a pattern consistent with a mixed effect on reward sensitivity and learning rate. Conclusion Reward-related learning reflected at least two partially separable contributions. The first related to phasic prediction error signalling, and was preferentially modulated by a low dose of the dopamine agonist pramipexole. The second related directly to reward sensitivity, and was preferentially reduced in MDD and anhedonia. Stress altered both components. Collectively, these findings highlight the contribution of model-based reinforcement learning meta-analysis for dissecting anhedonic behavior. PMID:23782813
Schönberg, Tom; Daw, Nathaniel D; Joel, Daphna; O'Doherty, John P
2007-11-21
The computational framework of reinforcement learning has been used to forward our understanding of the neural mechanisms underlying reward learning and decision-making behavior. It is known that humans vary widely in their performance in decision-making tasks. Here, we used a simple four-armed bandit task in which subjects are almost evenly split into two groups on the basis of their performance: those who do learn to favor choice of the optimal action and those who do not. Using models of reinforcement learning we sought to determine the neural basis of these intrinsic differences in performance by scanning both groups with functional magnetic resonance imaging. We scanned 29 subjects while they performed the reward-based decision-making task. Our results suggest that these two groups differ markedly in the degree to which reinforcement learning signals in the striatum are engaged during task performance. While the learners showed robust prediction error signals in both the ventral and dorsal striatum during learning, the nonlearner group showed a marked absence of such signals. Moreover, the magnitude of prediction error signals in a region of dorsal striatum correlated significantly with a measure of behavioral performance across all subjects. These findings support a crucial role of prediction error signals, likely originating from dopaminergic midbrain neurons, in enabling learning of action selection preferences on the basis of obtained rewards. Thus, spontaneously observed individual differences in decision making performance demonstrate the suggested dependence of this type of learning on the functional integrity of the dopaminergic striatal system in humans.
Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.
Ezaki, Takahiro; Horita, Yutaka; Takezawa, Masanori; Masuda, Naoki
2016-07-01
Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody) conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.
Social stress reactivity alters reward and punishment learning
Frank, Michael J.; Allen, John J. B.
2011-01-01
To examine how stress affects cognitive functioning, individual differences in trait vulnerability (punishment sensitivity) and state reactivity (negative affect) to social evaluative threat were examined during concurrent reinforcement learning. Lower trait-level punishment sensitivity predicted better reward learning and poorer punishment learning; the opposite pattern was found in more punishment sensitive individuals. Increasing state-level negative affect was directly related to punishment learning accuracy in highly punishment sensitive individuals, but these measures were inversely related in less sensitive individuals. Combined electrophysiological measurement, performance accuracy and computational estimations of learning parameters suggest that trait and state vulnerability to stress alter cortico-striatal functioning during reinforcement learning, possibly mediated via medio-frontal cortical systems. PMID:20453038
Social stress reactivity alters reward and punishment learning.
Cavanagh, James F; Frank, Michael J; Allen, John J B
2011-06-01
To examine how stress affects cognitive functioning, individual differences in trait vulnerability (punishment sensitivity) and state reactivity (negative affect) to social evaluative threat were examined during concurrent reinforcement learning. Lower trait-level punishment sensitivity predicted better reward learning and poorer punishment learning; the opposite pattern was found in more punishment sensitive individuals. Increasing state-level negative affect was directly related to punishment learning accuracy in highly punishment sensitive individuals, but these measures were inversely related in less sensitive individuals. Combined electrophysiological measurement, performance accuracy and computational estimations of learning parameters suggest that trait and state vulnerability to stress alter cortico-striatal functioning during reinforcement learning, possibly mediated via medio-frontal cortical systems.
Strauss, Gregory P; Thaler, Nicholas S; Matveeva, Tatyana M; Vogel, Sally J; Sutton, Griffin P; Lee, Bern G; Allen, Daniel N
2015-08-01
There is increasing evidence that schizophrenia (SZ) and bipolar disorder (BD) share a number of cognitive, neurobiological, and genetic markers. Shared features may be most prevalent among SZ and BD with a history of psychosis. This study extended this literature by examining reinforcement learning (RL) performance in individuals with SZ (n = 29), BD with a history of psychosis (BD+; n = 24), BD without a history of psychosis (BD-; n = 23), and healthy controls (HC; n = 24). RL was assessed through a probabilistic stimulus selection task with acquisition and test phases. Computational modeling evaluated competing accounts of the data. Each participant's trial-by-trial decision-making behavior was fit to 3 computational models of RL: (a) a standard actor-critic model simulating pure basal ganglia-dependent learning, (b) a pure Q-learning model simulating action selection as a function of learned expected reward value, and (c) a hybrid model where an actor-critic is "augmented" by a Q-learning component, meant to capture the top-down influence of orbitofrontal cortex value representations on the striatum. The SZ group demonstrated greater reinforcement learning impairments at acquisition and test phases than the BD+, BD-, and HC groups. The BD+ and BD- groups displayed comparable performance at acquisition and test phases. Collapsing across diagnostic categories, greater severity of current psychosis was associated with poorer acquisition of the most rewarding stimuli as well as poor go/no-go learning at test. Model fits revealed that reinforcement learning in SZ was best characterized by a pure actor-critic model where learning is driven by prediction error signaling alone. In contrast, BD-, BD+, and HC were best fit by a hybrid model where prediction errors are influenced by top-down expected value representations that guide decision making. These findings suggest that abnormalities in the reward system are more prominent in SZ than BD; however, current psychotic symptoms may be associated with reinforcement learning deficits regardless of a Diagnostic and Statistical Manual of Mental Disorders (5th Edition; American Psychiatric Association, 2013) diagnosis. (c) 2015 APA, all rights reserved).
Motor Learning Enhances Use-Dependent Plasticity
2017-01-01
Motor behaviors are shaped not only by current sensory signals but also by the history of recent experiences. For instance, repeated movements toward a particular target bias the subsequent movements toward that target direction. This process, called use-dependent plasticity (UDP), is considered a basic and goal-independent way of forming motor memories. Most studies consider movement history as the critical component that leads to UDP (Classen et al., 1998; Verstynen and Sabes, 2011). However, the effects of learning (i.e., improved performance) on UDP during movement repetition have not been investigated. Here, we used transcranial magnetic stimulation in two experiments to assess plasticity changes occurring in the primary motor cortex after individuals repeated reinforced and nonreinforced actions. The first experiment assessed whether learning a skill task modulates UDP. We found that a group that successfully learned the skill task showed greater UDP than a group that did not accumulate learning, but made comparable repeated actions. The second experiment aimed to understand the role of reinforcement learning in UDP while controlling for reward magnitude and action kinematics. We found that providing subjects with a binary reward without visual feedback of the cursor led to increased UDP effects. Subjects in the group that received comparable reward not associated with their actions maintained the previously induced UDP. Our findings illustrate how reinforcing consistent actions strengthens use-dependent memories and provide insight into operant mechanisms that modulate plastic changes in the motor cortex. SIGNIFICANCE STATEMENT Performing consistent motor actions induces use-dependent plastic changes in the motor cortex. This plasticity reflects one of the basic forms of human motor learning. Past studies assumed that this form of learning is exclusively affected by repetition of actions. However, here we showed that success-based reinforcement signals could affect the human use-dependent plasticity (UDP) process. Our results indicate that learning augments and interacts with UDP. This effect is important to the understanding of the interplay between the different forms of motor learning and suggests that reinforcement is not only important to learning new behaviors, but can shape our subsequent behavior via its interaction with UDP. PMID:28143961
Quantum machine learning: a classical perspective
NASA Astrophysics Data System (ADS)
Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Rocchetto, Andrea; Severini, Simone; Wossnig, Leonard
2018-01-01
Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed.
Quantum machine learning: a classical perspective
Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Severini, Simone; Wossnig, Leonard
2018-01-01
Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed. PMID:29434508
Quantum machine learning: a classical perspective.
Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Rocchetto, Andrea; Severini, Simone; Wossnig, Leonard
2018-01-01
Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed.
Salvador, Alexandre; Worbe, Yulia; Delorme, Cécile; Coricelli, Giorgio; Gaillard, Raphaël; Robbins, Trevor W; Hartmann, Andreas; Palminteri, Stefano
2017-07-24
The dopamine partial agonist aripiprazole is increasingly used to treat pathologies for which other antipsychotics are indicated because it displays fewer side effects, such as sedation and depression-like symptoms, than other dopamine receptor antagonists. Previously, we showed that aripiprazole may protect motivational function by preserving reinforcement-related signals used to sustain reward-maximization. However, the effect of aripiprazole on more cognitive facets of human reinforcement learning, such as learning from the forgone outcomes of alternative courses of action (i.e., counterfactual learning), is unknown. To test the influence of aripiprazole on counterfactual learning, we administered a reinforcement learning task that involves both direct learning from obtained outcomes and indirect learning from forgone outcomes to two groups of Gilles de la Tourette (GTS) patients, one consisting of patients who were completely unmedicated and the other consisting of patients who were receiving aripiprazole monotherapy, and to healthy subjects. We found that whereas learning performance improved in the presence of counterfactual feedback in both healthy controls and unmedicated GTS patients, this was not the case in aripiprazole-medicated GTS patients. Our results suggest that whereas aripiprazole preserves direct learning of action-outcome associations, it may impair more complex inferential processes, such as counterfactual learning from forgone outcomes, in GTS patients treated with this medication.
Somatosensory Contribution to the Initial Stages of Human Motor Learning
Bernardi, Nicolò F.; Darainy, Mohammad
2015-01-01
The early stages of motor skill acquisition are often marked by uncertainty about the sensory and motor goals of the task, as is the case in learning to speak or learning the feel of a good tennis serve. Here we present an experimental model of this early learning process, in which targets are acquired by exploration and reinforcement rather than sensory error. We use this model to investigate the relative contribution of motor and sensory factors to human motor learning. Participants make active reaching movements or matched passive movements to an unseen target using a robot arm. We find that learning through passive movements paired with reinforcement is comparable with learning associated with active movement, both in terms of magnitude and durability, with improvements due to training still observable at a 1 week retest. Motor learning is also accompanied by changes in somatosensory perceptual acuity. No stable changes in motor performance are observed for participants that train, actively or passively, in the absence of reinforcement, or for participants who are given explicit information about target position in the absence of somatosensory experience. These findings indicate that the somatosensory system dominates learning in the early stages of motor skill acquisition. SIGNIFICANCE STATEMENT The research focuses on the initial stages of human motor learning, introducing a new experimental model that closely approximates the key features of motor learning outside of the laboratory. The finding indicates that it is the somatosensory system rather than the motor system that dominates learning in the early stages of motor skill acquisition. This is important given that most of our computational models of motor learning are based on the idea that learning is motoric in origin. This is also a valuable finding for rehabilitation of patients with limited mobility as it shows that reinforcement in conjunction with passive movement results in benefits to motor learning that are as great as those observed for active movement training. PMID:26490869
Damage source identification of reinforced concrete structure using acoustic emission technique.
Panjsetooni, Alireza; Bunnori, Norazura Muhamad; Vakili, Amir Hossein
2013-01-01
Acoustic emission (AE) technique is one of the nondestructive evaluation (NDE) techniques that have been considered as the prime candidate for structural health and damage monitoring in loaded structures. This technique was employed for investigation process of damage in reinforced concrete (RC) frame specimens. A number of reinforced concrete RC frames were tested under loading cycle and were simultaneously monitored using AE. The AE test data were analyzed using the AE source location analysis method. The results showed that AE technique is suitable to identify the sources location of damage in RC structures.
Damage Source Identification of Reinforced Concrete Structure Using Acoustic Emission Technique
Panjsetooni, Alireza; Bunnori, Norazura Muhamad; Vakili, Amir Hossein
2013-01-01
Acoustic emission (AE) technique is one of the nondestructive evaluation (NDE) techniques that have been considered as the prime candidate for structural health and damage monitoring in loaded structures. This technique was employed for investigation process of damage in reinforced concrete (RC) frame specimens. A number of reinforced concrete RC frames were tested under loading cycle and were simultaneously monitored using AE. The AE test data were analyzed using the AE source location analysis method. The results showed that AE technique is suitable to identify the sources location of damage in RC structures. PMID:23997681
NASA Astrophysics Data System (ADS)
Yoshida, Yuki; Karakida, Ryo; Okada, Masato; Amari, Shun-ichi
2017-04-01
Weight normalization, a newly proposed optimization method for neural networks by Salimans and Kingma (2016), decomposes the weight vector of a neural network into a radial length and a direction vector, and the decomposed parameters follow their steepest descent update. They reported that learning with the weight normalization achieves better converging speed in several tasks including image recognition and reinforcement learning than learning with the conventional parameterization. However, it remains theoretically uncovered how the weight normalization improves the converging speed. In this study, we applied a statistical mechanical technique to analyze on-line learning in single layer linear and nonlinear perceptrons with weight normalization. By deriving order parameters of the learning dynamics, we confirmed quantitatively that weight normalization realizes fast converging speed by automatically tuning the effective learning rate, regardless of the nonlinearity of the neural network. This property is realized when the initial value of the radial length is near the global minimum; therefore, our theory suggests that it is important to choose the initial value of the radial length appropriately when using weight normalization.
Hacisalihoglu, Gokhan; Stephens, Desmond; Johnson, Lewis; Edington, Maurice
2018-01-01
Active learning is a pedagogical approach that involves students engaging in collaborative learning, which enables them to take more responsibility for their learning and improve their critical thinking skills. While prior research examined student performance at majority universities, this study focuses on specifically Historically Black Colleges and Universities (HBCUs) for the first time. Here we present work that focuses on the impact of active learning interventions at Florida A&M University, where we measured the impact of active learning strategies coupled with a SCALE-UP (Student Centered Active Learning Environment with Upside-down Pedagogies) learning environment on student success in General Biology. In biology sections where active learning techniques were employed, students watched online videos and completed specific activities before class covering information previously presented in a traditional lecture format. In-class activities were then carefully planned to reinforce critical concepts and enhance critical thinking skills through active learning techniques such as the one-minute paper, think-pair-share, and the utilization of clickers. Students in the active learning and control groups covered the same topics, took the same summative examinations and completed identical homework sets. In addition, the same instructor taught all of the sections included in this study. Testing demonstrated that these interventions increased learning gains by as much as 16%, and students reported an increase in their positive perceptions of active learning and biology. Overall, our results suggest that active learning approaches coupled with the SCALE-UP environment may provide an added opportunity for student success when compared with the standard modes of instruction in General Biology.
Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning.
Sidarta, Ananda; Vahdat, Shahabeddin; Bernardi, Nicolò F; Ostry, David J
2016-11-16
As one learns to dance or play tennis, the desired somatosensory state is typically unknown. Trial and error is important as motor behavior is shaped by successful and unsuccessful movements. As an experimental model, we designed a task in which human participants make reaching movements to a hidden target and receive positive reinforcement when successful. We identified somatic and reinforcement-based sources of plasticity on the basis of changes in functional connectivity using resting-state fMRI before and after learning. The neuroimaging data revealed reinforcement-related changes in both motor and somatosensory brain areas in which a strengthening of connectivity was related to the amount of positive reinforcement during learning. Areas of prefrontal cortex were similarly altered in relation to reinforcement, with connectivity between sensorimotor areas of putamen and the reward-related ventromedial prefrontal cortex strengthened in relation to the amount of successful feedback received. In other analyses, we assessed connectivity related to changes in movement direction between trials, a type of variability that presumably reflects exploratory strategies during learning. We found that connectivity in a network linking motor and somatosensory cortices increased with trial-to-trial changes in direction. Connectivity varied as well with the change in movement direction following incorrect movements. Here the changes were observed in a somatic memory and decision making network involving ventrolateral prefrontal cortex and second somatosensory cortex. Our results point to the idea that the initial stages of motor learning are not wholly motor but rather involve plasticity in somatic and prefrontal networks related both to reward and exploration. In the initial stages of motor learning, the placement of the limbs is learned primarily through trial and error. In an experimental analog, participants make reaching movements to a hidden target and receive positive feedback when successful. We identified sources of plasticity based on changes in functional connectivity using resting-state fMRI. The main finding is that there is a strengthening of connectivity between reward-related prefrontal areas and sensorimotor areas in the basal ganglia and frontal cortex. There is also a strengthening of connectivity related to movement exploration in sensorimotor circuits involved in somatic memory and decision making. The results indicate that initial stages of motor learning depend on plasticity in somatic and prefrontal networks related to reward and exploration. Copyright © 2016 the authors 0270-6474/16/3611682-11$15.00/0.
Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning
Sidarta, Ananda; Vahdat, Shahabeddin; Bernardi, Nicolò F.
2016-01-01
As one learns to dance or play tennis, the desired somatosensory state is typically unknown. Trial and error is important as motor behavior is shaped by successful and unsuccessful movements. As an experimental model, we designed a task in which human participants make reaching movements to a hidden target and receive positive reinforcement when successful. We identified somatic and reinforcement-based sources of plasticity on the basis of changes in functional connectivity using resting-state fMRI before and after learning. The neuroimaging data revealed reinforcement-related changes in both motor and somatosensory brain areas in which a strengthening of connectivity was related to the amount of positive reinforcement during learning. Areas of prefrontal cortex were similarly altered in relation to reinforcement, with connectivity between sensorimotor areas of putamen and the reward-related ventromedial prefrontal cortex strengthened in relation to the amount of successful feedback received. In other analyses, we assessed connectivity related to changes in movement direction between trials, a type of variability that presumably reflects exploratory strategies during learning. We found that connectivity in a network linking motor and somatosensory cortices increased with trial-to-trial changes in direction. Connectivity varied as well with the change in movement direction following incorrect movements. Here the changes were observed in a somatic memory and decision making network involving ventrolateral prefrontal cortex and second somatosensory cortex. Our results point to the idea that the initial stages of motor learning are not wholly motor but rather involve plasticity in somatic and prefrontal networks related both to reward and exploration. SIGNIFICANCE STATEMENT In the initial stages of motor learning, the placement of the limbs is learned primarily through trial and error. In an experimental analog, participants make reaching movements to a hidden target and receive positive feedback when successful. We identified sources of plasticity based on changes in functional connectivity using resting-state fMRI. The main finding is that there is a strengthening of connectivity between reward-related prefrontal areas and sensorimotor areas in the basal ganglia and frontal cortex. There is also a strengthening of connectivity related to movement exploration in sensorimotor circuits involved in somatic memory and decision making. The results indicate that initial stages of motor learning depend on plasticity in somatic and prefrontal networks related to reward and exploration. PMID:27852776
Promoting response variability and stimulus generalization in martial arts training.
Harding, Jay W; Wacker, David P; Berg, Wendy K; Rick, Gary; Lee, John F
2004-01-01
The effects of reinforcement and extinction on response variability and stimulus generalization in the punching and kicking techniques of 2 martial arts students were evaluated across drill and sparring conditions. During both conditions, the students were asked to demonstrate different techniques in response to an instructor's punching attack. During baseline, the students received no feedback on their responses in either condition. During the intervention phase, the students received differential reinforcement in the form of instructor feedback for each different punching or kicking technique they performed during a session of the drill condition, but no reinforcement was provided for techniques in the sparring condition. Results showed that both students increased the number of different techniques they performed when reinforcement and extinction procedures were conducted during the drill condition, and that this increase in response variability generalized to the sparring condition. PMID:15293637
What is the optimal task difficulty for reinforcement learning of brain self-regulation?
Bauer, Robert; Vukelić, Mathias; Gharabaghi, Alireza
2016-09-01
The balance between action and reward during neurofeedback may influence reinforcement learning of brain self-regulation. Eleven healthy volunteers participated in three runs of motor imagery-based brain-machine interface feedback where a robot passively opened the hand contingent to β-band modulation. For each run, the β-desynchronization threshold to initiate the hand robot movement increased in difficulty (low, moderate, and demanding). In this context, the incentive to learn was estimated by the change of reward per action, operationalized as the change in reward duration per movement onset. Variance analysis revealed a significant interaction between threshold difficulty and the relationship between reward duration and number of movement onsets (p<0.001), indicating a negative learning incentive for low difficulty, but a positive learning incentive for moderate and demanding runs. Exploration of different thresholds in the same data set indicated that the learning incentive peaked at higher thresholds than the threshold which resulted in maximum classification accuracy. Specificity is more important than sensitivity of neurofeedback for reinforcement learning of brain self-regulation. Learning efficiency requires adequate challenge by neurofeedback interventions. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
The Computational Development of Reinforcement Learning during Adolescence
Palminteri, Stefano; Coricelli, Giorgio; Blakemore, Sarah-Jayne
2016-01-01
Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents’ behaviour was better explained by a basic reinforcement learning algorithm, adults’ behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence. PMID:27322574
Robust reinforcement learning.
Morimoto, Jun; Doya, Kenji
2005-02-01
This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both offline learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H(infinity) control, we consider a differential game in which a "disturbing" agent tries to make the worst possible disturbance while a "control" agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H(infinity) control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.
Kerr, Robert R.; Grayden, David B.; Thomas, Doreen A.; Gilson, Matthieu; Burkitt, Anthony N.
2014-01-01
A fundamental goal of neuroscience is to understand how cognitive processes, such as operant conditioning, are performed by the brain. Typical and well studied examples of operant conditioning, in which the firing rates of individual cortical neurons in monkeys are increased using rewards, provide an opportunity for insight into this. Studies of reward-modulated spike-timing-dependent plasticity (RSTDP), and of other models such as R-max, have reproduced this learning behavior, but they have assumed that no unsupervised learning is present (i.e., no learning occurs without, or independent of, rewards). We show that these models cannot elicit firing rate reinforcement while exhibiting both reward learning and ongoing, stable unsupervised learning. To fix this issue, we propose a new RSTDP model of synaptic plasticity based upon the observed effects that dopamine has on long-term potentiation and depression (LTP and LTD). We show, both analytically and through simulations, that our new model can exhibit unsupervised learning and lead to firing rate reinforcement. This requires that the strengthening of LTP by the reward signal is greater than the strengthening of LTD and that the reinforced neuron exhibits irregular firing. We show the robustness of our findings to spike-timing correlations, to the synaptic weight dependence that is assumed, and to changes in the mean reward. We also consider our model in the differential reinforcement of two nearby neurons. Our model aligns more strongly with experimental studies than previous models and makes testable predictions for future experiments. PMID:24475240
Fee, Michale S.
2012-01-01
In its simplest formulation, reinforcement learning is based on the idea that if an action taken in a particular context is followed by a favorable outcome, then, in the same context, the tendency to produce that action should be strengthened, or reinforced. While reinforcement learning forms the basis of many current theories of basal ganglia (BG) function, these models do not incorporate distinct computational roles for signals that convey context, and those that convey what action an animal takes. Recent experiments in the songbird suggest that vocal-related BG circuitry receives two functionally distinct excitatory inputs. One input is from a cortical region that carries context information about the current “time” in the motor sequence. The other is an efference copy of motor commands from a separate cortical brain region that generates vocal variability during learning. Based on these findings, I propose here a general model of vertebrate BG function that combines context information with a distinct motor efference copy signal. The signals are integrated by a learning rule in which efference copy inputs gate the potentiation of context inputs (but not efference copy inputs) onto medium spiny neurons in response to a rewarded action. The hypothesis is described in terms of a circuit that implements the learning of visually guided saccades. The model makes testable predictions about the anatomical and functional properties of hypothesized context and efference copy inputs to the striatum from both thalamic and cortical sources. PMID:22754501
Fee, Michale S
2012-01-01
In its simplest formulation, reinforcement learning is based on the idea that if an action taken in a particular context is followed by a favorable outcome, then, in the same context, the tendency to produce that action should be strengthened, or reinforced. While reinforcement learning forms the basis of many current theories of basal ganglia (BG) function, these models do not incorporate distinct computational roles for signals that convey context, and those that convey what action an animal takes. Recent experiments in the songbird suggest that vocal-related BG circuitry receives two functionally distinct excitatory inputs. One input is from a cortical region that carries context information about the current "time" in the motor sequence. The other is an efference copy of motor commands from a separate cortical brain region that generates vocal variability during learning. Based on these findings, I propose here a general model of vertebrate BG function that combines context information with a distinct motor efference copy signal. The signals are integrated by a learning rule in which efference copy inputs gate the potentiation of context inputs (but not efference copy inputs) onto medium spiny neurons in response to a rewarded action. The hypothesis is described in terms of a circuit that implements the learning of visually guided saccades. The model makes testable predictions about the anatomical and functional properties of hypothesized context and efference copy inputs to the striatum from both thalamic and cortical sources.
Dere, Ekrem; De Souza-Silva, Maria A; Topic, Bianca; Spieler, Richard E; Haas, Helmut L; Huston, Joseph P
2003-01-01
The brain's histaminergic system has been implicated in hippocampal synaptic plasticity, learning, and memory, as well as brain reward and reinforcement. Our past pharmacological and lesion studies indicated that the brain's histamine system exerts inhibitory effects on the brain's reinforcement respective reward system reciprocal to mesolimbic dopamine systems, thereby modulating learning and memory performance. Given the close functional relationship between brain reinforcement and memory processes, the total disruption of brain histamine synthesis via genetic disruption of its synthesizing enzyme, histidine decarboxylase (HDC), in the mouse might have differential effects on learning dependent on the task-inherent reinforcement contingencies. Here, we investigated the effects of an HDC gene disruption in the mouse in a nonreinforced object exploration task and a negatively reinforced water-maze task as well as on neo- and ventro-striatal dopamine systems known to be involved in brain reward and reinforcement. Histidine decarboxylase knockout (HDC-KO) mice had higher dihydrophenylacetic acid concentrations and a higher dihydrophenylacetic acid/dopamine ratio in the neostriatum. In the ventral striatum, dihydrophenylacetic acid/dopamine and 3-methoxytyramine/dopamine ratios were higher in HDC-KO mice. Furthermore, the HDC-KO mice showed improved water-maze performance during both hidden and cued platform tasks, but deficient object discrimination based on temporal relationships. Our data imply that disruption of brain histamine synthesis can have both memory promoting and suppressive effects via distinct and independent mechanisms and further indicate that these opposed effects are related to the task-inherent reinforcement contingencies.
ERIC Educational Resources Information Center
Kahnt, Thorsten; Park, Soyoung Q.; Cohen, Michael X.; Beck, Anne; Heinz, Andreas; Wrase, Jana
2009-01-01
It has been suggested that the target areas of dopaminergic midbrain neurons, the dorsal (DS) and ventral striatum (VS), are differently involved in reinforcement learning especially as actor and critic. Whereas the critic learns to predict rewards, the actor maintains action values to guide future decisions. The different midbrain connections to…
Autonomous Inter-Task Transfer in Reinforcement Learning Domains
2008-08-01
Twentieth International Joint Conference on Artificial Intelli - gence, 2007. 304 Fumihide Tanaka and Masayuki Yamamura. Multitask reinforcement learning...Functions . . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . 18 2.2.4 Instance-based...tures [Laird et al., 1986, Choi et al., 2007]. However, TL for RL tasks has only recently been gaining attention in the artificial intelligence
A look at Behaviourism and Perceptual Control Theory in Interface Design
1998-02-01
behaviours such as response variability, instinctive drift, autoshaping , etc. Perceptual Control Theory (PCT) postulates that behaviours result from the...internal variables. Behaviourism, on the other hand, can not account for variability in responses, instinctive drift, autoshaping , etc. Researchers... Autoshaping . Animals appear to learn without reinforcement. However, conditioning theory speculates that learning results only when reinforcement
BEHAVIORAL MECHANISMS UNDERLYING NICOTINE REINFORCEMENT
Rupprecht, Laura E.; Smith, Tracy T.; Schassburger, Rachel L.; Buffalari, Deanne M.; Sved, Alan F.; Donny, Eric C.
2015-01-01
Cigarette smoking is the leading cause of preventable deaths worldwide and nicotine, the primary psychoactive constituent in tobacco, drives sustained use. The behavioral actions of nicotine are complex and extend well beyond the actions of the drug as a primary reinforcer. Stimuli that are consistently paired with nicotine can, through associative learning, take on reinforcing properties as conditioned stimuli. These conditioned stimuli can then impact the rate and probability of behavior and even function as conditioning reinforcers that maintain behavior in the absence of nicotine. Nicotine can also act as a conditioned stimulus, predicting the delivery of other reinforcers, which may allow nicotine to acquire value as a conditioned reinforcer. These associative effects, establishing non-nicotine stimuli as conditioned stimuli with discriminative stimulus and conditioned reinforcing properties as well as establishing nicotine as a conditioned stimulus, are predicted by basic conditioning principles. However, nicotine can also act non-associatively. Nicotine directly enhances the reinforcing efficacy of other reinforcing stimuli in the environment, an effect that does not require a temporal or predictive relationship between nicotine and either the stimulus or the behavior. Hence, the reinforcing actions of nicotine stem both from the primary reinforcing actions of the drug (and the subsequent associative learning effects) as well as the reinforcement enhancement action of nicotine which is non-associative in nature. Gaining a better understanding of how nicotine impacts behavior will allow for maximally effective tobacco control efforts aimed at reducing the harm associated with tobacco use by reducing and/or treating its addictiveness. PMID:25638333
Markou, Athina; Salamone, John D; Bussey, Timothy J; Mar, Adam C; Brunner, Daniela; Gilmour, Gary; Balsam, Peter
2013-11-01
The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu) meeting. A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. Copyright © 2013 Elsevier Ltd. All rights reserved.
Goal-Directed and Habit-Like Modulations of Stimulus Processing during Reinforcement Learning.
Luque, David; Beesley, Tom; Morris, Richard W; Jack, Bradley N; Griffiths, Oren; Whitford, Thomas J; Le Pelley, Mike E
2017-03-15
Recent research has shown that perceptual processing of stimuli previously associated with high-value rewards is automatically prioritized even when rewards are no longer available. It has been hypothesized that such reward-related modulation of stimulus salience is conceptually similar to an "attentional habit." Recording event-related potentials in humans during a reinforcement learning task, we show strong evidence in favor of this hypothesis. Resistance to outcome devaluation (the defining feature of a habit) was shown by the stimulus-locked P1 component, reflecting activity in the extrastriate visual cortex. Analysis at longer latencies revealed a positive component (corresponding to the P3b, from 550-700 ms) sensitive to outcome devaluation. Therefore, distinct spatiotemporal patterns of brain activity were observed corresponding to habitual and goal-directed processes. These results demonstrate that reinforcement learning engages both attentional habits and goal-directed processes in parallel. Consequences for brain and computational models of reinforcement learning are discussed. SIGNIFICANCE STATEMENT The human attentional network adapts to detect stimuli that predict important rewards. A recent hypothesis suggests that the visual cortex automatically prioritizes reward-related stimuli, driven by cached representations of reward value; that is, stimulus-response habits. Alternatively, the neural system may track the current value of the predicted outcome. Our results demonstrate for the first time that visual cortex activity is increased for reward-related stimuli even when the rewarding event is temporarily devalued. In contrast, longer-latency brain activity was specifically sensitive to transient changes in reward value. Therefore, we show that both habit-like attention and goal-directed processes occur in the same learning episode at different latencies. This result has important consequences for computational models of reinforcement learning. Copyright © 2017 the authors 0270-6474/17/373009-09$15.00/0.
Markou, Athina; Salamone, John D.; Bussey, Timothy; Mar, Adam; Brunner, Daniela; Gilmour, Gary; Balsam, Peter
2013-01-01
The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu). A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. PMID:23994273
Feature Reinforcement Learning: Part I. Unstructured MDPs
NASA Astrophysics Data System (ADS)
Hutter, Marcus
2009-12-01
General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part II (Hutter, 2009c). The role of POMDPs is also considered there.
The role of within-compound associations in learning about absent cues.
Witnauer, James E; Miller, Ralph R
2011-05-01
When two cues are reinforced together (in compound), most associative models assume that animals learn an associative network that includes direct cue-outcome associations and a within-compound association. All models of associative learning subscribe to the importance of cue-outcome associations, but most models assume that within-compound associations are irrelevant to each cue's subsequent behavioral control. In the present article, we present an extension of Van Hamme and Wasserman's (Learning and Motivation 25:127-151, 1994) model of retrospective revaluation based on learning about absent cues that are retrieved through within-compound associations. The model was compared with a model lacking retrieval through within-compound associations. Simulations showed that within-compound associations are necessary for the model to explain higher-order retrospective revaluation and the observed greater retrospective revaluation after partial reinforcement than after continuous reinforcement alone. These simulations suggest that the associability of an absent stimulus is determined by the extent to which the stimulus is activated through the within-compound association.
The role of within-compound associations in learning about absent cues
Witnauer, James E.
2011-01-01
When two cues are reinforced together (in compound), most associative models assume that animals learn an associative network that includes direct cue–outcome associations and a within-compound association. All models of associative learning subscribe to the importance of cue–outcome associations, but most models assume that within-compound associations are irrelevant to each cue's subsequent behavioral control. In the present article, we present an extension of Van Hamme and Wasserman's (Learning and Motivation 25:127–151, 1994) model of retrospective revaluation based on learning about absent cues that are retrieved through within-compound associations. The model was compared with a model lacking retrieval through within-compound associations. Simulations showed that within-compound associations are necessary for the model to explain higher-order retrospective revaluation and the observed greater retrospective revaluation after partial reinforcement than after continuous reinforcement alone. These simulations suggest that the associability of an absent stimulus is determined by the extent to which the stimulus is activated through the within-compound association. PMID:21264569
Pleasurable music affects reinforcement learning according to the listener
Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira
2013-01-01
Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875
Network Supervision of Adult Experience and Learning Dependent Sensory Cortical Plasticity.
Blake, David T
2017-06-18
The brain is capable of remodeling throughout life. The sensory cortices provide a useful preparation for studying neuroplasticity both during development and thereafter. In adulthood, sensory cortices change in the cortical area activated by behaviorally relevant stimuli, by the strength of response within that activated area, and by the temporal profiles of those responses. Evidence supports forms of unsupervised, reinforcement, and fully supervised network learning rules. Studies on experience-dependent plasticity have mostly not controlled for learning, and they find support for unsupervised learning mechanisms. Changes occur with greatest ease in neurons containing α-CamKII, which are pyramidal neurons in layers II/III and layers V/VI. These changes use synaptic mechanisms including long term depression. Synaptic strengthening at NMDA-containing synapses does occur, but its weak association with activity suggests other factors also initiate changes. Studies that control learning find support of reinforcement learning rules and limited evidence of other forms of supervised learning. Behaviorally associating a stimulus with reinforcement leads to a strengthening of cortical response strength and enlarging of response area with poor selectivity. Associating a stimulus with omission of reinforcement leads to a selective weakening of responses. In some preparations in which these associations are not as clearly made, neurons with the most informative discharges are relatively stronger after training. Studies analyzing the temporal profile of responses associated with omission of reward, or of plasticity in studies with different discriminanda but statistically matched stimuli, support the existence of limited supervised network learning. © 2017 American Physiological Society. Compr Physiol 7:977-1008, 2017. Copyright © 2017 John Wiley & Sons, Inc.
Pfeifer, Gaby; Garfinkel, Sarah N; Gould van Praag, Cassandra D; Sahota, Kuljit; Betka, Sophie; Critchley, Hugo D
2017-05-01
Feedback processing is critical to trial-and-error learning. Here, we examined whether interoceptive signals concerning the state of cardiovascular arousal influence the processing of reinforcing feedback during the learning of 'emotional' face-name pairs, with subsequent effects on retrieval. Participants (N=29) engaged in a learning task of face-name pairs (fearful, neutral, happy faces). Correct and incorrect learning decisions were reinforced by auditory feedback, which was delivered either at cardiac systole (on the heartbeat, when baroreceptors signal the contraction of the heart to the brain), or at diastole (between heartbeats during baroreceptor quiescence). We discovered a cardiac influence on feedback processing that enhanced the learning of fearful faces in people with heightened interoceptive ability. Individuals with enhanced accuracy on a heartbeat counting task learned fearful face-name pairs better when feedback was given at systole than at diastole. This effect was not present for neutral and happy faces. At retrieval, we also observed related effects of personality: First, individuals scoring higher for extraversion showed poorer retrieval accuracy. These individuals additionally manifested lower resting heart rate and lower state anxiety, suggesting that attenuated levels of cardiovascular arousal in extraverts underlies poorer performance. Second, higher extraversion scores predicted higher emotional intensity ratings of fearful faces reinforced at systole. Third, individuals scoring higher for neuroticism showed higher retrieval confidence for fearful faces reinforced at diastole. Our results show that cardiac signals shape feedback processing to influence learning of fearful faces, an effect underpinned by personality differences linked to psychophysiological arousal. Copyright © 2017 Elsevier B.V. All rights reserved.
Lin, Yun; Wang, Chao; Wang, Jiaxing; Dou, Zheng
2016-10-12
Cognitive radio sensor networks are one of the kinds of application where cognitive techniques can be adopted and have many potential applications, challenges and future research trends. According to the research surveys, dynamic spectrum access is an important and necessary technology for future cognitive sensor networks. Traditional methods of dynamic spectrum access are based on spectrum holes and they have some drawbacks, such as low accessibility and high interruptibility, which negatively affect the transmission performance of the sensor networks. To address this problem, in this paper a new initialization mechanism is proposed to establish a communication link and set up a sensor network without adopting spectrum holes to convey control information. Specifically, firstly a transmission channel model for analyzing the maximum accessible capacity for three different polices in a fading environment is discussed. Secondly, a hybrid spectrum access algorithm based on a reinforcement learning model is proposed for the power allocation problem of both the transmission channel and the control channel. Finally, extensive simulations have been conducted and simulation results show that this new algorithm provides a significant improvement in terms of the tradeoff between the control channel reliability and the efficiency of the transmission channel.
Lin, Yun; Wang, Chao; Wang, Jiaxing; Dou, Zheng
2016-01-01
Cognitive radio sensor networks are one of the kinds of application where cognitive techniques can be adopted and have many potential applications, challenges and future research trends. According to the research surveys, dynamic spectrum access is an important and necessary technology for future cognitive sensor networks. Traditional methods of dynamic spectrum access are based on spectrum holes and they have some drawbacks, such as low accessibility and high interruptibility, which negatively affect the transmission performance of the sensor networks. To address this problem, in this paper a new initialization mechanism is proposed to establish a communication link and set up a sensor network without adopting spectrum holes to convey control information. Specifically, firstly a transmission channel model for analyzing the maximum accessible capacity for three different polices in a fading environment is discussed. Secondly, a hybrid spectrum access algorithm based on a reinforcement learning model is proposed for the power allocation problem of both the transmission channel and the control channel. Finally, extensive simulations have been conducted and simulation results show that this new algorithm provides a significant improvement in terms of the tradeoff between the control channel reliability and the efficiency of the transmission channel. PMID:27754316
Franklin, Nicholas T; Frank, Michael J
2015-12-25
Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments.
Pilarski, Patrick M; Dawson, Michael R; Degris, Thomas; Fahimi, Farbod; Carey, Jason P; Sutton, Richard S
2011-01-01
As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis. © 2011 IEEE
Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning.
Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane
2017-01-01
Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.
Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning
Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane
2017-01-01
Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning. PMID:29163004
Learning and tuning fuzzy logic controllers through reinforcements
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Khedkar, Pratap
1992-01-01
This paper presents a new method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system. In particular, our generalized approximate reasoning-based intelligent control (GARIC) architecture (1) learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; (2) introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; (3) introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and (4) learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward neural network, which can then adaptively improve performance by using gradient descent methods. We extend the AHC algorithm of Barto et al. (1983) to include the prior control knowledge of human operators. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.
Learning the specific quality of taste reinforcement in larval Drosophila
Schleyer, Michael; Miura, Daisuke; Tanimura, Teiichi; Gerber, Bertram
2015-01-01
The only property of reinforcement insects are commonly thought to learn about is its value. We show that larval Drosophila not only remember the value of reinforcement (How much?), but also its quality (What?). This is demonstrated both within the appetitive domain by using sugar vs amino acid as different reward qualities, and within the aversive domain by using bitter vs high-concentration salt as different qualities of punishment. From the available literature, such nuanced memories for the quality of reinforcement are unexpected and pose a challenge to present models of how insect memory is organized. Given that animals as simple as larval Drosophila, endowed with but 10,000 neurons, operate with both reinforcement value and quality, we suggest that both are fundamental aspects of mnemonic processing—in any brain. DOI: http://dx.doi.org/10.7554/eLife.04711.001 PMID:25622533
Evidence for a neural law of effect.
Athalye, Vivek R; Santos, Fernando J; Carmena, Jose M; Costa, Rui M
2018-03-02
Thorndike's law of effect states that actions that lead to reinforcements tend to be repeated more often. Accordingly, neural activity patterns leading to reinforcement are also reentered more frequently. Reinforcement relies on dopaminergic activity in the ventral tegmental area (VTA), and animals shape their behavior to receive dopaminergic stimulation. Seeking evidence for a neural law of effect, we found that mice learn to reenter more frequently motor cortical activity patterns that trigger optogenetic VTA self-stimulation. Learning was accompanied by gradual shaping of these patterns, with participating neurons progressively increasing and aligning their covariance to that of the target pattern. Motor cortex patterns that lead to phasic dopaminergic VTA activity are progressively reinforced and shaped, suggesting a mechanism by which animals select and shape actions to reliably achieve reinforcement. Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Boctor, Lisa
2013-03-01
The majority of nursing students are kinesthetic learners, preferring a hands-on, active approach to education. Research shows that active-learning strategies can increase student learning and satisfaction. This study looks at the use of one active-learning strategy, a Jeopardy-style game, 'Nursopardy', to reinforce Fundamentals of Nursing material, aiding in students' preparation for a standardized final exam. The game was created keeping students varied learning styles and the NCLEX blueprint in mind. The blueprint was used to create 5 categories, with 26 total questions. Student survey results, using a five-point Likert scale showed that they did find this learning method enjoyable and beneficial to learning. More research is recommended regarding learning outcomes, when using active-learning strategies, such as games. Copyright © 2012 Elsevier Ltd. All rights reserved.
Zsuga, Judit; Biro, Klara; Papp, Csaba; Tajti, Gabor; Gesztelyi, Rudolf
2016-02-01
Reinforcement learning (RL) is a powerful concept underlying forms of associative learning governed by the use of a scalar reward signal, with learning taking place if expectations are violated. RL may be assessed using model-based and model-free approaches. Model-based reinforcement learning involves the amygdala, the hippocampus, and the orbitofrontal cortex (OFC). The model-free system involves the pedunculopontine-tegmental nucleus (PPTgN), the ventral tegmental area (VTA) and the ventral striatum (VS). Based on the functional connectivity of VS, model-free and model based RL systems center on the VS that by integrating model-free signals (received as reward prediction error) and model-based reward related input computes value. Using the concept of reinforcement learning agent we propose that the VS serves as the value function component of the RL agent. Regarding the model utilized for model-based computations we turned to the proactive brain concept, which offers an ubiquitous function for the default network based on its great functional overlap with contextual associative areas. Hence, by means of the default network the brain continuously organizes its environment into context frames enabling the formulation of analogy-based association that are turned into predictions of what to expect. The OFC integrates reward-related information into context frames upon computing reward expectation by compiling stimulus-reward and context-reward information offered by the amygdala and hippocampus, respectively. Furthermore we suggest that the integration of model-based expectations regarding reward into the value signal is further supported by the efferent of the OFC that reach structures canonical for model-free learning (e.g., the PPTgN, VTA, and VS). (c) 2016 APA, all rights reserved).
Incorporating Dispositional Traits into the Treatment of Anorexia Nervosa
Herzog, David; Moskovich, Ashley; Merwin, Rhonda; Lin, Tammy
2014-01-01
We provide a general framework to guide the development of interventions that aim to address persistent features in eating disorders that may preclude effective treatment. Using perfectionism as an exemplar, we draw from research in cognitive neuroscience regarding attention and reinforcement learning, from learning theory and social psychology regarding vicarious learning and implications for the role modeling of significant others, and from clinical psychology on the importance of verbal narratives as barriers that may influence expectations and shape reinforcement schedules. PMID:21243482
Hybrid learning in signalling games
NASA Astrophysics Data System (ADS)
Barrett, Jeffrey A.; Cochran, Calvin T.; Huttegger, Simon; Fujiwara, Naoki
2017-09-01
Lewis-Skyrms signalling games have been studied under a variety of low-rationality learning dynamics. Reinforcement dynamics are stable but slow and prone to evolving suboptimal signalling conventions. A low-inertia trial-and-error dynamical like win-stay/lose-randomise is fast and reliable at finding perfect signalling conventions but unstable in the context of noise or agent error. Here we consider a low-rationality hybrid of reinforcement and win-stay/lose-randomise learning that exhibits the virtues of both. This hybrid dynamics is reliable, stable and exceptionally fast.
An automated technique for manufacturing thermoplastic stringers in continuous length
NASA Astrophysics Data System (ADS)
Pantelakis, Sp.; Baxevani, E.; Spelz, U.
In the present work an automated Continuous Compression Moulding Technique for the manufacture of stringers in continuous length is presented. The method combines pultrusion and hot-pressing. The technique is utilized for the production of L-shape stringers which are widely applied in aerospace constructions. The investigation was carried out on carbon reinforced PEEK (C/PEEK), as well as, for comparison, on the thermoplastic composites carbon reinforced polyethersulfon (C/PES), glass and carbon reinforced polyphenylene-sulfide (G/PPS, C/PPS) and Kevlar reinforced Polyamide 6 (K/PA 6). For the materials investigated the optimized process parameters for manufacturing the L-shape stringers were derived experimentally. To achieve this goal, the quality of the produced parts was controlled by using non-destructive testing techniques. Parts providing satisfactory quality were also tested destructively to measure their mechanical properties. The investigation results have shown the suitability of the technique to produce continuous length stringers.
Sitaraman, Divya; Kramer, Elizabeth F.; Kahsai, Lily; Ostrowski, Daniela; Zars, Troy
2017-01-01
Feedback mechanisms in operant learning are critical for animals to increase reward or reduce punishment. However, not all conditions have a behavior that can readily resolve an event. Animals must then try out different behaviors to better their situation through outcome learning. This form of learning allows for novel solutions and with positive experience can lead to unexpected behavioral routines. Learned helplessness, as a type of outcome learning, manifests in part as increases in escape latency in the face of repeated unpredicted shocks. Little is known about the mechanisms of outcome learning. When fruit fly Drosophila melanogaster are exposed to unpredicted high temperatures in a place learning paradigm, flies both increase escape latencies and have a higher memory when given control of a place/temperature contingency. Here we describe discrete serotonin neuronal circuits that mediate aversive reinforcement, escape latencies, and memory levels after place learning in the presence and absence of unexpected aversive events. The results show that two features of learned helplessness depend on the same modulatory system as aversive reinforcement. Moreover, changes in aversive reinforcement and escape latency depend on local neural circuit modulation, while memory enhancement requires larger modulation of multiple behavioral control circuits. PMID:29321732
Ellwood, Ian T.; Patel, Tosha; Wadia, Varun; Lee, Anthony T.; Liptak, Alayna T.
2017-01-01
Dopamine neurons in the ventral tegmental area (VTA) encode reward prediction errors and can drive reinforcement learning through their projections to striatum, but much less is known about their projections to prefrontal cortex (PFC). Here, we studied these projections and observed phasic VTA–PFC fiber photometry signals after the delivery of rewards. Next, we studied how optogenetic stimulation of these projections affects behavior using conditioned place preference and a task in which mice learn associations between cues and food rewards and then use those associations to make choices. Neither phasic nor tonic stimulation of dopaminergic VTA–PFC projections elicited place preference. Furthermore, substituting phasic VTA–PFC stimulation for food rewards was not sufficient to reinforce new cue–reward associations nor maintain previously learned ones. However, the same patterns of stimulation that failed to reinforce place preference or cue–reward associations were able to modify behavior in other ways. First, continuous tonic stimulation maintained previously learned cue–reward associations even after they ceased being valid. Second, delivering phasic stimulation either continuously or after choices not previously associated with reward induced mice to make choices that deviated from previously learned associations. In summary, despite the fact that dopaminergic VTA–PFC projections exhibit phasic increases in activity that are time locked to the delivery of rewards, phasic activation of these projections does not necessarily reinforce specific actions. Rather, dopaminergic VTA–PFC activity can control whether mice maintain or deviate from previously learned cue–reward associations. SIGNIFICANCE STATEMENT Dopaminergic inputs from ventral tegmental area (VTA) to striatum encode reward prediction errors and reinforce specific actions; however, it is currently unknown whether dopaminergic inputs to prefrontal cortex (PFC) play similar or distinct roles. Here, we used bulk Ca2+ imaging to show that unexpected rewards or reward-predicting cues elicit phasic increases in the activity of dopaminergic VTA–PFC fibers. However, in multiple behavioral paradigms, we failed to observe reinforcing effects after stimulation of these fibers. In these same experiments, we did find that tonic or phasic patterns of stimulation caused mice to maintain or deviate from previously learned cue–reward associations, respectively. Therefore, although they may exhibit similar patterns of activity, dopaminergic inputs to striatum and PFC can elicit divergent behavioral effects. PMID:28739583
ERIC Educational Resources Information Center
Galbreath, Joy; Feldman, David
The relationship of reading comprehension accuracy and a contingently administered token reinforcement program used with an elementary level learning disabled student in the classroom was examined. The S earned points for each correct answer made after oral reading sessions. At the conclusion of the class he could exchange his points for rewards.…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clayton, Dwight A.; Santos-Villalobos, Hector J.; Baba, Justin S.
By the end of 1996, 109 Nuclear Power Plants were operating in the United States, producing 22% of the Nation’s electricity [1]. At present, more than two thirds of these power plants are more than 40 years old. The purpose of the U.S. Department of Energy Office of Nuclear Energy’s Light Water Reactor Sustainability (LWRS) Program is to develop technologies and other solutions that can improve the reliability, sustain the safety, and extend the operating lifetimes of nuclear power plants (NPPs) beyond 60 years [2]. The most important safety structures in an NPP are constructed of concrete. The structures generallymore » do not allow for destructive evaluation and access is limited to one side of the concrete element. Therefore, there is a need for techniques and technologies that can assess the internal health of complex, reinforced concrete structures nondestructively. Previously, we documented the challenges associated with Non-Destructive Evaluation (NDE) of thick, reinforced concrete sections and prioritized conceptual designs of specimens that could be fabricated to represent NPP concrete structures [3]. Consequently, a 7 feet tall, by 7 feet wide, by 3 feet and 4-inch-thick concrete specimen was constructed with 2.257-inch-and 1-inch-diameter rebar every 6 to 12 inches. In addition, defects were embedded the specimen to assess the performance of existing and future NDE techniques. The defects were designed to give a mix of realistic and controlled defects for assessment of the necessary measures needed to overcome the challenges with more heavily reinforced concrete structures. Information on the embedded defects is documented in [4]. We also documented the superiority of Frequency Banded Decomposition (FBD) Synthetic Aperture Focusing Technique (SAFT) over conventional SAFT when probing defects under deep concrete cover. Improvements include seeing an intensity corresponding to a defect that is either not visible at all in regular, full frequency content SAFT, or an improvement in contrast over conventional SAFT reconstructed images. This report documents our efforts in four fronts: 1) Comparative study between traditional SAFT and FBD SAFT for concrete specimen with and without Alkali-Silica Reaction (ASR) damage, 2) improvement of our Model-Based Iterative Reconstruction (MBIR) for thick reinforced concrete [5], 3) development of a universal framework for sharing, reconstruction, and visualization of ultrasound NDE datasets, and 4) application of machine learning techniques for automated detection of ASR inside concrete. Our comparative study between FBD and traditional SAFT reconstruction images shows a clear difference between images of ASR and non-ASR specimens. In particular, the left first harmonic shows an increased contrast and sensitivity to ASR damage. For MBIR, we show the superiority of model-based techniques over delay and sum techniques such as SAFT. Improvements include elimination of artifacts caused by direct arrival signals, and increased contrast and Signal to Noise Ratio. For the universal framework, we document a format for data storage based on the HDF5 file format, and also propose a modular Graphic User Interface (GUI) for easy customization of data conversion, reconstruction, and visualization routines. Finally, two techniques for ASR automated detection are presented. The first technique is based on an analysis of the frequency content using Hilbert Transform Indicator (HTI) and the second technique employees Artificial Neural Network (ANN) techniques for training and classification of ultrasound data as ASR or non-ASR damaged classes. The ANN technique shows great potential with classification accuracy above 95%. These approaches are extensible to the detection of additional reinforced, thick concrete defects and damage.« less
Cocaine addiction as a homeostatic reinforcement learning disorder.
Keramati, Mehdi; Durand, Audrey; Girardeau, Paul; Gutkin, Boris; Ahmed, Serge H
2017-03-01
Drug addiction implicates both reward learning and homeostatic regulation mechanisms of the brain. This has stimulated 2 partially successful theoretical perspectives on addiction. Many important aspects of addiction, however, remain to be explained within a single, unified framework that integrates the 2 mechanisms. Building upon a recently developed homeostatic reinforcement learning theory, the authors focus on a key transition stage of addiction that is well modeled in animals, escalation of drug use, and propose a computational theory of cocaine addiction where cocaine reinforces behavior due to its rapid homeostatic corrective effect, whereas its chronic use induces slow and long-lasting changes in homeostatic setpoint. Simulations show that our new theory accounts for key behavioral and neurobiological features of addiction, most notably, escalation of cocaine use, drug-primed craving and relapse, individual differences underlying dose-response curves, and dopamine D2-receptor downregulation in addicts. The theory also generates unique predictions about cocaine self-administration behavior in rats that are confirmed by new experimental results. Viewing addiction as a homeostatic reinforcement learning disorder coherently explains many behavioral and neurobiological aspects of the transition to cocaine addiction, and suggests a new perspective toward understanding addiction. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Challenges in the Verification of Reinforcement Learning Algorithms
NASA Technical Reports Server (NTRS)
Van Wesel, Perry; Goodloe, Alwyn E.
2017-01-01
Machine learning (ML) is increasingly being applied to a wide array of domains from search engines to autonomous vehicles. These algorithms, however, are notoriously complex and hard to verify. This work looks at the assumptions underlying machine learning algorithms as well as some of the challenges in trying to verify ML algorithms. Furthermore, we focus on the specific challenges of verifying reinforcement learning algorithms. These are highlighted using a specific example. Ultimately, we do not offer a solution to the complex problem of ML verification, but point out possible approaches for verification and interesting research opportunities.
Refining fuzzy logic controllers with machine learning
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1994-01-01
In this paper, we describe the GARIC (Generalized Approximate Reasoning-Based Intelligent Control) architecture, which learns from its past performance and modifies the labels in the fuzzy rules to improve performance. It uses fuzzy reinforcement learning which is a hybrid method of fuzzy logic and reinforcement learning. This technology can simplify and automate the application of fuzzy logic control to a variety of systems. GARIC has been applied in simulation studies of the Space Shuttle rendezvous and docking experiments. It has the potential of being applied in other aerospace systems as well as in consumer products such as appliances, cameras, and cars.
Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning.
Feng, Yuntian; Zhang, Hongjun; Hao, Wenning; Chen, Gang
2017-01-01
We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q -Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.
Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning
Zhang, Hongjun; Chen, Gang
2017-01-01
We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q-Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score. PMID:28894463
Investigation of a Reinforcement-Based Toilet Training Procedure for Children with Autism.
ERIC Educational Resources Information Center
Cicero, Frank R.; Pfadt, Al
2002-01-01
This study evaluated the effectiveness of a reinforcement-based toilet training intervention with three children with autism. Procedures included positive reinforcement, graduated guidance, scheduled practice trials, and forward prompting. All three children reduced urination accidents to zero and learned to request bathroom use spontaneously…
Sex Differences in Reinforcement and Punishment on Prime-Time Television.
ERIC Educational Resources Information Center
Downs, A. Chris; Gowan, Darryl C.
1980-01-01
Television programs were analyzed for frequencies of positive reinforcement and punishment exchanged among performers varying in age and sex. Females were found to more often exhibit and receive reinforcement, whereas males more often exhibited and received punishment. These findings have implications for children's learning of positive and…
Increasing the linguistic competence of the nurse with limited English proficiency.
Guttman, Minerva S
2004-01-01
Teaching linguistic competence to nursing students educated in the United States but whose families are recent immigrants is a difficult task for nurse educators. Although students may easily learn the mechanics of a new language, the cultural differences must also be addressed. In the face of the current nursing shortage, it is critically important that strategies to improve linguistic competence be incorporated into curricular efforts. This article describes integrated skills reinforcement as one academic strategy to improve reading, speaking, listening, and writing skills. Suggestions are made for incorporating and evaluating these techniques.
NASA Technical Reports Server (NTRS)
Smith, Barry T.
1990-01-01
Damage in composite materials was studied with through-the-thickness reinforcements. As a first step it was necessary to develop new ultrasonic imaging technology to better assess internal damage of the composite. A useful ultrasonic imaging technique was successfully developed to assess the internal damage of composite panels. The ultrasonic technique accurately determines the size of the internal damage. It was found that the ultrasonic imaging technique was better able to assess the damage in a composite panel with through-the-thickness reinforcements than by destructively sectioning the specimen and visual inspection under a microscope. Five composite compression-after-impact panels were tested. The compression-after-impact strength of the panels with the through-the-thickness reinforcements was almost twice that of the comparable panel without through-the-thickness reinforcement.
Separation of Time-Based and Trial-Based Accounts of the Partial Reinforcement Extinction Effect
Bouton, Mark E.; Woods, Amanda M.; Todd, Travis P.
2013-01-01
Two appetitive conditioning experiments with rats examined time-based and trial-based accounts of the partial reinforcement extinction effect (PREE). In the PREE, the loss of responding that occurs in extinction is slower when the conditioned stimulus (CS) has been paired with a reinforcer on some of its presentations (partially reinforced) instead of every presentation (continuously reinforced). According to a time-based or “time-accumulation” view (e.g., Gallistel & Gibbon, 2000), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger amount of time has accumulated in the CS over trials. In contrast, according to a trial-based view (e.g., Capaldi, 1967), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger number of CS presentations. Experiment 1 used a procedure that equated partially- and continuously-reinforced groups on their expected times to reinforcement during conditioning. A PREE was still observed. Experiment 2 then used an extinction procedure that allowed time in the CS and the number of trials to accumulate differentially through extinction. The PREE was still evident when responding was examined as a function of expected time units to the reinforcer, but was eliminated when responding was examined as a function of expected trial units to the reinforcer. There was no evidence that the animal responded according to the ratio of time accumulated during the CS in extinction over the time in the CS expected before the reinforcer. The results thus favor a trial-based account over a time-based account of extinction and the PREE. PMID:23962669
Progressive Learning of Topic Modeling Parameters: A Visual Analytics Framework.
El-Assady, Mennatallah; Sevastjanova, Rita; Sperrle, Fabian; Keim, Daniel; Collins, Christopher
2018-01-01
Topic modeling algorithms are widely used to analyze the thematic composition of text corpora but remain difficult to interpret and adjust. Addressing these limitations, we present a modular visual analytics framework, tackling the understandability and adaptability of topic models through a user-driven reinforcement learning process which does not require a deep understanding of the underlying topic modeling algorithms. Given a document corpus, our approach initializes two algorithm configurations based on a parameter space analysis that enhances document separability. We abstract the model complexity in an interactive visual workspace for exploring the automatic matching results of two models, investigating topic summaries, analyzing parameter distributions, and reviewing documents. The main contribution of our work is an iterative decision-making technique in which users provide a document-based relevance feedback that allows the framework to converge to a user-endorsed topic distribution. We also report feedback from a two-stage study which shows that our technique results in topic model quality improvements on two independent measures.
Schifani, Christin; Sukhanov, Ilya; Dorofeikova, Mariia; Bespalov, Anton
2017-07-28
There is a need to develop cognitive tasks that address valid neuropsychological constructs implicated in disease mechanisms and can be used in animals and humans to guide novel drug discovery. Present experiments aimed to characterize a novel reinforcement learning task based on a classical operant behavioral phenomenon observed in multiple species - differences in response patterning under variable (VI) vs fixed interval (FI) schedules of reinforcement. Wistar rats were trained to press a lever for food under VI30s and later weekly test sessions were introduced with reinforcement schedule switched to FI30s. During the FI30s test session, post-reinforcement pauses (PRPs) gradually grew towards the end of the session reaching 22-43% of the initial values. Animals could be retrained under VI30s conditions, and FI30s test sessions were repeated over a period of several months without appreciable signs of a practice effect. Administration of the non-competitive N-methyl-d-aspartate (NMDA) receptor antagonist MK-801 ((5S,10R)-(+)-5-Methyl-10,11-dihydro-5H-dibenzo[a,d]cyclohepten-5,10-imine maleate) prior to FI30s sessions prevented adjustment of PRPs associated with the change from VI to FI schedule. This effect was most pronounced at the highest tested dose of MK-801 and appeared to be independent of the effects of this dose on response rates. These results provide initial evidence for the possibility to use different response patterning under VI and FI schedules with equivalent reinforcement density for studying effects of drug treatment on reinforcement learning. Copyright © 2017 Elsevier B.V. All rights reserved.
Brain Research: Implications for Learning.
ERIC Educational Resources Information Center
Soares, Louise M.; Soares, Anthony T.
Brain research has illuminated several areas of the learning process: (1) learning as association; (2) learning as reinforcement; (3) learning as perception; (4) learning as imitation; (5) learning as organization; (6) learning as individual style; and (7) learning as brain activity. The classic conditioning model developed by Pavlov advanced…
Active Learning to Understand Infectious Disease Models and Improve Policy Making
Vladislavleva, Ekaterina; Broeckhove, Jan; Beutels, Philippe; Hens, Niel
2014-01-01
Modeling plays a major role in policy making, especially for infectious disease interventions but such models can be complex and computationally intensive. A more systematic exploration is needed to gain a thorough systems understanding. We present an active learning approach based on machine learning techniques as iterative surrogate modeling and model-guided experimentation to systematically analyze both common and edge manifestations of complex model runs. Symbolic regression is used for nonlinear response surface modeling with automatic feature selection. First, we illustrate our approach using an individual-based model for influenza vaccination. After optimizing the parameter space, we observe an inverse relationship between vaccination coverage and cumulative attack rate reinforced by herd immunity. Second, we demonstrate the use of surrogate modeling techniques on input-response data from a deterministic dynamic model, which was designed to explore the cost-effectiveness of varicella-zoster virus vaccination. We use symbolic regression to handle high dimensionality and correlated inputs and to identify the most influential variables. Provided insight is used to focus research, reduce dimensionality and decrease decision uncertainty. We conclude that active learning is needed to fully understand complex systems behavior. Surrogate models can be readily explored at no computational expense, and can also be used as emulator to improve rapid policy making in various settings. PMID:24743387
Active learning to understand infectious disease models and improve policy making.
Willem, Lander; Stijven, Sean; Vladislavleva, Ekaterina; Broeckhove, Jan; Beutels, Philippe; Hens, Niel
2014-04-01
Modeling plays a major role in policy making, especially for infectious disease interventions but such models can be complex and computationally intensive. A more systematic exploration is needed to gain a thorough systems understanding. We present an active learning approach based on machine learning techniques as iterative surrogate modeling and model-guided experimentation to systematically analyze both common and edge manifestations of complex model runs. Symbolic regression is used for nonlinear response surface modeling with automatic feature selection. First, we illustrate our approach using an individual-based model for influenza vaccination. After optimizing the parameter space, we observe an inverse relationship between vaccination coverage and cumulative attack rate reinforced by herd immunity. Second, we demonstrate the use of surrogate modeling techniques on input-response data from a deterministic dynamic model, which was designed to explore the cost-effectiveness of varicella-zoster virus vaccination. We use symbolic regression to handle high dimensionality and correlated inputs and to identify the most influential variables. Provided insight is used to focus research, reduce dimensionality and decrease decision uncertainty. We conclude that active learning is needed to fully understand complex systems behavior. Surrogate models can be readily explored at no computational expense, and can also be used as emulator to improve rapid policy making in various settings.
Adaptivity in Agent-Based Routing for Data Networks
NASA Technical Reports Server (NTRS)
Wolpert, David H.; Kirshner, Sergey; Merz, Chris J.; Turner, Kagan
2000-01-01
Adaptivity, both of the individual agents and of the interaction structure among the agents, seems indispensable for scaling up multi-agent systems (MAS s) in noisy environments. One important consideration in designing adaptive agents is choosing their action spaces to be as amenable as possible to machine learning techniques, especially to reinforcement learning (RL) techniques. One important way to have the interaction structure connecting agents itself be adaptive is to have the intentions and/or actions of the agents be in the input spaces of the other agents, much as in Stackelberg games. We consider both kinds of adaptivity in the design of a MAS to control network packet routing. We demonstrate on the OPNET event-driven network simulator the perhaps surprising fact that simply changing the action space of the agents to be better suited to RL can result in very large improvements in their potential performance: at their best settings, our learning-amenable router agents achieve throughputs up to three and one half times better than that of the standard Bellman-Ford routing algorithm, even when the Bellman-Ford protocol traffic is maintained. We then demonstrate that much of that potential improvement can be realized by having the agents learn their settings when the agent interaction structure is itself adaptive.
Hassani, S. A.; Oemisch, M.; Balcarras, M.; Westendorff, S.; Ardid, S.; van der Meer, M. A.; Tiesinga, P.; Womelsdorf, T.
2017-01-01
Noradrenaline is believed to support cognitive flexibility through the alpha 2A noradrenergic receptor (a2A-NAR) acting in prefrontal cortex. Enhanced flexibility has been inferred from improved working memory with the a2A-NA agonist Guanfacine. But it has been unclear whether Guanfacine improves specific attention and learning mechanisms beyond working memory, and whether the drug effects can be formalized computationally to allow single subject predictions. We tested and confirmed these suggestions in a case study with a healthy nonhuman primate performing a feature-based reversal learning task evaluating performance using Bayesian and Reinforcement learning models. In an initial dose-testing phase we found a Guanfacine dose that increased performance accuracy, decreased distractibility and improved learning. In a second experimental phase using only that dose we examined the faster feature-based reversal learning with Guanfacine with single-subject computational modeling. Parameter estimation suggested that improved learning is not accounted for by varying a single reinforcement learning mechanism, but by changing the set of parameter values to higher learning rates and stronger suppression of non-chosen over chosen feature information. These findings provide an important starting point for developing nonhuman primate models to discern the synaptic mechanisms of attention and learning functions within the context of a computational neuropsychiatry framework. PMID:28091572
Electrophysiological correlates of observational learning in children.
Rodriguez Buritica, Julia M; Eppinger, Ben; Schuck, Nicolas W; Heekeren, Hauke R; Li, Shu-Chen
2016-09-01
Observational learning is an important mechanism for cognitive and social development. However, the neurophysiological mechanisms underlying observational learning in children are not well understood. In this study, we used a probabilistic reward-based observational learning paradigm to compare behavioral and electrophysiological markers of individual and observational reinforcement learning in 8- to 10-year-old children. Specifically, we manipulated the amount of observable information as well as children's similarity in age to the observed person (same-aged child vs. adult) to examine the effects of similarity in age on the integration of observed information in children. We show that the feedback-related negativity (FRN) during individual reinforcement learning reflects the valence of outcomes of own actions. Furthermore, we found that the feedback-related negativity during observational reinforcement learning (oFRN) showed a similar distinction between outcome valences of observed actions. This suggests that the oFRN can serve as a measure of observational learning in middle childhood. Moreover, during observational learning children profited from the additional social information and imitated the choices of their own peers more than those of adults, indicating that children have a tendency to conform more with similar others (e.g. their own peers) compared to dissimilar others (adults). Taken together, our results show that children can benefit from integrating observable information and that oFRN may serve as a measure of observational learning in children. © 2015 John Wiley & Sons Ltd.
Mechanisms and time course of vocal learning and consolidation in the adult songbird.
Warren, Timothy L; Tumer, Evren C; Charlesworth, Jonathan D; Brainard, Michael S
2011-10-01
In songbirds, the basal ganglia outflow nucleus LMAN is a cortical analog that is required for several forms of song plasticity and learning. Moreover, in adults, inactivating LMAN can reverse the initial expression of learning driven via aversive reinforcement. In the present study, we investigated how LMAN contributes to both reinforcement-driven learning and a self-driven recovery process in adult Bengalese finches. We first drove changes in the fundamental frequency of targeted song syllables and compared the effects of inactivating LMAN with the effects of interfering with N-methyl-d-aspartate (NMDA) receptor-dependent transmission from LMAN to one of its principal targets, the song premotor nucleus RA. Inactivating LMAN and blocking NMDA receptors in RA caused indistinguishable reversions in the expression of learning, indicating that LMAN contributes to learning through NMDA receptor-mediated glutamatergic transmission to RA. We next assessed how LMAN's role evolves over time by maintaining learned changes to song while periodically inactivating LMAN. The expression of learning consolidated to become LMAN independent over multiple days, indicating that this form of consolidation is not completed over one night, as previously suggested, and instead may occur gradually during singing. Subsequent cessation of reinforcement was followed by a gradual self-driven recovery of original song structure, indicating that consolidation does not correspond with the lasting retention of changes to song. Finally, for self-driven recovery, as for reinforcement-driven learning, LMAN was required for the expression of initial, but not later, changes to song. Our results indicate that NMDA receptor-dependent transmission from LMAN to RA plays an essential role in the initial expression of two distinct forms of vocal learning and that this role gradually wanes over a multiday process of consolidation. The results support an emerging view that cortical-basal ganglia circuits can direct the initial expression of learning via top-down influences on primary motor circuitry.
Mechanisms and time course of vocal learning and consolidation in the adult songbird
Tumer, Evren C.; Charlesworth, Jonathan D.; Brainard, Michael S.
2011-01-01
In songbirds, the basal ganglia outflow nucleus LMAN is a cortical analog that is required for several forms of song plasticity and learning. Moreover, in adults, inactivating LMAN can reverse the initial expression of learning driven via aversive reinforcement. In the present study, we investigated how LMAN contributes to both reinforcement-driven learning and a self-driven recovery process in adult Bengalese finches. We first drove changes in the fundamental frequency of targeted song syllables and compared the effects of inactivating LMAN with the effects of interfering with N-methyl-d-aspartate (NMDA) receptor-dependent transmission from LMAN to one of its principal targets, the song premotor nucleus RA. Inactivating LMAN and blocking NMDA receptors in RA caused indistinguishable reversions in the expression of learning, indicating that LMAN contributes to learning through NMDA receptor-mediated glutamatergic transmission to RA. We next assessed how LMAN's role evolves over time by maintaining learned changes to song while periodically inactivating LMAN. The expression of learning consolidated to become LMAN independent over multiple days, indicating that this form of consolidation is not completed over one night, as previously suggested, and instead may occur gradually during singing. Subsequent cessation of reinforcement was followed by a gradual self-driven recovery of original song structure, indicating that consolidation does not correspond with the lasting retention of changes to song. Finally, for self-driven recovery, as for reinforcement-driven learning, LMAN was required for the expression of initial, but not later, changes to song. Our results indicate that NMDA receptor-dependent transmission from LMAN to RA plays an essential role in the initial expression of two distinct forms of vocal learning and that this role gradually wanes over a multiday process of consolidation. The results support an emerging view that cortical-basal ganglia circuits can direct the initial expression of learning via top-down influences on primary motor circuitry. PMID:21734110
ERIC Educational Resources Information Center
Kral, Paul A.; And Others
Investigates the effect of delay of reinforcement upon human discrimination learning with particular emphasis on the form of the gradient within the first few seconds of delay. In previous studies subjects are usually required to make an instrumental response to a stimulus, this is followed by the delay interval, and finally, the reinforcement…
Learning Theory and the Typewriter Teacher
ERIC Educational Resources Information Center
Wakin, B. Bertha
1974-01-01
Eight basic principles of learning are described and discussed in terms of practical learning strategies for typewriting. Described are goal setting, preassessment, active participation, individual differences, reinforcement, practice, transfer of learning, and evaluation. (SC)
Pechtel, Pia; Pizzagalli, Diego A.
2013-01-01
Context Childhood sexual abuse (CSA) has been associated with psychopathology, particularly major depressive disorder (MDD), and high-risk behaviors. Despite grave epidemiological data, the mechanisms underlying these maladaptive outcomes remain poorly understood. Objective We examined whether CSA history, particularly in conjunction with past MDD, is associated with behavioral and neural dysfunction in reinforcement learning, and whether such dysfunction is linked to maladaptive behavior. Design Participants completed a clinical evaluation and a probabilistic reinforcement task while 128-channel event-related potentials were recorded. Setting Academic setting; participants recruited from the community. Participants Fifteen remitted depressed females with CSA history (CSA+rMDD), 16 remitted depressed females without CSA history (rMDD), and 18 healthy females. Main Outcome Measures Participants’ preference for choosing the most rewarded stimulus and avoiding the most punished stimulus was evaluated. The feedback-related negativity (FRN) and error-related negativity (ERN)–hypothesized to reflect activation in the anterior cingulate cortex–were used as electrophysiological indices of reinforcement learning. Results No group differences emerged in the acquisition of reinforcement contingencies. In trials requiring to rely partially or exclusively on previously rewarded information, the CSA+rMDD group showed (1) lower accuracy (relative to both controls and rMDD), (2) blunted electrophysiological differentiation between correct and incorrect responses (relative to controls), and (3) increased activation in the subgenual anterior cingulate cortex (relative to rMDD). CSA history was not associated with impairments in avoiding the most punished stimulus. Self-harm and suicidal behaviors correlated with poorer performance of previously rewarded–but not previously punished–trials. Conclusions Irrespective of past MDD, women with CSA histories showed neural and behavioral deficits in utilizing previous reinforcement to optimize decision-making in the absence of feedback (blunted “Go learning”). While the current study provides initial evidence for reward-specific deficits associated with CSA, future research is warranted to determine if disrupted positive reinforcement learning predicts high-risk behavior following CSA. PMID:23487253
Reinforcement Learning Deficits in People with Schizophrenia Persist after Extended Trials
Cicero, David C.; Martin, Elizabeth A.; Becker, Theresa M.; Kerns, John G.
2014-01-01
Previous research suggests that people with schizophrenia have difficulty learning from positive feedback and when learning needs to occur rapidly. However, they seem to have relatively intact learning from negative feedback when learning occurs gradually. Participants are typically given a limited amount of acquisition trials to learn the reward contingencies and then tested about what they learned. The current study examined whether participants with schizophrenia continue to display these deficits when given extra time to learn the contingences. Participants with schizophrenia and matched healthy controls completed the Probabilistic Selection Task, which measures positive and negative feedback learning separately. Participants with schizophrenia showed a deficit in learning from both positive and negative feedback. These reward learning deficits persisted even if people with schizophrenia are given extra time (up to 10 blocks of 60 trials) to learn the reward contingencies. These results suggest that the observed deficits cannot be attributed solely to slower learning and instead reflect a specific deficit in reinforcement learning. PMID:25172610
Reinforcement learning deficits in people with schizophrenia persist after extended trials.
Cicero, David C; Martin, Elizabeth A; Becker, Theresa M; Kerns, John G
2014-12-30
Previous research suggests that people with schizophrenia have difficulty learning from positive feedback and when learning needs to occur rapidly. However, they seem to have relatively intact learning from negative feedback when learning occurs gradually. Participants are typically given a limited amount of acquisition trials to learn the reward contingencies and then tested about what they learned. The current study examined whether participants with schizophrenia continue to display these deficits when given extra time to learn the contingences. Participants with schizophrenia and matched healthy controls completed the Probabilistic Selection Task, which measures positive and negative feedback learning separately. Participants with schizophrenia showed a deficit in learning from both positive feedback and negative feedback. These reward learning deficits persisted even if people with schizophrenia are given extra time (up to 10 blocks of 60 trials) to learn the reward contingencies. These results suggest that the observed deficits cannot be attributed solely to slower learning and instead reflect a specific deficit in reinforcement learning. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Warren, Christopher M.; Holroyd, Clay B.
2012-01-01
We applied the event-related brain potential (ERP) technique to investigate the involvement of two neuromodulatory systems in learning and decision making: The locus coeruleus–norepinephrine system (NE system) and the mesencephalic dopamine system (DA system). We have previously presented evidence that the N2, a negative deflection in the ERP elicited by task-relevant events that begins approximately 200 ms after onset of the eliciting stimulus and that is sensitive to low-probability events, is a manifestation of cortex-wide noradrenergic modulation recruited to facilitate the processing of unexpected stimuli. Further, we hold that the impact of DA reinforcement learning signals on the anterior cingulate cortex (ACC) produces a component of the ERP called the feedback-related negativity (FRN). The N2 and the FRN share a similar time range, a similar topography, and similar antecedent conditions. We varied factors related to the degree of cognitive deliberation across a series of experiments to dissociate these two ERP components. Across four experiments we varied the demand for a deliberative strategy, from passively watching feedback, to more complex/challenging decision tasks. Consistent with our predictions, the FRN was largest in the experiment involving active learning and smallest in the experiment involving passive learning whereas the N2 exhibited the opposite effect. Within each experiment, when subjects attended to color, the N2 was maximal at frontal–central sites, and when they attended to gender it was maximal over lateral-occipital areas, whereas the topology of the FRN was frontal–central in both task conditions. We conclude that both the DA system and the NE system act in concert when learning from rewards that vary in expectedness, but that the DA system is relatively more exercised when subjects are relatively more engaged by the learning task. PMID:22493568
Reinforcement learning of periodical gaits in locomotion robots
NASA Astrophysics Data System (ADS)
Svinin, Mikhail; Yamada, Kazuyaki; Ushio, S.; Ueda, Kanji
1999-08-01
Emergence of stable gaits in locomotion robots is studied in this paper. A classifier system, implementing an instance- based reinforcement learning scheme, is used for sensory- motor control of an eight-legged mobile robot. Important feature of the classifier system is its ability to work with the continuous sensor space. The robot does not have a prior knowledge of the environment, its own internal model, and the goal coordinates. It is only assumed that the robot can acquire stable gaits by learning how to reach a light source. During the learning process the control system, is self-organized by reinforcement signals. Reaching the light source defines a global reward. Forward motion gets a local reward, while stepping back and falling down get a local punishment. Feasibility of the proposed self-organized system is tested under simulation and experiment. The control actions are specified at the leg level. It is shown that, as learning progresses, the number of the action rules in the classifier systems is stabilized to a certain level, corresponding to the acquired gait patterns.
Biases in probabilistic category learning in relation to social anxiety
Abraham, Anna; Hermann, Christiane
2015-01-01
Instrumental learning paradigms are rarely employed to investigate the mechanisms underlying acquired fear responses in social anxiety. Here, we adapted a probabilistic category learning paradigm to assess information processing biases as a function of the degree of social anxiety traits in a sample of healthy individuals without a diagnosis of social phobia. Participants were presented with three pairs of neutral faces with differing probabilistic accuracy contingencies (A/B: 80/20, C/D: 70/30, E/F: 60/40). Upon making their choice, negative and positive feedback was conveyed using angry and happy faces, respectively. The highly socially anxious group showed a strong tendency to be more accurate at learning the probability contingency associated with the most ambiguous stimulus pair (E/F: 60/40). Moreover, when pairing the most positively reinforced stimulus or the most negatively reinforced stimulus with all the other stimuli in a test phase, the highly socially anxious group avoided the most negatively reinforced stimulus significantly more than the control group. The results are discussed with reference to avoidance learning and hypersensitivity to negative socially evaluative information associated with social anxiety. PMID:26347685
Surprise beyond prediction error
Chumbley, Justin R; Burke, Christopher J; Stephan, Klaas E; Friston, Karl J; Tobler, Philippe N; Fehr, Ernst
2014-01-01
Surprise drives learning. Various neural “prediction error” signals are believed to underpin surprise-based reinforcement learning. Here, we report a surprise signal that reflects reinforcement learning but is neither un/signed reward prediction error (RPE) nor un/signed state prediction error (SPE). To exclude these alternatives, we measured surprise responses in the absence of RPE and accounted for a host of potential SPE confounds. This new surprise signal was evident in ventral striatum, primary sensory cortex, frontal poles, and amygdala. We interpret these findings via a normative model of surprise. PMID:24700400
Lee, Bang Yeon; Kang, Su-Tae; Yun, Hae-Bum; Kim, Yun Yong
2016-01-12
The distribution of fiber orientation is an important factor in determining the mechanical properties of fiber-reinforced concrete. This study proposes a new image analysis technique for improving the evaluation accuracy of fiber orientation distribution in the sectional image of fiber-reinforced concrete. A series of tests on the accuracy of fiber detection and the estimation performance of fiber orientation was performed on artificial fiber images to assess the validity of the proposed technique. The validation test results showed that the proposed technique estimates the distribution of fiber orientation more accurately than the direct measurement of fiber orientation by image analysis.
Lee, Bang Yeon; Kang, Su-Tae; Yun, Hae-Bum; Kim, Yun Yong
2016-01-01
The distribution of fiber orientation is an important factor in determining the mechanical properties of fiber-reinforced concrete. This study proposes a new image analysis technique for improving the evaluation accuracy of fiber orientation distribution in the sectional image of fiber-reinforced concrete. A series of tests on the accuracy of fiber detection and the estimation performance of fiber orientation was performed on artificial fiber images to assess the validity of the proposed technique. The validation test results showed that the proposed technique estimates the distribution of fiber orientation more accurately than the direct measurement of fiber orientation by image analysis. PMID:28787839
Tulip, Jennifer; Zimmermann, Jonas B; Farningham, David; Jackson, Andrew
2017-06-15
Behavioural training through positive reinforcement techniques is a well-recognised refinement to laboratory animal welfare. Behavioural neuroscience research requires subjects to be trained to perform repetitions of specific behaviours for food/fluid reward. Some animals fail to perform at a sufficient level, limiting the amount of data that can be collected and increasing the number of animals required for each study. We have implemented automated positive reinforcement training systems (comprising a button press task with variable levels of difficulty using LED cues and a fluid reward) at the breeding facility and research facility, to compare performance across these different settings, to pre-screen animals for selection and refine training protocols. Animals learned 1- and 4-choice button tasks within weeks of home enclosure training, with some inter-individual differences. High performance levels (∼200-300 trials per 60min session at ∼80% correct) were obtained without food or fluid restriction. Moreover, training quickly transferred to a laboratory version of the task. Animals that acquired the task at the breeding facility subsequently performed better both in early home enclosure sessions upon arrival at the research facility, and also in laboratory sessions. Automated systems at the breeding facility may be used to pre-screen animals for suitability for behavioural neuroscience research. In combination with conventional training, both the breeding and research facility systems facilitate acquisition and transference of learning. Automated systems have the potential to refine training protocols and minimise requirements for food/fluid control. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Optimal and Autonomous Control Using Reinforcement Learning: A Survey.
Kiumarsi, Bahare; Vamvoudakis, Kyriakos G; Modares, Hamidreza; Lewis, Frank L
2018-06-01
This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.
Towards autonomous neuroprosthetic control using Hebbian reinforcement learning.
Mahmoudi, Babak; Pohlmeyer, Eric A; Prins, Noeline W; Geng, Shijia; Sanchez, Justin C
2013-12-01
Our goal was to design an adaptive neuroprosthetic controller that could learn the mapping from neural states to prosthetic actions and automatically adjust adaptation using only a binary evaluative feedback as a measure of desirability/undesirability of performance. Hebbian reinforcement learning (HRL) in a connectionist network was used for the design of the adaptive controller. The method combines the efficiency of supervised learning with the generality of reinforcement learning. The convergence properties of this approach were studied using both closed-loop control simulations and open-loop simulations that used primate neural data from robot-assisted reaching tasks. The HRL controller was able to perform classification and regression tasks using its episodic and sequential learning modes, respectively. In our experiments, the HRL controller quickly achieved convergence to an effective control policy, followed by robust performance. The controller also automatically stopped adapting the parameters after converging to a satisfactory control policy. Additionally, when the input neural vector was reorganized, the controller resumed adaptation to maintain performance. By estimating an evaluative feedback directly from the user, the HRL control algorithm may provide an efficient method for autonomous adaptation of neuroprosthetic systems. This method may enable the user to teach the controller the desired behavior using only a simple feedback signal.
Suppression of Striatal Prediction Errors by the Prefrontal Cortex in Placebo Hypoalgesia.
Schenk, Lieven A; Sprenger, Christian; Onat, Selim; Colloca, Luana; Büchel, Christian
2017-10-04
Classical learning theories predict extinction after the discontinuation of reinforcement through prediction errors. However, placebo hypoalgesia, although mediated by associative learning, has been shown to be resistant to extinction. We tested the hypothesis that this is mediated by the suppression of prediction error processing through the prefrontal cortex (PFC). We compared pain modulation through treatment cues (placebo hypoalgesia, treatment context) with pain modulation through stimulus intensity cues (stimulus context) during functional magnetic resonance imaging in 48 male and female healthy volunteers. During acquisition, our data show that expectations are correctly learned and that this is associated with prediction error signals in the ventral striatum (VS) in both contexts. However, in the nonreinforced test phase, pain modulation and expectations of pain relief persisted to a larger degree in the treatment context, indicating that the expectations were not correctly updated in the treatment context. Consistently, we observed significantly stronger neural prediction error signals in the VS in the stimulus context compared with the treatment context. A connectivity analysis revealed negative coupling between the anterior PFC and the VS in the treatment context, suggesting that the PFC can suppress the expression of prediction errors in the VS. Consistent with this, a participant's conceptual views and beliefs about treatments influenced the pain modulation only in the treatment context. Our results indicate that in placebo hypoalgesia contextual treatment information engages prefrontal conceptual processes, which can suppress prediction error processing in the VS and lead to reduced updating of treatment expectancies, resulting in less extinction of placebo hypoalgesia. SIGNIFICANCE STATEMENT In aversive and appetitive reinforcement learning, learned effects show extinction when reinforcement is discontinued. This is thought to be mediated by prediction errors (i.e., the difference between expectations and outcome). Although reinforcement learning has been central in explaining placebo hypoalgesia, placebo hypoalgesic effects show little extinction and persist after the discontinuation of reinforcement. Our results support the idea that conceptual treatment beliefs bias the neural processing of expectations in a treatment context compared with a more stimulus-driven processing of expectations with stimulus intensity cues. We provide evidence that this is associated with the suppression of prediction error processing in the ventral striatum by the prefrontal cortex. This provides a neural basis for persisting effects in reinforcement learning and placebo hypoalgesia. Copyright © 2017 the authors 0270-6474/17/379715-09$15.00/0.
Working Memory Contributions to Reinforcement Learning Impairments in Schizophrenia
Brown, Jaime K.; Gold, James M.; Waltz, James A.; Frank, Michael J.
2014-01-01
Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine-dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia. PMID:25297101
Learning in Mental Retardation: A Comprehensive Bibliography.
ERIC Educational Resources Information Center
Gardner, James M.; And Others
The bibliography on learning in mentally handicapped persons is divided into the following topic categories: applied behavior change, classical conditioning, discrimination, generalization, motor learning, reinforcement, verbal learning, and miscellaneous. An author index is included. (KW)
Asynchronous Gossip for Averaging and Spectral Ranking
NASA Astrophysics Data System (ADS)
Borkar, Vivek S.; Makhijani, Rahul; Sundaresan, Rajesh
2014-08-01
We consider two variants of the classical gossip algorithm. The first variant is a version of asynchronous stochastic approximation. We highlight a fundamental difficulty associated with the classical asynchronous gossip scheme, viz., that it may not converge to a desired average, and suggest an alternative scheme based on reinforcement learning that has guaranteed convergence to the desired average. We then discuss a potential application to a wireless network setting with simultaneous link activation constraints. The second variant is a gossip algorithm for distributed computation of the Perron-Frobenius eigenvector of a nonnegative matrix. While the first variant draws upon a reinforcement learning algorithm for an average cost controlled Markov decision problem, the second variant draws upon a reinforcement learning algorithm for risk-sensitive control. We then discuss potential applications of the second variant to ranking schemes, reputation networks, and principal component analysis.
Reinforcement learning in professional basketball players
Neiman, Tal; Loewenstein, Yonatan
2011-01-01
Reinforcement learning in complex natural environments is a challenging task because the agent should generalize from the outcomes of actions taken in one state of the world to future actions in different states of the world. The extent to which human experts find the proper level of generalization is unclear. Here we show, using the sequences of field goal attempts made by professional basketball players, that the outcome of even a single field goal attempt has a considerable effect on the rate of subsequent 3 point shot attempts, in line with standard models of reinforcement learning. However, this change in behaviour is associated with negative correlations between the outcomes of successive field goal attempts. These results indicate that despite years of experience and high motivation, professional players overgeneralize from the outcomes of their most recent actions, which leads to decreased performance. PMID:22146388
Autonomous Performance Monitoring System: Monitoring and Self-Tuning (MAST)
NASA Technical Reports Server (NTRS)
Peterson, Chariya; Ziyad, Nigel A.
2000-01-01
Maintaining the long-term performance of software onboard a spacecraft can be a major factor in the cost of operations. In particular, the task of controlling and maintaining a future mission of distributed spacecraft will undoubtedly pose a great challenge, since the complexity of multiple spacecraft flying in formation grows rapidly as the number of spacecraft in the formation increases. Eventually, new approaches will be required in developing viable control systems that can handle the complexity of the data and that are flexible, reliable and efficient. In this paper we propose a methodology that aims to maintain the accuracy of flight software, while reducing the computational complexity of software tuning tasks. The proposed Monitoring and Self-Tuning (MAST) method consists of two parts: a flight software monitoring algorithm and a tuning algorithm. The dependency on the software being monitored is mostly contained in the monitoring process, while the tuning process is a generic algorithm independent of the detailed knowledge on the software. This architecture will enable MAST to be applicable to different onboard software controlling various dynamics of the spacecraft, such as attitude self-calibration, and formation control. An advantage of MAST over conventional techniques such as filter or batch least square is that the tuning algorithm uses machine learning approach to handle uncertainty in the problem domain, resulting in reducing over all computational complexity. The underlying concept of this technique is a reinforcement learning scheme based on cumulative probability generated by the historical performance of the system. The success of MAST will depend heavily on the reinforcement scheme used in the tuning algorithm, which guarantees the tuning solutions exist.
Universal effect of dynamical reinforcement learning mechanism in spatial evolutionary games
NASA Astrophysics Data System (ADS)
Zhang, Hai-Feng; Wu, Zhi-Xi; Wang, Bing-Hong
2012-06-01
One of the prototypical mechanisms in understanding the ubiquitous cooperation in social dilemma situations is the win-stay, lose-shift rule. In this work, a generalized win-stay, lose-shift learning model—a reinforcement learning model with dynamic aspiration level—is proposed to describe how humans adapt their social behaviors based on their social experiences. In the model, the players incorporate the information of the outcomes in previous rounds with time-dependent aspiration payoffs to regulate the probability of choosing cooperation. By investigating such a reinforcement learning rule in the spatial prisoner's dilemma game and public goods game, a most noteworthy viewpoint is that moderate greediness (i.e. moderate aspiration level) favors best the development and organization of collective cooperation. The generality of this observation is tested against different regulation strengths and different types of network of interaction as well. We also make comparisons with two recently proposed models to highlight the importance of the mechanism of adaptive aspiration level in supporting cooperation in structured populations.
Decker, Johannes H.; Otto, A. Ross; Daw, Nathaniel D.; Hartley, Catherine A.
2016-01-01
Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative strategies has not been well characterized. In this study, children, adolescents, and adults, performed a sequential reinforcement-learning task that enables estimation of model-based and model-free contributions to choice. Whereas a model-free strategy was evident in choice behavior across all age groups, evidence of a model-based strategy only emerged during adolescence and continued to increase into adulthood. These results suggest that recruitment of model-based valuation systems represents a critical cognitive component underlying the gradual maturation of goal-directed behavior. PMID:27084852
Reinforcement learning or active inference?
Friston, Karl J; Daunizeau, Jean; Kiebel, Stefan J
2009-07-29
This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.
Franklin, Nicholas T; Frank, Michael J
2015-01-01
Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments. DOI: http://dx.doi.org/10.7554/eLife.12029.001 PMID:26705698
Distributed Economic Dispatch in Microgrids Based on Cooperative Reinforcement Learning.
Liu, Weirong; Zhuang, Peng; Liang, Hao; Peng, Jun; Huang, Zhiwu; Weirong Liu; Peng Zhuang; Hao Liang; Jun Peng; Zhiwu Huang; Liu, Weirong; Liang, Hao; Peng, Jun; Zhuang, Peng; Huang, Zhiwu
2018-06-01
Microgrids incorporated with distributed generation (DG) units and energy storage (ES) devices are expected to play more and more important roles in the future power systems. Yet, achieving efficient distributed economic dispatch in microgrids is a challenging issue due to the randomness and nonlinear characteristics of DG units and loads. This paper proposes a cooperative reinforcement learning algorithm for distributed economic dispatch in microgrids. Utilizing the learning algorithm can avoid the difficulty of stochastic modeling and high computational complexity. In the cooperative reinforcement learning algorithm, the function approximation is leveraged to deal with the large and continuous state spaces. And a diffusion strategy is incorporated to coordinate the actions of DG units and ES devices. Based on the proposed algorithm, each node in microgrids only needs to communicate with its local neighbors, without relying on any centralized controllers. Algorithm convergence is analyzed, and simulations based on real-world meteorological and load data are conducted to validate the performance of the proposed algorithm.
Fourier-Mellin moment-based intertwining map for image encryption
NASA Astrophysics Data System (ADS)
Kaur, Manjit; Kumar, Vijay
2018-03-01
In this paper, a robust image encryption technique that utilizes Fourier-Mellin moments and intertwining logistic map is proposed. Fourier-Mellin moment-based intertwining logistic map has been designed to overcome the issue of low sensitivity of an input image. Multi-objective Non-Dominated Sorting Genetic Algorithm (NSGA-II) based on Reinforcement Learning (MNSGA-RL) has been used to optimize the required parameters of intertwining logistic map. Fourier-Mellin moments are used to make the secret keys more secure. Thereafter, permutation and diffusion operations are carried out on input image using secret keys. The performance of proposed image encryption technique has been evaluated on five well-known benchmark images and also compared with seven well-known existing encryption techniques. The experimental results reveal that the proposed technique outperforms others in terms of entropy, correlation analysis, a unified average changing intensity and the number of changing pixel rate. The simulation results reveal that the proposed technique provides high level of security and robustness against various types of attacks.
Effects of Intrinsic Motivation on Feedback Processing During Learning
DePasque, Samantha; Tricomi, Elizabeth
2015-01-01
Learning commonly requires feedback about the consequences of one’s actions, which can drive learners to modify their behavior. Motivation may determine how sensitive an individual might be to such feedback, particularly in educational contexts where some students value academic achievement more than others. Thus, motivation for a task might influence the value placed on performance feedback and how effectively it is used to improve learning. To investigate the interplay between intrinsic motivation and feedback processing, we used functional magnetic resonance imaging (fMRI) during feedback-based learning before and after a novel manipulation based on motivational interviewing, a technique for enhancing treatment motivation in mental health settings. Because of its role in the reinforcement learning system, the striatum is situated to play a significant role in the modulation of learning based on motivation. Consistent with this idea, motivation levels during the task were associated with sensitivity to positive versus negative feedback in the striatum. Additionally, heightened motivation following a brief motivational interview was associated with increases in feedback sensitivity in the left medial temporal lobe. Our results suggest that motivation modulates neural responses to performance-related feedback, and furthermore that changes in motivation facilitates processing in areas that support learning and memory. PMID:26112370
Shape and Reinforcement Optimization of Underground Tunnels
NASA Astrophysics Data System (ADS)
Ghabraie, Kazem; Xie, Yi Min; Huang, Xiaodong; Ren, Gang
Design of support system and selecting an optimum shape for the opening are two important steps in designing excavations in rock masses. Currently selecting the shape and support design are mainly based on designer's judgment and experience. Both of these problems can be viewed as material distribution problems where one needs to find the optimum distribution of a material in a domain. Topology optimization techniques have proved to be useful in solving these kinds of problems in structural design. Recently the application of topology optimization techniques in reinforcement design around underground excavations has been studied by some researchers. In this paper a three-phase material model will be introduced changing between normal rock, reinforced rock, and void. Using such a material model both problems of shape and reinforcement design can be solved together. A well-known topology optimization technique used in structural design is bi-directional evolutionary structural optimization (BESO). In this paper the BESO technique has been extended to simultaneously optimize the shape of the opening and the distribution of reinforcements. Validity and capability of the proposed approach have been investigated through some examples.
Guitars, Keyboards, Strobes, and Motors -- From Vibrational Motion to Active Research
NASA Astrophysics Data System (ADS)
Tagg, Randall; Carlson, John; Asadi-Zeydabadi, Masoud; Busley, Brad; Law-Balding, Katie; Juengel, Mattea
2013-01-01
Physics First is offered to ninth graders at high schools in Aurora, CO. A unique new asset of this school system is an embedded research lab called the "Innovation Hyperlab." The goal of the lab is to connect secondary school teaching to ongoing university scientific research, supporting the school district's aim to create opportunities to integrate P-20 (preschool to graduate school) learning. This paper is an example of how we create research connections in the context of introductory physics lessons on vibrations and waves. Key to the process is the use of several different types of technical resources, hence the name "hyperlab." Students learn many practical experimental techniques, reinforcing their knowledge of fundamentals and preparing them to work effectively on open-ended research or engineering projects.
Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances.
Song, Ruizhuo; Lewis, Frank L; Wei, Qinglai; Zhang, Huaguang
2016-05-01
An optimal control method is developed for unknown continuous-time systems with unknown disturbances in this paper. The integral reinforcement learning (IRL) algorithm is presented to obtain the iterative control. Off-policy learning is used to allow the dynamics to be completely unknown. Neural networks are used to construct critic and action networks. It is shown that if there are unknown disturbances, off-policy IRL may not converge or may be biased. For reducing the influence of unknown disturbances, a disturbances compensation controller is added. It is proven that the weight errors are uniformly ultimately bounded based on Lyapunov techniques. Convergence of the Hamiltonian function is also proven. The simulation study demonstrates the effectiveness of the proposed optimal control method for unknown systems with disturbances.
Research progress of microbial corrosion of reinforced concrete structure
NASA Astrophysics Data System (ADS)
Li, Shengli; Li, Dawang; Jiang, Nan; Wang, Dongwei
2011-04-01
Microbial corrosion of reinforce concrete structure is a new branch of learning. This branch deals with civil engineering , environment engineering, biology, chemistry, materials science and so on and is a interdisciplinary area. Research progress of the causes, research methods and contents of microbial corrosion of reinforced concrete structure is described. The research in the field is just beginning and concerted effort is needed to go further into the mechanism of reinforce concrete structure and assess the security and natural life of reinforce concrete structure under the special condition and put forward the protective methods.
Reinforcement learning with Marr.
Niv, Yael; Langdon, Angela
2016-10-01
To many, the poster child for David Marr's famous three levels of scientific inquiry is reinforcement learning-a computational theory of reward optimization, which readily prescribes algorithmic solutions that evidence striking resemblance to signals found in the brain, suggesting a straightforward neural implementation. Here we review questions that remain open at each level of analysis, concluding that the path forward to their resolution calls for inspiration across levels, rather than a focus on mutual constraints.
Time-Extended Policies in Mult-Agent Reinforcement Learning
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Agogino, Adrian K.
2004-01-01
Reinforcement learning methods perform well in many domains where a single agent needs to take a sequence of actions to perform a task. These methods use sequences of single-time-step rewards to create a policy that tries to maximize a time-extended utility, which is a (possibly discounted) sum of these rewards. In this paper we build on our previous work showing how these methods can be extended to a multi-agent environment where each agent creates its own policy that works towards maximizing a time-extended global utility over all agents actions. We show improved methods for creating time-extended utilities for the agents that are both "aligned" with the global utility and "learnable." We then show how to crate single-time-step rewards while avoiding the pi fall of having rewards aligned with the global reward leading to utilities not aligned with the global utility. Finally, we apply these reward functions to the multi-agent Gridworld problem. We explicitly quantify a utility's learnability and alignment, and show that reinforcement learning agents using the prescribed reward functions successfully tradeoff learnability and alignment. As a result they outperform both global (e.g., team games ) and local (e.g., "perfectly learnable" ) reinforcement learning solutions by as much as an order of magnitude.
Modeling Avoidance in Mood and Anxiety Disorders Using Reinforcement Learning.
Mkrtchian, Anahit; Aylward, Jessica; Dayan, Peter; Roiser, Jonathan P; Robinson, Oliver J
2017-10-01
Serious and debilitating symptoms of anxiety are the most common mental health problem worldwide, accounting for around 5% of all adult years lived with disability in the developed world. Avoidance behavior-avoiding social situations for fear of embarrassment, for instance-is a core feature of such anxiety. However, as for many other psychiatric symptoms the biological mechanisms underlying avoidance remain unclear. Reinforcement learning models provide formal and testable characterizations of the mechanisms of decision making; here, we examine avoidance in these terms. A total of 101 healthy participants and individuals with mood and anxiety disorders completed an approach-avoidance go/no-go task under stress induced by threat of unpredictable shock. We show an increased reliance in the mood and anxiety group on a parameter of our reinforcement learning model that characterizes a prepotent (pavlovian) bias to withhold responding in the face of negative outcomes. This was particularly the case when the mood and anxiety group was under stress. This formal description of avoidance within the reinforcement learning framework provides a new means of linking clinical symptoms with biophysically plausible models of neural circuitry and, as such, takes us closer to a mechanistic understanding of mood and anxiety disorders. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Arnold, Megan A; Newland, M Christopher
2018-06-16
Behavioral inflexibility is often assessed using reversal learning tasks, which require a relatively low degree of response variability. No studies have assessed sensitivity to reinforcement contingencies that specifically select highly variable response patterns in mice, let alone in models of neurodevelopmental disorders involving limited response variation. Operant variability and incremental repeated acquisition (IRA) were used to assess unique aspects of behavioral variability of two mouse strains: BALB/c, a model of some deficits in ASD, and C57Bl/6. On the operant variability task, BALB/c mice responded more repetitively during adolescence than C57Bl/6 mice when reinforcement did not require variability but responded more variably when reinforcement required variability. During IRA testing in adulthood, both strains acquired an unchanging, performance sequence equally well. Strain differences emerged, however, after novel learning sequences began alternating with the performance sequence: BALB/c mice substantially outperformed C57Bl/6 mice. Using litter-mate controls, it was found that adolescent experience with variability did not affect either learning or performance on the IRA task in adulthood. These findings constrain the use of BALB/c mice as a model of ASD, but once again reveal this strain is highly sensitive to reinforcement contingencies and they are fast and robust learners. Copyright © 2018. Published by Elsevier B.V.
Preventing Learned Helplessness.
ERIC Educational Resources Information Center
Hoy, Cheri
1986-01-01
To prevent learned helplessness in learning disabled students, teachers can share responsibilities with the students, train students to reinforce themselves for effort and self control, and introduce opportunities for changing counterproductive attitudes. (CL)
Bouton, Mark E.; Winterbauer, Neil E.; Todd, Travis P.
2012-01-01
It is widely recognized that extinction (the procedure in which a Pavlovian conditioned stimulus or an instrumental action is repeatedly presented without its reinforcer) weakens behavior without erasing the original learning. Most of the experiments that support this claim have focused on several “relapse” effects that occur after Pavlovian extinction, which collectively suggest that the original learning is saved through extinction. However, although such effects do occur after instrumental extinction, they have not been explored there in as much detail. This article reviews recent research in our laboratory that has investigated three relapse effects that occur after the extinction of instrumental (operant) learning. In renewal, responding returns after extinction when the behavior is tested in a different context; in resurgence, responding recovers when a second response that has been reinforced during extinction of the first is itself put on extinction; and in rapid reacquisition, extinguished responding returns rapidly when the response is reinforced again. The results provide new insights into extinction and relapse, and are consistent with principles that have been developed to explain extinction and relapse as they occur after Pavlovian conditioning. Extinction of instrumental learning, like Pavlovian learning, involves new learning that is relatively dependent on the context for expression. PMID:22450305
Enhanced appetitive learning and reversal learning in a mouse model for Prader-Willi syndrome.
Relkovic, Dinko; Humby, Trevor; Hagan, Jim J; Wilkinson, Lawrence S; Isles, Anthony R
2012-06-01
Prader-Willi syndrome (PWS) is caused by lack of paternally derived gene expression from the imprinted gene cluster on human chromosome 15q11-q13. PWS is characterized by severe hypotonia, a failure to thrive in infancy and, on emerging from infancy, evidence of learning disabilities and overeating behavior due to an abnormal satiety response and increased motivation by food. We have previously shown that an imprinting center deletion mouse model (PWS-IC) is quicker to acquire a preference for, and consume more of a palatable food. Here we examined how the use of this palatable food as a reinforcer influences learning in PWS-IC mice performing a simple appetitive learning task. On a nonspatial maze-based task, PWS-IC mice acquired criteria much quicker, making fewer errors during initial acquisition and also reversal learning. A manipulation where the reinforcer was devalued impaired wild-type performance but had no effect on PWS-IC mice. This suggests that increased motivation for the reinforcer in PWS-IC mice may underlie their enhanced learning. This supports previous findings in PWS patients and is the first behavioral study of an animal model of PWS in which the motivation of behavior by food rewards has been examined. © 2012 American Psychological Association
Economic decision-making in the ultimatum game by smokers.
Takahashi, Taiki
2007-10-01
No study to date compared degrees of inequity aversion in economic decision-making in the ultimatum game between non-addictive and addictive reinforcers. The comparison is potentially important in neuroeconomics and reinforcement learning theory of addiction. We compared the degrees of inequity aversion in the ultimatum game between money and cigarettes in habitual smokers. Smokers avoided inequity in the ultimatum game more dramatically for money than for cigarettes; i.e., there was a "domain effect" in decision-making in the ultimatum game. Reward-processing neural activities in the brain for non-addictive and addictive reinforcers may be distinct and the insula activation due to cue-induced craving may conflict with unfair offer-induced insula activation. Future studies in neuroeconomics of addiction should employ game-theoretic decision tasks for elucidating reinforcement learning processes in dopaminergic neural circuits.
Fun While Learning and Earning. A Look Into Chattanooga Public Schools' Token Reinforcement Program.
ERIC Educational Resources Information Center
Smith, William F.; Sanders, Frank J.
A token reinforcement program was used by the Piney Woods Research and Demonstration Center in Chattanooga, Tennessee. Children who were from economically deprived homes received tokens for positive behavior. The tokens were redeemable for recess privileges, ice cream, candy, and other such reinforcers. All tokens were spent on the day earned so…
ERIC Educational Resources Information Center
Raska, David; Keller, Eileen Weisenbach; Shaw, Doris
2014-01-01
Curriculum-Faculty-Reinforcement (CFR) alignment is an alignment between fundamental marketing concepts that are integral to the mastery of knowledge expected of our marketing graduates, their perceived importance by the faculty, and their level of reinforcement throughout core marketing courses required to obtain a marketing degree. This research…
ERIC Educational Resources Information Center
Diegelmann, Soeren; Zars, Melissa; Zars, Troy
2006-01-01
Memories can have different strengths, largely dependent on the intensity of reinforcers encountered. The relationship between reinforcement and memory strength is evident in asymptotic memory curves, with the level of the asymptote related to the intensity of the reinforcer. Although this is likely a fundamental property of memory formation,…
ERIC Educational Resources Information Center
Moreno-Fernandez, Maria M.; Abad, Maria J. F.; Ramos-Alvarez, Manuel M.; Rosas, Juan M.
2011-01-01
Predictive value for continuously reinforced cues is affected by context changes when they are trained within a context in which a different cue undergoes partial reinforcement. An experiment was conducted with the goal of exploring the mechanisms underlying this context-switch effect. Human participants were trained in a predictive learning…
The Use of Reinforcement Procedures in Teaching Reading to Rural Culturally Deprived Children.
ERIC Educational Resources Information Center
Egeland, Byron
A group of culturally deprived children with severe reading and behavior problems was systematically given tangible reinforcers while learning to read. Twelve second-grade and 12 third-grade boys from a rural and lower socioeconomic background were taught reading with the use of tangible reinforcers (E group). Four similar control groups (C group)…
Tabassum, Heena; Frey, Julietta U
2013-12-01
Hippocampal long-term potentiation (LTP) is a cellular model of learning and memory. An early form of LTP (E-LTP) can be reinforced into its late form (L-LTP) by various behavioral interactions within a specific time window ("behavioral LTP-reinforcement"). Depending on the type and procedure used, various studies have shown that stress differentially affects synaptic plasticity. Under low stress, such as novelty detection or mild foot shocks, E-LTP can be transformed into L-LTP in the rat dentate gyrus (DG). A reinforcing effect of a 2-min swim, however, has only been shown in (Korz and Frey (2003) J Neurosci 23:7281-7287; Korz and Frey (2005) J Neurosci 25:7393-7400; Ahmed et al. (2006) J Neurosci 26:3951-3958; Sajikumar et al., (2007) J Physiol 584.2:389-400) so far. We have reinvestigated these studies using the same as well as an improved recording technique which allowed the recording of field excitatory postsynaptic potentials (fEPSP) and the population spike amplitude (PSA) at their places of generation in freely moving rats. We show that acute swim stress led to a long-term depression (LTD) in baseline values of PSA and partially fEPSP. In contrast to earlier studies a LTP-reinforcement by swimming could never be reproduced. Our results indicate that 2-min swim stress influenced synaptic potentials as well as E-LTP negatively. Copyright © 2013 Wiley Periodicals, Inc.
Taylor, Jordan A; Ivry, Richard B
2014-01-01
Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. © 2014 Elsevier B.V. All rights reserved.
Katnani, Husam A; Patel, Shaun R; Kwon, Churl-Su; Abdel-Aziz, Samer; Gale, John T; Eskandar, Emad N
2016-01-04
The primate brain has the remarkable ability of mapping sensory stimuli into motor behaviors that can lead to positive outcomes. We have previously shown that during the reinforcement of visual-motor behavior, activity in the caudate nucleus is correlated with the rate of learning. Moreover, phasic microstimulation in the caudate during the reinforcement period was shown to enhance associative learning, demonstrating the importance of temporal specificity to manipulate learning related changes. Here we present evidence that extends upon our previous finding by demonstrating that temporally coordinated phasic deep brain stimulation across both the nucleus accumbens and caudate can further enhance associative learning. Monkeys performed a visual-motor associative learning task and received stimulation at time points critical to learning related changes. Resulting performance revealed an enhancement in the rate, ceiling, and reaction times of learning. Stimulation of each brain region alone or at different time points did not generate the same effect.
Cerebellar and Prefrontal Cortex Contributions to Adaptation, Strategies, and Reinforcement Learning
Taylor, Jordan A.; Ivry, Richard B.
2014-01-01
Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. PMID:24916295
Modeling the behavioral substrates of associate learning and memory - Adaptive neural models
NASA Technical Reports Server (NTRS)
Lee, Chuen-Chien
1991-01-01
Three adaptive single-neuron models based on neural analogies of behavior modification episodes are proposed, which attempt to bridge the gap between psychology and neurophysiology. The proposed models capture the predictive nature of Pavlovian conditioning, which is essential to the theory of adaptive/learning systems. The models learn to anticipate the occurrence of a conditioned response before the presence of a reinforcing stimulus when training is complete. Furthermore, each model can find the most nonredundant and earliest predictor of reinforcement. The behavior of the models accounts for several aspects of basic animal learning phenomena in Pavlovian conditioning beyond previous related models. Computer simulations show how well the models fit empirical data from various animal learning paradigms.
Stimulus discriminability may bias value-based probabilistic learning.
Schutte, Iris; Slagter, Heleen A; Collins, Anne G E; Frank, Michael J; Kenemans, J Leon
2017-01-01
Reinforcement learning tasks are often used to assess participants' tendency to learn more from the positive or more from the negative consequences of one's action. However, this assessment often requires comparison in learning performance across different task conditions, which may differ in the relative salience or discriminability of the stimuli associated with more and less rewarding outcomes, respectively. To address this issue, in a first set of studies, participants were subjected to two versions of a common probabilistic learning task. The two versions differed with respect to the stimulus (Hiragana) characters associated with reward probability. The assignment of character to reward probability was fixed within version but reversed between versions. We found that performance was highly influenced by task version, which could be explained by the relative perceptual discriminability of characters assigned to high or low reward probabilities, as assessed by a separate discrimination experiment. Participants were more reliable in selecting rewarding characters that were more discriminable, leading to differences in learning curves and their sensitivity to reward probability. This difference in experienced reinforcement history was accompanied by performance biases in a test phase assessing ability to learn from positive vs. negative outcomes. In a subsequent large-scale web-based experiment, this impact of task version on learning and test measures was replicated and extended. Collectively, these findings imply a key role for perceptual factors in guiding reward learning and underscore the need to control stimulus discriminability when making inferences about individual differences in reinforcement learning.
Interactive computer simulations of knee-replacement surgery.
Gunther, Stephen B; Soto, Gabriel E; Colman, William W
2002-07-01
Current surgical training programs in the United States are based on an apprenticeship model. This model is outdated because it does not provide conceptual scaffolding, promote collaborative learning, or offer constructive reinforcement. Our objective was to create a more useful approach by preparing students and residents for operative cases using interactive computer simulations of surgery. Total-knee-replacement surgery (TKR) is an ideal procedure to model on the computer because there is a systematic protocol for the procedure. Also, this protocol is difficult to learn by the apprenticeship model because of the multiple instruments that must be used in a specific order. We designed an interactive computer tutorial to teach medical students and residents how to perform knee-replacement surgery. We also aimed to reinforce the specific protocol of the operative procedure. Our final goal was to provide immediate, constructive feedback. We created a computer tutorial by generating three-dimensional wire-frame models of the surgical instruments. Next, we applied a surface to the wire-frame models using three-dimensional modeling. Finally, the three-dimensional models were animated to simulate the motions of an actual TKR. The tutorial is a step-by-step tutorial that teaches and tests the correct sequence of steps in a TKR. The student or resident must select the correct instruments in the correct order. The learner is encouraged to learn the stepwise surgical protocol through repetitive use of the computer simulation. Constructive feedback is acquired through a grading system, which rates the student's or resident's ability to perform the task in the correct order. The grading system also accounts for the time required to perform the simulated procedure. We evaluated the efficacy of this teaching technique by testing medical students who learned by the computer simulation and those who learned by reading the surgical protocol manual. Both groups then performed TKR on manufactured bone models using real instruments. Their technique was graded with the standard protocol. The students who learned on the computer simulation performed the task in a shorter time and with fewer errors than the control group. They were also more engaged in the learning process. Surgical training programs generally lack a consistent approach to preoperative education related to surgical procedures. This interactive computer tutorial has allowed us to make a quantum leap in medical student and resident teaching in our orthopedic department because the students actually participate in the entire process. Our technique provides a linear, sequential method of skill acquisition and direct feedback, which is ideally suited for learning stepwise surgical protocols. Since our initial evaluation has shown the efficacy of this program, we have implemented this teaching tool into our orthopedic curriculum. Our plans for future work with this simulator include modeling procedures involving other anatomic areas of interest, such as the hip and shoulder.
Dai, Qing; Sheng, Xiesun; Chen, Feng
2017-04-12
The reinforcing and reducing manipulation at different acupoints is a kind of acupuncture manipulations and has satisfactory clinical therapeutic effects, combined with a proper needling techniques. The reinforcing needling method is used in the upper and the reducing one in the lower, the distal acupoints are combined with the nearby acupoints. The local acupoints or adjcant acupoints of the affected area are regarded as the nearby acupoints, e.g. the acupoints in the upper. The distant acupoints and the acupoints on the hand and foot are named as distal acupoints, e.g. the acupoint in the lower. In the reinforcing manipulation, the needle is inserted shallowly along the running direction of meridian. In the reducing manipulation, the needle is inserted deeply and against the running direction of meridian. The yin - yang couple needling technique is used with the combination of the front- mu and back- shu points. In the first option, the reinforcing and reducing needling method with rotating technique is predominated at the front- mu points, while that with lifting and thrusting technique is at the back- shu points. In the second option, when needling the back- shu points, the needling sensation is transmitted along the transverse segment and far to the chest and abdomen. These two kinds of integration of acupoint combination and needling techniques display a certain clinical significance in improving the therapeutic effects of acupuncture.
High and low temperatures have unequal reinforcing properties in Drosophila spatial learning.
Zars, Melissa; Zars, Troy
2006-07-01
Small insects regulate their body temperature solely through behavior. Thus, sensing environmental temperature and implementing an appropriate behavioral strategy can be critical for survival. The fly Drosophila melanogaster prefers 24 degrees C, avoiding higher and lower temperatures when tested on a temperature gradient. Furthermore, temperatures above 24 degrees C have negative reinforcing properties. In contrast, we found that flies have a preference in operant learning experiments for a low-temperature-associated position rather than the 24 degrees C alternative in the heat-box. Two additional differences between high- and low-temperature reinforcement, i.e., temperatures above and below 24 degrees C, were found. Temperatures equally above and below 24 degrees C did not reinforce equally and only high temperatures supported increased memory performance with reversal conditioning. Finally, low- and high-temperature reinforced memories are similarly sensitive to two genetic mutations. Together these results indicate the qualitative meaning of temperatures below 24 degrees C depends on the dynamics of the temperatures encountered and that the reinforcing effects of these temperatures depend on at least some common genetic components. Conceptualizing these results using the Wolf-Heisenberg model of operant conditioning, we propose the maximum difference in experienced temperatures determines the magnitude of the reinforcement input to a conditioning circuit.
Another View on "Reinforcement in Developmentally Appropriate Early Childhood Classrooms."
ERIC Educational Resources Information Center
Wolfgang, Charles H.
2001-01-01
Contrasts the use of behavioral and developmental theories to address a child's aggression. Presents concerns about the use of social reinforcers, activity reinforcers, and tangible reinforcers. Asserts that behavioral techniques that shape children's surface behaviors without placing the behaviors within a developmental context may interfere with…
Martínez-Velázquez, Eduardo S; Ramos-Loyo, Julieta; González-Garrido, Andrés A; Sequeira, Henrique
2015-01-21
Feedback-related negativity (FRN) is a negative deflection that appears around 250 ms after the gain or loss of feedback to chosen alternatives in a gambling task in frontocentral regions following outcomes. Few studies have reported FRN enhancement in adolescents compared with adults in a gambling task without probabilistic reinforcement learning, despite the fact that learning from positive or negative consequences is crucial for decision-making during adolescence. Therefore, the aim of the present research was to identify differences in FRN amplitude and latency between adolescents and adults on a gambling task with favorable and unfavorable probabilistic reinforcement learning conditions, in addition to a nonlearning condition with monetary gains and losses. Higher rate scores of high-magnitude choices during the final 30 trials compared with the first 30 trials were observed during the favorable condition, whereas lower rates were observed during the unfavorable condition in both groups. Higher FRN amplitude in all conditions and longer latency in the nonlearning condition were observed in adolescents compared with adults and in relation to losses. Results indicate that both the adolescents and the adults improved their performance in relation to positive and negative feedback. However, the FRN findings suggest an increased sensitivity to external feedback to losses in adolescents compared with adults, irrespective of the presence or absence of probabilistic reinforcement learning. These results reflect processing differences on the neural monitoring system and provide new perspectives on the dynamic development of an adolescent's brain.
Pechtel, Pia; Pizzagalli, Diego A
2013-05-01
Childhood sexual abuse (CSA) has been associated with psychopathology, particularly major depressive disorder (MDD), and high-risk behaviors. Despite the epidemiological data available, the mechanisms underlying these maladaptive outcomes remain poorly understood. We examined whether a history of CSA, particularly in conjunction with a past episode of MDD, is associated with behavioral and neural dysfunction in reinforcement learning, and whether such dysfunction is linked to maladaptive behavior. Participants completed a clinical evaluation and a probabilistic reinforcement task while 128-channel event-related potentials were recorded. Academic setting; participants recruited from the community. Fifteen women with a history of CSA and remitted MDD (CSA + rMDD), 16 women with remitted MDD with no history of CSA (rMDD), and 18 healthy women (controls). Three or more episodes of coerced sexual contact (mean [SD] duration, 3.00 [2.20] years) between the ages of 7 and 12 years by at least 1 male perpetrator. Participants' preference for choosing the most rewarded stimulus and avoiding the most punished stimulus was evaluated. The feedback-related negativity and error-related negativity-hypothesized to reflect activation in the anterior cingulate cortex-were used as electrophysiological indices of reinforcement learning. No group differences emerged in the acquisition of reinforcement contingencies. In trials requiring participants to rely partially or exclusively on previously rewarded information, the CSA + rMDD group showed (1) lower accuracy (relative to both controls and the rMDD group), (2) blunted electrophysiological differentiation between correct and incorrect responses (relative to controls), and (3) increased activation in the subgenual anterior cingulate cortex (relative to the rMDD group). A history of CSA was not associated with impairments in avoiding the most punished stimulus. Self-harm and suicidal behaviors correlated with poorer performance of previously rewarded, but not previously punished, trials. Irrespective of past MDD episodes, women with a history of CSA showed neural and behavioral deficits in utilizing previous reinforcement to optimize decision making in the absence of feedback (blunted "Go learning"). Although our study provides initial evidence for reward-specific deficits associated with CSA, future research is warranted to determine if disrupted positive reinforcement learning predicts high-risk behavior following CSA.
Zhang, Chen; Sun, Chao; Gao, Liqiang; Zheng, Nenggan; Chen, Weidong; Zheng, Xiaoxiang
2013-01-01
Bio-robots based on brain computer interface (BCI) suffer from the lack of considering the characteristic of the animals in navigation. This paper proposed a new method for bio-robots' automatic navigation combining the reward generating algorithm base on Reinforcement Learning (RL) with the learning intelligence of animals together. Given the graded electrical reward, the animal e.g. the rat, intends to seek the maximum reward while exploring an unknown environment. Since the rat has excellent spatial recognition, the rat-robot and the RL algorithm can convergent to an optimal route by co-learning. This work has significant inspiration for the practical development of bio-robots' navigation with hybrid intelligence.
Emotion-based learning systems and the development of morality.
Blair, R J R
2017-10-01
In this paper it is proposed that important components of moral development and moral judgment rely on two forms of emotional learning: stimulus-reinforcement and response-outcome learning. Data in support of this position will be primarily drawn from work with individuals with the developmental condition of psychopathy as well as fMRI studies with healthy individuals. Individuals with psychopathy show impairment on moral judgment tasks and a pronounced increased risk for instrumental antisocial behavior. It will be argued that these impairments are developmental consequences of impaired stimulus-aversive conditioning on the basis of distress cue reinforcers and response-outcome learning in individuals with this disorder. Copyright © 2017. Published by Elsevier B.V.
Feedback-related brain activity predicts learning from feedback in multiple-choice testing.
Ernst, Benjamin; Steinhauser, Marco
2012-06-01
Different event-related potentials (ERPs) have been shown to correlate with learning from feedback in decision-making tasks and with learning in explicit memory tasks. In the present study, we investigated which ERPs predict learning from corrective feedback in a multiple-choice test, which combines elements from both paradigms. Participants worked through sets of multiple-choice items of a Swahili-German vocabulary task. Whereas the initial presentation of an item required the participants to guess the answer, corrective feedback could be used to learn the correct response. Initial analyses revealed that corrective feedback elicited components related to reinforcement learning (FRN), as well as to explicit memory processing (P300) and attention (early frontal positivity). However, only the P300 and early frontal positivity were positively correlated with successful learning from corrective feedback, whereas the FRN was even larger when learning failed. These results suggest that learning from corrective feedback crucially relies on explicit memory processing and attentional orienting to corrective feedback, rather than on reinforcement learning.
More Than the Sum of Its Parts: A Role for the Hippocampus in Configural Reinforcement Learning.
Duncan, Katherine; Doll, Bradley B; Daw, Nathaniel D; Shohamy, Daphna
2018-05-02
People often perceive configurations rather than the elements they comprise, a bias that may emerge because configurations often predict outcomes. But how does the brain learn to associate configurations with outcomes and how does this learning differ from learning about individual elements? We combined behavior, reinforcement learning models, and functional imaging to understand how people learn to associate configurations of cues with outcomes. We found that configural learning depended on the relative predictive strength of elements versus configurations and was related to both the strength of BOLD activity and patterns of BOLD activity in the hippocampus. Configural learning was further related to functional connectivity between the hippocampus and nucleus accumbens. Moreover, configural learning was associated with flexible knowledge about associations and differential eye movements during choice. Together, this suggests that configural learning is associated with a distinct computational, cognitive, and neural profile that is well suited to support flexible and adaptive behavior. Copyright © 2018 Elsevier Inc. All rights reserved.
DOT National Transportation Integrated Search
2015-07-01
The effects of steel reinforcement and chloride-induced corrosion initiation on the electrical resistivity measurements using the Wenner : probe technique were studied experimentally on custom-designed reinforced concrete slabs. Investigation paramet...
The Interaction of Temporal Generalization Gradients Predicts the Context Effect
ERIC Educational Resources Information Center
de Castro, Ana Catarina; Machado, Armando
2012-01-01
In a temporal double bisection task, animals learn two discriminations. In the presence of Red and Green keys, responses to Red are reinforced after 1-s samples and responses to Green are reinforced after 4-s samples; in the presence of Blue and Yellow keys, responses to Blue are reinforced after 4-s samples and responses to Yellow are reinforced…
Amygdala and Ventral Striatum Make Distinct Contributions to Reinforcement Learning.
Costa, Vincent D; Dal Monte, Olga; Lucas, Daniel R; Murray, Elisabeth A; Averbeck, Bruno B
2016-10-19
Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL, we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with an RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys' choice reaction times, which emphasized a speed-accuracy trade-off that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL. Published by Elsevier Inc.
Amygdala and ventral striatum make distinct contributions to reinforcement learning
Costa, Vincent D.; Monte, Olga Dal; Lucas, Daniel R.; Murray, Elisabeth A.; Averbeck, Bruno B.
2016-01-01
Summary Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with a RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys’ choice reaction times, which emphasized a speed-accuracy tradeoff that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL. PMID:27720488
Working memory contributions to reinforcement learning impairments in schizophrenia.
Collins, Anne G E; Brown, Jaime K; Gold, James M; Waltz, James A; Frank, Michael J
2014-10-08
Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine-dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia. Copyright © 2014 the authors 0270-6474/14/3413747-10$15.00/0.
Functional Contour-following via Haptic Perception and Reinforcement Learning.
Hellman, Randall B; Tekin, Cem; van der Schaar, Mihaela; Santos, Veronica J
2018-01-01
Many tasks involve the fine manipulation of objects despite limited visual feedback. In such scenarios, tactile and proprioceptive feedback can be leveraged for task completion. We present an approach for real-time haptic perception and decision-making for a haptics-driven, functional contour-following task: the closure of a ziplock bag. This task is challenging for robots because the bag is deformable, transparent, and visually occluded by artificial fingertip sensors that are also compliant. A deep neural net classifier was trained to estimate the state of a zipper within a robot's pinch grasp. A Contextual Multi-Armed Bandit (C-MAB) reinforcement learning algorithm was implemented to maximize cumulative rewards by balancing exploration versus exploitation of the state-action space. The C-MAB learner outperformed a benchmark Q-learner by more efficiently exploring the state-action space while learning a hard-to-code task. The learned C-MAB policy was tested with novel ziplock bag scenarios and contours (wire, rope). Importantly, this work contributes to the development of reinforcement learning approaches that account for limited resources such as hardware life and researcher time. As robots are used to perform complex, physically interactive tasks in unstructured or unmodeled environments, it becomes important to develop methods that enable efficient and effective learning with physical testbeds.
Quantum reinforcement learning.
Dong, Daoyi; Chen, Chunlin; Li, Hanxiong; Tarn, Tzyh-Jong
2008-10-01
The key approaches for machine learning, particularly learning in unknown probabilistic environments, are new representations and computation mechanisms. In this paper, a novel quantum reinforcement learning (QRL) method is proposed by combining quantum theory and reinforcement learning (RL). Inspired by the state superposition principle and quantum parallelism, a framework of a value-updating algorithm is introduced. The state (action) in traditional RL is identified as the eigen state (eigen action) in QRL. The state (action) set can be represented with a quantum superposition state, and the eigen state (eigen action) can be obtained by randomly observing the simulated quantum state according to the collapse postulate of quantum measurement. The probability of the eigen action is determined by the probability amplitude, which is updated in parallel according to rewards. Some related characteristics of QRL such as convergence, optimality, and balancing between exploration and exploitation are also analyzed, which shows that this approach makes a good tradeoff between exploration and exploitation using the probability amplitude and can speedup learning through the quantum parallelism. To evaluate the performance and practicability of QRL, several simulated experiments are given, and the results demonstrate the effectiveness and superiority of the QRL algorithm for some complex problems. This paper is also an effective exploration on the application of quantum computation to artificial intelligence.
Mattingly, B A; Zolman, J F
1980-08-01
The effect of the number of prepunishment acquisition trials on the age dependency of passive avoidance (PA) learning of the Vantress X Arbor Acre chick was determined in both key-peck and runway tests. In nine experiments, 1- and 4-day-old chicks were first trained to respond for heat reward, and then, following a variable number of reinforced acquisition trials, the chicks' responses were punished with aversive wing shocks. The major finding of these experiments was that the age dependency of PA learning of the young chick is related specifically to the number of reinforced training trials given prior to PA testing. When a large number of prepunishment acquisition trials were given, 1-day-old chicks learned as quickly as 4-day-old chicks to withhold responding when punished. However, when only a few acquisition trials preceded PA testing, 1-day-old chicks showed significantly less response suppression than 4-day-old chicks. These acquisition effects indicate that the age-dependent changes in PA learning of the chick are not solely due to developmental changes in general inhibitory ability. Rather, these PA results suggest that the 1-day-old chick, compared with the 4-day-old chick, is deficient in learning, or detecting changes in, stimulus- and/or response-reinforcement contingencies.
Silvetti, Massimo; Wiersema, Jan R; Sonuga-Barke, Edmund; Verguts, Tom
2013-10-01
Attention Deficit/Hyperactivity Disorder (ADHD) is a pathophysiologically complex and heterogeneous condition with both cognitive and motivational components. We propose a novel computational hypothesis of motivational deficits in ADHD, drawing together recent evidence on the role of anterior cingulate cortex (ACC) and associated mesolimbic dopamine circuits in both reinforcement learning and ADHD. Based on findings of dopamine dysregulation and ACC involvement in ADHD we simulated a lesion in a previously validated computational model of ACC (Reward Value and Prediction Model, RVPM). We explored the effects of the lesion on the processing of reinforcement signals. We tested specific behavioral predictions about the profile of reinforcement-related deficits in ADHD in three experimental contexts; probability tracking task, partial and continuous reward schedules, and immediate versus delayed rewards. In addition, predictions were made at the neurophysiological level. Behavioral and neurophysiological predictions from the RVPM-based lesion-model of motivational dysfunction in ADHD were confirmed by data from previously published studies. RVPM represents a promising model of ADHD reinforcement learning suggesting that ACC dysregulation might play a role in the pathogenesis of motivational deficits in ADHD. However, more behavioral and neurophysiological studies are required to test core predictions of the model. In addition, the interaction with different brain networks underpinning other aspects of ADHD neuropathology (i.e., executive function) needs to be better understood. Copyright © 2013 Elsevier Ltd. All rights reserved.
Robotic action acquisition with cognitive biases in coarse-grained state space.
Uragami, Daisuke; Kohno, Yu; Takahashi, Tatsuji
2016-07-01
Some of the authors have previously proposed a cognitively inspired reinforcement learning architecture (LS-Q) that mimics cognitive biases in humans. LS-Q adaptively learns under uniform, coarse-grained state division and performs well without parameter tuning in a giant-swing robot task. However, these results were shown only in simulations. In this study, we test the validity of the LS-Q implemented in a robot in a real environment. In addition, we analyze the learning process to elucidate the mechanism by which the LS-Q adaptively learns under the partially observable environment. We argue that the LS-Q may be a versatile reinforcement learning architecture, which is, despite its simplicity, easily applicable and does not require well-prepared settings. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Reinforcement Learning for Weakly-Coupled MDPs and an Application to Planetary Rover Control
NASA Technical Reports Server (NTRS)
Bernstein, Daniel S.; Zilberstein, Shlomo
2003-01-01
Weakly-coupled Markov decision processes can be decomposed into subprocesses that interact only through a small set of bottleneck states. We study a hierarchical reinforcement learning algorithm designed to take advantage of this particular type of decomposability. To test our algorithm, we use a decision-making problem faced by autonomous planetary rovers. In this problem, a Mars rover must decide which activities to perform and when to traverse between science sites in order to make the best use of its limited resources. In our experiments, the hierarchical algorithm performs better than Q-learning in the early stages of learning, but unlike Q-learning it converges to a suboptimal policy. This suggests that it may be advantageous to use the hierarchical algorithm when training time is limited.
A state of the art review on reinforced concrete beams with openings retrofitted with FRP
NASA Astrophysics Data System (ADS)
Osman, Bashir H.; Wu, Erjun; Ji, Bohai; S Abdelgader, Abdeldime M.
2016-09-01
The use of externally bonded fiber reinforced polymer (FRP) sheets, strips or steel plates is a modern and convenient way for strengthening of reinforced concrete (RC) beams. Several researches have been carried out on reinforced concrete beams with web openings that strengthened using fiber reinforced polymer composite. Majority of researches focused on shear strengthening compared with flexural strengthening, while others studied the effect of openings on shear and flexural separately with various loading. This paper investigates the impact of more than sixty articles on opening reinforced concrete beams with and without strengthening by fiber reinforcement polymers FRP. Moreover, important practical issues, which are contributed in shear strengthening of beams with different strengthening techniques, such as steel plate and FRP laminate, and detailed with various design approaches are discussed. Furthermore, a simple technique of applying fiber reinforced polymer contributed with steel plate for strengthening the RC beams with openings under different load application is concluded. Directions for future research based on the existing gaps of the present works are presented.
Reinforced soil structures. Volume I. Design and construction guidelines
DOT National Transportation Integrated Search
1990-11-01
This report presents comprehensive guidelines for evaluating and using soil reinforcement techniques in the construction of retaining walls, embankment slopes, and natural or cut slopes. A variety of available systems for reinforced soil including in...
Reinforced soil structures. Volume I, Design and construction guidelines
DOT National Transportation Integrated Search
1990-11-01
This report presents comprehensive guidelines for evaluating and using soil reinforcement techniques in the construction of retaining walls, embankment slopes, and natural or cut slopes. A variety of available systems for reinforced soil including in...
Pombo, Nuno; Garcia, Nuno; Bousson, Kouamana
2017-03-01
Sleep apnea syndrome (SAS), which can significantly decrease the quality of life is associated with a major risk factor of health implications such as increased cardiovascular disease, sudden death, depression, irritability, hypertension, and learning difficulties. Thus, it is relevant and timely to present a systematic review describing significant applications in the framework of computational intelligence-based SAS, including its performance, beneficial and challenging effects, and modeling for the decision-making on multiple scenarios. This study aims to systematically review the literature on systems for the detection and/or prediction of apnea events using a classification model. Forty-five included studies revealed a combination of classification techniques for the diagnosis of apnea, such as threshold-based (14.75%) and machine learning (ML) models (85.25%). In addition, the ML models, were clustered in a mind map, include neural networks (44.26%), regression (4.91%), instance-based (11.47%), Bayesian algorithms (1.63%), reinforcement learning (4.91%), dimensionality reduction (8.19%), ensemble learning (6.55%), and decision trees (3.27%). A classification model should provide an auto-adaptive and no external-human action dependency. In addition, the accuracy of the classification models is related with the effective features selection. New high-quality studies based on randomized controlled trials and validation of models using a large and multiple sample of data are recommended. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Dopaminergic Contributions to Vocal Learning
Hoffmann, Lukas A.; Saravanan, Varun; Wood, Alynda N.; He, Li
2016-01-01
Although the brain relies on auditory information to calibrate vocal behavior, the neural substrates of vocal learning remain unclear. Here we demonstrate that lesions of the dopaminergic inputs to a basal ganglia nucleus in a songbird species (Bengalese finches, Lonchura striata var. domestica) greatly reduced the magnitude of vocal learning driven by disruptive auditory feedback in a negative reinforcement task. These lesions produced no measureable effects on the quality of vocal performance or the amount of song produced. Our results suggest that dopaminergic inputs to the basal ganglia selectively mediate reinforcement-driven vocal plasticity. In contrast, dopaminergic lesions produced no measurable effects on the birds' ability to restore song acoustics to baseline following the cessation of reinforcement training, suggesting that different forms of vocal plasticity may use different neural mechanisms. SIGNIFICANCE STATEMENT During skill learning, the brain relies on sensory feedback to improve motor performance. However, the neural basis of sensorimotor learning is poorly understood. Here, we investigate the role of the neurotransmitter dopamine in regulating vocal learning in the Bengalese finch, a songbird with an extremely precise singing behavior that can nevertheless be reshaped dramatically by auditory feedback. Our findings show that reduction of dopamine inputs to a region of the songbird basal ganglia greatly impairs vocal learning but has no detectable effect on vocal performance. These results suggest a specific role for dopamine in regulating vocal plasticity. PMID:26888928
Characteristics of implicit chaining in cotton-top tamarins (Saguinus oedipus).
Locurto, Charles; Gagne, Matthew; Nutile, Lauren
2010-07-01
In human cognition there has been considerable interest in observing the conditions under which subjects learn material without explicit instructions to learn. In the present experiments, we adapted this issue to nonhumans by asking what subjects learn in the absence of explicit reinforcement for correct responses. Two experiments examined the acquisition of sequence information by cotton-top tamarins (Saguinus oedipus) when such learning was not demanded by the experimental contingencies. An implicit chaining procedure was used in which visual stimuli were presented serially on a touchscreen. Subjects were required to touch one stimulus to advance to the next stimulus. Stimulus presentations followed a pattern, but learning the pattern was not necessary for reinforcement. In Experiment 1 the chain consisted of five different visual stimuli that were presented in the same order on each trial. Each stimulus could occur at any one of six touchscreen positions. In Experiment 2 the same visual element was presented serially in the same five locations on each trial, thereby allowing a behavioral pattern to be correlated with the visual pattern. In this experiment two new tests, a Wild-Card test and a Running-Start test, were used to assess what was learned in this procedure. Results from both experiments indicated that tamarins acquired more information from an implicit chain than was required by the contingencies of reinforcement. These results contribute to the developing literature on nonhuman analogs of implicit learning.
The paper discusses several projects to measure hydrocarbon emissions associated with the manufacture of fiberglass-reinforced plastics. The main purpose of the projects was to evaluate pollution prevention techniques to reduce emissions by altering raw materials, application equ...
Understanding Optimal Decision-Making
2015-06-01
Task (IGT) (Bechara, Damasio, Damasio, & Anderson,1994), a very common test of reinforcement learning that has been used in hundreds of psychology ... psychology task that elicits reinforcement learning (Bechara et al., 1994) and has been used in hundreds of studies (Krain et al., 2006). Subjects...34) 70 # LatByTrial<- LatByTrial+geom_line(data=player,aes(x=trial,y=ewma),linetype=1, colour ="grey8 8") # LatByTrial<- LatByTrial+geom_point
Dere, E; Frisch, C; De Souza Silva, M A; Gödecke, A; Schrader, J; Huston, J P
2001-01-01
Proceeding from previous findings of a beneficial effect of endothelial nitric oxide synthase (eNOS) gene inactivation on negatively reinforced water maze performance, we asked whether this improvement in place learning capacities also holds for a positively reinforced radial maze task. Unlike its beneficial effects on the water maze task, eNOS gene inactivation did not facilitate radial maze performance. The acquisition performance over the days of place learning did not differ between eNOS knockout (eNOS-/-) and wild-type mice (eNOS+/+). eNOS-/- mice displayed a slight and eNOS+/+ mice a more severe working memory deficit in the place learning version of the radial maze compared to the genetic background C57BL/6 strain. Possible differential effects of eNOS inactivation, related to differences in reinforcement contingencies between the Morris water maze and radial maze tasks, behavioral strategy requirements, or to different emotional and physiological concomitants inherent in the two tasks are discussed. These task-unique characteristics might be differentially affected by the reported anxiogenic and hypertensional effects of eNOS gene inactivation. Post-mortem determination of acetylcholine concentrations in diverse brain structures revealed that acetylcholine and choline contents were not different between eNOS-/- and eNOS+/+ mice, but were increased in eNOS+/+ mice compared to C57BL/6 mice in the frontal cortex. Our findings demonstrate that phenotyping of learning and memory capacities should not rely on one learning task only, but should include tasks employing both negative and positive reinforcement contingencies in order to allow valid statements regarding differences in learning capacities between rodent strains.
Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules
La Camera, Giancarlo; Richmond, Barry J.
2008-01-01
It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as “schedule length effect”). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: “framing,” wherein equivalent options are treated differently depending on the context in which they are presented, and the “sunk cost” effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys. PMID:18688266
Modeling the violation of reward maximization and invariance in reinforcement schedules.
La Camera, Giancarlo; Richmond, Barry J
2008-08-08
It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as "schedule length effect"). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: "framing," wherein equivalent options are treated differently depending on the context in which they are presented, and the "sunk cost" effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys.
Sclafani, Anthony; Ackroff, Karen
2015-01-01
Intragastric (IG) flavor conditioning studies in rodents indicate that isocaloric sugar infusions differ in their reinforcing actions, with glucose and sucrose more potent than fructose. Here we determined if the sugars also differ in their ability to maintain operant self-administration by licking an empty spout for IG infusions. Food-restricted C57BL/6J mice were trained 1 h/day to lick a food-baited spout, which triggered IG infusions of 16% sucrose. In testing, the mice licked an empty spout, which triggered IG infusions of different sugars. Mice shifted from sucrose to 16% glucose increased dry licking, whereas mice shifted to 16% fructose rapidly reduced licking to low levels. Other mice shifted from sucrose to IG water reduced licking more slowly but reached the same low levels. Thus IG fructose, like water, is not reinforcing to hungry mice. The more rapid decline in licking induced by fructose may be due to the sugar's satiating effects. Further tests revealed that the Glucose mice increased their dry licking when shifted from 16% to 8% glucose, and reduced their dry licking when shifted to 32% glucose. This may reflect caloric regulation and/or differences in satiation. The Glucose mice did not maintain caloric intake when tested with different sugars. They self-infused less sugar when shifted from 16% glucose to 16% sucrose, and even more so when shifted to 16% fructose. Reduced sucrose self-administration may occur because the fructose component of the disaccharide reduces its reinforcing potency. FVB mice also reduced operant licking when tested with 16% fructose, yet learned to prefer a flavor paired with IG fructose. These data indicate that sugars differ substantially in their ability to support IG self-administration and flavor preference learning. The same post-oral reinforcement process appears to mediate operant licking and flavor learning, although flavor learning provides a more sensitive measure of sugar reinforcement. PMID:26485294
Optimal control in microgrid using multi-agent reinforcement learning.
Li, Fu-Dong; Wu, Min; He, Yong; Chen, Xin
2012-11-01
This paper presents an improved reinforcement learning method to minimize electricity costs on the premise of satisfying the power balance and generation limit of units in a microgrid with grid-connected mode. Firstly, the microgrid control requirements are analyzed and the objective function of optimal control for microgrid is proposed. Then, a state variable "Average Electricity Price Trend" which is used to express the most possible transitions of the system is developed so as to reduce the complexity and randomicity of the microgrid, and a multi-agent architecture including agents, state variables, action variables and reward function is formulated. Furthermore, dynamic hierarchical reinforcement learning, based on change rate of key state variable, is established to carry out optimal policy exploration. The analysis shows that the proposed method is beneficial to handle the problem of "curse of dimensionality" and speed up learning in the unknown large-scale world. Finally, the simulation results under JADE (Java Agent Development Framework) demonstrate the validity of the presented method in optimal control for a microgrid with grid-connected mode. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.
Decker, Johannes H; Otto, A Ross; Daw, Nathaniel D; Hartley, Catherine A
2016-06-01
Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative strategies has not been well characterized. In this study, children, adolescents, and adults performed a sequential reinforcement-learning task that enabled estimation of model-based and model-free contributions to choice. Whereas a model-free strategy was apparent in choice behavior across all age groups, a model-based strategy was absent in children, became evident in adolescents, and strengthened in adults. These results suggest that recruitment of model-based valuation systems represents a critical cognitive component underlying the gradual maturation of goal-directed behavior. © The Author(s) 2016.
Reinforcement Learning with Autonomous Small Unmanned Aerial Vehicles in Cluttered Environments
NASA Technical Reports Server (NTRS)
Tran, Loc; Cross, Charles; Montague, Gilbert; Motter, Mark; Neilan, James; Qualls, Garry; Rothhaar, Paul; Trujillo, Anna; Allen, B. Danette
2015-01-01
We present ongoing work in the Autonomy Incubator at NASA Langley Research Center (LaRC) exploring the efficacy of a data set aggregation approach to reinforcement learning for small unmanned aerial vehicle (sUAV) flight in dense and cluttered environments with reactive obstacle avoidance. The goal is to learn an autonomous flight model using training experiences from a human piloting a sUAV around static obstacles. The training approach uses video data from a forward-facing camera that records the human pilot's flight. Various computer vision based features are extracted from the video relating to edge and gradient information. The recorded human-controlled inputs are used to train an autonomous control model that correlates the extracted feature vector to a yaw command. As part of the reinforcement learning approach, the autonomous control model is iteratively updated with feedback from a human agent who corrects undesired model output. This data driven approach to autonomous obstacle avoidance is explored for simulated forest environments furthering autonomous flight under the tree canopy research. This enables flight in previously inaccessible environments which are of interest to NASA researchers in Earth and Atmospheric sciences.
CENTRAL REINFORCING EFFECTS OF ETHANOL ARE BLOCKED BY CATALASE INHIBITION
Nizhnikov, Michael Edward; Molina, Juan Carlos; Spear, Norman
2007-01-01
Recent studies have systematically indicated that newborn rats are highly sensitive to ethanol’s positive reinforcing effects. Central administrations of ethanol (25–200 mg %) associated with an olfactory conditioned stimulus (CS) promote subsequent conditioned approach to the CS as evaluated through the newborn’s response to a surrogate nipple scented with the CS. It has been shown that ethanol’s first metabolite, acetaldehyde, exerts significant reinforcing effects in the central nervous system. A significant amount of acetaldehyde is derived from ethanol metabolism via the catalase system. In newborn rats catalase levels are particularly high in several brain structures. The present study tested the effect of catalase inhibition on central ethanol reinforcement. In the first experiment, pups experienced lemon odor either paired or unpaired with intracisternal (i.c.) administrations of 100 mg% ethanol. Half of the animals corresponding to each learning condition were pretreated with i.c. administrations of either physiological saline or a catalase inhibitor (sodium-azide). Catalase inhibition completely suppressed ethanol reinforcement in paired groups without affecting responsiveness to the CS during conditioning or responding by unpaired control groups. A second experiment tested whether these effects were specific to ethanol reinforcement or due instead to general impairment in learning and expression capabilities. Central administration of an endogenous kappa opioid receptor agonist (dynorphin A-13) was employed as an alternative source of reinforcement. Inhibition of the catalase system had no effect on the reinforcing properties of dynorphin. The present results support the hypothesis that ethanol metabolism regulated by the catalase system plays a critical role in determination of ethanol reinforcement in newborn rats. PMID:17980789
Dunne, Simon; D'Souza, Arun; O'Doherty, John P
2016-06-01
A major open question is whether computational strategies thought to be used during experiential learning, specifically model-based and model-free reinforcement learning, also support observational learning. Furthermore, the question of how observational learning occurs when observers must learn about the value of options from observing outcomes in the absence of choice has not been addressed. In the present study we used a multi-armed bandit task that encouraged human participants to employ both experiential and observational learning while they underwent functional magnetic resonance imaging (fMRI). We found evidence for the presence of model-based learning signals during both observational and experiential learning in the intraparietal sulcus. However, unlike during experiential learning, model-free learning signals in the ventral striatum were not detectable during this form of observational learning. These results provide insight into the flexibility of the model-based learning system, implicating this system in learning during observation as well as from direct experience, and further suggest that the model-free reinforcement learning system may be less flexible with regard to its involvement in observational learning. Copyright © 2016 the American Physiological Society.
Zhu, Lusha; Mathewson, Kyle E.; Hsu, Ming
2012-01-01
Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents’ beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs. PMID:22307594
Zhu, Lusha; Mathewson, Kyle E; Hsu, Ming
2012-01-31
Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents' beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs.
Mi, Misa; Gould, Douglas
2014-01-01
A Wiki group project was integrated into a neuroscience course for first-year medical students. The project was developed as a self-directed, collaborative learning task to help medical students review course content and make clinically important connections. The goals of the project were to enhance students' understanding of key concepts in neuroscience, promote active learning, and reinforce their information literacy skills. The objective of the exploratory study was to provide a formative evaluation of the Wiki group project and to examine how Wiki technology was utilized to enhance active and collaborative learning of first-year medical students in the course and to reinforce information literacy skills.
Gormally, Cara; Sullivan, Carol Subiño; Szeinbaum, Nadia
2016-05-01
Inquiry-based teaching approaches are increasingly being adopted in biology laboratories. Yet teaching assistants (TAs), often novice teachers, teach the majority of laboratory courses in US research universities. This study analyzed the perspectives of TAs and their students and used classroom observations to uncover challenges faced by TAs during their first year of inquiry-based teaching. Our study revealed three insights about barriers to effective inquiry teaching practices: 1) TAs lack sufficient facilitation skills; 2) TAs struggle to share control over learning with students as they reconcile long-standing teaching beliefs with newly learned approaches, consequently undermining their fledgling ability to use inquiry approaches; and 3) student evaluations reinforce teacher-centered behaviors as TAs receive positive feedback conflicting with inquiry approaches. We make recommendations, including changing instructional feedback to focus on learner-centered teaching practices. We urge TA mentors to engage TAs in discussions to uncover teaching beliefs underlying teaching choices and support TAs through targeted feedback and practice.
The Effects of Basolateral Amygdala Lesions on Unblocking
Chang, Stephen E.; McDannald, Michael A.; Wheeler, Daniel S.; Holland, Peter C.
2012-01-01
Prior reinforcement of a neutral stimulus often blocks subsequent conditioning of a new stimulus if a compound of the original and new cues is paired with the same reinforcer. However, if the value of the reinforcer is altered when the compound is presented, the new cue typically acquires conditioning, a result called unblocking. Blocking, unblocking and related phenomena have been attributed to variations in processing of either the reinforcer, for example, the Rescorla-Wagner (1972) model, or cues, for example, the Pearce-Hall (1980) model. Here, we examined the effects of lesions of the basolateral amygdala on the occurrence of unblocking when the food reinforcer was increased in quantity at the time of introduction of the new cue. The lesions had no effects on unblocking in a simple design (Experiment 1), which did not distinguish between unblocking produced by variations in reward or cue processing. However, in a procedure that distinguished between unblocking due to direct conditioning by the added reinforcer, consistent with the Rescorla-Wagner (1972) model, and that due to increases in conditioning to the original reinforcer, consistent with the Pearce-Hall (1980) and other models of learning, the lesions prevented unblocking of the latter type. These results were discussed in the context of roles of the basolateral amygdala in coding and using reward prediction error information in associative learning. PMID:22448857
Iijima, Yudai; Takano, Keisuke; Boddez, Yannick; Raes, Filip; Tanno, Yoshihiko
2017-01-01
Learning theories of depression have proposed that depressive cognitions, such as negative thoughts with reference to oneself, can develop through a reinforcement learning mechanism. This negative self-reference is considered to be positively reinforced by rewarding experiences such as genuine support from others after negative self-disclosure, and negatively reinforced by avoidance of potential aversive situations. The learning account additionally predicts that negative self-reference would be maintained by an inability to adjust one’s behavior when negative self-reference no longer leads to such reward. To test this prediction, we designed an adapted version of the reversal-learning task. In this task, participants were reinforced to choose and engage in either negative or positive self-reference by probabilistic economic reward and punishment. Although participants were initially trained to choose negative self-reference, the stimulus-reward contingencies were reversed to prompt a shift toward positive self-reference (Study 1) and a further shift toward negative self-reference (Study 2). Model-based computational analyses showed that depressive symptoms were associated with a low learning rate of negative self-reference, indicating a high level of reward expectancy for negative self-reference even after the contingency reversal. Furthermore, the difficulty in updating outcome predictions of negative self-reference was significantly associated with the extent to which one possesses negative self-images. These results suggest that difficulty in adjusting action-outcome estimates for negative self-reference increases the chance to be faced with negative aspects of self, which may result in depressive symptoms. PMID:28824511
Hogarth, Lee; He, Zhimin; Chase, Henry W; Wills, Andy J; Troisi, Joseph; Leventhal, Adam M; Mathew, Amanda R; Hitsman, Brian
2015-09-01
Two theories explain how negative mood primes smoking behaviour. The stimulus-response (S-R) account argues that in the negative mood state, smoking is experienced as more reinforcing, establishing a direct (automatic) association between the negative mood state and smoking behaviour. By contrast, the incentive learning account argues that in the negative mood state smoking is expected to be more reinforcing, which integrates with instrumental knowledge of the response required to produce that outcome. One differential prediction is that whereas the incentive learning account anticipates that negative mood induction could augment a novel tobacco-seeking response in an extinction test, the S-R account could not explain this effect because the extinction test prevents S-R learning by omitting experience of the reinforcer. To test this, overnight-deprived daily smokers (n = 44) acquired two instrumental responses for tobacco and chocolate points, respectively, before smoking to satiety. Half then received negative mood induction to raise the expected value of tobacco, opposing satiety, whilst the remainder received positive mood induction. Finally, a choice between tobacco and chocolate was measured in extinction to test whether negative mood could augment tobacco choice, opposing satiety, in the absence of direct experience of tobacco reinforcement. Negative mood induction not only abolished the devaluation of tobacco choice, but participants with a significant increase in negative mood increased their tobacco choice in extinction, despite satiety. These findings suggest that negative mood augments drug-seeking by raising the expected value of the drug through incentive learning, rather than through automatic S-R control.
Cella, Matteo; Bishara, Anthony J.; Medin, Evelina; Swan, Sarah; Reeder, Clare; Wykes, Til
2014-01-01
Objective: Converging research suggests that individuals with schizophrenia show a marked impairment in reinforcement learning, particularly in tasks requiring flexibility and adaptation. The problem has been associated with dopamine reward systems. This study explores, for the first time, the characteristics of this impairment and how it is affected by a behavioral intervention—cognitive remediation. Method: Using computational modelling, 3 reinforcement learning parameters based on the Wisconsin Card Sorting Test (WCST) trial-by-trial performance were estimated: R (reward sensitivity), P (punishment sensitivity), and D (choice consistency). In Study 1 the parameters were compared between a group of individuals with schizophrenia (n = 100) and a healthy control group (n = 50). In Study 2 the effect of cognitive remediation therapy (CRT) on these parameters was assessed in 2 groups of individuals with schizophrenia, one receiving CRT (n = 37) and the other receiving treatment as usual (TAU, n = 34). Results: In Study 1 individuals with schizophrenia showed impairment in the R and P parameters compared with healthy controls. Study 2 demonstrated that sensitivity to negative feedback (P) and reward (R) improved in the CRT group after therapy compared with the TAU group. R and P parameter change correlated with WCST outputs. Improvements in R and P after CRT were associated with working memory gains and reduction of negative symptoms, respectively. Conclusion: Schizophrenia reinforcement learning difficulties negatively influence performance in shift learning tasks. CRT can improve sensitivity to reward and punishment. Identifying parameters that show change may be useful in experimental medicine studies to identify cognitive domains susceptible to improvement. PMID:24214932
'Proactive' use of cue-context congruence for building reinforcement learning's reward function.
Zsuga, Judit; Biro, Klara; Tajti, Gabor; Szilasi, Magdolna Emma; Papp, Csaba; Juhasz, Bela; Gesztelyi, Rudolf
2016-10-28
Reinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent's knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability. The central objective of reinforcement learning is to solve these two functions outside the agent's control either using, or not using a model. In the present paper, using the proactive model of reinforcement learning we offer insight on how the brain creates simplified representations of the environment, and how these representations are organized to support the identification of relevant stimuli and action. Furthermore, we identify neurobiological correlates of our model by suggesting that the reward and policy functions, attributes of the Bellman equitation, are built by the orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC), respectively. Based on this we propose that the OFC assesses cue-context congruence to activate the most context frame. Furthermore given the bidirectional neuroanatomical link between the OFC and model-free structures, we suggest that model-based input is incorporated into the reward prediction error (RPE) signal, and conversely RPE signal may be used to update the reward-related information of context frames and the policy underlying action selection in the OFC and ACC, respectively. Furthermore clinical implications for cognitive behavioral interventions are discussed.
Iijima, Yudai; Takano, Keisuke; Boddez, Yannick; Raes, Filip; Tanno, Yoshihiko
2017-01-01
Learning theories of depression have proposed that depressive cognitions, such as negative thoughts with reference to oneself, can develop through a reinforcement learning mechanism. This negative self-reference is considered to be positively reinforced by rewarding experiences such as genuine support from others after negative self-disclosure, and negatively reinforced by avoidance of potential aversive situations. The learning account additionally predicts that negative self-reference would be maintained by an inability to adjust one's behavior when negative self-reference no longer leads to such reward. To test this prediction, we designed an adapted version of the reversal-learning task. In this task, participants were reinforced to choose and engage in either negative or positive self-reference by probabilistic economic reward and punishment. Although participants were initially trained to choose negative self-reference, the stimulus-reward contingencies were reversed to prompt a shift toward positive self-reference (Study 1) and a further shift toward negative self-reference (Study 2). Model-based computational analyses showed that depressive symptoms were associated with a low learning rate of negative self-reference, indicating a high level of reward expectancy for negative self-reference even after the contingency reversal. Furthermore, the difficulty in updating outcome predictions of negative self-reference was significantly associated with the extent to which one possesses negative self-images. These results suggest that difficulty in adjusting action-outcome estimates for negative self-reference increases the chance to be faced with negative aspects of self, which may result in depressive symptoms.
Schiffino, Felipe L; Zhou, Vivian; Holland, Peter C
2014-02-01
Within most contemporary learning theories, reinforcement prediction error, the difference between the obtained and expected reinforcer value, critically influences associative learning. In some theories, this prediction error determines the momentary effectiveness of the reinforcer itself, such that the same physical event produces more learning when its presentation is surprising than when it is expected. In other theories, prediction error enhances attention to potential cues for that reinforcer by adjusting cue-specific associability parameters, biasing the processing of those stimuli so that they more readily enter into new associations in the future. A unique feature of these latter theories is that such alterations in stimulus associability must be represented in memory in an enduring fashion. Indeed, considerable data indicate that altered associability may be expressed days after its induction. Previous research from our laboratory identified brain circuit elements critical to the enhancement of stimulus associability by the omission of an expected event, and to the subsequent expression of that altered associability in more rapid learning. Here, for the first time, we identified a brain region, the posterior parietal cortex, as a potential site for a memorial representation of altered stimulus associability. In three experiments using rats and a serial prediction task, we found that intact posterior parietal cortex function was essential during the encoding, consolidation, and retrieval of an associability memory enhanced by surprising omissions. We discuss these new results in the context of our previous findings and additional plausible frontoparietal and subcortical networks. © 2013 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Multiagent cooperation and competition with deep reinforcement learning.
Tampuu, Ardi; Matiisen, Tambet; Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul
2017-01-01
Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.
Emotional Multiagent Reinforcement Learning in Spatial Social Dilemmas.
Yu, Chao; Zhang, Minjie; Ren, Fenghui; Tan, Guozhen
2015-12-01
Social dilemmas have attracted extensive interest in the research of multiagent systems in order to study the emergence of cooperative behaviors among selfish agents. Understanding how agents can achieve cooperation in social dilemmas through learning from local experience is a critical problem that has motivated researchers for decades. This paper investigates the possibility of exploiting emotions in agent learning in order to facilitate the emergence of cooperation in social dilemmas. In particular, the spatial version of social dilemmas is considered to study the impact of local interactions on the emergence of cooperation in the whole system. A double-layered emotional multiagent reinforcement learning framework is proposed to endow agents with internal cognitive and emotional capabilities that can drive these agents to learn cooperative behaviors. Experimental results reveal that various network topologies and agent heterogeneities have significant impacts on agent learning behaviors in the proposed framework, and under certain circumstances, high levels of cooperation can be achieved among the agents.
A self-learning rule base for command following in dynamical systems
NASA Technical Reports Server (NTRS)
Tsai, Wei K.; Lee, Hon-Mun; Parlos, Alexander
1992-01-01
In this paper, a self-learning Rule Base for command following in dynamical systems is presented. The learning is accomplished though reinforcement learning using an associative memory called SAM. The main advantage of SAM is that it is a function approximator with explicit storage of training samples. A learning algorithm patterned after the dynamic programming is proposed. Two artificially created, unstable dynamical systems are used for testing, and the Rule Base was used to generate a feedback control to improve the command following ability of the otherwise uncontrolled systems. The numerical results are very encouraging. The controlled systems exhibit a more stable behavior and a better capability to follow reference commands. The rules resulting from the reinforcement learning are explicitly stored and they can be modified or augmented by human experts. Due to overlapping storage scheme of SAM, the stored rules are similar to fuzzy rules.
Multiagent cooperation and competition with deep reinforcement learning
Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul
2017-01-01
Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments. PMID:28380078
Constructing Temporally Extended Actions through Incremental Community Detection
Li, Ge
2018-01-01
Hierarchical reinforcement learning works on temporally extended actions or skills to facilitate learning. How to automatically form such abstraction is challenging, and many efforts tackle this issue in the options framework. While various approaches exist to construct options from different perspectives, few of them concentrate on options' adaptability during learning. This paper presents an algorithm to create options and enhance their quality online. Both aspects operate on detected communities of the learning environment's state transition graph. We first construct options from initial samples as the basis of online learning. Then a rule-based community revision algorithm is proposed to update graph partitions, based on which existing options can be continuously tuned. Experimental results in two problems indicate that options from initial samples may perform poorly in more complex environments, and our presented strategy can effectively improve options and get better results compared with flat reinforcement learning. PMID:29849543
Deep Direct Reinforcement Learning for Financial Signal Representation and Trading.
Deng, Yue; Bao, Feng; Kong, Youyong; Ren, Zhiquan; Dai, Qionghai
2017-03-01
Can we train the computer to beat experienced traders for financial assert trading? In this paper, we try to address this challenge by introducing a recurrent deep neural network (NN) for real-time financial signal representation and trading. Our model is inspired by two biological-related learning concepts of deep learning (DL) and reinforcement learning (RL). In the framework, the DL part automatically senses the dynamic market condition for informative feature learning. Then, the RL module interacts with deep representations and makes trading decisions to accumulate the ultimate rewards in an unknown environment. The learning system is implemented in a complex NN that exhibits both the deep and recurrent structures. Hence, we propose a task-aware backpropagation through time method to cope with the gradient vanishing issue in deep training. The robustness of the neural system is verified on both the stock and the commodity future markets under broad testing conditions.
"Learned Helplessness" or "Learned Incompetence"?
ERIC Educational Resources Information Center
Sergent, Justine; Lambert, Wallace E.
Studies in the past have shown that reinforcements independent of the subjects actions may induce a feeling of helplessness. Most experiments on learned helplessness have led researchers to believe that uncontrollability (non-contingency of feedback upon response) was the determining feature of learned helplessness, although in most studies…
Implicit learning in cotton-top tamarins (Saguinus oedipus) and pigeons (Columba livia).
Locurto, Charles; Fox, Maura; Mazzella, Andrea
2015-06-01
There is considerable interest in the conditions under which human subjects learn patterned information without explicit instructions to learn that information. This form of learning, termed implicit or incidental learning, can be approximated in nonhumans by exposing subjects to patterned information but delivering reinforcement randomly, thereby not requiring the subjects to learn the information in order to be reinforced. Following acquisition, nonhuman subjects are queried as to what they have learned about the patterned information. In the present experiment, we extended the study of implicit learning in nonhumans by comparing two species, cotton-top tamarins (Saguinus oedipus) and pigeons (Columba livia), on an implicit learning task that used an artificial grammar to generate the patterned elements for training. We equated the conditions of training and testing as much as possible between the two species. The results indicated that both species demonstrated approximately the same magnitude of implicit learning, judged both by a random test and by choice tests between pairs of training elements. This finding suggests that the ability to extract patterned information from situations in which such learning is not demanded is of longstanding origin.
Social learning through prediction error in the brain
NASA Astrophysics Data System (ADS)
Joiner, Jessica; Piva, Matthew; Turrin, Courtney; Chang, Steve W. C.
2017-06-01
Learning about the world is critical to survival and success. In social animals, learning about others is a necessary component of navigating the social world, ultimately contributing to increasing evolutionary fitness. How humans and nonhuman animals represent the internal states and experiences of others has long been a subject of intense interest in the developmental psychology tradition, and, more recently, in studies of learning and decision making involving self and other. In this review, we explore how psychology conceptualizes the process of representing others, and how neuroscience has uncovered correlates of reinforcement learning signals to explore the neural mechanisms underlying social learning from the perspective of representing reward-related information about self and other. In particular, we discuss self-referenced and other-referenced types of reward prediction errors across multiple brain structures that effectively allow reinforcement learning algorithms to mediate social learning. Prediction-based computational principles in the brain may be strikingly conserved between self-referenced and other-referenced information.
DOT National Transportation Integrated Search
2014-03-01
This report describes a research project to investigate accelerated aging protocols for fiber-reinforced : polymer (FRP) reinforcement of concrete. This research was conducted in three stages. In the first : stage, various spectroscopic techniques we...
Effects of intrinsic motivation on feedback processing during learning.
DePasque, Samantha; Tricomi, Elizabeth
2015-10-01
Learning commonly requires feedback about the consequences of one's actions, which can drive learners to modify their behavior. Motivation may determine how sensitive an individual might be to such feedback, particularly in educational contexts where some students value academic achievement more than others. Thus, motivation for a task might influence the value placed on performance feedback and how effectively it is used to improve learning. To investigate the interplay between intrinsic motivation and feedback processing, we used functional magnetic resonance imaging (fMRI) during feedback-based learning before and after a novel manipulation based on motivational interviewing, a technique for enhancing treatment motivation in mental health settings. Because of its role in the reinforcement learning system, the striatum is situated to play a significant role in the modulation of learning based on motivation. Consistent with this idea, motivation levels during the task were associated with sensitivity to positive versus negative feedback in the striatum. Additionally, heightened motivation following a brief motivational interview was associated with increases in feedback sensitivity in the left medial temporal lobe. Our results suggest that motivation modulates neural responses to performance-related feedback, and furthermore that changes in motivation facilitate processing in areas that support learning and memory. Copyright © 2015. Published by Elsevier Inc.
Synchronization of Chaotic Systems without Direct Connections Using Reinforcement Learning
NASA Astrophysics Data System (ADS)
Sato, Norihisa; Adachi, Masaharu
In this paper, we propose a control method for the synchronization of chaotic systems that does not require the systems to be connected, unlike existing methods such as that proposed by Pecora and Carroll in 1990. The method is based on the reinforcement learning algorithm. We apply our method to two discrete-time chaotic systems with mismatched parameters and achieve M step delay synchronization. Moreover, we extend the proposed method to the synchronization of continuous-time chaotic systems.
Effective Classroom Demonstration of Soil Reinforcing Techniques.
ERIC Educational Resources Information Center
Williams, John Wharton; Fox, Dennis James
1986-01-01
Presents a model for demonstrating soil mass stabilization. Explains how this approach can assist students in understanding the various types of soil reinforcement techniques, their relative contribution to increased soil strength, and some of their limitations. A working drawing of the model and directives for construction are included. (ML)
Effects of Ventral Striatum Lesions on Stimulus-Based versus Action-Based Reinforcement Learning.
Rothenhoefer, Kathryn M; Costa, Vincent D; Bartolo, Ramón; Vicario-Feliciano, Raquel; Murray, Elisabeth A; Averbeck, Bruno B
2017-07-19
Learning the values of actions versus stimuli may depend on separable neural circuits. In the current study, we evaluated the performance of rhesus macaques with ventral striatum (VS) lesions on a two-arm bandit task that had randomly interleaved blocks of stimulus-based and action-based reinforcement learning (RL). Compared with controls, monkeys with VS lesions had deficits in learning to select rewarding images but not rewarding actions. We used a RL model to quantify learning and choice consistency and found that, in stimulus-based RL, the VS lesion monkeys were more influenced by negative feedback and had lower choice consistency than controls. Using a Bayesian model to parse the groups' learning strategies, we also found that VS lesion monkeys defaulted to an action-based choice strategy. Therefore, the VS is involved specifically in learning the value of stimuli, not actions. SIGNIFICANCE STATEMENT Reinforcement learning models of the ventral striatum (VS) often assume that it maintains an estimate of state value. This suggests that it plays a general role in learning whether rewards are assigned based on a chosen action or stimulus. In the present experiment, we examined the effects of VS lesions on monkeys' ability to learn that choosing a particular action or stimulus was more likely to lead to reward. We found that VS lesions caused a specific deficit in the monkeys' ability to discriminate between images with different values, whereas their ability to discriminate between actions with different values remained intact. Our results therefore suggest that the VS plays a specific role in learning to select rewarded stimuli. Copyright © 2017 the authors 0270-6474/17/376902-13$15.00/0.
Intrinsically motivated reinforcement learning for human-robot interaction in the real-world.
Qureshi, Ahmed Hussain; Nakamura, Yutaka; Yoshikawa, Yuichiro; Ishiguro, Hiroshi
2018-03-26
For a natural social human-robot interaction, it is essential for a robot to learn the human-like social skills. However, learning such skills is notoriously hard due to the limited availability of direct instructions from people to teach a robot. In this paper, we propose an intrinsically motivated reinforcement learning framework in which an agent gets the intrinsic motivation-based rewards through the action-conditional predictive model. By using the proposed method, the robot learned the social skills from the human-robot interaction experiences gathered in the real uncontrolled environments. The results indicate that the robot not only acquired human-like social skills but also took more human-like decisions, on a test dataset, than a robot which received direct rewards for the task achievement. Copyright © 2018 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhou, Changjiu; Meng, Qingchun; Guo, Zhongwen; Qu, Wiefen; Yin, Bo
2002-04-01
Robot learning in unstructured environments has been proved to be an extremely challenging problem, mainly because of many uncertainties always present in the real world. Human beings, on the other hand, seem to cope very well with uncertain and unpredictable environments, often relying on perception-based information. Furthermore, humans beings can also utilize perceptions to guide their learning on those parts of the perception-action space that are actually relevant to the task. Therefore, we conduct a research aimed at improving robot learning through the incorporation of both perception-based and measurement-based information. For this reason, a fuzzy reinforcement learning (FRL) agent is proposed in this paper. Based on a neural-fuzzy architecture, different kinds of information can be incorporated into the FRL agent to initialise its action network, critic network and evaluation feedback module so as to accelerate its learning. By making use of the global optimisation capability of GAs (genetic algorithms), a GA-based FRL (GAFRL) agent is presented to solve the local minima problem in traditional actor-critic reinforcement learning. On the other hand, with the prediction capability of the critic network, GAs can perform a more effective global search. Different GAFRL agents are constructed and verified by using the simulation model of a physical biped robot. The simulation analysis shows that the biped learning rate for dynamic balance can be improved by incorporating perception-based information on biped balancing and walking evaluation. The biped robot can find its application in ocean exploration, detection or sea rescue activity, as well as military maritime activity.
Anderson, Sarah J.; Hecker, Kent G.; Krigolson, Olave E.; Jamniczky, Heather A.
2018-01-01
In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise. PMID:29467638
Anderson, Sarah J; Hecker, Kent G; Krigolson, Olave E; Jamniczky, Heather A
2018-01-01
In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise.
ERIC Educational Resources Information Center
Peterson, Craig M.
A system of task analysis and positive reinforcement was used in the vocational training of a 19-year-old trainable retarded youth (MA=6 years). The task of polishing shoe skates was analyzed and programmed into 29 steps and was reinforced with praise and money. The trainee learned the task in 13 sessions (approximately 1 month) and was employed…
TRAINING IN INDUSTRY--THE MANAGEMENT OF LEARNING.
ERIC Educational Resources Information Center
BASS, BERNARD M.; VAUGHAN, JAMES A.
THE PRINCIPLES OF LEARNING BEHAVIOR DERIVED THROUGH LABORATORY STUDY CAN BE EXTENDED TO EXPLAIN MUCH OF THE COMPLEX LEARNING REQUIRED IN INDUSTRIAL TRAINING PROGRAMS. A REVIEW OF THE BASIC PRINCIPLES OF HUMAN LEARNING INTRODUCES FOUR BASIC CONCEPTS--DRIVE, STIMULUS, RESPONSE, AND REINFORCER--AND DISCUSSES CLASSICAL AND INSTRUMENTAL CONDITIONING…
What Does a Cultural Approach Offer to Research on Learning?
ERIC Educational Resources Information Center
Hatano, Giyoo; Miyake, Naomi
1991-01-01
The articles of this issue argue that taking culture into consideration in research on learning can enhance understanding of a variety of phenomena related to learning. Discussion reinforces this position and further considers learning processes that are or are not determined by culture. (SLD)
DOT National Transportation Integrated Search
2014-09-01
Corrosion of steel-reinforced concrete bridges is a serious problem facing the WVDOT. This : paper provides an overview of techniques for evaluating the condition of reinforced concrete : bridge elements; methods for modeling the remaining service li...
Learning Theory and Prosocial Behavior
ERIC Educational Resources Information Center
Rosenhan, D. L.
1972-01-01
Although theories of learning which stress the role of reinforcement can help us understand altruistic behaviors, it seems clear that a more complete comprehension calls for an expansion of our notions of learning, such that they incorporate affect and cognition. (Author/JM)
Local Service Learning in Teacher Preparation Program
ERIC Educational Resources Information Center
Nuangchalerm, Prasart
2016-01-01
The local knowledge is simply integrated in education and learning process. This study aims to promote local knowledge in school through service learning. The learning process is employed herbal plants to reinforce students learn how to sustain local knowledge with modern life and 21st century classroom. Participants consisted of 42 pre-service…