Sample records for reinforcement learning methods

  1. A parameter control method in reinforcement learning to rapidly follow unexpected environmental changes.

    PubMed

    Murakoshi, Kazushi; Mizuno, Junya

    2004-11-01

    In order to rapidly follow unexpected environmental changes, we propose a parameter control method in reinforcement learning that changes each of learning parameters in appropriate directions. We determine each appropriate direction on the basis of relationships between behaviors and neuromodulators by considering an emergency as a key word. Computer experiments show that the agents using our proposed method could rapidly respond to unexpected environmental changes, not depending on either two reinforcement learning algorithms (Q-learning and actor-critic (AC) architecture) or two learning problems (discontinuous and continuous state-action problems).

  2. A reward optimization method based on action subrewards in hierarchical reinforcement learning.

    PubMed

    Fu, Yuchen; Liu, Quan; Ling, Xionghong; Cui, Zhiming

    2014-01-01

    Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

  3. Mastery Learning through Individualized Instruction: A Reinforcement Strategy

    ERIC Educational Resources Information Center

    Sagy, John; Ravi, R.; Ananthasayanam, R.

    2009-01-01

    The present study attempts to gauge the effect of individualized instructional methods as a reinforcement strategy for mastery learning. Among various individualized instructional methods, the study focuses on PIM (Programmed Instructional Method) and CAIM (Computer Assisted Instruction Method). Mastery learning is a process where students achieve…

  4. GA-based fuzzy reinforcement learning for control of a magnetic bearing system.

    PubMed

    Lin, C T; Jou, C P

    2000-01-01

    This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.

  5. Reinforcement learning in scheduling

    NASA Technical Reports Server (NTRS)

    Dietterich, Tom G.; Ok, Dokyeong; Zhang, Wei; Tadepalli, Prasad

    1994-01-01

    The goal of this research is to apply reinforcement learning methods to real-world problems like scheduling. In this preliminary paper, we show that learning to solve scheduling problems such as the Space Shuttle Payload Processing and the Automatic Guided Vehicle (AGV) scheduling can be usefully studied in the reinforcement learning framework. We discuss some of the special challenges posed by the scheduling domain to these methods and propose some possible solutions we plan to implement.

  6. A Robust Cooperated Control Method with Reinforcement Learning and Adaptive H∞ Control

    NASA Astrophysics Data System (ADS)

    Obayashi, Masanao; Uchiyama, Shogo; Kuremoto, Takashi; Kobayashi, Kunikazu

    This study proposes a robust cooperated control method combining reinforcement learning with robust control to control the system. A remarkable characteristic of the reinforcement learning is that it doesn't require model formula, however, it doesn't guarantee the stability of the system. On the other hand, robust control system guarantees stability and robustness, however, it requires model formula. We employ both the actor-critic method which is a kind of reinforcement learning with minimal amount of computation to control continuous valued actions and the traditional robust control, that is, H∞ control. The proposed system was compared method with the conventional control method, that is, the actor-critic only used, through the computer simulation of controlling the angle and the position of a crane system, and the simulation result showed the effectiveness of the proposed method.

  7. Model-based reinforcement learning with dimension reduction.

    PubMed

    Tangkaratt, Voot; Morimoto, Jun; Sugiyama, Masashi

    2016-12-01

    The goal of reinforcement learning is to learn an optimal policy which controls an agent to acquire the maximum cumulative reward. The model-based reinforcement learning approach learns a transition model of the environment from data, and then derives the optimal policy using the transition model. However, learning an accurate transition model in high-dimensional environments requires a large amount of data which is difficult to obtain. To overcome this difficulty, in this paper, we propose to combine model-based reinforcement learning with the recently developed least-squares conditional entropy (LSCE) method, which simultaneously performs transition model estimation and dimension reduction. We also further extend the proposed method to imitation learning scenarios. The experimental results show that policy search combined with LSCE performs well for high-dimensional control tasks including real humanoid robot control. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. Rational and mechanistic perspectives on reinforcement learning.

    PubMed

    Chater, Nick

    2009-12-01

    This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: mechanistic and rational. Reinforcement learning is often viewed in mechanistic terms--as describing the operation of aspects of an agent's cognitive and neural machinery. Yet it can also be viewed as a rational level of description, specifically, as describing a class of methods for learning from experience, using minimal background knowledge. This paper considers how rational and mechanistic perspectives differ, and what types of evidence distinguish between them. Reinforcement learning research in the cognitive and brain sciences is often implicitly committed to the mechanistic interpretation. Here the opposite view is put forward: that accounts of reinforcement learning should apply at the rational level, unless there is strong evidence for a mechanistic interpretation. Implications of this viewpoint for reinforcement-based theories in the cognitive and brain sciences are discussed.

  9. On the integration of reinforcement learning and approximate reasoning for control

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1991-01-01

    The author discusses the importance of strengthening the knowledge representation characteristic of reinforcement learning techniques using methods such as approximate reasoning. The ARIC (approximate reasoning-based intelligent control) architecture is an example of such a hybrid approach in which the fuzzy control rules are modified (fine-tuned) using reinforcement learning. ARIC also demonstrates that it is possible to start with an approximately correct control knowledge base and learn to refine this knowledge through further experience. On the other hand, techniques such as the TD (temporal difference) algorithm and Q-learning establish stronger theoretical foundations for their use in adaptive control and also in stability analysis of hybrid reinforcement learning and approximate reasoning-based controllers.

  10. Learning and tuning fuzzy logic controllers through reinforcements.

    PubMed

    Berenji, H R; Khedkar, P

    1992-01-01

    A method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system is presented. It is shown that: the generalized approximate-reasoning-based intelligent control (GARIC) architecture learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward network, which can then adaptively improve performance by using gradient descent methods. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.

  11. Probabilistic Reinforcement Learning in Adults with Autism Spectrum Disorders

    PubMed Central

    Solomon, Marjorie; Smith, Anne C.; Frank, Michael J.; Ly, Stanford; Carter, Cameron S.

    2017-01-01

    Background Autism spectrum disorders (ASDs) can be conceptualized as disorders of learning, however there have been few experimental studies taking this perspective. Methods We examined the probabilistic reinforcement learning performance of 28 adults with ASDs and 30 typically developing adults on a task requiring learning relationships between three stimulus pairs consisting of Japanese characters with feedback that was valid with different probabilities (80%, 70%, and 60%). Both univariate and Bayesian state–space data analytic methods were employed. Hypotheses were based on the extant literature as well as on neurobiological and computational models of reinforcement learning. Results Both groups learned the task after training. However, there were group differences in early learning in the first task block where individuals with ASDs acquired the most frequently accurately reinforced stimulus pair (80%) comparably to typically developing individuals; exhibited poorer acquisition of the less frequently reinforced 70% pair as assessed by state–space learning curves; and outperformed typically developing individuals on the near chance (60%) pair. Individuals with ASDs also demonstrated deficits in using positive feedback to exploit rewarded choices. Conclusions Results support the contention that individuals with ASDs are slower learners. Based on neurobiology and on the results of computational modeling, one interpretation of this pattern of findings is that impairments are related to deficits in flexible updating of reinforcement history as mediated by the orbito-frontal cortex, with spared functioning of the basal ganglia. This hypothesis about the pathophysiology of learning in ASDs can be tested using functional magnetic resonance imaging. PMID:21425243

  12. Deep imitation learning for 3D navigation tasks.

    PubMed

    Hussein, Ahmed; Elyan, Eyad; Gaber, Mohamed Medhat; Jayne, Chrisina

    2018-01-01

    Deep learning techniques have shown success in learning from raw high-dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in imitation learning has been scarcely explored. Imitation learning can be an efficient method to teach intelligent agents by providing a set of demonstrations to learn from. However, generalizing to situations that are not represented in the demonstrations can be challenging, especially in 3D environments. In this paper, we propose a deep imitation learning method to learn navigation tasks from demonstrations in a 3D environment. The supervised policy is refined using active learning in order to generalize to unseen situations. This approach is compared to two popular deep reinforcement learning techniques: deep-Q-networks and Asynchronous actor-critic (A3C). The proposed method as well as the reinforcement learning methods employ deep convolutional neural networks and learn directly from raw visual input. Methods for combining learning from demonstrations and experience are also investigated. This combination aims to join the generalization ability of learning by experience with the efficiency of learning by imitation. The proposed methods are evaluated on 4 navigation tasks in a 3D simulated environment. Navigation tasks are a typical problem that is relevant to many real applications. They pose the challenge of requiring demonstrations of long trajectories to reach the target and only providing delayed rewards (usually terminal) to the agent. The experiments show that the proposed method can successfully learn navigation tasks from raw visual input while learning from experience methods fail to learn an effective policy. Moreover, it is shown that active learning can significantly improve the performance of the initially learned policy using a small number of active samples.

  13. Reinforcement Learning Trees

    PubMed Central

    Zhu, Ruoqing; Zeng, Donglin; Kosorok, Michael R.

    2015-01-01

    In this paper, we introduce a new type of tree-based method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001) under high-dimensional settings. The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach enables linear combination cuts at little extra computational cost. Second, we propose a variable muting procedure that progressively eliminates noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards terminal nodes, where the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method under basic assumptions and discuss rationale in general settings. PMID:26903687

  14. Reinforcement learning state estimator.

    PubMed

    Morimoto, Jun; Doya, Kenji

    2007-03-01

    In this study, we propose a novel use of reinforcement learning for estimating hidden variables and parameters of nonlinear dynamical systems. A critical issue in hidden-state estimation is that we cannot directly observe estimation errors. However, by defining errors of observable variables as a delayed penalty, we can apply a reinforcement learning frame-work to state estimation problems. Specifically, we derive a method to construct a nonlinear state estimator by finding an appropriate feedback input gain using the policy gradient method. We tested the proposed method on single pendulum dynamics and show that the joint angle variable could be successfully estimated by observing only the angular velocity, and vice versa. In addition, we show that we could acquire a state estimator for the pendulum swing-up task in which a swing-up controller is also acquired by reinforcement learning simultaneously. Furthermore, we demonstrate that it is possible to estimate the dynamics of the pendulum itself while the hidden variables are estimated in the pendulum swing-up task. Application of the proposed method to a two-linked biped model is also presented.

  15. Effect of reinforcement learning on coordination of multiangent systems

    NASA Astrophysics Data System (ADS)

    Bukkapatnam, Satish T. S.; Gao, Greg

    2000-12-01

    For effective coordination of distributed environments involving multiagent systems, learning ability of each agent in the environment plays a crucial role. In this paper, we develop a simple group learning method based on reinforcement, and study its effect on coordination through application to a supply chain procurement scenario involving a computer manufacturer. Here, all parties are represented by self-interested, autonomous agents, each capable of performing specific simple tasks. They negotiate with each other to perform complex tasks and thus coordinate supply chain procurement. Reinforcement learning is intended to enable each agent to reach a best negotiable price within a shortest possible time. Our simulations of the application scenario under different learning strategies reveals the positive effects of reinforcement learning on an agent's as well as the system's performance.

  16. Refining Linear Fuzzy Rules by Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap S.; Malkani, Anil

    1996-01-01

    Linear fuzzy rules are increasingly being used in the development of fuzzy logic systems. Radial basis functions have also been used in the antecedents of the rules for clustering in product space which can automatically generate a set of linear fuzzy rules from an input/output data set. Manual methods are usually used in refining these rules. This paper presents a method for refining the parameters of these rules using reinforcement learning which can be applied in domains where supervised input-output data is not available and reinforcements are received only after a long sequence of actions. This is shown for a generalization of radial basis functions. The formation of fuzzy rules from data and their automatic refinement is an important step in closing the gap between the application of reinforcement learning methods in the domains where only some limited input-output data is available.

  17. Learning and tuning fuzzy logic controllers through reinforcements

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap

    1992-01-01

    A new method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system is presented. In particular, our Generalized Approximate Reasoning-based Intelligent Control (GARIC) architecture: (1) learns and tunes a fuzzy logic controller even when only weak reinforcements, such as a binary failure signal, is available; (2) introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; (3) introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and (4) learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward network, which can then adaptively improve performance by using gradient descent methods. We extend the AHC algorithm of Barto, Sutton, and Anderson to include the prior control knowledge of human operators. The GARIC architecture is applied to a cart-pole balancing system and has demonstrated significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.

  18. Reinforcement Learning with Orthonormal Basis Adaptation Based on Activity-Oriented Index Allocation

    NASA Astrophysics Data System (ADS)

    Satoh, Hideki

    An orthonormal basis adaptation method for function approximation was developed and applied to reinforcement learning with multi-dimensional continuous state space. First, a basis used for linear function approximation of a control function is set to an orthonormal basis. Next, basis elements with small activities are replaced with other candidate elements as learning progresses. As this replacement is repeated, the number of basis elements with large activities increases. Example chaos control problems for multiple logistic maps were solved, demonstrating that the method for adapting an orthonormal basis can modify a basis while holding the orthonormality in accordance with changes in the environment to improve the performance of reinforcement learning and to eliminate the adverse effects of redundant noisy states.

  19. Framework for robot skill learning using reinforcement learning

    NASA Astrophysics Data System (ADS)

    Wei, Yingzi; Zhao, Mingyang

    2003-09-01

    Robot acquiring skill is a process similar to human skill learning. Reinforcement learning (RL) is an on-line actor critic method for a robot to develop its skill. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. We present an augmented reward function that provides a new way for RL controller to incorporate prior knowledge and experience into the RL controller. Also, the difference form of augmented reward function is considered carefully. The additional reward beyond conventional reward will provide more heuristic information for RL. In this paper, we present a strategy for the task of complex skill learning. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. The new form of value function is introduced to attain smooth motion switching swiftly. We present a formal, but practical, framework for robot skill learning and also illustrate with an example the utility of method for learning skilled robot control on line.

  20. Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning

    PubMed Central

    Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

    2015-01-01

    Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction. PMID:26065018

  1. Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning.

    PubMed

    Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

    2015-01-01

    Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction.

  2. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning.

    PubMed

    Feng, Yuntian; Zhang, Hongjun; Hao, Wenning; Chen, Gang

    2017-01-01

    We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q -Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

  3. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning

    PubMed Central

    Zhang, Hongjun; Chen, Gang

    2017-01-01

    We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q-Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score. PMID:28894463

  4. Synchronization of Chaotic Systems without Direct Connections Using Reinforcement Learning

    NASA Astrophysics Data System (ADS)

    Sato, Norihisa; Adachi, Masaharu

    In this paper, we propose a control method for the synchronization of chaotic systems that does not require the systems to be connected, unlike existing methods such as that proposed by Pecora and Carroll in 1990. The method is based on the reinforcement learning algorithm. We apply our method to two discrete-time chaotic systems with mismatched parameters and achieve M step delay synchronization. Moreover, we extend the proposed method to the synchronization of continuous-time chaotic systems.

  5. Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.

    PubMed

    Ren, Zhipeng; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Zhipeng Ren; Daoyi Dong; Huaxiong Li; Chunlin Chen; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Ren, Zhipeng

    2018-06-01

    In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.

  6. Learning and tuning fuzzy logic controllers through reinforcements

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap

    1992-01-01

    This paper presents a new method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system. In particular, our generalized approximate reasoning-based intelligent control (GARIC) architecture (1) learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; (2) introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; (3) introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and (4) learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward neural network, which can then adaptively improve performance by using gradient descent methods. We extend the AHC algorithm of Barto et al. (1983) to include the prior control knowledge of human operators. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.

  7. A reinforcement learning-based architecture for fuzzy logic control

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1992-01-01

    This paper introduces a new method for learning to refine a rule-based fuzzy logic controller. A reinforcement learning technique is used in conjunction with a multilayer neural network model of a fuzzy controller. The approximate reasoning based intelligent control (ARIC) architecture proposed here learns by updating its prediction of the physical system's behavior and fine tunes a control knowledge base. Its theory is related to Sutton's temporal difference (TD) method. Because ARIC has the advantage of using the control knowledge of an experienced operator and fine tuning it through the process of learning, it learns faster than systems that train networks from scratch. The approach is applied to a cart-pole balancing system.

  8. Fuzzy Q-Learning for Generalization of Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1996-01-01

    Fuzzy Q-Learning, introduced earlier by the author, is an extension of Q-Learning into fuzzy environments. GARIC is a methodology for fuzzy reinforcement learning. In this paper, we introduce GARIC-Q, a new method for doing incremental Dynamic Programming using a society of intelligent agents which are controlled at the top level by Fuzzy Q-Learning and at the local level, each agent learns and operates based on GARIC. GARIC-Q improves the speed and applicability of Fuzzy Q-Learning through generalization of input space by using fuzzy rules and bridges the gap between Q-Learning and rule based intelligent systems.

  9. Reciprocity Family Counseling: A Multi-Ethnic Model.

    ERIC Educational Resources Information Center

    Penrose, David M.

    The Reciprocity Family Counseling Method involves learning principles of behavior modification including selective reinforcement, behavioral contracting, self-correction, and over-correction. Selective reinforcement refers to the recognition and modification of parent/child responses and reinforcers. Parents and children are asked to identify…

  10. Time-Extended Policies in Mult-Agent Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Agogino, Adrian K.

    2004-01-01

    Reinforcement learning methods perform well in many domains where a single agent needs to take a sequence of actions to perform a task. These methods use sequences of single-time-step rewards to create a policy that tries to maximize a time-extended utility, which is a (possibly discounted) sum of these rewards. In this paper we build on our previous work showing how these methods can be extended to a multi-agent environment where each agent creates its own policy that works towards maximizing a time-extended global utility over all agents actions. We show improved methods for creating time-extended utilities for the agents that are both "aligned" with the global utility and "learnable." We then show how to crate single-time-step rewards while avoiding the pi fall of having rewards aligned with the global reward leading to utilities not aligned with the global utility. Finally, we apply these reward functions to the multi-agent Gridworld problem. We explicitly quantify a utility's learnability and alignment, and show that reinforcement learning agents using the prescribed reward functions successfully tradeoff learnability and alignment. As a result they outperform both global (e.g., team games ) and local (e.g., "perfectly learnable" ) reinforcement learning solutions by as much as an order of magnitude.

  11. Effective Reinforcement Techniques in Elementary Physical Education: The Key to Behavior Management

    ERIC Educational Resources Information Center

    Downing, John; Keating, Tedd; Bennett, Carl

    2005-01-01

    The ability to shape appropriate behavior while extinguishing misbehavior is critical to teaching and learning in physical education. The scientific principles that affect student learning in the gymnasium also apply to the methods teachers use to influence social behaviors. Research indicates that reinforcement strategies are more effective than…

  12. Research progress of microbial corrosion of reinforced concrete structure

    NASA Astrophysics Data System (ADS)

    Li, Shengli; Li, Dawang; Jiang, Nan; Wang, Dongwei

    2011-04-01

    Microbial corrosion of reinforce concrete structure is a new branch of learning. This branch deals with civil engineering , environment engineering, biology, chemistry, materials science and so on and is a interdisciplinary area. Research progress of the causes, research methods and contents of microbial corrosion of reinforced concrete structure is described. The research in the field is just beginning and concerted effort is needed to go further into the mechanism of reinforce concrete structure and assess the security and natural life of reinforce concrete structure under the special condition and put forward the protective methods.

  13. Mobile robots exploration through cnn-based reinforcement learning.

    PubMed

    Tai, Lei; Liu, Ming

    2016-01-01

    Exploration in an unknown environment is an elemental application for mobile robots. In this paper, we outlined a reinforcement learning method aiming for solving the exploration problem in a corridor environment. The learning model took the depth image from an RGB-D sensor as the only input. The feature representation of the depth image was extracted through a pre-trained convolutional-neural-networks model. Based on the recent success of deep Q-network on artificial intelligence, the robot controller achieved the exploration and obstacle avoidance abilities in several different simulated environments. It is the first time that the reinforcement learning is used to build an exploration strategy for mobile robots through raw sensor information.

  14. Towards autonomous neuroprosthetic control using Hebbian reinforcement learning.

    PubMed

    Mahmoudi, Babak; Pohlmeyer, Eric A; Prins, Noeline W; Geng, Shijia; Sanchez, Justin C

    2013-12-01

    Our goal was to design an adaptive neuroprosthetic controller that could learn the mapping from neural states to prosthetic actions and automatically adjust adaptation using only a binary evaluative feedback as a measure of desirability/undesirability of performance. Hebbian reinforcement learning (HRL) in a connectionist network was used for the design of the adaptive controller. The method combines the efficiency of supervised learning with the generality of reinforcement learning. The convergence properties of this approach were studied using both closed-loop control simulations and open-loop simulations that used primate neural data from robot-assisted reaching tasks. The HRL controller was able to perform classification and regression tasks using its episodic and sequential learning modes, respectively. In our experiments, the HRL controller quickly achieved convergence to an effective control policy, followed by robust performance. The controller also automatically stopped adapting the parameters after converging to a satisfactory control policy. Additionally, when the input neural vector was reorganized, the controller resumed adaptation to maintain performance. By estimating an evaluative feedback directly from the user, the HRL control algorithm may provide an efficient method for autonomous adaptation of neuroprosthetic systems. This method may enable the user to teach the controller the desired behavior using only a simple feedback signal.

  15. Hierarchical extreme learning machine based reinforcement learning for goal localization

    NASA Astrophysics Data System (ADS)

    AlDahoul, Nouar; Zaw Htike, Zaw; Akmeliawati, Rini

    2017-03-01

    The objective of goal localization is to find the location of goals in noisy environments. Simple actions are performed to move the agent towards the goal. The goal detector should be capable of minimizing the error between the predicted locations and the true ones. Few regions need to be processed by the agent to reduce the computational effort and increase the speed of convergence. In this paper, reinforcement learning (RL) method was utilized to find optimal series of actions to localize the goal region. The visual data, a set of images, is high dimensional unstructured data and needs to be represented efficiently to get a robust detector. Different deep Reinforcement models have already been used to localize a goal but most of them take long time to learn the model. This long learning time results from the weights fine tuning stage that is applied iteratively to find an accurate model. Hierarchical Extreme Learning Machine (H-ELM) was used as a fast deep model that doesn’t fine tune the weights. In other words, hidden weights are generated randomly and output weights are calculated analytically. H-ELM algorithm was used in this work to find good features for effective representation. This paper proposes a combination of Hierarchical Extreme learning machine and Reinforcement learning to find an optimal policy directly from visual input. This combination outperforms other methods in terms of accuracy and learning speed. The simulations and results were analysed by using MATLAB.

  16. A Novel Clustering Method Curbing the Number of States in Reinforcement Learning

    NASA Astrophysics Data System (ADS)

    Kotani, Naoki; Nunobiki, Masayuki; Taniguchi, Kenji

    We propose an efficient state-space construction method for a reinforcement learning. Our method controls the number of categories with improving the clustering method of Fuzzy ART which is an autonomous state-space construction method. The proposed method represents weight vector as the mean value of input vectors in order to curb the number of new categories and eliminates categories whose state values are low to curb the total number of categories. As the state value is updated, the size of category becomes small to learn policy strictly. We verified the effectiveness of the proposed method with simulations of a reaching problem for a two-link robot arm. We confirmed that the number of categories was reduced and the agent achieved the complex task quickly.

  17. Working Memory and Reinforcement Schedule Jointly Determine Reinforcement Learning in Children: Potential Implications for Behavioral Parent Training

    PubMed Central

    Segers, Elien; Beckers, Tom; Geurts, Hilde; Claes, Laurence; Danckaerts, Marina; van der Oord, Saskia

    2018-01-01

    Introduction: Behavioral Parent Training (BPT) is often provided for childhood psychiatric disorders. These disorders have been shown to be associated with working memory impairments. BPT is based on operant learning principles, yet how operant principles shape behavior (through the partial reinforcement (PRF) extinction effect, i.e., greater resistance to extinction that is created when behavior is reinforced partially rather than continuously) and the potential role of working memory therein is scarcely studied in children. This study explored the PRF extinction effect and the role of working memory therein using experimental tasks in typically developing children. Methods: Ninety-seven children (age 6–10) completed a working memory task and an operant learning task, in which children acquired a response-sequence rule under either continuous or PRF (120 trials), followed by an extinction phase (80 trials). Data of 88 children were used for analysis. Results: The PRF extinction effect was confirmed: We observed slower acquisition and extinction in the PRF condition as compared to the continuous reinforcement (CRF) condition. Working memory was negatively related to acquisition but not extinction performance. Conclusion: Both reinforcement contingencies and working memory relate to acquisition performance. Potential implications for BPT are that decreasing working memory load may enhance the chance of optimally learning through reinforcement. PMID:29643822

  18. Optimal control in microgrid using multi-agent reinforcement learning.

    PubMed

    Li, Fu-Dong; Wu, Min; He, Yong; Chen, Xin

    2012-11-01

    This paper presents an improved reinforcement learning method to minimize electricity costs on the premise of satisfying the power balance and generation limit of units in a microgrid with grid-connected mode. Firstly, the microgrid control requirements are analyzed and the objective function of optimal control for microgrid is proposed. Then, a state variable "Average Electricity Price Trend" which is used to express the most possible transitions of the system is developed so as to reduce the complexity and randomicity of the microgrid, and a multi-agent architecture including agents, state variables, action variables and reward function is formulated. Furthermore, dynamic hierarchical reinforcement learning, based on change rate of key state variable, is established to carry out optimal policy exploration. The analysis shows that the proposed method is beneficial to handle the problem of "curse of dimensionality" and speed up learning in the unknown large-scale world. Finally, the simulation results under JADE (Java Agent Development Framework) demonstrate the validity of the presented method in optimal control for a microgrid with grid-connected mode. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

  19. Within- and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory.

    PubMed

    Collins, Anne G E; Frank, Michael J

    2018-03-06

    Learning from rewards and punishments is essential to survival and facilitates flexible human behavior. It is widely appreciated that multiple cognitive and reinforcement learning systems contribute to decision-making, but the nature of their interactions is elusive. Here, we leverage methods for extracting trial-by-trial indices of reinforcement learning (RL) and working memory (WM) in human electro-encephalography to reveal single-trial computations beyond that afforded by behavior alone. Neural dynamics confirmed that increases in neural expectation were predictive of reduced neural surprise in the following feedback period, supporting central tenets of RL models. Within- and cross-trial dynamics revealed a cooperative interplay between systems for learning, in which WM contributes expectations to guide RL, despite competition between systems during choice. Together, these results provide a deeper understanding of how multiple neural systems interact for learning and decision-making and facilitate analysis of their disruption in clinical populations.

  20. Effective reinforcement learning following cerebellar damage requires a balance between exploration and motor noise.

    PubMed

    Therrien, Amanda S; Wolpert, Daniel M; Bastian, Amy J

    2016-01-01

    Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. © The Author (2015). Published by Oxford University Press on behalf of the Guarantors of Brain.

  1. Effective reinforcement learning following cerebellar damage requires a balance between exploration and motor noise

    PubMed Central

    Therrien, Amanda S.; Wolpert, Daniel M.

    2016-01-01

    Abstract See Miall and Galea (doi: 10.1093/awv343 ) for a scientific commentary on this article. Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. PMID:26626368

  2. Apprenticeship Learning: Learning to Schedule from Human Experts

    DTIC Science & Technology

    2016-06-09

    approaches to learning such models are based on Markov models, such as reinforcement learning or inverse reinforcement learning (Busoniu, Babuska, and De...via inverse reinforcement learning. In ICML. Barto, A. G., and Mahadevan, S. 2003. Recent advances in hierarchical reinforcement learning. Discrete...of tasks with temporal constraints. In Proc. AAAI, 2110–2116. Odom, P., and Natarajan, S. 2015. Active advice seeking for inverse reinforcement

  3. Reinforcement learning for a biped robot based on a CPG-actor-critic method.

    PubMed

    Nakamura, Yutaka; Mori, Takeshi; Sato, Masa-aki; Ishii, Shin

    2007-08-01

    Animals' rhythmic movements, such as locomotion, are considered to be controlled by neural circuits called central pattern generators (CPGs), which generate oscillatory signals. Motivated by this biological mechanism, studies have been conducted on the rhythmic movements controlled by CPG. As an autonomous learning framework for a CPG controller, we propose in this article a reinforcement learning method we call the "CPG-actor-critic" method. This method introduces a new architecture to the actor, and its training is roughly based on a stochastic policy gradient algorithm presented recently. We apply this method to an automatic acquisition problem of control for a biped robot. Computer simulations show that training of the CPG can be successfully performed by our method, thus allowing the biped robot to not only walk stably but also adapt to environmental changes.

  4. Characterizing Reinforcement Learning Methods through Parameterized Learning Problems

    DTIC Science & Technology

    2011-06-03

    extraneous. The agent could potentially adapt these representational aspects by applying methods from feature selection ( Kolter and Ng, 2009; Petrik et al...611–616. AAAI Press. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature selection in least-squares temporal difference learning. In A. P

  5. Beyond adaptive-critic creative learning for intelligent mobile robots

    NASA Astrophysics Data System (ADS)

    Liao, Xiaoqun; Cao, Ming; Hall, Ernest L.

    2001-10-01

    Intelligent industrial and mobile robots may be considered proven technology in structured environments. Teach programming and supervised learning methods permit solutions to a variety of applications. However, we believe that to extend the operation of these machines to more unstructured environments requires a new learning method. Both unsupervised learning and reinforcement learning are potential candidates for these new tasks. The adaptive critic method has been shown to provide useful approximations or even optimal control policies to non-linear systems. The purpose of this paper is to explore the use of new learning methods that goes beyond the adaptive critic method for unstructured environments. The adaptive critic is a form of reinforcement learning. A critic element provides only high level grading corrections to a cognition module that controls the action module. In the proposed system the critic's grades are modeled and forecasted, so that an anticipated set of sub-grades are available to the cognition model. The forecasting grades are interpolated and are available on the time scale needed by the action model. The success of the system is highly dependent on the accuracy of the forecasted grades and adaptability of the action module. Examples from the guidance of a mobile robot are provided to illustrate the method for simple line following and for the more complex navigation and control in an unstructured environment. The theory presented that is beyond the adaptive critic may be called creative theory. Creative theory is a form of learning that models the highest level of human learning - imagination. The application of the creative theory appears to not only be to mobile robots but also to many other forms of human endeavor such as educational learning and business forecasting. Reinforcement learning such as the adaptive critic may be applied to known problems to aid in the discovery of their solutions. The significance of creative theory is that it permits the discovery of the unknown problems, ones that are not yet recognized but may be critical to survival or success.

  6. Refining fuzzy logic controllers with machine learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1994-01-01

    In this paper, we describe the GARIC (Generalized Approximate Reasoning-Based Intelligent Control) architecture, which learns from its past performance and modifies the labels in the fuzzy rules to improve performance. It uses fuzzy reinforcement learning which is a hybrid method of fuzzy logic and reinforcement learning. This technology can simplify and automate the application of fuzzy logic control to a variety of systems. GARIC has been applied in simulation studies of the Space Shuttle rendezvous and docking experiments. It has the potential of being applied in other aerospace systems as well as in consumer products such as appliances, cameras, and cars.

  7. Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions.

    PubMed

    Tamosiunaite, Minija; Asfour, Tamim; Wörgötter, Florentin

    2009-03-01

    Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult.

  8. Utilising reinforcement learning to develop strategies for driving auditory neural implants.

    PubMed

    Lee, Geoffrey W; Zambetta, Fabio; Li, Xiaodong; Paolini, Antonio G

    2016-08-01

    In this paper we propose a novel application of reinforcement learning to the area of auditory neural stimulation. We aim to develop a simulation environment which is based off real neurological responses to auditory and electrical stimulation in the cochlear nucleus (CN) and inferior colliculus (IC) of an animal model. Using this simulator we implement closed loop reinforcement learning algorithms to determine which methods are most effective at learning effective acoustic neural stimulation strategies. By recording a comprehensive set of acoustic frequency presentations and neural responses from a set of animals we created a large database of neural responses to acoustic stimulation. Extensive electrical stimulation in the CN and the recording of neural responses in the IC provides a mapping of how the auditory system responds to electrical stimuli. The combined dataset is used as the foundation for the simulator, which is used to implement and test learning algorithms. Reinforcement learning, utilising a modified n-Armed Bandit solution, is implemented to demonstrate the model's function. We show the ability to effectively learn stimulation patterns which mimic the cochlea's ability to covert acoustic frequencies to neural activity. Time taken to learn effective replication using neural stimulation takes less than 20 min under continuous testing. These results show the utility of reinforcement learning in the field of neural stimulation. These results can be coupled with existing sound processing technologies to develop new auditory prosthetics that are adaptable to the recipients current auditory pathway. The same process can theoretically be abstracted to other sensory and motor systems to develop similar electrical replication of neural signals.

  9. Policy improvement by a model-free Dyna architecture.

    PubMed

    Hwang, Kao-Shing; Lo, Chia-Yue

    2013-05-01

    The objective of this paper is to accelerate the process of policy improvement in reinforcement learning. The proposed Dyna-style system combines two learning schemes, one of which utilizes a temporal difference method for direct learning; the other uses relative values for indirect learning in planning between two successive direct learning cycles. Instead of establishing a complicated world model, the approach introduces a simple predictor of average rewards to actor-critic architecture in the simulation (planning) mode. The relative value of a state, defined as the accumulated differences between immediate reward and average reward, is used to steer the improvement process in the right direction. The proposed learning scheme is applied to control a pendulum system for tracking a desired trajectory to demonstrate its adaptability and robustness. Through reinforcement signals from the environment, the system takes the appropriate action to drive an unknown dynamic to track desired outputs in few learning cycles. Comparisons are made between the proposed model-free method, a connectionist adaptive heuristic critic, and an advanced method of Dyna-Q learning in the experiments of labyrinth exploration. The proposed method outperforms its counterparts in terms of elapsed time and convergence rate.

  10. The role of multisensor data fusion in neuromuscular control of a sagittal arm with a pair of muscles using actor-critic reinforcement learning method.

    PubMed

    Golkhou, V; Parnianpour, M; Lucas, C

    2004-01-01

    In this study, we consider the role of multisensor data fusion in neuromuscular control using an actor-critic reinforcement learning method. The model we use is a single link system actuated by a pair of muscles that are excited with alpha and gamma signals. Various physiological sensor information such as proprioception, spindle sensors, and Golgi tendon organs have been integrated to achieve an oscillatory movement with variable amplitude and frequency, while achieving a stable movement with minimum metabolic cost and coactivation. The system is highly nonlinear in all its physical and physiological attributes. Transmission delays are included in the afferent and efferent neural paths to account for a more accurate representation of the reflex loops. This paper proposes a reinforcement learning method with an Actor-Critic architecture instead of middle and low level of central nervous system (CNS). The Actor in this structure is a two layer feedforward neural network and the Critic is a model of the cerebellum. The Critic is trained by the State-Action-Reward-State-Action (SARSA) method. The Critic will train the Actor by supervisory learning based on previous experiences. The reinforcement signal in SARSA is evaluated based on available alternatives concerning the concept of multisensor data fusion. The effectiveness and the biological plausibility of the present model are demonstrated by several simulations. The system showed excellent tracking capability when we integrated the available sensor information. Addition of a penalty for activation of muscles resulted in much lower muscle coactivation while keeping the movement stable.

  11. Reinforcement learning agents providing advice in complex video games

    NASA Astrophysics Data System (ADS)

    Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa

    2014-01-01

    This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

  12. Rational and Mechanistic Perspectives on Reinforcement Learning

    ERIC Educational Resources Information Center

    Chater, Nick

    2009-01-01

    This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…

  13. Active-learning strategies: the use of a game to reinforce learning in nursing education. A case study.

    PubMed

    Boctor, Lisa

    2013-03-01

    The majority of nursing students are kinesthetic learners, preferring a hands-on, active approach to education. Research shows that active-learning strategies can increase student learning and satisfaction. This study looks at the use of one active-learning strategy, a Jeopardy-style game, 'Nursopardy', to reinforce Fundamentals of Nursing material, aiding in students' preparation for a standardized final exam. The game was created keeping students varied learning styles and the NCLEX blueprint in mind. The blueprint was used to create 5 categories, with 26 total questions. Student survey results, using a five-point Likert scale showed that they did find this learning method enjoyable and beneficial to learning. More research is recommended regarding learning outcomes, when using active-learning strategies, such as games. Copyright © 2012 Elsevier Ltd. All rights reserved.

  14. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning.

    PubMed

    Pilarski, Patrick M; Dawson, Michael R; Degris, Thomas; Fahimi, Farbod; Carey, Jason P; Sutton, Richard S

    2011-01-01

    As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis. © 2011 IEEE

  15. Optimal and Autonomous Control Using Reinforcement Learning: A Survey.

    PubMed

    Kiumarsi, Bahare; Vamvoudakis, Kyriakos G; Modares, Hamidreza; Lewis, Frank L

    2018-06-01

    This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.

  16. An intelligent agent for optimal river-reservoir system management

    NASA Astrophysics Data System (ADS)

    Rieker, Jeffrey D.; Labadie, John W.

    2012-09-01

    A generalized software package is presented for developing an intelligent agent for stochastic optimization of complex river-reservoir system management and operations. Reinforcement learning is an approach to artificial intelligence for developing a decision-making agent that learns the best operational policies without the need for explicit probabilistic models of hydrologic system behavior. The agent learns these strategies experientially in a Markov decision process through observational interaction with the environment and simulation of the river-reservoir system using well-calibrated models. The graphical user interface for the reinforcement learning process controller includes numerous learning method options and dynamic displays for visualizing the adaptive behavior of the agent. As a case study, the generalized reinforcement learning software is applied to developing an intelligent agent for optimal management of water stored in the Truckee river-reservoir system of California and Nevada for the purpose of streamflow augmentation for water quality enhancement. The intelligent agent successfully learns long-term reservoir operational policies that specifically focus on mitigating water temperature extremes during persistent drought periods that jeopardize the survival of threatened and endangered fish species.

  17. Reinforcement Learning Strategies for Clinical Trials in Non-small Cell Lung Cancer

    PubMed Central

    Zhao, Yufan; Zeng, Donglin; Socinski, Mark A.; Kosorok, Michael R.

    2010-01-01

    Summary Typical regimens for advanced metastatic stage IIIB/IV non-small cell lung cancer (NSCLC) consist of multiple lines of treatment. We present an adaptive reinforcement learning approach to discover optimal individualized treatment regimens from a specially designed clinical trial (a “clinical reinforcement trial”) of an experimental treatment for patients with advanced NSCLC who have not been treated previously with systemic therapy. In addition to the complexity of the problem of selecting optimal compounds for first and second-line treatments based on prognostic factors, another primary goal is to determine the optimal time to initiate second-line therapy, either immediately or delayed after induction therapy, yielding the longest overall survival time. A reinforcement learning method called Q-learning is utilized which involves learning an optimal regimen from patient data generated from the clinical reinforcement trial. Approximating the Q-function with time-indexed parameters can be achieved by using a modification of support vector regression which can utilize censored data. Within this framework, a simulation study shows that the procedure can extract optimal regimens for two lines of treatment directly from clinical data without prior knowledge of the treatment effect mechanism. In addition, we demonstrate that the design reliably selects the best initial time for second-line therapy while taking into account the heterogeneity of NSCLC across patients. PMID:21385164

  18. Negative reinforcement learning is affected in substance dependence.

    PubMed

    Thompson, Laetitia L; Claus, Eric D; Mikulich-Gilbertson, Susan K; Banich, Marie T; Crowley, Thomas; Krmpotich, Theodore; Miller, David; Tanabe, Jody

    2012-06-01

    Negative reinforcement results in behavior to escape or avoid an aversive outcome. Withdrawal symptoms are purported to be negative reinforcers in perpetuating substance dependence, but little is known about negative reinforcement learning in this population. The purpose of this study was to examine reinforcement learning in substance dependent individuals (SDI), with an emphasis on assessing negative reinforcement learning. We modified the Iowa Gambling Task to separately assess positive and negative reinforcement. We hypothesized that SDI would show differences in negative reinforcement learning compared to controls and we investigated whether learning differed as a function of the relative magnitude or frequency of the reinforcer. Thirty subjects dependent on psychostimulants were compared with 28 community controls on a decision making task that manipulated outcome frequencies and magnitudes and required an action to avoid a negative outcome. SDI did not learn to avoid negative outcomes to the same degree as controls. This difference was driven by the magnitude, not the frequency, of negative feedback. In contrast, approach behaviors in response to positive reinforcement were similar in both groups. Our findings are consistent with a specific deficit in negative reinforcement learning in SDI. SDI were relatively insensitive to the magnitude, not frequency, of loss. If this generalizes to drug-related stimuli, it suggests that repeated episodes of withdrawal may drive relapse more than the severity of a single episode. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  19. An Upside to Reward Sensitivity: The Hippocampus Supports Enhanced Reinforcement Learning in Adolescence.

    PubMed

    Davidow, Juliet Y; Foerde, Karin; Galván, Adriana; Shohamy, Daphna

    2016-10-05

    Adolescents are notorious for engaging in reward-seeking behaviors, a tendency attributed to heightened activity in the brain's reward systems during adolescence. It has been suggested that reward sensitivity in adolescence might be adaptive, but evidence of an adaptive role has been scarce. Using a probabilistic reinforcement learning task combined with reinforcement learning models and fMRI, we found that adolescents showed better reinforcement learning and a stronger link between reinforcement learning and episodic memory for rewarding outcomes. This behavioral benefit was related to heightened prediction error-related BOLD activity in the hippocampus and to stronger functional connectivity between the hippocampus and the striatum at the time of reinforcement. These findings reveal an important role for the hippocampus in reinforcement learning in adolescence and suggest that reward sensitivity in adolescence is related to adaptive differences in how adolescents learn from experience. Copyright © 2016 Elsevier Inc. All rights reserved.

  20. Reinforcement interval type-2 fuzzy controller design by online rule generation and q-value-aided ant colony optimization.

    PubMed

    Juang, Chia-Feng; Hsu, Chia-Hung

    2009-12-01

    This paper proposes a new reinforcement-learning method using online rule generation and Q-value-aided ant colony optimization (ORGQACO) for fuzzy controller design. The fuzzy controller is based on an interval type-2 fuzzy system (IT2FS). The antecedent part in the designed IT2FS uses interval type-2 fuzzy sets to improve controller robustness to noise. There are initially no fuzzy rules in the IT2FS. The ORGQACO concurrently designs both the structure and parameters of an IT2FS. We propose an online interval type-2 rule generation method for the evolution of system structure and flexible partitioning of the input space. Consequent part parameters in an IT2FS are designed using Q -values and the reinforcement local-global ant colony optimization algorithm. This algorithm selects the consequent part from a set of candidate actions according to ant pheromone trails and Q-values, both of which are updated using reinforcement signals. The ORGQACO design method is applied to the following three control problems: 1) truck-backing control; 2) magnetic-levitation control; and 3) chaotic-system control. The ORGQACO is compared with other reinforcement-learning methods to verify its efficiency and effectiveness. Comparisons with type-1 fuzzy systems verify the noise robustness property of using an IT2FS.

  1. Market Model for Resource Allocation in Emerging Sensor Networks with Reinforcement Learning

    PubMed Central

    Zhang, Yue; Song, Bin; Zhang, Ying; Du, Xiaojiang; Guizani, Mohsen

    2016-01-01

    Emerging sensor networks (ESNs) are an inevitable trend with the development of the Internet of Things (IoT), and intend to connect almost every intelligent device. Therefore, it is critical to study resource allocation in such an environment, due to the concern of efficiency, especially when resources are limited. By viewing ESNs as multi-agent environments, we model them with an agent-based modelling (ABM) method and deal with resource allocation problems with market models, after describing users’ patterns. Reinforcement learning methods are introduced to estimate users’ patterns and verify the outcomes in our market models. Experimental results show the efficiency of our methods, which are also capable of guiding topology management. PMID:27916841

  2. Using Optimal Combination of Teaching-Learning Methods (Open Book Assignment and Group Tutorials) as Revision Exercises to Improve Learning Outcome in Low Achievers in Biochemistry

    ERIC Educational Resources Information Center

    Rajappa, Medha; Bobby, Zachariah; Nandeesha, H.; Suryapriya, R.; Ragul, Anithasri; Yuvaraj, B.; Revathy, G.; Priyadarssini, M.

    2016-01-01

    Graduate medical students of India are taught Biochemistry by didactic lectures and they hardly get any opportunity to clarify their doubts and reinforce the concepts which they learn in these lectures. We used a combination of teaching-learning (T-L) methods (open book assignment followed by group tutorials) to study their efficacy in improving…

  3. MANUAL OF ADMINISTRATION AND RECORDING METHODS FOR THE STAATS "MOTIVATED LEARNING" READING PROCEDURE.

    ERIC Educational Resources Information Center

    STAATS, ARTHUR W.; AND OTHERS

    THE STAATS MOTIVATED LEARNING READING PROCEDURE IS AN APPLICATION OF AN INTEGRATED-FUNCTIONAL APPROACH TO LEARNING IN THE AREA OF READING. THE METHOD INVOLVES A SYSTEM OF EXTRINSIC REINFORCEMENT WHICH EMPLOYS TOKENS BACKED UP BY A MONETARY REWARD. THE STUDENT REPORTS TO THE PROGRAM ADMINISTRATOR SOME ITEM FOR WHICH HE WOULD LIKE TO WORK, SUCH AS A…

  4. Reinforcement Learning in Information Searching

    ERIC Educational Resources Information Center

    Cen, Yonghua; Gan, Liren; Bai, Chen

    2013-01-01

    Introduction: The study seeks to answer two questions: How do university students learn to use correct strategies to conduct scholarly information searches without instructions? and, What are the differences in learning mechanisms between users at different cognitive levels? Method: Two groups of users, thirteen first year undergraduate students…

  5. Cooperative Education Is a Superior Strategy for Using Basic Learning Processes.

    ERIC Educational Resources Information Center

    Reed, V. Gerald

    Cooperative education is a learning strategy that fits very well with basic laws of learning. In fact, several basic important learning processes are far better adapted to the cooperative education strategy than to methods that lean entirely on classroom instruction. For instance, cooperative education affords more opportunities for reinforcement,…

  6. Roles of OA1 octopamine receptor and Dop1 dopamine receptor in mediating appetitive and aversive reinforcement revealed by RNAi studies

    PubMed Central

    Awata, Hiroko; Wakuda, Ryo; Ishimaru, Yoshiyasu; Matsuoka, Yuji; Terao, Kanta; Katata, Satomi; Matsumoto, Yukihisa; Hamanaka, Yoshitaka; Noji, Sumihare; Mito, Taro; Mizunami, Makoto

    2016-01-01

    Revealing reinforcing mechanisms in associative learning is important for elucidation of brain mechanisms of behavior. In mammals, dopamine neurons are thought to mediate both appetitive and aversive reinforcement signals. Studies using transgenic fruit-flies suggested that dopamine neurons mediate both appetitive and aversive reinforcements, through the Dop1 dopamine receptor, but our studies using octopamine and dopamine receptor antagonists and using Dop1 knockout crickets suggested that octopamine neurons mediate appetitive reinforcement and dopamine neurons mediate aversive reinforcement in associative learning in crickets. To fully resolve this issue, we examined the effects of silencing of expression of genes that code the OA1 octopamine receptor and Dop1 and Dop2 dopamine receptors by RNAi in crickets. OA1-silenced crickets exhibited impairment in appetitive learning with water but not in aversive learning with sodium chloride solution, while Dop1-silenced crickets exhibited impairment in aversive learning but not in appetitive learning. Dop2-silenced crickets showed normal scores in both appetitive learning and aversive learning. The results indicate that octopamine neurons mediate appetitive reinforcement via OA1 and that dopamine neurons mediate aversive reinforcement via Dop1 in crickets, providing decisive evidence that neurotransmitters and receptors that mediate appetitive reinforcement indeed differ among different species of insects. PMID:27412401

  7. Roles of OA1 octopamine receptor and Dop1 dopamine receptor in mediating appetitive and aversive reinforcement revealed by RNAi studies.

    PubMed

    Awata, Hiroko; Wakuda, Ryo; Ishimaru, Yoshiyasu; Matsuoka, Yuji; Terao, Kanta; Katata, Satomi; Matsumoto, Yukihisa; Hamanaka, Yoshitaka; Noji, Sumihare; Mito, Taro; Mizunami, Makoto

    2016-07-14

    Revealing reinforcing mechanisms in associative learning is important for elucidation of brain mechanisms of behavior. In mammals, dopamine neurons are thought to mediate both appetitive and aversive reinforcement signals. Studies using transgenic fruit-flies suggested that dopamine neurons mediate both appetitive and aversive reinforcements, through the Dop1 dopamine receptor, but our studies using octopamine and dopamine receptor antagonists and using Dop1 knockout crickets suggested that octopamine neurons mediate appetitive reinforcement and dopamine neurons mediate aversive reinforcement in associative learning in crickets. To fully resolve this issue, we examined the effects of silencing of expression of genes that code the OA1 octopamine receptor and Dop1 and Dop2 dopamine receptors by RNAi in crickets. OA1-silenced crickets exhibited impairment in appetitive learning with water but not in aversive learning with sodium chloride solution, while Dop1-silenced crickets exhibited impairment in aversive learning but not in appetitive learning. Dop2-silenced crickets showed normal scores in both appetitive learning and aversive learning. The results indicate that octopamine neurons mediate appetitive reinforcement via OA1 and that dopamine neurons mediate aversive reinforcement via Dop1 in crickets, providing decisive evidence that neurotransmitters and receptors that mediate appetitive reinforcement indeed differ among different species of insects.

  8. Bio-robots automatic navigation with graded electric reward stimulation based on Reinforcement Learning.

    PubMed

    Zhang, Chen; Sun, Chao; Gao, Liqiang; Zheng, Nenggan; Chen, Weidong; Zheng, Xiaoxiang

    2013-01-01

    Bio-robots based on brain computer interface (BCI) suffer from the lack of considering the characteristic of the animals in navigation. This paper proposed a new method for bio-robots' automatic navigation combining the reward generating algorithm base on Reinforcement Learning (RL) with the learning intelligence of animals together. Given the graded electrical reward, the animal e.g. the rat, intends to seek the maximum reward while exploring an unknown environment. Since the rat has excellent spatial recognition, the rat-robot and the RL algorithm can convergent to an optimal route by co-learning. This work has significant inspiration for the practical development of bio-robots' navigation with hybrid intelligence.

  9. Collaborating Fuzzy Reinforcement Learning Agents

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1997-01-01

    Earlier, we introduced GARIC-Q, a new method for doing incremental Dynamic Programming using a society of intelligent agents which are controlled at the top level by Fuzzy Relearning and at the local level, each agent learns and operates based on ANTARCTIC, a technique for fuzzy reinforcement learning. In this paper, we show that it is possible for these agents to compete in order to affect the selected control policy but at the same time, they can collaborate while investigating the state space. In this model, the evaluator or the critic learns by observing all the agents behaviors but the control policy changes only based on the behavior of the winning agent also known as the super agent.

  10. Longitudinal investigation on learned helplessness tested under negative and positive reinforcement involving stimulus control.

    PubMed

    Oliveira, Emileane C; Hunziker, Maria Helena

    2014-07-01

    In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title. Copyright © 2014. Published by Elsevier B.V.

  11. Scheduled power tracking control of the wind-storage hybrid system based on the reinforcement learning theory

    NASA Astrophysics Data System (ADS)

    Li, Ze

    2017-09-01

    In allusion to the intermittency and uncertainty of the wind electricity, energy storage and wind generator are combined into a hybrid system to improve the controllability of the output power. A scheduled power tracking control method is proposed based on the reinforcement learning theory and Q-learning algorithm. In this method, the state space of the environment is formed with two key factors, i.e. the state of charge of the energy storage and the difference value between the actual wind power and scheduled power, the feasible action is the output power of the energy storage, and the corresponding immediate rewarding function is designed to reflect the rationality of the control action. By interacting with the environment and learning from the immediate reward, the optimal control strategy is gradually formed. After that, it could be applied to the scheduled power tracking control of the hybrid system. Finally, the rationality and validity of the method are verified through simulation examples.

  12. Stress affects instrumental learning based on positive or negative reinforcement in interaction with personality in domestic horses

    PubMed Central

    Valenchon, Mathilde; Lévy, Frédéric; Moussu, Chantal; Lansade, Léa

    2017-01-01

    The present study investigated how stress affects instrumental learning performance in horses (Equus caballus) depending on the type of reinforcement. Horses were assigned to four groups (N = 15 per group); each group received training with negative or positive reinforcement in the presence or absence of stressors unrelated to the learning task. The instrumental learning task consisted of the horse entering one of two compartments at the appearance of a visual signal given by the experimenter. In the absence of stressors unrelated to the task, learning performance did not differ between negative and positive reinforcements. The presence of stressors unrelated to the task (exposure to novel and sudden stimuli) impaired learning performance. Interestingly, this learning deficit was smaller when the negative reinforcement was used. The negative reinforcement, considered as a stressor related to the task, could have counterbalanced the impact of the extrinsic stressor by focusing attention toward the learning task. In addition, learning performance appears to differ between certain dimensions of personality depending on the presence of stressors and the type of reinforcement. These results suggest that when negative reinforcement is used (i.e. stressor related to the task), the most fearful horses may be the best performers in the absence of stressors but the worst performers when stressors are present. On the contrary, when positive reinforcement is used, the most fearful horses appear to be consistently the worst performers, with and without exposure to stressors unrelated to the learning task. This study is the first to demonstrate in ungulates that stress affects learning performance differentially according to the type of reinforcement and in interaction with personality. It provides fundamental and applied perspectives in the understanding of the relationships between personality and training abilities. PMID:28475581

  13. Gr-GDHP: A New Architecture for Globalized Dual Heuristic Dynamic Programming.

    PubMed

    Zhong, Xiangnan; Ni, Zhen; He, Haibo

    2017-10-01

    Goal representation globalized dual heuristic dynamic programming (Gr-GDHP) method is proposed in this paper. A goal neural network is integrated into the traditional GDHP method providing an internal reinforcement signal and its derivatives to help the control and learning process. From the proposed architecture, it is shown that the obtained internal reinforcement signal and its derivatives can be able to adjust themselves online over time rather than a fixed or predefined function in literature. Furthermore, the obtained derivatives can directly contribute to the objective function of the critic network, whose learning process is thus simplified. Numerical simulation studies are applied to show the performance of the proposed Gr-GDHP method and compare the results with other existing adaptive dynamic programming designs. We also investigate this method on a ball-and-beam balancing system. The statistical simulation results are presented for both the Gr-GDHP and the GDHP methods to demonstrate the improved learning and controlling performance.

  14. Reinforcement learning for resource allocation in LEO satellite networks.

    PubMed

    Usaha, Wipawee; Barria, Javier A

    2007-06-01

    In this paper, we develop and assess online decision-making algorithms for call admission and routing for low Earth orbit (LEO) satellite networks. It has been shown in a recent paper that, in a LEO satellite system, a semi-Markov decision process formulation of the call admission and routing problem can achieve better performance in terms of an average revenue function than existing routing methods. However, the conventional dynamic programming (DP) numerical solution becomes prohibited as the problem size increases. In this paper, two solution methods based on reinforcement learning (RL) are proposed in order to circumvent the computational burden of DP. The first method is based on an actor-critic method with temporal-difference (TD) learning. The second method is based on a critic-only method, called optimistic TD learning. The algorithms enhance performance in terms of requirements in storage, computational complexity and computational time, and in terms of an overall long-term average revenue function that penalizes blocked calls. Numerical studies are carried out, and the results obtained show that the RL framework can achieve up to 56% higher average revenue over existing routing methods used in LEO satellite networks with reasonable storage and computational requirements.

  15. Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games.

    PubMed

    Song, Ruizhuo; Lewis, Frank L; Wei, Qinglai

    2017-03-01

    This paper establishes an off-policy integral reinforcement learning (IRL) method to solve nonlinear continuous-time (CT) nonzero-sum (NZS) games with unknown system dynamics. The IRL algorithm is presented to obtain the iterative control and off-policy learning is used to allow the dynamics to be completely unknown. Off-policy IRL is designed to do policy evaluation and policy improvement in the policy iteration algorithm. Critic and action networks are used to obtain the performance index and control for each player. The gradient descent algorithm makes the update of critic and action weights simultaneously. The convergence analysis of the weights is given. The asymptotic stability of the closed-loop system and the existence of Nash equilibrium are proved. The simulation study demonstrates the effectiveness of the developed method for nonlinear CT NZS games with unknown system dynamics.

  16. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention—A Neuroeducation Study

    PubMed Central

    Anderson, Sarah J.; Hecker, Kent G.; Krigolson, Olave E.; Jamniczky, Heather A.

    2018-01-01

    In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise. PMID:29467638

  17. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention-A Neuroeducation Study.

    PubMed

    Anderson, Sarah J; Hecker, Kent G; Krigolson, Olave E; Jamniczky, Heather A

    2018-01-01

    In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise.

  18. Racial bias shapes social reinforcement learning.

    PubMed

    Lindström, Björn; Selbing, Ida; Molapour, Tanaz; Olsson, Andreas

    2014-03-01

    Both emotional facial expressions and markers of racial-group belonging are ubiquitous signals in social interaction, but little is known about how these signals together affect future behavior through learning. To address this issue, we investigated how emotional (threatening or friendly) in-group and out-group faces reinforced behavior in a reinforcement-learning task. We asked whether reinforcement learning would be modulated by intergroup attitudes (i.e., racial bias). The results showed that individual differences in racial bias critically modulated reinforcement learning. As predicted, racial bias was associated with more efficiently learned avoidance of threatening out-group individuals. We used computational modeling analysis to quantitatively delimit the underlying processes affected by social reinforcement. These analyses showed that racial bias modulates the rate at which exposure to threatening out-group individuals is transformed into future avoidance behavior. In concert, these results shed new light on the learning processes underlying social interaction with racial-in-group and out-group individuals.

  19. "Notice of Violation of IEEE Publication Principles" Multiobjective Reinforcement Learning: A Comprehensive Overview.

    PubMed

    Liu, Chunming; Xu, Xin; Hu, Dewen

    2013-04-29

    Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.

  20. Hierarchically organized behavior and its neural foundations: A reinforcement-learning perspective

    PubMed Central

    Botvinick, Matthew M.; Niv, Yael; Barto, Andrew C.

    2009-01-01

    Research on human and animal behavior has long emphasized its hierarchical structure — the divisibility of ongoing behavior into discrete tasks, which are comprised of subtask sequences, which in turn are built of simple actions. The hierarchical structure of behavior has also been of enduring interest within neuroscience, where it has been widely considered to reflect prefrontal cortical functions. In this paper, we reexamine behavioral hierarchy and its neural substrates from the point of view of recent developments in computational reinforcement learning. Specifically, we consider a set of approaches known collectively as hierarchical reinforcement learning, which extend the reinforcement learning paradigm by allowing the learning agent to aggregate actions into reusable subroutines or skills. A close look at the components of hierarchical reinforcement learning suggests how they might map onto neural structures, in particular regions within the dorsolateral and orbital prefrontal cortex. It also suggests specific ways in which hierarchical reinforcement learning might provide a complement to existing psychological models of hierarchically structured behavior. A particularly important question that hierarchical reinforcement learning brings to the fore is that of how learning identifies new action routines that are likely to provide useful building blocks in solving a wide range of future problems. Here and at many other points, hierarchical reinforcement learning offers an appealing framework for investigating the computational and neural underpinnings of hierarchically structured behavior. PMID:18926527

  1. Progression of Cohort Learning Style during an Intensive Education Program

    ERIC Educational Resources Information Center

    Compton, David A.; Compton, Cynthia M.

    2017-01-01

    The authors describe an intensive graduate program involving compressed classroom preparation followed by a period of experiential activities designed to reinforce and enhance the knowledge base. Beginning with a brief review of the andragogical issues, they describe methods undertaken to track learning styles via the Kolb Learning Styles…

  2. Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis

    PubMed Central

    2013-01-01

    Background Depression is characterised partly by blunted reactions to reward. However, tasks probing this deficiency have not distinguished insensitivity to reward from insensitivity to the prediction errors for reward that determine learning and are putatively reported by the phasic activity of dopamine neurons. We attempted to disentangle these factors with respect to anhedonia in the context of stress, Major Depressive Disorder (MDD), Bipolar Disorder (BPD) and a dopaminergic challenge. Methods Six behavioural datasets involving 392 experimental sessions were subjected to a model-based, Bayesian meta-analysis. Participants across all six studies performed a probabilistic reward task that used an asymmetric reinforcement schedule to assess reward learning. Healthy controls were tested under baseline conditions, stress or after receiving the dopamine D2 agonist pramipexole. In addition, participants with current or past MDD or BPD were evaluated. Reinforcement learning models isolated the contributions of variation in reward sensitivity and learning rate. Results MDD and anhedonia reduced reward sensitivity more than they affected the learning rate, while a low dose of the dopamine D2 agonist pramipexole showed the opposite pattern. Stress led to a pattern consistent with a mixed effect on reward sensitivity and learning rate. Conclusion Reward-related learning reflected at least two partially separable contributions. The first related to phasic prediction error signalling, and was preferentially modulated by a low dose of the dopamine agonist pramipexole. The second related directly to reward sensitivity, and was preferentially reduced in MDD and anhedonia. Stress altered both components. Collectively, these findings highlight the contribution of model-based reinforcement learning meta-analysis for dissecting anhedonic behavior. PMID:23782813

  3. Statistical Mechanics of the Delayed Reward-Based Learning with Node Perturbation

    NASA Astrophysics Data System (ADS)

    Hiroshi Saito,; Kentaro Katahira,; Kazuo Okanoya,; Masato Okada,

    2010-06-01

    In reward-based learning, reward is typically given with some delay after a behavior that causes the reward. In machine learning literature, the framework of the eligibility trace has been used as one of the solutions to handle the delayed reward in reinforcement learning. In recent studies, the eligibility trace is implied to be important for difficult neuroscience problem known as the “distal reward problem”. Node perturbation is one of the stochastic gradient methods from among many kinds of reinforcement learning implementations, and it searches the approximate gradient by introducing perturbation to a network. Since the stochastic gradient method does not require a objective function differential, it is expected to be able to account for the learning mechanism of a complex system, like a brain. We study the node perturbation with the eligibility trace as a specific example of delayed reward-based learning, and analyzed it using a statistical mechanics approach. As a result, we show the optimal time constant of the eligibility trace respect to the reward delay and the existence of unlearnable parameter configurations.

  4. Tuning fuzzy PD and PI controllers using reinforcement learning.

    PubMed

    Boubertakh, Hamid; Tadjine, Mohamed; Glorennec, Pierre-Yves; Labiod, Salim

    2010-10-01

    In this paper, we propose a new auto-tuning fuzzy PD and PI controllers using reinforcement Q-learning (QL) algorithm for SISO (single-input single-output) and TITO (two-input two-output) systems. We first, investigate the design parameters and settings of a typical class of Fuzzy PD (FPD) and Fuzzy PI (FPI) controllers: zero-order Takagi-Sugeno controllers with equidistant triangular membership functions for inputs, equidistant singleton membership functions for output, Larsen's implication method, and average sum defuzzification method. Secondly, the analytical structures of these typical fuzzy PD and PI controllers are compared to their classical counterpart PD and PI controllers. Finally, the effectiveness of the proposed method is proven through simulation examples. Copyright © 2010 ISA. Published by Elsevier Ltd. All rights reserved.

  5. Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules

    PubMed Central

    La Camera, Giancarlo; Richmond, Barry J.

    2008-01-01

    It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as “schedule length effect”). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: “framing,” wherein equivalent options are treated differently depending on the context in which they are presented, and the “sunk cost” effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys. PMID:18688266

  6. Modeling the violation of reward maximization and invariance in reinforcement schedules.

    PubMed

    La Camera, Giancarlo; Richmond, Barry J

    2008-08-08

    It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as "schedule length effect"). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: "framing," wherein equivalent options are treated differently depending on the context in which they are presented, and the "sunk cost" effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys.

  7. Navigating complex decision spaces: Problems and paradigms in sequential choice

    PubMed Central

    Walsh, Matthew M.; Anderson, John R.

    2015-01-01

    To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult when the consequences of an action follow a delay. This introduces the problem of temporal credit assignment. When feedback follows a sequence of decisions, how should the individual assign credit to the intermediate actions that comprise the sequence? Research in reinforcement learning provides two general solutions to this problem: model-free reinforcement learning and model-based reinforcement learning. In this review, we examine connections between stimulus-response and cognitive learning theories, habitual and goal-directed control, and model-free and model-based reinforcement learning. We then consider a range of problems related to temporal credit assignment. These include second-order conditioning and secondary reinforcers, latent learning and detour behavior, partially observable Markov decision processes, actions with distributed outcomes, and hierarchical learning. We ask whether humans and animals, when faced with these problems, behave in a manner consistent with reinforcement learning techniques. Throughout, we seek to identify neural substrates of model-free and model-based reinforcement learning. The former class of techniques is understood in terms of the neurotransmitter dopamine and its effects in the basal ganglia. The latter is understood in terms of a distributed network of regions including the prefrontal cortex, medial temporal lobes cerebellum, and basal ganglia. Not only do reinforcement learning techniques have a natural interpretation in terms of human and animal behavior, but they also provide a useful framework for understanding neural reward valuation and action selection. PMID:23834192

  8. Model-Based Reinforcement Learning under Concurrent Schedules of Reinforcement in Rodents

    ERIC Educational Resources Information Center

    Huh, Namjung; Jo, Suhyun; Kim, Hoseok; Sul, Jung Hoon; Jung, Min Whan

    2009-01-01

    Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's…

  9. 11.2 YIP Human In the Loop Statistical RelationalLearners

    DTIC Science & Technology

    2017-10-23

    learning formalisms including inverse reinforcement learning [4] and statistical relational learning [7, 5, 8]. We have also applied our algorithms in...one introduced for label preferences. 4 Figure 2: Active Advice Seeking for Inverse Reinforcement Learning. active advice seeking is in selecting the...learning tasks. 1.2.1 Sequential Decision-Making Our previous work on advice for inverse reinforcement learning (IRL) defined advice as action

  10. Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

    PubMed

    Uchibe, Eiji; Doya, Kenji

    2008-12-01

    Understanding the design principle of reward functions is a substantial challenge both in artificial intelligence and neuroscience. Successful acquisition of a task usually requires not only rewards for goals, but also for intermediate states to promote effective exploration. This paper proposes a method for designing 'intrinsic' rewards of autonomous agents by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use Cyber Rodent robots, in which collision avoidance, recharging from battery packs, and 'mating' by software reproduction are three major 'extrinsic' rewards. We show in hardware experiments that the robots can find appropriate 'intrinsic' rewards for the vision of battery packs and other robots to promote approach behaviors.

  11. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia.

    PubMed

    Markou, Athina; Salamone, John D; Bussey, Timothy J; Mar, Adam C; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-11-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu) meeting. A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. Copyright © 2013 Elsevier Ltd. All rights reserved.

  12. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia

    PubMed Central

    Markou, Athina; Salamone, John D.; Bussey, Timothy; Mar, Adam; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-01-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu). A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. PMID:23994273

  13. Participatory Evaluation and Learning: A Case Example Involving Ripple Effects Mapping of a Tourism Assessment Program

    ERIC Educational Resources Information Center

    Bhattacharyya, Rani; Templin, Elizabeth; Messer, Cynthia; Chazdon, Scott

    2017-01-01

    Engaging communities through research-based participatory evaluation and learning methods can be rewarding for both a community and Extension. A case study of a community tourism development program evaluation shows how participatory evaluation and learning can be mutually reinforcing activities. Many communities value the opportunity to reflect…

  14. Effects of dopamine on reinforcement learning and consolidation in Parkinson's disease.

    PubMed

    Grogan, John P; Tsivos, Demitra; Smith, Laura; Knight, Brogan E; Bogacz, Rafal; Whone, Alan; Coulthard, Elizabeth J

    2017-07-10

    Emerging evidence suggests that dopamine may modulate learning and memory with important implications for understanding the neurobiology of memory and future therapeutic targeting. An influential hypothesis posits that dopamine biases reinforcement learning. More recent data also suggest an influence during both consolidation and retrieval. Eighteen Parkinson's disease patients learned through feedback ON or OFF medication, with memory tested 24 hr later ON or OFF medication (4 conditions, within-subjects design with matched healthy control group). Patients OFF medication during learning decreased in memory accuracy over the following 24 hr. In contrast to previous studies, however, dopaminergic medication during learning and testing did not affect expression of positive or negative reinforcement. Two further experiments were run without the 24 hr delay, but they too failed to reproduce effects of dopaminergic medication on reinforcement learning. While supportive of a dopaminergic role in consolidation, this study failed to replicate previous findings on reinforcement learning.

  15. An Online Social Networking Approach to Reinforce Learning of Rocks and Minerals

    ERIC Educational Resources Information Center

    Kennelly, Patrick

    2009-01-01

    Numerous and varied methods are used in introductory Earth science and geology classes to help students learn about rocks and minerals, such as classroom lectures, laboratory specimen identification, and field trips. This paper reports on a method using online social networking. The choice of this forum was based on two criteria. First, many…

  16. Enhanced Experience Replay for Deep Reinforcement Learning

    DTIC Science & Technology

    2015-11-01

    ARL-TR-7538 ● NOV 2015 US Army Research Laboratory Enhanced Experience Replay for Deep Reinforcement Learning by David Doria...Experience Replay for Deep Reinforcement Learning by David Doria, Bryan Dawson, and Manuel Vindiola Computational and Information Sciences Directorate...

  17. Prespeech motor learning in a neural network using reinforcement.

    PubMed

    Warlaumont, Anne S; Westermann, Gert; Buder, Eugene H; Oller, D Kimbrough

    2013-02-01

    Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one's language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the different conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network's post-learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network's post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model's post-learning productions were more likely to resemble the English vowels and vice versa. Copyright © 2012 Elsevier Ltd. All rights reserved.

  18. Reconciling Reinforcement Learning Models with Behavioral Extinction and Renewal: Implications for Addiction, Relapse, and Problem Gambling

    ERIC Educational Resources Information Center

    Redish, A. David; Jensen, Steve; Johnson, Adam; Kurth-Nelson, Zeb

    2007-01-01

    Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated value and are thus unable to capture renewal. TDRL…

  19. Deep Direct Reinforcement Learning for Financial Signal Representation and Trading.

    PubMed

    Deng, Yue; Bao, Feng; Kong, Youyong; Ren, Zhiquan; Dai, Qionghai

    2017-03-01

    Can we train the computer to beat experienced traders for financial assert trading? In this paper, we try to address this challenge by introducing a recurrent deep neural network (NN) for real-time financial signal representation and trading. Our model is inspired by two biological-related learning concepts of deep learning (DL) and reinforcement learning (RL). In the framework, the DL part automatically senses the dynamic market condition for informative feature learning. Then, the RL module interacts with deep representations and makes trading decisions to accumulate the ultimate rewards in an unknown environment. The learning system is implemented in a complex NN that exhibits both the deep and recurrent structures. Hence, we propose a task-aware backpropagation through time method to cope with the gradient vanishing issue in deep training. The robustness of the neural system is verified on both the stock and the commodity future markets under broad testing conditions.

  20. Intrinsically motivated reinforcement learning for human-robot interaction in the real-world.

    PubMed

    Qureshi, Ahmed Hussain; Nakamura, Yutaka; Yoshikawa, Yuichiro; Ishiguro, Hiroshi

    2018-03-26

    For a natural social human-robot interaction, it is essential for a robot to learn the human-like social skills. However, learning such skills is notoriously hard due to the limited availability of direct instructions from people to teach a robot. In this paper, we propose an intrinsically motivated reinforcement learning framework in which an agent gets the intrinsic motivation-based rewards through the action-conditional predictive model. By using the proposed method, the robot learned the social skills from the human-robot interaction experiences gathered in the real uncontrolled environments. The results indicate that the robot not only acquired human-like social skills but also took more human-like decisions, on a test dataset, than a robot which received direct rewards for the task achievement. Copyright © 2018 Elsevier Ltd. All rights reserved.

  1. Behavioral and neural properties of social reinforcement learning

    PubMed Central

    Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Libby, Victoria; Glover, Gary; Voss, Henning U.; Ballon, Douglas J.; Casey, BJ

    2011-01-01

    Social learning is critical for engaging in complex interactions with other individuals. Learning from positive social exchanges, such as acceptance from peers, may be similar to basic reinforcement learning. We formally test this hypothesis by developing a novel paradigm that is based upon work in non-human primates and human imaging studies of reinforcement learning. The probability of receiving positive social reinforcement from three distinct peers was parametrically manipulated while brain activity was recorded in healthy adults using event-related functional magnetic resonance imaging (fMRI). Over the course of the experiment, participants responded more quickly to faces of peers who provided more frequent positive social reinforcement, and rated them as more likeable. Modeling trial-by-trial learning showed ventral striatum and orbital frontal cortex activity correlated positively with forming expectations about receiving social reinforcement. Rostral anterior cingulate cortex activity tracked positively with modulations of expected value of the cues (peers). Together, the findings across three levels of analysis - social preferences, response latencies and modeling neural responses – are consistent with reinforcement learning theory and non-human primate electrophysiological studies of reward. This work highlights the fundamental influence of acceptance by one’s peers in altering subsequent behavior. PMID:21917787

  2. The combination of appetitive and aversive reinforcers and the nature of their interaction during auditory learning.

    PubMed

    Ilango, A; Wetzel, W; Scheich, H; Ohl, F W

    2010-03-31

    Learned changes in behavior can be elicited by either appetitive or aversive reinforcers. It is, however, not clear whether the two types of motivation, (approaching appetitive stimuli and avoiding aversive stimuli) drive learning in the same or different ways, nor is their interaction understood in situations where the two types are combined in a single experiment. To investigate this question we have developed a novel learning paradigm for Mongolian gerbils, which not only allows rewards and punishments to be presented in isolation or in combination with each other, but also can use these opposite reinforcers to drive the same learned behavior. Specifically, we studied learning of tone-conditioned hurdle crossing in a shuttle box driven by either an appetitive reinforcer (brain stimulation reward) or an aversive reinforcer (electrical footshock), or by a combination of both. Combination of the two reinforcers potentiated speed of acquisition, led to maximum possible performance, and delayed extinction as compared to either reinforcer alone. Additional experiments, using partial reinforcement protocols and experiments in which one of the reinforcers was omitted after the animals had been previously trained with the combination of both reinforcers, indicated that appetitive and aversive reinforcers operated together but acted in different ways: in this particular experimental context, punishment appeared to be more effective for initial acquisition and reward more effective to maintain a high level of conditioned responses (CRs). The results imply that learning mechanisms in problem solving were maximally effective when the initial punishment of mistakes was combined with the subsequent rewarding of correct performance. Copyright 2010 IBRO. Published by Elsevier Ltd. All rights reserved.

  3. Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers

    ERIC Educational Resources Information Center

    Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.

    2014-01-01

    Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…

  4. The cerebellum: a neural system for the study of reinforcement learning.

    PubMed

    Swain, Rodney A; Kerr, Abigail L; Thompson, Richard F

    2011-01-01

    In its strictest application, the term "reinforcement learning" refers to a computational approach to learning in which an agent (often a machine) interacts with a mutable environment to maximize reward through trial and error. The approach borrows essentials from several fields, most notably Computer Science, Behavioral Neuroscience, and Psychology. At the most basic level, a neural system capable of mediating reinforcement learning must be able to acquire sensory information about the external environment and internal milieu (either directly or through connectivities with other brain regions), must be able to select a behavior to be executed, and must be capable of providing evaluative feedback about the success of that behavior. Given that Psychology informs us that reinforcers, both positive and negative, are stimuli or consequences that increase the probability that the immediately antecedent behavior will be repeated and that reinforcer strength or viability is modulated by the organism's past experience with the reinforcer, its affect, and even the state of its muscles (e.g., eyes open or closed); it is the case that any neural system that supports reinforcement learning must also be sensitive to these same considerations. Once learning is established, such a neural system must finally be able to maintain continued response expression and prevent response drift. In this report, we examine both historical and recent evidence that the cerebellum satisfies all of these requirements. While we report evidence from a variety of learning paradigms, the majority of our discussion will focus on classical conditioning of the rabbit eye blink response as an ideal model system for the study of reinforcement and reinforcement learning.

  5. A Personalized Study Method for Learning University Physics

    ERIC Educational Resources Information Center

    Aravind, Vasudeva Rao; Croyle, Kevin

    2017-01-01

    Students learn scientific concepts and mathematical calculations relating to scientific principles by repetition and reinforcement. Teachers and instructors cannot practically spend the long time required during tutorials to patiently teach students the calculations. Usually, teachers assign homework to provide practice to students, hoping that…

  6. Genetic reinforcement learning through symbiotic evolution for fuzzy controller design.

    PubMed

    Juang, C F; Lin, J Y; Lin, C T

    2000-01-01

    An efficient genetic reinforcement learning algorithm for designing fuzzy controllers is proposed in this paper. The genetic algorithm (GA) adopted in this paper is based upon symbiotic evolution which, when applied to fuzzy controller design, complements the local mapping property of a fuzzy rule. Using this Symbiotic-Evolution-based Fuzzy Controller (SEFC) design method, the number of control trials, as well as consumed CPU time, are considerably reduced when compared to traditional GA-based fuzzy controller design methods and other types of genetic reinforcement learning schemes. Moreover, unlike traditional fuzzy controllers, which partition the input space into a grid, SEFC partitions the input space in a flexible way, thus creating fewer fuzzy rules. In SEFC, different types of fuzzy rules whose consequent parts are singletons, fuzzy sets, or linear equations (TSK-type fuzzy rules) are allowed. Further, the free parameters (e.g., centers and widths of membership functions) and fuzzy rules are all tuned automatically. For the TSK-type fuzzy rule especially, which put the proposed learning algorithm in use, only the significant input variables are selected to participate in the consequent of a rule. The proposed SEFC design method has been applied to different simulated control problems, including the cart-pole balancing system, a magnetic levitation system, and a water bath temperature control system. The proposed SEFC has been verified to be efficient and superior from these control problems, and from comparisons with some traditional GA-based fuzzy systems.

  7. Impaired Behavior Regulation under Conditions of Concurrent Variable Schedules of Reinforcement in Children with ADHD

    ERIC Educational Resources Information Center

    Taylor, David; Lincoln, Alan J.; Foster, Sharon L.

    2010-01-01

    Objective: To bridge theory of response inhibition and learning in children with ADHD. Method: Thirty ADHD and 30 non-ADHD children (ages 9-12) were compared under concurrent variable interval (VI-15 sec., VI-30 sec. and VI- 45 sec.) reinforcement schedules that required the child to switch between the three schedules under conditions of…

  8. The role of GABAB receptors in human reinforcement learning.

    PubMed

    Ort, Andres; Kometer, Michael; Rohde, Judith; Seifritz, Erich; Vollenweider, Franz X

    2014-10-01

    Behavioral evidence from human studies suggests that the γ-aminobutyric acid type B receptor (GABAB receptor) agonist baclofen modulates reinforcement learning and reduces craving in patients with addiction spectrum disorders. However, in contrast to the well established role of dopamine in reinforcement learning, the mechanisms by which the GABAB receptor influences reinforcement learning in humans remain completely unknown. To further elucidate this issue, a cross-over, double-blind, placebo-controlled study was performed in healthy human subjects (N=15) to test the effects of baclofen (20 and 50mg p.o.) on probabilistic reinforcement learning. Outcomes were the feedback-induced P2 component of the event-related potential, the feedback-related negativity, and the P300 component of the event-related potential. Baclofen produced a reduction of P2 amplitude over the course of the experiment, but did not modulate the feedback-related negativity. Furthermore, there was a trend towards increased learning after baclofen administration relative to placebo over the course of the experiment. The present results extend previous theories of reinforcement learning, which focus on the importance of mesolimbic dopamine signaling, and indicate that stimulation of cortical GABAB receptors in a fronto-parietal network leads to better attentional allocation in reinforcement learning. This observation is a first step in our understanding of how baclofen may improve reinforcement learning in healthy subjects. Further studies with bigger sample sizes are needed to corroborate this conclusion and furthermore, test this effect in patients with addiction spectrum disorder. Copyright © 2014 Elsevier B.V. and ECNP. All rights reserved.

  9. Effects of dopamine on reinforcement learning and consolidation in Parkinson’s disease

    PubMed Central

    Grogan, John P; Tsivos, Demitra; Smith, Laura; Knight, Brogan E; Bogacz, Rafal; Whone, Alan; Coulthard, Elizabeth J

    2017-01-01

    Emerging evidence suggests that dopamine may modulate learning and memory with important implications for understanding the neurobiology of memory and future therapeutic targeting. An influential hypothesis posits that dopamine biases reinforcement learning. More recent data also suggest an influence during both consolidation and retrieval. Eighteen Parkinson’s disease patients learned through feedback ON or OFF medication, with memory tested 24 hr later ON or OFF medication (4 conditions, within-subjects design with matched healthy control group). Patients OFF medication during learning decreased in memory accuracy over the following 24 hr. In contrast to previous studies, however, dopaminergic medication during learning and testing did not affect expression of positive or negative reinforcement. Two further experiments were run without the 24 hr delay, but they too failed to reproduce effects of dopaminergic medication on reinforcement learning. While supportive of a dopaminergic role in consolidation, this study failed to replicate previous findings on reinforcement learning. DOI: http://dx.doi.org/10.7554/eLife.26801.001 PMID:28691905

  10. Fear of losing money? Aversive conditioning with secondary reinforcers.

    PubMed

    Delgado, M R; Labouliere, C D; Phelps, E A

    2006-12-01

    Money is a secondary reinforcer that acquires its value through social communication and interaction. In everyday human behavior and laboratory studies, money has been shown to influence appetitive or reward learning. It is unclear, however, if money has a similar impact on aversive learning. The goal of this study was to investigate the efficacy of money in aversive learning, comparing it with primary reinforcers that are traditionally used in fear conditioning paradigms. A series of experiments were conducted in which participants initially played a gambling game that led to a monetary gain. They were then presented with an aversive conditioning paradigm, with either shock (primary reinforcer) or loss of money (secondary reinforcer) as the unconditioned stimulus. Skin conductance responses and subjective ratings indicated that potential monetary loss modulated the conditioned response. Depending on the presentation context, the secondary reinforcer was as effective as the primary reinforcer during aversive conditioning. These results suggest that stimuli that acquire reinforcing properties through social communication and interaction, such as money, can effectively influence aversive learning.

  11. Reinforcement learning and Tourette syndrome.

    PubMed

    Palminteri, Stefano; Pessiglione, Mathias

    2013-01-01

    In this chapter, we report the first experimental explorations of reinforcement learning in Tourette syndrome, realized by our team in the last few years. This report will be preceded by an introduction aimed to provide the reader with the state of the art of the knowledge concerning the neural bases of reinforcement learning at the moment of these studies and the scientific rationale beyond them. In short, reinforcement learning is learning by trial and error to maximize rewards and minimize punishments. This decision-making and learning process implicates the dopaminergic system projecting to the frontal cortex-basal ganglia circuits. A large body of evidence suggests that the dysfunction of the same neural systems is implicated in the pathophysiology of Tourette syndrome. Our results show that Tourette condition, as well as the most common pharmacological treatments (dopamine antagonists), affects reinforcement learning performance in these patients. Specifically, the results suggest a deficit in negative reinforcement learning, possibly underpinned by a functional hyperdopaminergia, which could explain the persistence of tics, despite their evident inadaptive (negative) value. This idea, together with the implications of these results in Tourette therapy and the future perspectives, is discussed in Section 4 of this chapter. © 2013 Elsevier Inc. All rights reserved.

  12. Towards a genetics-based adaptive agent to support flight testing

    NASA Astrophysics Data System (ADS)

    Cribbs, Henry Brown, III

    Although the benefits of aircraft simulation have been known since the late 1960s, simulation almost always entails interaction with a human test pilot. This "pilot-in-the-loop" simulation process provides useful evaluative information to the aircraft designer and provides a training tool to the pilot. Emulation of a pilot during the early phases of the aircraft design process might provide designers a useful evaluative tool. Machine learning might emulate a pilot in a simulated aircraft/cockpit setting. Preliminary work in the application of machine learning techniques, such as reinforcement learning, to aircraft maneuvering have shown promise. These studies used simplified interfaces between machine learning agent and the aircraft simulation. The simulations employed low order equivalent system models. High-fidelity aircraft simulations exist, such as the simulations developed by NASA at its Dryden Flight Research Center. To expand the applicational domain of reinforcement learning to aircraft designs, this study presents a series of experiments that examine a reinforcement learning agent in the role of test pilot. The NASA X-31 and F-106 high-fidelity simulations provide realistic aircraft for the agent to maneuver. The approach of the study is to examine an agent possessing a genetic-based, artificial neural network to approximate long-term, expected cost (Bellman value) in a basic maneuvering task. The experiments evaluate different learning methods based on a common feedback function and an identical task. The learning methods evaluated are: Q-learning, Q(lambda)-learning, SARSA learning, and SARSA(lambda) learning. Experimental results indicate that, while prediction error remain quite high, similar, repeatable behaviors occur in both aircraft. Similar behavior exhibits portability of the agent between aircraft with different handling qualities (dynamics). Besides the adaptive behavior aspects of the study, the genetic algorithm used in the agent is shown to play an additive role in the shaping of the artificial neural network to the prediction task.

  13. Intelligence moderates reinforcement learning: a mini-review of the neural evidence

    PubMed Central

    2014-01-01

    Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. PMID:25185818

  14. Intelligence moderates reinforcement learning: a mini-review of the neural evidence.

    PubMed

    Chen, Chong

    2015-06-01

    Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. Copyright © 2015 the American Physiological Society.

  15. Prespeech motor learning in a neural network using reinforcement☆

    PubMed Central

    Warlaumont, Anne S.; Westermann, Gert; Buder, Eugene H.; Oller, D. Kimbrough

    2012-01-01

    Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one’s language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the differ-ent conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network’s post learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network’s post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model’s post-learning productions were more likely to resemble the English vowels and vice versa. PMID:23275137

  16. Functional Contour-following via Haptic Perception and Reinforcement Learning.

    PubMed

    Hellman, Randall B; Tekin, Cem; van der Schaar, Mihaela; Santos, Veronica J

    2018-01-01

    Many tasks involve the fine manipulation of objects despite limited visual feedback. In such scenarios, tactile and proprioceptive feedback can be leveraged for task completion. We present an approach for real-time haptic perception and decision-making for a haptics-driven, functional contour-following task: the closure of a ziplock bag. This task is challenging for robots because the bag is deformable, transparent, and visually occluded by artificial fingertip sensors that are also compliant. A deep neural net classifier was trained to estimate the state of a zipper within a robot's pinch grasp. A Contextual Multi-Armed Bandit (C-MAB) reinforcement learning algorithm was implemented to maximize cumulative rewards by balancing exploration versus exploitation of the state-action space. The C-MAB learner outperformed a benchmark Q-learner by more efficiently exploring the state-action space while learning a hard-to-code task. The learned C-MAB policy was tested with novel ziplock bag scenarios and contours (wire, rope). Importantly, this work contributes to the development of reinforcement learning approaches that account for limited resources such as hardware life and researcher time. As robots are used to perform complex, physically interactive tasks in unstructured or unmodeled environments, it becomes important to develop methods that enable efficient and effective learning with physical testbeds.

  17. Quantum reinforcement learning.

    PubMed

    Dong, Daoyi; Chen, Chunlin; Li, Hanxiong; Tarn, Tzyh-Jong

    2008-10-01

    The key approaches for machine learning, particularly learning in unknown probabilistic environments, are new representations and computation mechanisms. In this paper, a novel quantum reinforcement learning (QRL) method is proposed by combining quantum theory and reinforcement learning (RL). Inspired by the state superposition principle and quantum parallelism, a framework of a value-updating algorithm is introduced. The state (action) in traditional RL is identified as the eigen state (eigen action) in QRL. The state (action) set can be represented with a quantum superposition state, and the eigen state (eigen action) can be obtained by randomly observing the simulated quantum state according to the collapse postulate of quantum measurement. The probability of the eigen action is determined by the probability amplitude, which is updated in parallel according to rewards. Some related characteristics of QRL such as convergence, optimality, and balancing between exploration and exploitation are also analyzed, which shows that this approach makes a good tradeoff between exploration and exploitation using the probability amplitude and can speedup learning through the quantum parallelism. To evaluate the performance and practicability of QRL, several simulated experiments are given, and the results demonstrate the effectiveness and superiority of the QRL algorithm for some complex problems. This paper is also an effective exploration on the application of quantum computation to artificial intelligence.

  18. Fault-tolerant optimised tracking control for unknown discrete-time linear systems using a combined reinforcement learning and residual compensation methodology

    NASA Astrophysics Data System (ADS)

    Han, Ke-Zhen; Feng, Jian; Cui, Xiaohong

    2017-10-01

    This paper considers the fault-tolerant optimised tracking control (FTOTC) problem for unknown discrete-time linear system. A research scheme is proposed on the basis of data-based parity space identification, reinforcement learning and residual compensation techniques. The main characteristic of this research scheme lies in the parity-space-identification-based simultaneous tracking control and residual compensation. The specific technical line consists of four main contents: apply subspace aided method to design observer-based residual generator; use reinforcement Q-learning approach to solve optimised tracking control policy; rely on robust H∞ theory to achieve noise attenuation; adopt fault estimation triggered by residual generator to perform fault compensation. To clarify the design and implementation procedures, an integrated algorithm is further constructed to link up these four functional units. The detailed analysis and proof are subsequently given to explain the guaranteed FTOTC performance of the proposed conclusions. Finally, a case simulation is provided to verify its effectiveness.

  19. Implementation of real-time energy management strategy based on reinforcement learning for hybrid electric vehicles and simulation validation

    PubMed Central

    Kong, Zehui; Liu, Teng

    2017-01-01

    To further improve the fuel economy of series hybrid electric tracked vehicles, a reinforcement learning (RL)-based real-time energy management strategy is developed in this paper. In order to utilize the statistical characteristics of online driving schedule effectively, a recursive algorithm for the transition probability matrix (TPM) of power-request is derived. The reinforcement learning (RL) is applied to calculate and update the control policy at regular time, adapting to the varying driving conditions. A facing-forward powertrain model is built in detail, including the engine-generator model, battery model and vehicle dynamical model. The robustness and adaptability of real-time energy management strategy are validated through the comparison with the stationary control strategy based on initial transition probability matrix (TPM) generated from a long naturalistic driving cycle in the simulation. Results indicate that proposed method has better fuel economy than stationary one and is more effective in real-time control. PMID:28671967

  20. Implementation of real-time energy management strategy based on reinforcement learning for hybrid electric vehicles and simulation validation.

    PubMed

    Kong, Zehui; Zou, Yuan; Liu, Teng

    2017-01-01

    To further improve the fuel economy of series hybrid electric tracked vehicles, a reinforcement learning (RL)-based real-time energy management strategy is developed in this paper. In order to utilize the statistical characteristics of online driving schedule effectively, a recursive algorithm for the transition probability matrix (TPM) of power-request is derived. The reinforcement learning (RL) is applied to calculate and update the control policy at regular time, adapting to the varying driving conditions. A facing-forward powertrain model is built in detail, including the engine-generator model, battery model and vehicle dynamical model. The robustness and adaptability of real-time energy management strategy are validated through the comparison with the stationary control strategy based on initial transition probability matrix (TPM) generated from a long naturalistic driving cycle in the simulation. Results indicate that proposed method has better fuel economy than stationary one and is more effective in real-time control.

  1. Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations.

    PubMed

    Lee, Jae Young; Park, Jin Bae; Choi, Yoon Ho

    2015-05-01

    This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral Q -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.

  2. Reinforcement learning in complementarity game and population dynamics

    NASA Astrophysics Data System (ADS)

    Jost, Jürgen; Li, Wei

    2014-02-01

    We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005), 10.1016/j.physa.2004.07.005] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.

  3. The prefrontal cortex and hybrid learning during iterative competitive games.

    PubMed

    Abe, Hiroshi; Seo, Hyojung; Lee, Daeyeol

    2011-12-01

    Behavioral changes driven by reinforcement and punishment are referred to as simple or model-free reinforcement learning. Animals can also change their behaviors by observing events that are neither appetitive nor aversive when these events provide new information about payoffs available from alternative actions. This is an example of model-based reinforcement learning and can be accomplished by incorporating hypothetical reward signals into the value functions for specific actions. Recent neuroimaging and single-neuron recording studies showed that the prefrontal cortex and the striatum are involved not only in reinforcement and punishment, but also in model-based reinforcement learning. We found evidence for both types of learning, and hence hybrid learning, in monkeys during simulated competitive games. In addition, in both the dorsolateral prefrontal cortex and orbitofrontal cortex, individual neurons heterogeneously encoded signals related to actual and hypothetical outcomes from specific actions, suggesting that both areas might contribute to hybrid learning. © 2011 New York Academy of Sciences.

  4. Examining Organizational Learning in Schools: The Role of Psychological Safety, Experimentation, and Leadership that Reinforces Learning

    ERIC Educational Resources Information Center

    Higgins, Monica; Ishimaru, Ann; Holcombe, Rebecca; Fowler, Amy

    2012-01-01

    This study draws upon theory and methods from the field of organizational behavior to examine organizational learning (OL) in the context of a large urban US school district. We build upon prior literature on OL from the field of organizational behavior to introduce and validate three subscales that assess key dimensions of organizational learning…

  5. Improving Robot Locomotion Through Learning Methods for Expensive Black-Box Systems

    DTIC Science & Technology

    2013-11-01

    development of a class of “gradient free” optimization techniques; these include local approaches, such as a Nelder- Mead simplex search (c.f. [73]), and global...1Note that this simple method differs from the Nelder Mead constrained nonlinear optimization method [73]. 39 the Non-dominated Sorting Genetic Algorithm...Kober, and Jan Peters. Model-free inverse reinforcement learning. In International Conference on Artificial Intelligence and Statistics, 2011. [12] George

  6. Reinforcement Learning and Dopamine in Schizophrenia: Dimensions of Symptoms or Specific Features of a Disease Group?

    PubMed Central

    Deserno, Lorenz; Boehme, Rebecca; Heinz, Andreas; Schlagenhauf, Florian

    2013-01-01

    Abnormalities in reinforcement learning are a key finding in schizophrenia and have been proposed to be linked to elevated levels of dopamine neurotransmission. Behavioral deficits in reinforcement learning and their neural correlates may contribute to the formation of clinical characteristics of schizophrenia. The ability to form predictions about future outcomes is fundamental for environmental interactions and depends on neuronal teaching signals, like reward prediction errors. While aberrant prediction errors, that encode non-salient events as surprising, have been proposed to contribute to the formation of positive symptoms, a failure to build neural representations of decision values may result in negative symptoms. Here, we review behavioral and neuroimaging research in schizophrenia and focus on studies that implemented reinforcement learning models. In addition, we discuss studies that combined reinforcement learning with measures of dopamine. Thereby, we suggest how reinforcement learning abnormalities in schizophrenia may contribute to the formation of psychotic symptoms and may interact with cognitive deficits. These ideas point toward an interplay of more rigid versus flexible control over reinforcement learning. Pronounced deficits in the flexible or model-based domain may allow for a detailed characterization of well-established cognitive deficits in schizophrenia patients based on computational models of learning. Finally, we propose a framework based on the potentially crucial contribution of dopamine to dysfunctional reinforcement learning on the level of neural networks. Future research may strongly benefit from computational modeling but also requires further methodological improvement for clinical group studies. These research tools may help to improve our understanding of disease-specific mechanisms and may help to identify clinically relevant subgroups of the heterogeneous entity schizophrenia. PMID:24391603

  7. Generalization of value in reinforcement learning by humans.

    PubMed

    Wimmer, G Elliott; Daw, Nathaniel D; Shohamy, Daphna

    2012-04-01

    Research in decision-making has focused on the role of dopamine and its striatal targets in guiding choices via learned stimulus-reward or stimulus-response associations, behavior that is well described by reinforcement learning theories. However, basic reinforcement learning is relatively limited in scope and does not explain how learning about stimulus regularities or relations may guide decision-making. A candidate mechanism for this type of learning comes from the domain of memory, which has highlighted a role for the hippocampus in learning of stimulus-stimulus relations, typically dissociated from the role of the striatum in stimulus-response learning. Here, we used functional magnetic resonance imaging and computational model-based analyses to examine the joint contributions of these mechanisms to reinforcement learning. Humans performed a reinforcement learning task with added relational structure, modeled after tasks used to isolate hippocampal contributions to memory. On each trial participants chose one of four options, but the reward probabilities for pairs of options were correlated across trials. This (uninstructed) relationship between pairs of options potentially enabled an observer to learn about option values based on experience with the other options and to generalize across them. We observed blood oxygen level-dependent (BOLD) activity related to learning in the striatum and also in the hippocampus. By comparing a basic reinforcement learning model to one augmented to allow feedback to generalize between correlated options, we tested whether choice behavior and BOLD activity were influenced by the opportunity to generalize across correlated options. Although such generalization goes beyond standard computational accounts of reinforcement learning and striatal BOLD, both choices and striatal BOLD activity were better explained by the augmented model. Consistent with the hypothesized role for the hippocampus in this generalization, functional connectivity between the ventral striatum and hippocampus was modulated, across participants, by the ability of the augmented model to capture participants' choice. Our results thus point toward an interactive model in which striatal reinforcement learning systems may employ relational representations typically associated with the hippocampus. © 2012 The Authors. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.

  8. Instrumental learning and relearning in individuals with psychopathy and in patients with lesions involving the amygdala or orbitofrontal cortex.

    PubMed

    Mitchell, D G V; Fine, C; Richell, R A; Newman, C; Lumsden, J; Blair, K S; Blair, R J R

    2006-05-01

    Previous work has shown that individuals with psychopathy are impaired on some forms of associative learning, particularly stimulus-reinforcement learning (Blair et al., 2004; Newman & Kosson, 1986). Animal work suggests that the acquisition of stimulus-reinforcement associations requires the amygdala (Baxter & Murray, 2002). Individuals with psychopathy also show impoverished reversal learning (Mitchell, Colledge, Leonard, & Blair, 2002). Reversal learning is supported by the ventrolateral and orbitofrontal cortex (Rolls, 2004). In this paper we present experiments investigating stimulus-reinforcement learning and relearning in patients with lesions of the orbitofrontal cortex or amygdala, and individuals with developmental psychopathy without known trauma. The results are interpreted with reference to current neurocognitive models of stimulus-reinforcement learning, relearning, and developmental psychopathy. Copyright (c) 2006 APA, all rights reserved.

  9. Reinforcing In-Service Teachers Education via ICT

    ERIC Educational Resources Information Center

    Thorsteinsson, Gisli

    2012-01-01

    Earlier educational models have not managed to take into account novel contextual and mobile methods of learning with the advances in technology-mediated learning. The article firstly reports an educational approach, namely, future innovative in-service teacher education in Europe (ICE-ED). This project was supported by the European Union Comenius…

  10. Reinforcement of Science Learning through Local Culture: A Delphi Study

    ERIC Educational Resources Information Center

    Nuangchalerm, Prasart

    2008-01-01

    This study aims to explore the ways to reinforce science learning through local culture by using Delphi technique. Twenty four participants in various fields of study were selected. The result of study provides a framework for reinforcement of science learning through local culture on the theme life and environment. (Contains 1 table.)

  11. Efficient model learning methods for actor-critic control.

    PubMed

    Grondman, Ivo; Vaandrager, Maarten; Buşoniu, Lucian; Babuska, Robert; Schuitema, Erik

    2012-06-01

    We propose two new actor-critic algorithms for reinforcement learning. Both algorithms use local linear regression (LLR) to learn approximations of the functions involved. A crucial feature of the algorithms is that they also learn a process model, and this, in combination with LLR, provides an efficient policy update for faster learning. The first algorithm uses a novel model-based update rule for the actor parameters. The second algorithm does not use an explicit actor but learns a reference model which represents a desired behavior, from which desired control actions can be calculated using the inverse of the learned process model. The two novel methods and a standard actor-critic algorithm are applied to the pendulum swing-up problem, in which the novel methods achieve faster learning than the standard algorithm.

  12. An architecture for designing fuzzy logic controllers using neural networks

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1991-01-01

    Described here is an architecture for designing fuzzy controllers through a hierarchical process of control rule acquisition and by using special classes of neural network learning techniques. A new method for learning to refine a fuzzy logic controller is introduced. A reinforcement learning technique is used in conjunction with a multi-layer neural network model of a fuzzy controller. The model learns by updating its prediction of the plant's behavior and is related to the Sutton's Temporal Difference (TD) method. The method proposed here has the advantage of using the control knowledge of an experienced operator and fine-tuning it through the process of learning. The approach is applied to a cart-pole balancing system.

  13. Machine learning in cardiovascular medicine: are we there yet?

    PubMed

    Shameer, Khader; Johnson, Kipp W; Glicksberg, Benjamin S; Dudley, Joel T; Sengupta, Partho P

    2018-01-19

    Artificial intelligence (AI) broadly refers to analytical algorithms that iteratively learn from data, allowing computers to find hidden insights without being explicitly programmed where to look. These include a family of operations encompassing several terms like machine learning, cognitive learning, deep learning and reinforcement learning-based methods that can be used to integrate and interpret complex biomedical and healthcare data in scenarios where traditional statistical methods may not be able to perform. In this review article, we discuss the basics of machine learning algorithms and what potential data sources exist; evaluate the need for machine learning; and examine the potential limitations and challenges of implementing machine in the context of cardiovascular medicine. The most promising avenues for AI in medicine are the development of automated risk prediction algorithms which can be used to guide clinical care; use of unsupervised learning techniques to more precisely phenotype complex disease; and the implementation of reinforcement learning algorithms to intelligently augment healthcare providers. The utility of a machine learning-based predictive model will depend on factors including data heterogeneity, data depth, data breadth, nature of modelling task, choice of machine learning and feature selection algorithms, and orthogonal evidence. A critical understanding of the strength and limitations of various methods and tasks amenable to machine learning is vital. By leveraging the growing corpus of big data in medicine, we detail pathways by which machine learning may facilitate optimal development of patient-specific models for improving diagnoses, intervention and outcome in cardiovascular medicine. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  14. Partial Planning Reinforcement Learning

    DTIC Science & Technology

    2012-08-31

    Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 15. SUBJECT TERMS Reinforcement Learning, Bayesian Optimization, Active ... Learning , Action Model Learning, Decision Theoretic Assistance Prasad Tadepalli, Alan Fern Oregon State University Office of Sponsored Programs Oregon State

  15. Reinforcement learning in supply chains.

    PubMed

    Valluri, Annapurna; North, Michael J; Macal, Charles M

    2009-10-01

    Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice.

  16. Neural Basis of Reinforcement Learning and Decision Making

    PubMed Central

    Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan

    2012-01-01

    Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain. PMID:22462543

  17. Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data.

    PubMed

    Lewis, F L; Vamvoudakis, Kyriakos G

    2011-02-01

    Approximate dynamic programming (ADP) is a class of reinforcement learning methods that have shown their importance in a variety of applications, including feedback control of dynamical systems. ADP generally requires full information about the system internal states, which is usually not available in practical situations. In this paper, we show how to implement ADP methods using only measured input/output data from the system. Linear dynamical systems with deterministic behavior are considered herein, which are systems of great interest in the control system community. In control system theory, these types of methods are referred to as output feedback (OPFB). The stochastic equivalent of the systems dealt with in this paper is a class of partially observable Markov decision processes. We develop both policy iteration and value iteration algorithms that converge to an optimal controller that requires only OPFB. It is shown that, similar to Q -learning, the new methods have the important advantage that knowledge of the system dynamics is not needed for the implementation of these learning algorithms or for the OPFB control. Only the order of the system, as well as an upper bound on its "observability index," must be known. The learned OPFB controller is in the form of a polynomial autoregressive moving-average controller that has equivalent performance with the optimal state variable feedback gain.

  18. Extreme Trust Region Policy Optimization for Active Object Recognition.

    PubMed

    Liu, Huaping; Wu, Yupei; Sun, Fuchun; Huaping Liu; Yupei Wu; Fuchun Sun; Sun, Fuchun; Liu, Huaping; Wu, Yupei

    2018-06-01

    In this brief, we develop a deep reinforcement learning method to actively recognize objects by choosing a sequence of actions for an active camera that helps to discriminate between the objects. The method is realized using trust region policy optimization, in which the policy is realized by an extreme learning machine and, therefore, leads to efficient optimization algorithm. The experimental results on the publicly available data set show the advantages of the developed extreme trust region optimization method.

  19. Extending the Peak Bandwidth of Parameters for Softmax Selection in Reinforcement Learning.

    PubMed

    Iwata, Kazunori

    2016-05-11

    Softmax selection is one of the most popular methods for action selection in reinforcement learning. Although various recently proposed methods may be more effective with full parameter tuning, implementing a complicated method that requires the tuning of many parameters can be difficult. Thus, softmax selection is still worth revisiting, considering the cost savings of its implementation and tuning. In fact, this method works adequately in practice with only one parameter appropriately set for the environment. The aim of this paper is to improve the variable setting of this method to extend the bandwidth of good parameters, thereby reducing the cost of implementation and parameter tuning. To achieve this, we take advantage of the asymptotic equipartition property in a Markov decision process to extend the peak bandwidth of softmax selection. Using a variety of episodic tasks, we show that our setting is effective in extending the bandwidth and that it yields a better policy in terms of stability. The bandwidth is quantitatively assessed in a series of statistical tests.

  20. The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive.

    PubMed

    Otto, A Ross; Gershman, Samuel J; Markman, Arthur B; Daw, Nathaniel D

    2013-05-01

    A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls behavior-and under what circumstances-are still unclear. Following the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrated that having human decision makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement-learning strategy. Further, we showed that, across trials, people negotiate the trade-off between the two systems dynamically as a function of concurrent executive-function demands, and people's choice latencies reflect the computational expenses of the strategy they employ. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.

  1. The Curse of Planning: Dissecting multiple reinforcement learning systems by taxing the central executive

    PubMed Central

    Otto, A. Ross; Gershman, Samuel J.; Markman, Arthur B.; Daw, Nathaniel D.

    2013-01-01

    A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. Along these lines, a flexible but computationally expensive model-based reinforcement learning system has been contrasted with a less flexible but more efficient model-free reinforcement learning system. The factors governing which system controls behavior—and under what circumstances—are still unclear. Based on the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrate that having human decision-makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement learning strategy. Further, we show that across trials, people negotiate this tradeoff dynamically as a function of concurrent executive function demands and their choice latencies reflect the computational expenses of the strategy employed. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources. PMID:23558545

  2. Machine Learning and Infrared Thermography for Fiber Orientation Assessment on Randomly-Oriented Strands Parts.

    PubMed

    Fernandes, Henrique; Zhang, Hai; Figueiredo, Alisson; Malheiros, Fernando; Ignacio, Luis Henrique; Sfarra, Stefano; Ibarra-Castanedo, Clemente; Guimaraes, Gilmar; Maldague, Xavier

    2018-01-19

    The use of fiber reinforced materials such as randomly-oriented strands has grown in recent years, especially for manufacturing of aerospace composite structures. This growth is mainly due to their advantageous properties: they are lighter and more resistant to corrosion when compared to metals and are more easily shaped than continuous fiber composites. The resistance and stiffness of these materials are directly related to their fiber orientation. Thus, efficient approaches to assess their fiber orientation are in demand. In this paper, a non-destructive evaluation method is applied to assess the fiber orientation on laminates reinforced with randomly-oriented strands. More specifically, a method called pulsed thermal ellipsometry combined with an artificial neural network, a machine learning technique, is used in order to estimate the fiber orientation on the surface of inspected parts. Results showed that the method can be potentially used to inspect large areas with good accuracy and speed.

  3. Machine Learning and Infrared Thermography for Fiber Orientation Assessment on Randomly-Oriented Strands Parts

    PubMed Central

    Maldague, Xavier

    2018-01-01

    The use of fiber reinforced materials such as randomly-oriented strands has grown in recent years, especially for manufacturing of aerospace composite structures. This growth is mainly due to their advantageous properties: they are lighter and more resistant to corrosion when compared to metals and are more easily shaped than continuous fiber composites. The resistance and stiffness of these materials are directly related to their fiber orientation. Thus, efficient approaches to assess their fiber orientation are in demand. In this paper, a non-destructive evaluation method is applied to assess the fiber orientation on laminates reinforced with randomly-oriented strands. More specifically, a method called pulsed thermal ellipsometry combined with an artificial neural network, a machine learning technique, is used in order to estimate the fiber orientation on the surface of inspected parts. Results showed that the method can be potentially used to inspect large areas with good accuracy and speed. PMID:29351240

  4. Fuzzy and neural control

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1992-01-01

    Fuzzy logic and neural networks provide new methods for designing control systems. Fuzzy logic controllers do not require a complete analytical model of a dynamic system and can provide knowledge-based heuristic controllers for ill-defined and complex systems. Neural networks can be used for learning control. In this chapter, we discuss hybrid methods using fuzzy logic and neural networks which can start with an approximate control knowledge base and refine it through reinforcement learning.

  5. Learning Sequences of Actions in Collectives of Autonomous Agents

    NASA Technical Reports Server (NTRS)

    Turner, Kagan; Agogino, Adrian K.; Wolpert, David H.; Clancy, Daniel (Technical Monitor)

    2001-01-01

    In this paper we focus on the problem of designing a collective of autonomous agents that individually learn sequences of actions such that the resultant sequence of joint actions achieves a predetermined global objective. We are particularly interested in instances of this problem where centralized control is either impossible or impractical. For single agent systems in similar domains, machine learning methods (e.g., reinforcement learners) have been successfully used. However, applying such solutions directly to multi-agent systems often proves problematic, as agents may work at cross-purposes, or have difficulty in evaluating their contribution to achievement of the global objective, or both. Accordingly, the crucial design step in multiagent systems centers on determining the private objectives of each agent so that as the agents strive for those objectives, the system reaches a good global solution. In this work we consider a version of this problem involving multiple autonomous agents in a grid world. We use concepts from collective intelligence to design goals for the agents that are 'aligned' with the global goal, and are 'learnable' in that agents can readily see how their behavior affects their utility. We show that reinforcement learning agents using those goals outperform both 'natural' extensions of single agent algorithms and global reinforcement, learning solutions based on 'team games'.

  6. Examining the Use of Web-Based Reusable Learning Objects by Animal and Veterinary Nursing Students

    ERIC Educational Resources Information Center

    Chapman-Waterhouse, Emily; Silva-Fletcher, Ayona; Whittlestone, Kim David

    2016-01-01

    This intervention study examined the interaction of animal and veterinary nursing students with reusable learning objects (RLO) in the context of preparing for summative assessment. Data was collected from 199 undergraduates using quantitative and qualitative methods. Students accessed RLO via personal devices in order to reinforce taught…

  7. English and the Learning-Disabled Student: A Survey of Research.

    ERIC Educational Resources Information Center

    Siegel, Gerald

    The author reviews literature on teaching the learning disabled (LD) in college English classrooms. He notes work by V. Davis which suggests the following methods and techniques: (1) reinforce coping techniques the students have already developed; (2) provide help with reading tasks through summaries of vocabulary; (3) allow taping of classes (to…

  8. Exploring Interprofessional Education through a High-Fidelity Human Patient Simulation Scenario: A Mixed Methods Study

    ERIC Educational Resources Information Center

    Rossler, Kelly Lynn

    2013-01-01

    High-fidelity human patient simulation has emerged as a valuable medium to reinforce educational content within programs of nursing. As simulation learning experiences have been identified as augmenting both didactic lecture content and clinical learning, these experiences have expanded to incorporate interprofessional education. Review of…

  9. Twenty Golden Opportunities To Enhance Student Learning: Use Them or Lose Them.

    ERIC Educational Resources Information Center

    Sponder, Barry

    In an average classroom period, a teacher has twenty or more opportunities to interact with students and thereby influence learning outcomes. As such, teachers should use these opportunities to reinforce instruction or give positive corrective feedback. Typical methods used in schools emphasize error correction at the expense of calling attention…

  10. Peer-led small groups: Are we on the right track?

    PubMed

    Moore, Fraser

    2017-10-01

    Peer tutor-led small group sessions are a valuable learning strategy but students may lack confidence in the absence of a content expert. This study examined whether faculty reinforcement of peer tutor-led small group content was beneficial. Two peer tutor-led small group sessions were compared with one faculty-led small group session using questionnaires sent to student participants and interviews with the peer tutors. One peer tutor-led session was followed by a lecture with revision of the small group content; after the second, students submitted a group report which was corrected and returned to them with comments. Student participants and peer tutors identified increased discussion and opportunity for personal reflection as major benefits of the peer tutor-led small group sessions, but students did express uncertainty about gaps in their learning following these sessions. Both methods of subsequent faculty reinforcement were perceived as valuable by student participants and peer tutors. Knowing in advance that the group report would be corrected reduced discussion in some groups, potentially negating one of the major benefits of the peer tutor-led sessions. Faculty reinforcement of peer-tutor led small group content benefits students but close attention should be paid to the method of reinforcement.

  11. Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning.

    PubMed

    Morimura, Tetsuro; Uchibe, Eiji; Yoshimoto, Junichiro; Peters, Jan; Doya, Kenji

    2010-02-01

    Most conventional policy gradient reinforcement learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the policy parameter. That term involves the derivative of the stationary state distribution that corresponds to the sensitivity of its distribution to changes in the policy parameter. Although the bias introduced by this omission can be reduced by setting the forgetting rate gamma for the value functions close to 1, these algorithms do not permit gamma to be set exactly at gamma = 1. In this article, we propose a method for estimating the log stationary state distribution derivative (LSD) as a useful form of the derivative of the stationary state distribution through backward Markov chain formulation and a temporal difference learning framework. A new policy gradient (PG) framework with an LSD is also proposed, in which the average reward gradient can be estimated by setting gamma = 0, so it becomes unnecessary to learn the value functions. We also test the performance of the proposed algorithms using simple benchmark tasks and show that these can improve the performances of existing PG methods.

  12. Spectrum Access In Cognitive Radio Using a Two-Stage Reinforcement Learning Approach

    NASA Astrophysics Data System (ADS)

    Raj, Vishnu; Dias, Irene; Tholeti, Thulasi; Kalyani, Sheetal

    2018-02-01

    With the advent of the 5th generation of wireless standards and an increasing demand for higher throughput, methods to improve the spectral efficiency of wireless systems have become very important. In the context of cognitive radio, a substantial increase in throughput is possible if the secondary user can make smart decisions regarding which channel to sense and when or how often to sense. Here, we propose an algorithm to not only select a channel for data transmission but also to predict how long the channel will remain unoccupied so that the time spent on channel sensing can be minimized. Our algorithm learns in two stages - a reinforcement learning approach for channel selection and a Bayesian approach to determine the optimal duration for which sensing can be skipped. Comparisons with other learning methods are provided through extensive simulations. We show that the number of sensing is minimized with negligible increase in primary interference; this implies that lesser energy is spent by the secondary user in sensing and also higher throughput is achieved by saving on sensing.

  13. Classification of amyotrophic lateral sclerosis disease based on convolutional neural network and reinforcement sample learning algorithm.

    PubMed

    Sengur, Abdulkadir; Akbulut, Yaman; Guo, Yanhui; Bajaj, Varun

    2017-12-01

    Electromyogram (EMG) signals contain useful information of the neuromuscular diseases like amyotrophic lateral sclerosis (ALS). ALS is a well-known brain disease, which can progressively degenerate the motor neurons. In this paper, we propose a deep learning based method for efficient classification of ALS and normal EMG signals. Spectrogram, continuous wavelet transform (CWT), and smoothed pseudo Wigner-Ville distribution (SPWVD) have been employed for time-frequency (T-F) representation of EMG signals. A convolutional neural network is employed to classify these features. In it, Two convolution layers, two pooling layer, a fully connected layer and a lost function layer is considered in CNN architecture. The CNN architecture is trained with the reinforcement sample learning strategy. The efficiency of the proposed implementation is tested on publicly available EMG dataset. The dataset contains 89 ALS and 133 normal EMG signals with 24 kHz sampling frequency. Experimental results show 96.80% accuracy. The obtained results are also compared with other methods, which show the superiority of the proposed method.

  14. B-tree search reinforcement learning for model based intelligent agent

    NASA Astrophysics Data System (ADS)

    Bhuvaneswari, S.; Vignashwaran, R.

    2013-03-01

    Agents trained by learning techniques provide a powerful approximation of active solutions for naive approaches. In this study using B - Trees implying reinforced learning the data search for information retrieval is moderated to achieve accuracy with minimum search time. The impact of variables and tactics applied in training are determined using reinforcement learning. Agents based on these techniques perform satisfactory baseline and act as finite agents based on the predetermined model against competitors from the course.

  15. Using Fuzzy Logic for Performance Evaluation in Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap S.

    1992-01-01

    Current reinforcement learning algorithms require long training periods which generally limit their applicability to small size problems. A new architecture is described which uses fuzzy rules to initialize its two neural networks: a neural network for performance evaluation and another for action selection. This architecture is applied to control of dynamic systems and it is demonstrated that it is possible to start with an approximate prior knowledge and learn to refine it through experiments using reinforcement learning.

  16. Reinforcement learning in multidimensional environments relies on attention mechanisms.

    PubMed

    Niv, Yael; Daniel, Reka; Geana, Andra; Gershman, Samuel J; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C

    2015-05-27

    In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this "representation learning" process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the "curse of dimensionality" in reinforcement learning. Copyright © 2015 the authors 0270-6474/15/358145-13$15.00/0.

  17. Changes in corticostriatal connectivity during reinforcement learning in humans.

    PubMed

    Horga, Guillermo; Maia, Tiago V; Marsh, Rachel; Hao, Xuejun; Xu, Dongrong; Duan, Yunsuo; Tau, Gregory Z; Graniello, Barbara; Wang, Zhishun; Kangarlu, Alayar; Martinez, Diana; Packard, Mark G; Peterson, Bradley S

    2015-02-01

    Many computational models assume that reinforcement learning relies on changes in synaptic efficacy between cortical regions representing stimuli and striatal regions involved in response selection, but this assumption has thus far lacked empirical support in humans. We recorded hemodynamic signals with fMRI while participants navigated a virtual maze to find hidden rewards. We fitted a reinforcement-learning algorithm to participants' choice behavior and evaluated the neural activity and the changes in functional connectivity related to trial-by-trial learning variables. Activity in the posterior putamen during choice periods increased progressively during learning. Furthermore, the functional connections between the sensorimotor cortex and the posterior putamen strengthened progressively as participants learned the task. These changes in corticostriatal connectivity differentiated participants who learned the task from those who did not. These findings provide a direct link between changes in corticostriatal connectivity and learning, thereby supporting a central assumption common to several computational models of reinforcement learning. © 2014 Wiley Periodicals, Inc.

  18. A common neural circuit mechanism for internally guided and externally reinforced forms of motor learning.

    PubMed

    Hisey, Erin; Kearney, Matthew Gene; Mooney, Richard

    2018-04-01

    The complex skills underlying verbal and musical expression can be learned without external punishment or reward, indicating their learning is internally guided. The neural mechanisms that mediate internally guided learning are poorly understood, but a circuit comprising dopamine-releasing neurons in the midbrain ventral tegmental area (VTA) and their targets in the basal ganglia are important to externally reinforced learning. Juvenile zebra finches copy a tutor song in a process that is internally guided and, in adulthood, can learn to modify the fundamental frequency (pitch) of a target syllable in response to external reinforcement with white noise. Here we combined intersectional genetic ablation of VTA neurons, reversible blockade of dopamine receptors in the basal ganglia, and singing-triggered optogenetic stimulation of VTA terminals to establish that a common VTA-basal ganglia circuit enables internally guided song copying and externally reinforced syllable pitch learning.

  19. Knockout crickets for the study of learning and memory: Dopamine receptor Dop1 mediates aversive but not appetitive reinforcement in crickets.

    PubMed

    Awata, Hiroko; Watanabe, Takahito; Hamanaka, Yoshitaka; Mito, Taro; Noji, Sumihare; Mizunami, Makoto

    2015-11-02

    Elucidation of reinforcement mechanisms in associative learning is an important subject in neuroscience. In mammals, dopamine neurons are thought to play critical roles in mediating both appetitive and aversive reinforcement. Our pharmacological studies suggested that octopamine and dopamine neurons mediate reward and punishment, respectively, in crickets, but recent studies in fruit-flies concluded that dopamine neurons mediates both reward and punishment, via the type 1 dopamine receptor Dop1. To resolve the discrepancy between studies in different insect species, we produced Dop1 knockout crickets using the CRISPR/Cas9 system and found that they are defective in aversive learning with sodium chloride punishment but not appetitive learning with water or sucrose reward. The results suggest that dopamine and octopamine neurons mediate aversive and appetitive reinforcement, respectively, in crickets. We suggest unexpected diversity in neurotransmitters mediating appetitive reinforcement between crickets and fruit-flies, although the neurotransmitter mediating aversive reinforcement is conserved. This study demonstrates usefulness of the CRISPR/Cas9 system for producing knockout animals for the study of learning and memory.

  20. Applying reinforcement learning techniques to detect hepatocellular carcinoma under limited screening capacity.

    PubMed

    Lee, Elliot; Lavieri, Mariel S; Volk, Michael L; Xu, Yongcai

    2015-09-01

    We investigate the problem faced by a healthcare system wishing to allocate its constrained screening resources across a population at risk for developing a disease. A patient's risk of developing the disease depends on his/her biomedical dynamics. However, knowledge of these dynamics must be learned by the system over time. Three classes of reinforcement learning policies are designed to address this problem of simultaneously gathering and utilizing information across multiple patients. We investigate a case study based upon the screening for Hepatocellular Carcinoma (HCC), and optimize each of the three classes of policies using the indifference zone method. A simulation is built to gauge the performance of these policies, and their performance is compared to current practice. We then demonstrate how the benefits of learning-based screening policies differ across various levels of resource scarcity and provide metrics of policy performance.

  1. Identifying Cognitive Remediation Change Through Computational Modelling—Effects on Reinforcement Learning in Schizophrenia

    PubMed Central

    Cella, Matteo; Bishara, Anthony J.; Medin, Evelina; Swan, Sarah; Reeder, Clare; Wykes, Til

    2014-01-01

    Objective: Converging research suggests that individuals with schizophrenia show a marked impairment in reinforcement learning, particularly in tasks requiring flexibility and adaptation. The problem has been associated with dopamine reward systems. This study explores, for the first time, the characteristics of this impairment and how it is affected by a behavioral intervention—cognitive remediation. Method: Using computational modelling, 3 reinforcement learning parameters based on the Wisconsin Card Sorting Test (WCST) trial-by-trial performance were estimated: R (reward sensitivity), P (punishment sensitivity), and D (choice consistency). In Study 1 the parameters were compared between a group of individuals with schizophrenia (n = 100) and a healthy control group (n = 50). In Study 2 the effect of cognitive remediation therapy (CRT) on these parameters was assessed in 2 groups of individuals with schizophrenia, one receiving CRT (n = 37) and the other receiving treatment as usual (TAU, n = 34). Results: In Study 1 individuals with schizophrenia showed impairment in the R and P parameters compared with healthy controls. Study 2 demonstrated that sensitivity to negative feedback (P) and reward (R) improved in the CRT group after therapy compared with the TAU group. R and P parameter change correlated with WCST outputs. Improvements in R and P after CRT were associated with working memory gains and reduction of negative symptoms, respectively. Conclusion: Schizophrenia reinforcement learning difficulties negatively influence performance in shift learning tasks. CRT can improve sensitivity to reward and punishment. Identifying parameters that show change may be useful in experimental medicine studies to identify cognitive domains susceptible to improvement. PMID:24214932

  2. Social Cognition as Reinforcement Learning: Feedback Modulates Emotion Inference.

    PubMed

    Zaki, Jamil; Kallman, Seth; Wimmer, G Elliott; Ochsner, Kevin; Shohamy, Daphna

    2016-09-01

    Neuroscientific studies of social cognition typically employ paradigms in which perceivers draw single-shot inferences about the internal states of strangers. Real-world social inference features much different parameters: People often encounter and learn about particular social targets (e.g., friends) over time and receive feedback about whether their inferences are correct or incorrect. Here, we examined this process and, more broadly, the intersection between social cognition and reinforcement learning. Perceivers were scanned using fMRI while repeatedly encountering three social targets who produced conflicting visual and verbal emotional cues. Perceivers guessed how targets felt and received feedback about whether they had guessed correctly. Visual cues reliably predicted one target's emotion, verbal cues predicted a second target's emotion, and neither reliably predicted the third target's emotion. Perceivers successfully used this information to update their judgments over time. Furthermore, trial-by-trial learning signals-estimated using two reinforcement learning models-tracked activity in ventral striatum and ventromedial pFC, structures associated with reinforcement learning, and regions associated with updating social impressions, including TPJ. These data suggest that learning about others' emotions, like other forms of feedback learning, relies on domain-general reinforcement mechanisms as well as domain-specific social information processing.

  3. Human-level control through deep reinforcement learning.

    PubMed

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-26

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  4. Human-level control through deep reinforcement learning

    NASA Astrophysics Data System (ADS)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-01

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  5. Switching Reinforcement Learning for Continuous Action Space

    NASA Astrophysics Data System (ADS)

    Nagayoshi, Masato; Murao, Hajime; Tamaki, Hisashi

    Reinforcement Learning (RL) attracts much attention as a technique of realizing computational intelligence such as adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. This difficulty includes a problem of designing a suitable action space of an agent, i.e., satisfying two requirements in trade-off: (i) to keep the characteristics (or structure) of an original search space as much as possible in order to seek strategies that lie close to the optimal, and (ii) to reduce the search space as much as possible in order to expedite the learning process. In order to design a suitable action space adaptively, we propose switching RL model to mimic a process of an infant's motor development in which gross motor skills develop before fine motor skills. Then, a method for switching controllers is constructed by introducing and referring to the “entropy”. Further, through computational experiments by using robot navigation problems with one and two-dimensional continuous action space, the validity of the proposed method has been confirmed.

  6. Optimizing microstimulation using a reinforcement learning framework.

    PubMed

    Brockmeier, Austin J; Choi, John S; Distasio, Marcello M; Francis, Joseph T; Príncipe, José C

    2011-01-01

    The ability to provide sensory feedback is desired to enhance the functionality of neuroprosthetics. Somatosensory feedback provides closed-loop control to the motor system, which is lacking in feedforward neuroprosthetics. In the case of existing somatosensory function, a template of the natural response can be used as a template of desired response elicited by electrical microstimulation. In the case of no initial training data, microstimulation parameters that produce responses close to the template must be selected in an online manner. We propose using reinforcement learning as a framework to balance the exploration of the parameter space and the continued selection of promising parameters for further stimulation. This approach avoids an explicit model of the neural response from stimulation. We explore a preliminary architecture--treating the task as a k-armed bandit--using offline data recorded for natural touch and thalamic microstimulation, and we examine the methods efficiency in exploring the parameter space while concentrating on promising parameter forms. The best matching stimulation parameters, from k = 68 different forms, are selected by the reinforcement learning algorithm consistently after 334 realizations.

  7. A Reinforcement Learning Model Equipped with Sensors for Generating Perception Patterns: Implementation of a Simulated Air Navigation System Using ADS-B (Automatic Dependent Surveillance-Broadcast) Technology.

    PubMed

    Álvarez de Toledo, Santiago; Anguera, Aurea; Barreiro, José M; Lara, Juan A; Lizcano, David

    2017-01-19

    Over the last few decades, a number of reinforcement learning techniques have emerged, and different reinforcement learning-based applications have proliferated. However, such techniques tend to specialize in a particular field. This is an obstacle to their generalization and extrapolation to other areas. Besides, neither the reward-punishment (r-p) learning process nor the convergence of results is fast and efficient enough. To address these obstacles, this research proposes a general reinforcement learning model. This model is independent of input and output types and based on general bioinspired principles that help to speed up the learning process. The model is composed of a perception module based on sensors whose specific perceptions are mapped as perception patterns. In this manner, similar perceptions (even if perceived at different positions in the environment) are accounted for by the same perception pattern. Additionally, the model includes a procedure that statistically associates perception-action pattern pairs depending on the positive or negative results output by executing the respective action in response to a particular perception during the learning process. To do this, the model is fitted with a mechanism that reacts positively or negatively to particular sensory stimuli in order to rate results. The model is supplemented by an action module that can be configured depending on the maneuverability of each specific agent. The model has been applied in the air navigation domain, a field with strong safety restrictions, which led us to implement a simulated system equipped with the proposed model. Accordingly, the perception sensors were based on Automatic Dependent Surveillance-Broadcast (ADS-B) technology, which is described in this paper. The results were quite satisfactory, and it outperformed traditional methods existing in the literature with respect to learning reliability and efficiency.

  8. A Reinforcement Learning Model Equipped with Sensors for Generating Perception Patterns: Implementation of a Simulated Air Navigation System Using ADS-B (Automatic Dependent Surveillance-Broadcast) Technology

    PubMed Central

    Álvarez de Toledo, Santiago; Anguera, Aurea; Barreiro, José M.; Lara, Juan A.; Lizcano, David

    2017-01-01

    Over the last few decades, a number of reinforcement learning techniques have emerged, and different reinforcement learning-based applications have proliferated. However, such techniques tend to specialize in a particular field. This is an obstacle to their generalization and extrapolation to other areas. Besides, neither the reward-punishment (r-p) learning process nor the convergence of results is fast and efficient enough. To address these obstacles, this research proposes a general reinforcement learning model. This model is independent of input and output types and based on general bioinspired principles that help to speed up the learning process. The model is composed of a perception module based on sensors whose specific perceptions are mapped as perception patterns. In this manner, similar perceptions (even if perceived at different positions in the environment) are accounted for by the same perception pattern. Additionally, the model includes a procedure that statistically associates perception-action pattern pairs depending on the positive or negative results output by executing the respective action in response to a particular perception during the learning process. To do this, the model is fitted with a mechanism that reacts positively or negatively to particular sensory stimuli in order to rate results. The model is supplemented by an action module that can be configured depending on the maneuverability of each specific agent. The model has been applied in the air navigation domain, a field with strong safety restrictions, which led us to implement a simulated system equipped with the proposed model. Accordingly, the perception sensors were based on Automatic Dependent Surveillance-Broadcast (ADS-B) technology, which is described in this paper. The results were quite satisfactory, and it outperformed traditional methods existing in the literature with respect to learning reliability and efficiency. PMID:28106849

  9. Attitude of medical students towards Early Clinical Exposure in learning endocrine physiology

    PubMed Central

    Sathishkumar, Solomon; Thomas, Nihal; Tharion, Elizabeth; Neelakantan, Nithya; Vyas, Rashmi

    2007-01-01

    Background Different teaching-learning methods have been used in teaching endocrine physiology for the medical students, so as to increase their interest and enhance their learning. This paper describes the pros and cons of the various approaches used to reinforce didactic instruction in endocrine physiology and goes on to describe the value of adding an Early Clinical Exposure program (ECE) to didactic instruction in endocrine physiology, as well as student reactions to it as an alternative approach. Discussion Various methods have been used to reinforce didactic instruction in endocrine physiology such as case-stimulated learning, problem-based learning, patient-centred learning and multiple-format sessions. We devised a teaching-learning intervention in endocrine physiology, which comprised of traditional didactic lectures, supplemented with an ECE program consisting of case based lectures and a hospital visit to see patients. A focus group discussion was conducted with the medical students and, based on the themes that emerged from it, a questionnaire was developed and administered to further enquire into the attitude of all the students towards ECE in learning endocrine physiology. The students in their feedback commented that ECE increased their interest for the subject and motivated them to read more. They also felt that ECE enhanced their understanding of endocrine physiology, enabled them to remember the subject better, contributed to their knowledge of the subject and also helped them to integrate their knowledge. Many students said that ECE increased their sensitivity toward patient problems and needs. They expressed a desire and a need for ECE to be continued in teaching endocrine physiology for future groups of students and also be extended for teaching other systems as well. The majority of the students (96.4%) in their feedback gave an overall rating of the program as good to excellent on a 5 point Likert scale. Summary The ECE program was introduced as an alternative approach to reinforce didactic instruction in endocrine physiology for the first year medical students. The study demonstrated that students clearly enjoyed the experience and perceived that it was valuable. This method could potentially be used for other basic science topics as well. PMID:17784967

  10. Role of dopamine D2 receptors in human reinforcement learning.

    PubMed

    Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W

    2014-09-01

    Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well.

  11. Role of Dopamine D2 Receptors in Human Reinforcement Learning

    PubMed Central

    Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W

    2014-01-01

    Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well. PMID:24713613

  12. Reinforcement learning in computer vision

    NASA Astrophysics Data System (ADS)

    Bernstein, A. V.; Burnaev, E. V.

    2018-04-01

    Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computer vision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computer vision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computer vision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computer vision problems.

  13. Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms

    PubMed Central

    Daniel, Reka; Geana, Andra; Gershman, Samuel J.; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C.

    2015-01-01

    In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this “representation learning” process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the “curse of dimensionality” in reinforcement learning. PMID:26019331

  14. Seizure Control in a Computational Model Using a Reinforcement Learning Stimulation Paradigm.

    PubMed

    Nagaraj, Vivek; Lamperski, Andrew; Netoff, Theoden I

    2017-11-01

    Neuromodulation technologies such as vagus nerve stimulation and deep brain stimulation, have shown some efficacy in controlling seizures in medically intractable patients. However, inherent patient-to-patient variability of seizure disorders leads to a wide range of therapeutic efficacy. A patient specific approach to determining stimulation parameters may lead to increased therapeutic efficacy while minimizing stimulation energy and side effects. This paper presents a reinforcement learning algorithm that optimizes stimulation frequency for controlling seizures with minimum stimulation energy. We apply our method to a computational model called the epileptor. The epileptor model simulates inter-ictal and ictal local field potential data. In order to apply reinforcement learning to the Epileptor, we introduce a specialized reward function and state-space discretization. With the reward function and discretization fixed, we test the effectiveness of the temporal difference reinforcement learning algorithm (TD(0)). For periodic pulsatile stimulation, we derive a relation that describes, for any stimulation frequency, the minimal pulse amplitude required to suppress seizures. The TD(0) algorithm is able to identify parameters that control seizures quickly. Additionally, our results show that the TD(0) algorithm refines the stimulation frequency to minimize stimulation energy thereby converging to optimal parameters reliably. An advantage of the TD(0) algorithm is that it is adaptive so that the parameters necessary to control the seizures can change over time. We show that the algorithm can converge on the optimal solution in simulation with slow and fast inter-seizure intervals.

  15. Action-Driven Visual Object Tracking With Deep Reinforcement Learning.

    PubMed

    Yun, Sangdoo; Choi, Jongwon; Yoo, Youngjoon; Yun, Kimin; Choi, Jin Young

    2018-06-01

    In this paper, we propose an efficient visual tracker, which directly captures a bounding box containing the target object in a video by means of sequential actions learned using deep neural networks. The proposed deep neural network to control tracking actions is pretrained using various training video sequences and fine-tuned during actual tracking for online adaptation to a change of target and background. The pretraining is done by utilizing deep reinforcement learning (RL) as well as supervised learning. The use of RL enables even partially labeled data to be successfully utilized for semisupervised learning. Through the evaluation of the object tracking benchmark data set, the proposed tracker is validated to achieve a competitive performance at three times the speed of existing deep network-based trackers. The fast version of the proposed method, which operates in real time on graphics processing unit, outperforms the state-of-the-art real-time trackers with an accuracy improvement of more than 8%.

  16. Network congestion control algorithm based on Actor-Critic reinforcement learning model

    NASA Astrophysics Data System (ADS)

    Xu, Tao; Gong, Lina; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen

    2018-04-01

    Aiming at the network congestion control problem, a congestion control algorithm based on Actor-Critic reinforcement learning model is designed. Through the genetic algorithm in the congestion control strategy, the network congestion problems can be better found and prevented. According to Actor-Critic reinforcement learning, the simulation experiment of network congestion control algorithm is designed. The simulation experiments verify that the AQM controller can predict the dynamic characteristics of the network system. Moreover, the learning strategy is adopted to optimize the network performance, and the dropping probability of packets is adaptively adjusted so as to improve the network performance and avoid congestion. Based on the above finding, it is concluded that the network congestion control algorithm based on Actor-Critic reinforcement learning model can effectively avoid the occurrence of TCP network congestion.

  17. From Recurrent Choice to Skill Learning: A Reinforcement-Learning Model

    ERIC Educational Resources Information Center

    Fu, Wai-Tat; Anderson, John R.

    2006-01-01

    The authors propose a reinforcement-learning mechanism as a model for recurrent choice and extend it to account for skill learning. The model was inspired by recent research in neurophysiological studies of the basal ganglia and provides an integrated explanation of recurrent choice behavior and skill learning. The behavior includes effects of…

  18. The Effects of Interspersal and Reinforcement on Math Fact Accuracy and Learning Rate

    ERIC Educational Resources Information Center

    Rumberger, Jessica L.

    2013-01-01

    Mathematics skill acquisition is a crucial component of education and ongoing research is needed to determine quality instructional techniques. A ubiquitous instructional question is how to manage time. This study investigated several flashcard presentation methods to determine the one that would provide the most learning in a set amount of time.…

  19. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning

    PubMed Central

    Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, BJ

    2014-01-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents towards action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggests possible explanations for how peers may motivate adolescent behavior. PMID:24550063

  20. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning.

    PubMed

    Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, B J

    2014-06-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.

  1. Stochastic Reinforcement Benefits Skill Acquisition

    ERIC Educational Resources Information Center

    Dayan, Eran; Averbeck, Bruno B.; Richmond, Barry J.; Cohen, Leonardo G.

    2014-01-01

    Learning complex skills is driven by reinforcement, which facilitates both online within-session gains and retention of the acquired skills. Yet, in ecologically relevant situations, skills are often acquired when mapping between actions and rewarding outcomes is unknown to the learning agent, resulting in reinforcement schedules of a stochastic…

  2. Multi-agent Reinforcement Learning Model for Effective Action Selection

    NASA Astrophysics Data System (ADS)

    Youk, Sang Jo; Lee, Bong Keun

    Reinforcement learning is a sub area of machine learning concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward. In the case of multi-agent, especially, which state space and action space gets very enormous in compared to single agent, so it needs to take most effective measure available select the action strategy for effective reinforcement learning. This paper proposes a multi-agent reinforcement learning model based on fuzzy inference system in order to improve learning collect speed and select an effective action in multi-agent. This paper verifies an effective action select strategy through evaluation tests based on Robocop Keep away which is one of useful test-beds for multi-agent. Our proposed model can apply to evaluate efficiency of the various intelligent multi-agents and also can apply to strategy and tactics of robot soccer system.

  3. Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models

    PubMed Central

    Najnin, Shamima; Banerjee, Bonny

    2018-01-01

    Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The “novel words to novel objects” language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task. PMID:29441027

  4. Reinforcement learning improves behaviour from evaluative feedback

    NASA Astrophysics Data System (ADS)

    Littman, Michael L.

    2015-05-01

    Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.

  5. Reinforcement learning improves behaviour from evaluative feedback.

    PubMed

    Littman, Michael L

    2015-05-28

    Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.

  6. A game theory-reinforcement learning (GT-RL) method to develop optimal operation policies for multi-operator reservoir systems

    NASA Astrophysics Data System (ADS)

    Madani, Kaveh; Hooshyar, Milad

    2014-11-01

    Reservoir systems with multiple operators can benefit from coordination of operation policies. To maximize the total benefit of these systems the literature has normally used the social planner's approach. Based on this approach operation decisions are optimized using a multi-objective optimization model with a compound system's objective. While the utility of the system can be increased this way, fair allocation of benefits among the operators remains challenging for the social planner who has to assign controversial weights to the system's beneficiaries and their objectives. Cooperative game theory provides an alternative framework for fair and efficient allocation of the incremental benefits of cooperation. To determine the fair and efficient utility shares of the beneficiaries, cooperative game theory solution methods consider the gains of each party in the status quo (non-cooperation) as well as what can be gained through the grand coalition (social planner's solution or full cooperation) and partial coalitions. Nevertheless, estimation of the benefits of different coalitions can be challenging in complex multi-beneficiary systems. Reinforcement learning can be used to address this challenge and determine the gains of the beneficiaries for different levels of cooperation, i.e., non-cooperation, partial cooperation, and full cooperation, providing the essential input for allocation based on cooperative game theory. This paper develops a game theory-reinforcement learning (GT-RL) method for determining the optimal operation policies in multi-operator multi-reservoir systems with respect to fairness and efficiency criteria. As the first step to underline the utility of the GT-RL method in solving complex multi-agent multi-reservoir problems without a need for developing compound objectives and weight assignment, the proposed method is applied to a hypothetical three-agent three-reservoir system.

  7. Quantum-Enhanced Machine Learning

    NASA Astrophysics Data System (ADS)

    Dunjko, Vedran; Taylor, Jacob M.; Briegel, Hans J.

    2016-09-01

    The emerging field of quantum machine learning has the potential to substantially aid in the problems and scope of artificial intelligence. This is only enhanced by recent successes in the field of classical machine learning. In this work we propose an approach for the systematic treatment of machine learning, from the perspective of quantum information. Our approach is general and covers all three main branches of machine learning: supervised, unsupervised, and reinforcement learning. While quantum improvements in supervised and unsupervised learning have been reported, reinforcement learning has received much less attention. Within our approach, we tackle the problem of quantum enhancements in reinforcement learning as well, and propose a systematic scheme for providing improvements. As an example, we show that quadratic improvements in learning efficiency, and exponential improvements in performance over limited time periods, can be obtained for a broad class of learning problems.

  8. The drift diffusion model as the choice rule in reinforcement learning.

    PubMed

    Pedersen, Mads Lund; Frank, Michael J; Biele, Guido

    2017-08-01

    Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.

  9. Can model-free reinforcement learning explain deontological moral judgments?

    PubMed

    Ayars, Alisabeth

    2016-05-01

    Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework. Copyright © 2016 Elsevier B.V. All rights reserved.

  10. The drift diffusion model as the choice rule in reinforcement learning

    PubMed Central

    Frank, Michael J.

    2017-01-01

    Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyper-activity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups. PMID:27966103

  11. General functioning predicts reward and punishment learning in schizophrenia.

    PubMed

    Somlai, Zsuzsanna; Moustafa, Ahmed A; Kéri, Szabolcs; Myers, Catherine E; Gluck, Mark A

    2011-04-01

    Previous studies investigating feedback-driven reinforcement learning in patients with schizophrenia have provided mixed results. In this study, we explored the clinical predictors of reward and punishment learning using a probabilistic classification learning task. Patients with schizophrenia (n=40) performed similarly to healthy controls (n=30) on the classification learning task. However, more severe negative and general symptoms were associated with lower reward-learning performance, whereas poorer general psychosocial functioning was correlated with both lower reward- and punishment-learning performances. Multiple linear regression analyses indicated that general psychosocial functioning was the only significant predictor of reinforcement learning performance when education, antipsychotic dose, and positive, negative and general symptoms were included in the analysis. These results suggest a close relationship between reinforcement learning and general psychosocial functioning in schizophrenia. Published by Elsevier B.V.

  12. Agent-based traffic management and reinforcement learning in congested intersection network.

    DOT National Transportation Integrated Search

    2012-08-01

    This study evaluates the performance of traffic control systems based on reinforcement learning (RL), also called approximate dynamic programming (ADP). Two algorithms have been selected for testing: 1) Q-learning and 2) approximate dynamic programmi...

  13. Operant conditioning of enhanced pain sensitivity by heat-pain titration.

    PubMed

    Becker, Susanne; Kleinböhl, Dieter; Klossika, Iris; Hölzl, Rupert

    2008-11-15

    Operant conditioning mechanisms have been demonstrated to be important in the development of chronic pain. Most experimental studies have investigated the operant modulation of verbal pain reports with extrinsic reinforcement, such as verbal reinforcement. Whether this reflects actual changes in the subjective experience of the nociceptive stimulus remained unclear. This study replicates and extends our previous demonstration that enhanced pain sensitivity to prolonged heat-pain stimulation could be learned in healthy participants through intrinsic reinforcement (contingent changes in nociceptive input) independent of verbal pain reports. In addition, we examine whether different magnitudes of reinforcement differentially enhance pain sensitivity using an operant heat-pain titration paradigm. It is based on the previously developed non-verbal behavioral discrimination task for the assessment of sensitization, which uses discriminative down- or up-regulation of stimulus temperatures in response to changes in subjective intensity. In operant heat-pain titration, this discriminative behavior and not verbal pain report was contingently reinforced or punished by acute decreases or increases in heat-pain intensity. The magnitude of reinforcement was varied between three groups: low (N1=13), medium (N2=11) and high reinforcement (N3=12). Continuous reinforcement was applied to acquire and train the operant behavior, followed by partial reinforcement to analyze the underlying learning mechanisms. Results demonstrated that sensitization to prolonged heat-pain stimulation was enhanced by operant learning within 1h. The extent of sensitization was directly dependent on the received magnitude of reinforcement. Thus, operant learning mechanisms based on intrinsic reinforcement may provide an explanation for the gradual development of sustained hypersensitivity during pain that is becoming chronic.

  14. Effects of D-cycloserine on the extinction of appetitive operant learning

    PubMed Central

    Vurbic, Drina; Gold, Benjamin; Bouton, Mark E.

    2011-01-01

    Four experiments with rat subjects examined whether D-cycloserine (DCS), a partial NMDA agonist, facilitates the extinction of operant lever-pressing reinforced by food. Previous research has demonstrated that DCS facilitates extinction learning with methods that involve Pavlovian extinction. In the current experiments, operant conditioning occurred in Context A, extinction in Context B, and then testing occurred in both the extinction and conditioning contexts. Experiments 1a and 1b tested the effects of three doses of DCS (5, 15, and 30 mg/kg) on the extinction of lever pressing trained as a free operant. Experiment 2 examined their effects when extinction of the free operant was conducted in the presence of non-response-contingent deliveries of the reinforcer (which theoretically reduced the role of generalization decrement in suppressing responding). Experiment 3 examined their effects on extinction of a discriminated operant, i.e., one that had been reinforced in the presence of a discriminative stimulus, but not in its absence. A strong ABA renewal effect was observed in all four experiments during testing. However, despite the use of DCS doses and a drug administration procedure that facilitates the extinction of Pavlovian learning, there was no evidence in any experiment that DCS facilitated operant extinction learning assessed in either the extinction or the conditioning context. DCS may primarily facilitate learning processes that underlie Pavlovian, rather than purely operant, extinction. PMID:21688894

  15. Improving Robot Motor Learning with Negatively Valenced Reinforcement Signals

    PubMed Central

    Navarro-Guerrero, Nicolás; Lowe, Robert J.; Wermter, Stefan

    2017-01-01

    Both nociception and punishment signals have been used in robotics. However, the potential for using these negatively valenced types of reinforcement learning signals for robot learning has not been exploited in detail yet. Nociceptive signals are primarily used as triggers of preprogrammed action sequences. Punishment signals are typically disembodied, i.e., with no or little relation to the agent-intrinsic limitations, and they are often used to impose behavioral constraints. Here, we provide an alternative approach for nociceptive signals as drivers of learning rather than simple triggers of preprogrammed behavior. Explicitly, we use nociception to expand the state space while we use punishment as a negative reinforcement learning signal. We compare the performance—in terms of task error, the amount of perceived nociception, and length of learned action sequences—of different neural networks imbued with punishment-based reinforcement signals for inverse kinematic learning. We contrast the performance of a version of the neural network that receives nociceptive inputs to that without such a process. Furthermore, we provide evidence that nociception can improve learning—making the algorithm more robust against network initializations—as well as behavioral performance by reducing the task error, perceived nociception, and length of learned action sequences. Moreover, we provide evidence that punishment, at least as typically used within reinforcement learning applications, may be detrimental in all relevant metrics. PMID:28420976

  16. Place preference and vocal learning rely on distinct reinforcers in songbirds.

    PubMed

    Murdoch, Don; Chen, Ruidong; Goldberg, Jesse H

    2018-04-30

    In reinforcement learning (RL) agents are typically tasked with maximizing a single objective function such as reward. But it remains poorly understood how agents might pursue distinct objectives at once. In machines, multiobjective RL can be achieved by dividing a single agent into multiple sub-agents, each of which is shaped by agent-specific reinforcement, but it remains unknown if animals adopt this strategy. Here we use songbirds to test if navigation and singing, two behaviors with distinct objectives, can be differentially reinforced. We demonstrate that strobe flashes aversively condition place preference but not song syllables. Brief noise bursts aversively condition song syllables but positively reinforce place preference. Thus distinct behavior-generating systems, or agencies, within a single animal can be shaped by correspondingly distinct reinforcement signals. Our findings suggest that spatially segregated vocal circuits can solve a credit assignment problem associated with multiobjective learning.

  17. Dopamine-Dependent Reinforcement of Motor Skill Learning: Evidence from Gilles de la Tourette Syndrome

    ERIC Educational Resources Information Center

    Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias

    2011-01-01

    Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…

  18. Machine Learning Control For Highly Reconfigurable High-Order Systems

    DTIC Science & Technology

    2015-01-02

    develop and flight test a Reinforcement Learning based approach for autonomous tracking of ground targets using a fixed wing Unmanned...Reinforcement Learning - based algorithms are developed for learning agents’ time dependent dynamics while also learning to control them. Three algorithms...to a wide range of engineering- based problems . Implementation of these solutions, however, is often complicated by the hysteretic, non-linear,

  19. The Andragogy, the Social Change and the Transformative Learning Educational Approaches in Adult Education

    ERIC Educational Resources Information Center

    Giannoukos, Georgios; Besas, Georgios; Galiropoulos, Christos; Hioctour, Vasilios

    2015-01-01

    This paper is concerned with the methods and techniques used in adult education in order to allow the educator to successfully respond to suitable learning experiences on the part of the learner as well as to reinforce interaction between the learners. The strategies adopted, teaching aids and the choice of suitable teaching material also is…

  20. Cross-language opinion lexicon extraction using mutual-reinforcement label propagation.

    PubMed

    Lin, Zheng; Tan, Songbo; Liu, Yue; Cheng, Xueqi; Xu, Xueke

    2013-01-01

    There is a growing interest in automatically building opinion lexicon from sources such as product reviews. Most of these methods depend on abundant external resources such as WordNet, which limits the applicability of these methods. Unsupervised or semi-supervised learning provides an optional solution to multilingual opinion lexicon extraction. However, the datasets are imbalanced in different languages. For some languages, the high-quality corpora are scarce or hard to obtain, which limits the research progress. To solve the above problems, we explore a mutual-reinforcement label propagation framework. First, for each language, a label propagation algorithm is applied to a word relation graph, and then a bilingual dictionary is used as a bridge to transfer information between two languages. A key advantage of this model is its ability to make two languages learn from each other and boost each other. The experimental results show that the proposed approach outperforms baseline significantly.

  1. Cross-Language Opinion Lexicon Extraction Using Mutual-Reinforcement Label Propagation

    PubMed Central

    Lin, Zheng; Tan, Songbo; Liu, Yue; Cheng, Xueqi; Xu, Xueke

    2013-01-01

    There is a growing interest in automatically building opinion lexicon from sources such as product reviews. Most of these methods depend on abundant external resources such as WordNet, which limits the applicability of these methods. Unsupervised or semi-supervised learning provides an optional solution to multilingual opinion lexicon extraction. However, the datasets are imbalanced in different languages. For some languages, the high-quality corpora are scarce or hard to obtain, which limits the research progress. To solve the above problems, we explore a mutual-reinforcement label propagation framework. First, for each language, a label propagation algorithm is applied to a word relation graph, and then a bilingual dictionary is used as a bridge to transfer information between two languages. A key advantage of this model is its ability to make two languages learn from each other and boost each other. The experimental results show that the proposed approach outperforms baseline significantly. PMID:24260190

  2. Reinforcement and inference in cross-situational word learning.

    PubMed

    Tilles, Paulo F C; Fontanari, José F

    2013-01-01

    Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation) inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only) reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.

  3. Acquisition of Robotic Giant-swing Motion Using Reinforcement Learning and Its Consideration of Motion Forms

    NASA Astrophysics Data System (ADS)

    Sakai, Naoki; Kawabe, Naoto; Hara, Masayuki; Toyoda, Nozomi; Yabuta, Tetsuro

    This paper argues how a compact humanoid robot can acquire a giant-swing motion without any robotic models by using Q-Learning method. Generally, it is widely said that Q-Learning is not appropriated for learning dynamic motions because Markov property is not necessarily guaranteed during the dynamic task. However, we tried to solve this problem by embedding the angular velocity state into state definition and averaging Q-Learning method to reduce dynamic effects, although there remain non-Markov effects in the learning results. The result shows how the robot can acquire a giant-swing motion by using Q-Learning algorithm. The successful acquired motions are analyzed in the view point of dynamics in order to realize a functionally giant-swing motion. Finally, the result shows how this method can avoid the stagnant action loop at around the bottom of the horizontal bar during the early stage of giant-swing motion.

  4. Evolution with Reinforcement Learning in Negotiation

    PubMed Central

    Zou, Yi; Zhan, Wenjie; Shao, Yuan

    2014-01-01

    Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm. PMID:25048108

  5. Evolution with reinforcement learning in negotiation.

    PubMed

    Zou, Yi; Zhan, Wenjie; Shao, Yuan

    2014-01-01

    Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm.

  6. Overcoming Learned Helplessness in Community College Students.

    ERIC Educational Resources Information Center

    Roueche, John E.; Mink, Oscar G.

    1982-01-01

    Reviews research on the effects of repeated experiences of helplessness and on locus of control. Identifies conditions necessary for overcoming learned helplessness; i.e., the potential for learning to occur; consistent reinforcement; relevant, valued reinforcers; and favorable psychological situation. Recommends eight ways for teachers to…

  7. Dopamine D2 Receptor Signaling in the Nucleus Accumbens Comprises a Metabolic-Cognitive Brain Interface Regulating Metabolic Components of Glucose Reinforcement.

    PubMed

    Michaelides, Michael; Miller, Michael L; DiNieri, Jennifer A; Gomez, Juan L; Schwartz, Elizabeth; Egervari, Gabor; Wang, Gene Jack; Mobbs, Charles V; Volkow, Nora D; Hurd, Yasmin L

    2017-11-01

    Appetitive drive is influenced by coordinated interactions between brain circuits that regulate reinforcement and homeostatic signals that control metabolism. Glucose modulates striatal dopamine (DA) and regulates appetitive drive and reinforcement learning. Striatal DA D2 receptors (D2Rs) also regulate reinforcement learning and are implicated in glucose-related metabolic disorders. Nevertheless, interactions between striatal D2R and peripheral glucose have not been previously described. Here we show that manipulations involving striatal D2R signaling coincide with perseverative and impulsive-like responding for sucrose, a disaccharide consisting of fructose and glucose. Fructose conveys orosensory (ie, taste) reinforcement but does not convey metabolic (ie, nutrient-derived) reinforcement. Glucose however conveys orosensory reinforcement but unlike fructose, it is a major metabolic energy source, underlies sustained reinforcement, and activates striatal circuitry. We found that mice with deletion of dopamine- and cAMP-regulated neuronal phosphoprotein (DARPP-32) exclusively in D2R-expressing cells exhibited preferential D2R changes in the nucleus accumbens (NAc), a striatal region that critically regulates sucrose reinforcement. These changes coincided with perseverative and impulsive-like responding for sucrose pellets and sustained reinforcement learning of glucose-paired flavors. These mice were also characterized by significant glucose intolerance (ie, impaired glucose utilization). Systemic glucose administration significantly attenuated sucrose operant responding and D2R activation or blockade in the NAc bidirectionally modulated blood glucose levels and glucose tolerance. Collectively, these results implicate NAc D2R in regulating both peripheral glucose levels and glucose-dependent reinforcement learning behaviors and highlight the notion that glucose metabolic impairments arising from disrupted NAc D2R signaling are involved in compulsive and perseverative feeding behaviors.

  8. A junction-tree based learning algorithm to optimize network wide traffic control: A coordinated multi-agent framework

    DOE PAGES

    Zhu, Feng; Aziz, H. M. Abdul; Qian, Xinwu; ...

    2015-01-31

    Our study develops a novel reinforcement learning algorithm for the challenging coordinated signal control problem. Traffic signals are modeled as intelligent agents interacting with the stochastic traffic environment. The model is built on the framework of coordinated reinforcement learning. The Junction Tree Algorithm (JTA) based reinforcement learning is proposed to obtain an exact inference of the best joint actions for all the coordinated intersections. Moreover, the algorithm is implemented and tested with a network containing 18 signalized intersections in VISSIM. Finally, our results show that the JTA based algorithm outperforms independent learning (Q-learning), real-time adaptive learning, and fixed timing plansmore » in terms of average delay, number of stops, and vehicular emissions at the network level.« less

  9. Optimisation of cognitive performance in rodent operant (touchscreen) testing: Evaluation and effects of reinforcer strength.

    PubMed

    Phillips, Benjamin U; Heath, Christopher J; Ossowska, Zofia; Bussey, Timothy J; Saksida, Lisa M

    2017-09-01

    Operant testing is a widely used and highly effective method of studying cognition in rodents. Performance on such tasks is sensitive to reinforcer strength. It is therefore advantageous to select effective reinforcers to minimize training times and maximize experimental throughput. To quantitatively investigate the control of behavior by different reinforcers, performance of mice was tested with either strawberry milkshake or a known powerful reinforcer, super saccharin (1.5% or 2% (w/v) saccharin/1.5% (w/v) glucose/water mixture). Mice were tested on fixed (FR)- and progressive-ratio (PR) schedules in the touchscreen-operant testing system. Under an FR schedule, both the rate of responding and number of trials completed were higher in animals responding for strawberry milkshake versus super saccharin. Under a PR schedule, mice were willing to emit similar numbers of responses for strawberry milkshake and super saccharin; however, analysis of the rate of responding revealed a significantly higher rate of responding by animals reinforced with milkshake versus super saccharin. To determine the impact of reinforcer strength on cognitive performance, strawberry milkshake and super saccharin-reinforced animals were compared on a touchscreen visual discrimination task. Animals reinforced by strawberry milkshake were significantly faster to acquire the discrimination than animals reinforced by super saccharin. Taken together, these results suggest that strawberry milkshake is superior to super saccharin for operant behavioral testing and further confirms that the application of response rate analysis to multiple ratio tasks is a highly sensitive method for the detection of behavioral differences relevant to learning and motivation.

  10. Theoretical assumptions of Maffesoli's sensitivity and Problem-Based Learning in Nursing Education1

    PubMed Central

    Rodríguez-Borrego, María-Aurora; Nitschke, Rosane Gonçalves; do Prado, Marta Lenise; Martini, Jussara Gue; Guerra-Martín, María-Dolores; González-Galán, Carmen

    2014-01-01

    Objective understand the everyday and the imaginary of Nursing students in their knowledge socialization process through the Problem-Based Learning (PBL) strategy. Method Action Research, involving 86 students from the second year of an undergraduate Nursing program in Spain. A Critical Incident Questionnaire and Group interview were used. Thematic/categorical analysis, triangulation of researchers, subjects and techniques. Results the students signal the need to have a view from within, reinforcing the criticism against the schematic dualism; PBL allows one to learn how to be with the other, with his mechanical and organic solidarity; the feeling together, with its emphasis on learning to work in group and wanting to be close to the person taking care. Conclusions The great contradictions the protagonists of the process, that is, the students experience seem to express that group learning is not a form of gaining knowledge, as it makes them lose time to study. The daily, the execution time and the imaginary of how learning should be do not seem to have an intersection point in the use of Problem-Based Learning. The importance of focusing on the daily and the imaginary should be reinforced when we consider nursing education. PMID:25029064

  11. The Effects of Partial Reinforcement in the Acquisition and Extinction of Recurrent Serial Patterns.

    ERIC Educational Resources Information Center

    Dockstader, Steven L.

    The purpose of these 2 experiments was to determine whether sequential response pattern behavior is affected by partial reinforcement in the same way as other behavior systems. The first experiment investigated the partial reinforcement extinction effects (PREE) in a sequential concept learning task where subjects were required to learn a…

  12. Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning.

    PubMed

    Yang, Xiong; Liu, Derong; Wang, Ding; Wei, Qinglai

    2014-07-01

    In this paper, a reinforcement-learning-based direct adaptive control is developed to deliver a desired tracking performance for a class of discrete-time (DT) nonlinear systems with unknown bounded disturbances. We investigate multi-input-multi-output unknown nonaffine nonlinear DT systems and employ two neural networks (NNs). By using Implicit Function Theorem, an action NN is used to generate the control signal and it is also designed to cancel the nonlinearity of unknown DT systems, for purpose of utilizing feedback linearization methods. On the other hand, a critic NN is applied to estimate the cost function, which satisfies the recursive equations derived from heuristic dynamic programming. The weights of both the action NN and the critic NN are directly updated online instead of offline training. By utilizing Lyapunov's direct method, the closed-loop tracking errors and the NN estimated weights are demonstrated to be uniformly ultimately bounded. Two numerical examples are provided to show the effectiveness of the present approach. Copyright © 2014 Elsevier Ltd. All rights reserved.

  13. Microstimulation of the Human Substantia Nigra Alters Reinforcement Learning

    PubMed Central

    Ramayya, Ashwin G.; Misra, Amrit

    2014-01-01

    Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action–reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action–reward associations rather than stimulus–reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action–reward associations during reinforcement learning. PMID:24828643

  14. An Investigation of Ways to Reduce the Failure Rate of Student Pilots during Flying Training in the Royal Australian Air Force.

    DTIC Science & Technology

    1987-09-01

    Luthans (28) expanded the concept of learning as follows: 1. Learning involves a change, though not necessarily an improvement, in behaviour. Learning...that results in an unpleasant outcome is not likely to be repeated (36:244). Luthans and Kreitner (27) described the various forms of reinforcement as...four 33 alternatives (defined previously on page 24 and taken from Luthans ) of positive reinforcement, negative reinforcement, extinction and punishment

  15. Reinforcement learning solution for HJB equation arising in constrained optimal control problem.

    PubMed

    Luo, Biao; Wu, Huai-Ning; Huang, Tingwen; Liu, Derong

    2015-11-01

    The constrained optimal control problem depends on the solution of the complicated Hamilton-Jacobi-Bellman equation (HJBE). In this paper, a data-based off-policy reinforcement learning (RL) method is proposed, which learns the solution of the HJBE and the optimal control policy from real system data. One important feature of the off-policy RL is that its policy evaluation can be realized with data generated by other behavior policies, not necessarily the target policy, which solves the insufficient exploration problem. The convergence of the off-policy RL is proved by demonstrating its equivalence to the successive approximation approach. Its implementation procedure is based on the actor-critic neural networks structure, where the function approximation is conducted with linearly independent basis functions. Subsequently, the convergence of the implementation procedure with function approximation is also proved. Finally, its effectiveness is verified through computer simulations. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Better Care Teams: A Stepwise Skill Reinforcement Model.

    PubMed

    Christopher, Beth-Anne; Grantner, Mary; Coke, Lola A; Wideman, Marilyn; Kwakwa, Francis

    2016-06-01

    The Building Healthy Urban Communities initiative presents a path for organizations partnering to improve patient outcomes with continuing education (CE) as a key component. Components of the CE initiative included traditional CE delivery formats with an essential element of adaptability and new methods, with rigorous evaluation over time that included evaluation prior to the course, immediately following the CE session, 6 to 8 weeks after the CE session, and then subsequent monthly "testlets." Outcome measures were designed to allow for ongoing adaptation of content, reinforcement of key learning objectives, and use of innovative concordant testing and retrieval practice techniques. The results after 1 year of programming suggest the stepwise skill reinforcement model is effective for learning and is an efficient use of financial and human resources. More important, its design is one that could be adopted at low cost by organizations willing to work in close partnership. J Contin Educ Nurs. 2016;47(6):283-288. Copyright 2016, SLACK Incorporated.

  17. Instructional control of reinforcement learning: A behavioral and neurocomputational investigation

    PubMed Central

    Doll, Bradley B.; Jacobs, W. Jake; Sanfey, Alan G.; Frank, Michael J.

    2011-01-01

    Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is “overridden” at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract “Q-learning” and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a “confirmation bias” in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes. PMID:19595993

  18. How partial reinforcement of food cues affects the extinction and reacquisition of appetitive responses. A new model for dieting success?

    PubMed

    van den Akker, Karolien; Havermans, Remco C; Bouton, Mark E; Jansen, Anita

    2014-10-01

    Animals and humans can easily learn to associate an initially neutral cue with food intake through classical conditioning, but extinction of learned appetitive responses can be more difficult. Intermittent or partial reinforcement of food cues causes especially persistent behaviour in animals: after exposure to such learning schedules, the decline in responding that occurs during extinction is slow. After extinction, increases in responding with renewed reinforcement of food cues (reacquisition) might be less rapid after acquisition with partial reinforcement. In humans, it may be that the eating behaviour of some individuals resembles partial reinforcement schedules to a greater extent, possibly affecting dieting success by interacting with extinction and reacquisition. Furthermore, impulsivity has been associated with less successful dieting, and this association might be explained by impulsivity affecting the learning and extinction of appetitive responses. In the present two studies, the effects of different reinforcement schedules and impulsivity on the acquisition, extinction, and reacquisition of appetitive responses were investigated in a conditioning paradigm involving food rewards in healthy humans. Overall, the results indicate both partial reinforcement schedules and, possibly, impulsivity to be associated with worse extinction performance. A new model of dieting success is proposed: learning histories and, perhaps, certain personality traits (impulsivity) can interfere with the extinction and reacquisition of appetitive responses to food cues and they may be causally related to unsuccessful dieting. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. Radical Conversations: Part Two--Cultivating Social-Constructivist Learning Methods in ABE Classrooms

    ERIC Educational Resources Information Center

    Muth, Bill; Kiser, Madeline

    2008-01-01

    In many U.S. prisons an overuse of individualized instruction silences literacy learners and reinforces oppressive notions about what knowledge is and whose knowledge counts. In these classrooms, methods that invite learners to tap their background knowledge, reflect on their worlds, and dialogue with others to construct meaning--commonplace in…

  20. THE WORLD OF TEACHING MACHINES, PROGRAMED LEARNING AND SELF-INSTRUCTIONAL DEVICES.

    ERIC Educational Resources Information Center

    FOLTZ, CHARLES I.

    TEACHING MACHINES HAVE SEVERAL ADVANTAGES--TIME IS SAVED, CORRECT RESPONSE IS REINFORCED IMMEDIATELY, AND STUDENTS ARE NOT PUBLICLY CONFRONTED WITH FAILURE. THERE ARE TWO APPROACHES TO THE PROGRAMED INSTRUCTION INTRINSIC TO THEIR USE--ONE (THE SKINNER METHOD) IS BASED ON FILLING IN BLANK SPACES, THE OTHER (THE CROWDER METHOD) EMPLOYS…

  1. Teaching English through Action: Total Physical Response (T.P.R.). A Right-Brain/Left-Brain Approach to Language Acquisition. A Workshop.

    ERIC Educational Resources Information Center

    Segal, Bertha E.

    Materials from a teacher workshop on the Total Physical Response method for teaching English as a second language are presented. The technique describes the process of first language acquisition, uses physical activities in the classroom to reinforce learning, and allows a long period of receptive language learning before requiring production. The…

  2. Regulating recognition decisions through incremental reinforcement learning.

    PubMed

    Han, Sanghoon; Dobbins, Ian G

    2009-06-01

    Does incremental reinforcement learning influence recognition memory judgments? We examined this question by subtly altering the relative validity or availability of feedback in order to differentially reinforce old or new recognition judgments. Experiment 1 probabilistically and incorrectly indicated that either misses or false alarms were correct in the context of feedback that was otherwise accurate. Experiment 2 selectively withheld feedback for either misses or false alarms in the context of feedback that was otherwise present. Both manipulations caused prominent shifts of recognition memory decision criteria that remained for considerable periods even after feedback had been altogether removed. Overall, these data demonstrate that incremental reinforcement-learning mechanisms influence the degree of caution subjects exercise when evaluating explicit memories.

  3. Infant Contingency Learning in Different Cultural Contexts

    ERIC Educational Resources Information Center

    Graf, Frauke; Lamm, Bettina; Goertz, Claudia; Kolling, Thorsten; Freitag, Claudia; Spangler, Sibylle; Fassbender, Ina; Teubert, Manuel; Vierhaus, Marc; Keller, Heidi; Lohaus, Arnold; Schwarzer, Gudrun; Knopf, Monika

    2012-01-01

    Three-month-old Cameroonian Nso farmer and German middle-class infants were compared regarding learning and retention in a computerized mobile task. Infants achieving a preset learning criterion during reinforcement were tested for immediate and long-term retention measured in terms of an increased response rate after reinforcement and after a…

  4. Adaptive Educational Software by Applying Reinforcement Learning

    ERIC Educational Resources Information Center

    Bennane, Abdellah

    2013-01-01

    The introduction of the intelligence in teaching software is the object of this paper. In software elaboration process, one uses some learning techniques in order to adapt the teaching software to characteristics of student. Generally, one uses the artificial intelligence techniques like reinforcement learning, Bayesian network in order to adapt…

  5. Blended learning for reinforcing dental pharmacology in the clinical years: A qualitative analysis

    PubMed Central

    Eachempati, Prashanti; Kiran Kumar, K. S.; Sumanth, K. N.

    2016-01-01

    Objectives: Blended learning has become the method of choice in educational institutions because of its systematic integration of traditional classroom teaching and online components. This study aims to analyze student’s reflection regarding blended learning in dental pharmacology. Subjects and Methods: A cross-sectional study was conducted in Faculty of Dentistry, Melaka-Manipal Medical College among 3rd and 4th year BDS students. A total of 145 dental students, who consented, participate in the study. Students were divided into 14 groups. Nine online sessions followed by nine face-to-face discussions were held. Each session addressed topics related to oral lesions and orofacial pain with pharmacological applications. After each week, students were asked to reflect on blended learning. On completion of 9 weeks, reflections were collected and analyzed. Statistical Analysis: Qualitative analysis was done using thematic analysis model suggested by Braun and Clarke. Results: The four main themes were identified, namely, merits of blended learning, skill in writing prescription for oral diseases, dosages of drugs, and identification of strengths and weakness. In general, the participants had a positive feedback regarding blended learning. Students felt more confident in drug selection and prescription writing. They could recollect the doses better after the online and face-to-face sessions. Most interestingly, the students reflected that they are able to identify their strength and weakness after the blended learning sessions. Conclusions: Blended learning module was successfully implemented for reinforcing dental pharmacology. The results obtained in this study enable us to plan future comparative studies to know the effectiveness of blended learning in dental pharmacology. PMID:28031603

  6. Impedance learning for robotic contact tasks using natural actor-critic algorithm.

    PubMed

    Kim, Byungchan; Park, Jooyoung; Park, Shinsuk; Kang, Sungchul

    2010-04-01

    Compared with their robotic counterparts, humans excel at various tasks by using their ability to adaptively modulate arm impedance parameters. This ability allows us to successfully perform contact tasks even in uncertain environments. This paper considers a learning strategy of motor skill for robotic contact tasks based on a human motor control theory and machine learning schemes. Our robot learning method employs impedance control based on the equilibrium point control theory and reinforcement learning to determine the impedance parameters for contact tasks. A recursive least-square filter-based episodic natural actor-critic algorithm is used to find the optimal impedance parameters. The effectiveness of the proposed method was tested through dynamic simulations of various contact tasks. The simulation results demonstrated that the proposed method optimizes the performance of the contact tasks in uncertain conditions of the environment.

  7. Online Bahavior Aquisition of an Agent based on Coaching as Learning Assistance

    NASA Astrophysics Data System (ADS)

    Hirokawa, Masakazu; Suzuki, Kenji

    This paper describes a novel methodology, namely ``Coaching'', which allows humans to give a subjective evaluation to an agent in an iterative manner. This is an interactive learning method to improve the reinforcement learning by modifying a reward function dynamically according to given evaluations by a trainer and the learning situation of the agent. We demonstrate that the agent can learn different reward functions by given instructions such as ``good or bad'' by human's observation, and can also obtain a set of behavior based on the learnt reward functions through several experiments.

  8. A spiking neural network model of model-free reinforcement learning with high-dimensional sensory input and perceptual ambiguity.

    PubMed

    Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji

    2015-01-01

    A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach.

  9. Punishment insensitivity and impaired reinforcement learning in preschoolers.

    PubMed

    Briggs-Gowan, Margaret J; Nichols, Sara R; Voss, Joel; Zobel, Elvira; Carter, Alice S; McCarthy, Kimberly J; Pine, Daniel S; Blair, James; Wakschlag, Lauren S

    2014-01-01

    Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a developmental vulnerability to psychopathic traits. One hundred and fifty-seven preschoolers (mean age 4.7 ± 0.8 years) participated in a substudy that was embedded within a larger project. Children completed the 'Stars-in-Jars' task, which involved learning to select rewarded jars and avoid punished jars. Maternal report of responsiveness to socialization was assessed with the Punishment Insensitivity and Low Concern for Others scales of the Multidimensional Assessment of Preschool Disruptive Behavior (MAP-DB). Punishment Insensitivity, but not Low Concern for Others, was significantly associated with reinforcement learning in multivariate models that accounted for age and sex. Specifically, higher Punishment Insensitivity was associated with significantly lower overall performance and more errors on punished trials ('passive avoidance'). Impairments in reinforcement learning manifest in preschoolers who are high in maternal ratings of Punishment Insensitivity. If replicated, these findings may help to pinpoint the neurodevelopmental antecedents of psychopathic tendencies and suggest novel intervention targets beginning in early childhood. © 2013 The Authors. Journal of Child Psychology and Psychiatry © 2013 Association for Child and Adolescent Mental Health.

  10. A Spiking Neural Network Model of Model-Free Reinforcement Learning with High-Dimensional Sensory Input and Perceptual Ambiguity

    PubMed Central

    Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji

    2015-01-01

    A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach. PMID:25734662

  11. Rats bred for helplessness exhibit positive reinforcement learning deficits which are not alleviated by an antidepressant dose of the MAO-B inhibitor deprenyl.

    PubMed

    Schulz, Daniela; Henn, Fritz A; Petri, David; Huston, Joseph P

    2016-08-04

    Principles of negative reinforcement learning may play a critical role in the etiology and treatment of depression. We examined the integrity of positive reinforcement learning in congenitally helpless (cH) rats, an animal model of depression, using a random ratio schedule and a devaluation-extinction procedure. Furthermore, we tested whether an antidepressant dose of the monoamine oxidase (MAO)-B inhibitor deprenyl would reverse any deficits in positive reinforcement learning. We found that cH rats (n=9) were impaired in the acquisition of even simple operant contingencies, such as a fixed interval (FI) 20 schedule. cH rats exhibited no apparent deficits in appetite or reward sensitivity. They reacted to the devaluation of food in a manner consistent with a dose-response relationship. Reinforcer motivation as assessed by lever pressing across sessions with progressively decreasing reward probabilities was highest in congenitally non-helpless (cNH, n=10) rats as long as the reward probabilities remained relatively high. cNH compared to wild-type (n=10) rats were also more resistant to extinction across sessions. Compared to saline (n=5), deprenyl (n=5) reduced the duration of immobility of cH rats in the forced swimming test, indicative of antidepressant effects, but did not restore any deficits in the acquisition of a FI 20 schedule. We conclude that positive reinforcement learning was impaired in rats bred for helplessness, possibly due to motivational impairments but not deficits in reward sensitivity, and that deprenyl exerted antidepressant effects but did not reverse the deficits in positive reinforcement learning. Copyright © 2016 IBRO. Published by Elsevier Ltd. All rights reserved.

  12. Antipsychotic dose modulates behavioral and neural responses to feedback during reinforcement learning in schizophrenia.

    PubMed

    Insel, Catherine; Reinen, Jenna; Weber, Jochen; Wager, Tor D; Jarskog, L Fredrik; Shohamy, Daphna; Smith, Edward E

    2014-03-01

    Schizophrenia is characterized by an abnormal dopamine system, and dopamine blockade is the primary mechanism of antipsychotic treatment. Consistent with the known role of dopamine in reward processing, prior research has demonstrated that patients with schizophrenia exhibit impairments in reward-based learning. However, it remains unknown how treatment with antipsychotic medication impacts the behavioral and neural signatures of reinforcement learning in schizophrenia. The goal of this study was to examine whether antipsychotic medication modulates behavioral and neural responses to prediction error coding during reinforcement learning. Patients with schizophrenia completed a reinforcement learning task while undergoing functional magnetic resonance imaging. The task consisted of two separate conditions in which participants accumulated monetary gain or avoided monetary loss. Behavioral results indicated that antipsychotic medication dose was associated with altered behavioral approaches to learning, such that patients taking higher doses of medication showed increased sensitivity to negative reinforcement. Higher doses of antipsychotic medication were also associated with higher learning rates (LRs), suggesting that medication enhanced sensitivity to trial-by-trial feedback. Neuroimaging data demonstrated that antipsychotic dose was related to differences in neural signatures of feedback prediction error during the loss condition. Specifically, patients taking higher doses of medication showed attenuated prediction error responses in the striatum and the medial prefrontal cortex. These findings indicate that antipsychotic medication treatment may influence motivational processes in patients with schizophrenia.

  13. Effects of D-cycloserine on the extinction of appetitive operant learning.

    PubMed

    Vurbic, Drina; Gold, Benjamin; Bouton, Mark E

    2011-08-01

    Four experiments with rat subjects examined whether D-cycloserine (DCS), a partial NMDA agonist, facilitates the extinction of operant lever-pressing reinforced by food. Previous research has demonstrated that DCS facilitates extinction learning with methods that involve Pavlovian extinction. In the current experiments, operant conditioning occurred in Context A, extinction in Context B, and then testing occurred in both the extinction and conditioning contexts. Experiments 1A and 1B tested the effects of three doses of DCS (5, 15, and 30 mg/kg) on the extinction of lever pressing trained as a free operant. Experiment 2 examined their effects when extinction of the free operant was conducted in the presence of nonresponse-contingent deliveries of the reinforcer (that theoretically reduced the role of generalization decrement in suppressing responding). Experiment 3 examined their effects on extinction of a discriminated operant, that is, one that had been reinforced in the presence of a discriminative stimulus, but not in its absence. A strong ABA renewal effect was observed in all four experiments during testing. However, despite the use of DCS doses and a drug administration procedure that facilitates the extinction of Pavlovian learning, there was no evidence in any experiment that DCS facilitated operant extinction learning assessed in either the extinction or the conditioning context. DCS may primarily facilitate learning processes that underlie Pavlovian, rather than purely operant, extinction. (PsycINFO Database Record (c) 2011 APA, all rights reserved).

  14. How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

    PubMed

    Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C

    2014-03-01

    Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.

  15. Agent-based real-time signal coordination in congested networks.

    DOT National Transportation Integrated Search

    2014-01-01

    This study is the continuation of a previous NEXTRANS study on agent-based reinforcement : learning methods for signal coordination in congested networks. In the previous study, the : formulation of a real-time agent-based traffic signal control in o...

  16. Microstimulation of the human substantia nigra alters reinforcement learning.

    PubMed

    Ramayya, Ashwin G; Misra, Amrit; Baltuch, Gordon H; Kahana, Michael J

    2014-05-14

    Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action-reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action-reward associations rather than stimulus-reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action-reward associations during reinforcement learning. Copyright © 2014 the authors 0270-6474/14/346887-09$15.00/0.

  17. Batch Mode Reinforcement Learning based on the Synthesis of Artificial Trajectories

    PubMed Central

    Fonteneau, Raphael; Murphy, Susan A.; Wehenkel, Louis; Ernst, Damien

    2013-01-01

    In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value function. As an alternative to the use of function approximators, we rely on the synthesis of “artificial trajectories” from the given sample of trajectories, and show that this idea opens new avenues for designing and analyzing algorithms for batch mode reinforcement learning. PMID:24049244

  18. From free energy to expected energy: Improving energy-based value function approximation in reinforcement learning.

    PubMed

    Elfwing, Stefan; Uchibe, Eiji; Doya, Kenji

    2016-12-01

    Free-energy based reinforcement learning (FERL) was proposed for learning in high-dimensional state and action spaces. However, the FERL method does only really work well with binary, or close to binary, state input, where the number of active states is fewer than the number of non-active states. In the FERL method, the value function is approximated by the negative free energy of a restricted Boltzmann machine (RBM). In our earlier study, we demonstrated that the performance and the robustness of the FERL method can be improved by scaling the free energy by a constant that is related to the size of network. In this study, we propose that RBM function approximation can be further improved by approximating the value function by the negative expected energy (EERL), instead of the negative free energy, as well as being able to handle continuous state input. We validate our proposed method by demonstrating that EERL: (1) outperforms FERL, as well as standard neural network and linear function approximation, for three versions of a gridworld task with high-dimensional image state input; (2) achieves new state-of-the-art results in stochastic SZ-Tetris in both model-free and model-based learning settings; and (3) significantly outperforms FERL and standard neural network function approximation for a robot navigation task with raw and noisy RGB images as state input and a large number of actions. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  19. Human reinforcement learning subdivides structured action spaces by learning effector-specific values

    PubMed Central

    Gershman, Samuel J.; Pesaran, Bijan; Daw, Nathaniel D.

    2009-01-01

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable, due to the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning – such as prediction error signals for action valuation associated with dopamine and the striatum – can cope with this “curse of dimensionality.” We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and BOLD activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to “divide and conquer” reinforcement learning over high-dimensional action spaces. PMID:19864565

  20. Human reinforcement learning subdivides structured action spaces by learning effector-specific values.

    PubMed

    Gershman, Samuel J; Pesaran, Bijan; Daw, Nathaniel D

    2009-10-28

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.

  1. Construction of multi-agent mobile robots control system in the problem of persecution with using a modified reinforcement learning method based on neural networks

    NASA Astrophysics Data System (ADS)

    Patkin, M. L.; Rogachev, G. N.

    2018-02-01

    A method for constructing a multi-agent control system for mobile robots based on training with reinforcement using deep neural networks is considered. Synthesis of the management system is proposed to be carried out with reinforcement training and the modified Actor-Critic method, in which the Actor module is divided into Action Actor and Communication Actor in order to simultaneously manage mobile robots and communicate with partners. Communication is carried out by sending partners at each step a vector of real numbers that are added to the observation vector and affect the behaviour. Functions of Actors and Critic are approximated by deep neural networks. The Critics value function is trained by using the TD-error method and the Actor’s function by using DDPG. The Communication Actor’s neural network is trained through gradients received from partner agents. An environment in which a cooperative multi-agent interaction is present was developed, computer simulation of the application of this method in the control problem of two robots pursuing two goals was carried out.

  2. Separation of time-based and trial-based accounts of the partial reinforcement extinction effect.

    PubMed

    Bouton, Mark E; Woods, Amanda M; Todd, Travis P

    2014-01-01

    Two appetitive conditioning experiments with rats examined time-based and trial-based accounts of the partial reinforcement extinction effect (PREE). In the PREE, the loss of responding that occurs in extinction is slower when the conditioned stimulus (CS) has been paired with a reinforcer on some of its presentations (partially reinforced) instead of every presentation (continuously reinforced). According to a time-based or "time-accumulation" view (e.g., Gallistel and Gibbon, 2000), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger amount of time has accumulated in the CS over trials. In contrast, according to a trial-based view (e.g., Capaldi, 1967), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger number of CS presentations. Experiment 1 used a procedure that equated partially and continuously reinforced groups on their expected times to reinforcement during conditioning. A PREE was still observed. Experiment 2 then used an extinction procedure that allowed time in the CS and the number of trials to accumulate differentially through extinction. The PREE was still evident when responding was examined as a function of expected time units to the reinforcer, but was eliminated when responding was examined as a function of expected trial units to the reinforcer. There was no evidence that the animal responded according to the ratio of time accumulated during the CS in extinction over the time in the CS expected before the reinforcer. The results thus favor a trial-based account over a time-based account of extinction and the PREE. This article is part of a Special Issue entitled: Associative and Temporal Learning. Copyright © 2013 Elsevier B.V. All rights reserved.

  3. Learning to Predict Consequences as a Method of Knowledge Transfer in Reinforcement Learning.

    PubMed

    Chalmers, Eric; Contreras, Edgar Bermudez; Robertson, Brandon; Luczak, Artur; Gruber, Aaron

    2017-04-17

    The reinforcement learning (RL) paradigm allows agents to solve tasks through trial-and-error learning. To be capable of efficient, long-term learning, RL agents should be able to apply knowledge gained in the past to new tasks they may encounter in the future. The ability to predict actions' consequences may facilitate such knowledge transfer. We consider here domains where an RL agent has access to two kinds of information: agent-centric information with constant semantics across tasks, and environment-centric information, which is necessary to solve the task, but with semantics that differ between tasks. For example, in robot navigation, environment-centric information may include the robot's geographic location, while agent-centric information may include sensor readings of various nearby obstacles. We propose that these situations provide an opportunity for a very natural style of knowledge transfer, in which the agent learns to predict actions' environmental consequences using agent-centric information. These predictions contain important information about the affordances and dangers present in a novel environment, and can effectively transfer knowledge from agent-centric to environment-centric learning systems. Using several example problems including spatial navigation and network routing, we show that our knowledge transfer approach can allow faster and lower cost learning than existing alternatives.

  4. Autonomous reinforcement learning with experience replay.

    PubMed

    Wawrzyński, Paweł; Tanwani, Ajay Kumar

    2013-05-01

    This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. Copyright © 2012 Elsevier Ltd. All rights reserved.

  5. Electrophysiological correlates of reinforcement learning in young people with Tourette syndrome with and without co-occurring ADHD symptoms.

    PubMed

    Shephard, Elizabeth; Jackson, Georgina M; Groom, Madeleine J

    2016-06-01

    Altered reinforcement learning is implicated in the causes of Tourette syndrome (TS) and attention-deficit/hyperactivity disorder (ADHD). TS and ADHD frequently co-occur but how this affects reinforcement learning has not been investigated. We examined the ability of young people with TS (n=18), TS+ADHD (N=17), ADHD (n=13) and typically developing controls (n=20) to learn and reverse stimulus-response (S-R) associations based on positive and negative reinforcement feedback. We used a 2 (TS-yes, TS-no)×2 (ADHD-yes, ADHD-no) factorial design to assess the effects of TS, ADHD, and their interaction on behavioural (accuracy, RT) and event-related potential (stimulus-locked P3, feedback-locked P2, feedback-related negativity, FRN) indices of learning and reversing the S-R associations. TS was associated with intact learning and reversal performance and largely typical ERP amplitudes. ADHD was associated with lower accuracy during S-R learning and impaired reversal learning (significantly reduced accuracy and a trend for smaller P3 amplitude). The results indicate that co-occurring ADHD symptoms impair reversal learning in TS+ADHD. The implications of these findings for behavioural tic therapies are discussed. Copyright © 2016 ISDN. Published by Elsevier Ltd. All rights reserved.

  6. Embedded Incremental Feature Selection for Reinforcement Learning

    DTIC Science & Technology

    2012-05-01

    Prior to this work, feature selection for reinforce- ment learning has focused on linear value function ap- proximation ( Kolter and Ng, 2009; Parr et al...InProceed- ings of the the 23rd International Conference on Ma- chine Learning, pages 449–456. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature

  7. Social Learning, Reinforcement and Crime: Evidence from Three European Cities

    ERIC Educational Resources Information Center

    Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina

    2012-01-01

    This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…

  8. Novelty and Inductive Generalization in Human Reinforcement Learning

    PubMed Central

    Gershman, Samuel J.; Niv, Yael

    2015-01-01

    In reinforcement learning, a decision maker searching for the most rewarding option is often faced with the question: what is the value of an option that has never been tried before? One way to frame this question is as an inductive problem: how can I generalize my previous experience with one set of options to a novel option? We show how hierarchical Bayesian inference can be used to solve this problem, and describe an equivalence between the Bayesian model and temporal difference learning algorithms that have been proposed as models of reinforcement learning in humans and animals. According to our view, the search for the best option is guided by abstract knowledge about the relationships between different options in an environment, resulting in greater search efficiency compared to traditional reinforcement learning algorithms previously applied to human cognition. In two behavioral experiments, we test several predictions of our model, providing evidence that humans learn and exploit structured inductive knowledge to make predictions about novel options. In light of this model, we suggest a new interpretation of dopaminergic responses to novelty. PMID:25808176

  9. Learning with incomplete information and the mathematical structure behind it.

    PubMed

    Kühn, Reimer; Stamatescu, Ion-Olimpiu

    2007-07-01

    We investigate the problem of learning with incomplete information as exemplified by learning with delayed reinforcement. We study a two phase learning scenario in which a phase of Hebbian associative learning based on momentary internal representations is supplemented by an 'unlearning' phase depending on a graded reinforcement signal. The reinforcement signal quantifies the success-rate globally for a number of learning steps in phase one, and 'unlearning' is indiscriminate with respect to associations learnt in that phase. Learning according to this model is studied via simulations and analytically within a student-teacher scenario for both single layer networks and, for a committee machine. Success and speed of learning depend on the ratio lambda of the learning rates used for the associative Hebbian learning phase and for the unlearning-correction in response to the reinforcement signal, respectively. Asymptotically perfect generalization is possible only, if this ratio exceeds a critical value lambda( c ), in which case the generalization error exhibits a power law decay with the number of examples seen by the student, with an exponent that depends in a non-universal manner on the parameter lambda. We find these features to be robust against a wide spectrum of modifications of microscopic modelling details. Two illustrative applications-one of a robot learning to navigate a field containing obstacles, and the problem of identifying a specific component in a collection of stimuli-are also provided.

  10. Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity, and Performance

    DTIC Science & Technology

    2014-09-29

    Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity , and Performance W. Bradley Knox...positive a trainer’s reward values are; temporal discounting, the extent to which future reward is discounted in value; episodicity , whether task...learning occurs in discrete learning episodes instead of one continuing session; and task performance, the agent’s performance on the task the trainer

  11. Applications of operant learning theory to the management of challenging behavior after traumatic brain injury.

    PubMed

    Wood, Rodger Ll; Alderman, Nick

    2011-01-01

    For more than 3 decades, interventions derived from learning theory have been delivered within a neurobehavioral framework to manage challenging behavior after traumatic brain injury with the aim of promoting engagement in the rehabilitation process and ameliorating social handicap. Learning theory provides a conceptual structure that facilitates our ability to understand the relationship between challenging behavior and environmental contingencies, while accommodating the constraints upon learning imposed by impaired cognition. Interventions derived from operant learning theory have most frequently been described in the literature because this method of associational learning provides good evidence for the effectiveness of differential reinforcement methods. This article therefore examines the efficacy of applying operant learning theory to manage challenging behavior after TBI as well as some of the limitations of this approach. Future developments in the application of learning theory are also considered.

  12. Development of Critical Thinking through Aesthetic Experience: The Case of Students of an Educational Department

    ERIC Educational Resources Information Center

    Raikou, Natassa

    2016-01-01

    This article addresses an application performed in tertiary education--a department of pedagogical and educational sciences--of a contemporary method, Transformative Learning through Aesthetic Experience. The method is based on the use of art and aims to reinforce and promote the development of critical thinking within educational settings.…

  13. Reinforcement learning algorithms for robotic navigation in dynamic environments.

    PubMed

    Yen, Gary G; Hickey, Travis W

    2004-04-01

    The purpose of this study was to examine improvements to reinforcement learning (RL) algorithms in order to successfully interact within dynamic environments. The scope of the research was that of RL algorithms as applied to robotic navigation. Proposed improvements include: addition of a forgetting mechanism, use of feature based state inputs, and hierarchical structuring of an RL agent. Simulations were performed to evaluate the individual merits and flaws of each proposal, to compare proposed methods to prior established methods, and to compare proposed methods to theoretically optimal solutions. Incorporation of a forgetting mechanism did considerably improve the learning times of RL agents in a dynamic environment. However, direct implementation of a feature-based RL agent did not result in any performance enhancements, as pure feature-based navigation results in a lack of positional awareness, and the inability of the agent to determine the location of the goal state. Inclusion of a hierarchical structure in an RL agent resulted in significantly improved performance, specifically when one layer of the hierarchy included a feature-based agent for obstacle avoidance, and a standard RL agent for global navigation. In summary, the inclusion of a forgetting mechanism, and the use of a hierarchically structured RL agent offer substantially increased performance when compared to traditional RL agents navigating in a dynamic environment.

  14. Proactivity and Reinforcement: The Contingency of Social Behavior

    ERIC Educational Resources Information Center

    Williams, J. Sherwood; And Others

    1976-01-01

    This paper analyzes development of group structure in terms of the stimulus-sampling perspective. Learning is the continual sampling of possibilities, with those reinforced possibilities increasing in probability of occurance. This contingency learning approach is tested experimentally. (NG)

  15. Machine Learning Approaches for Clinical Psychology and Psychiatry.

    PubMed

    Dwyer, Dominic B; Falkai, Peter; Koutsouleris, Nikolaos

    2018-05-07

    Machine learning approaches for clinical psychology and psychiatry explicitly focus on learning statistical functions from multidimensional data sets to make generalizable predictions about individuals. The goal of this review is to provide an accessible understanding of why this approach is important for future practice given its potential to augment decisions associated with the diagnosis, prognosis, and treatment of people suffering from mental illness using clinical and biological data. To this end, the limitations of current statistical paradigms in mental health research are critiqued, and an introduction is provided to critical machine learning methods used in clinical studies. A selective literature review is then presented aiming to reinforce the usefulness of machine learning methods and provide evidence of their potential. In the context of promising initial results, the current limitations of machine learning approaches are addressed, and considerations for future clinical translation are outlined.

  16. Altered neural encoding of prediction errors in assault-related posttraumatic stress disorder.

    PubMed

    Ross, Marisa C; Lenow, Jennifer K; Kilts, Clinton D; Cisler, Josh M

    2018-05-12

    Posttraumatic stress disorder (PTSD) is widely associated with deficits in extinguishing learned fear responses, which relies on mechanisms of reinforcement learning (e.g., updating expectations based on prediction errors). However, the degree to which PTSD is associated with impairments in general reinforcement learning (i.e., outside of the context of fear stimuli) remains poorly understood. Here, we investigate brain and behavioral differences in general reinforcement learning between adult women with and without a current diagnosis of PTSD. 29 adult females (15 PTSD with exposure to assaultive violence, 14 controls) underwent a neutral reinforcement-learning task (i.e., two arm bandit task) during fMRI. We modeled participant behavior using different adaptations of the Rescorla-Wagner (RW) model and used Independent Component Analysis to identify timecourses for large-scale a priori brain networks. We found that an anticorrelated and risk sensitive RW model best fit participant behavior, with no differences in computational parameters between groups. Women in the PTSD group demonstrated significantly less neural encoding of prediction errors in both a ventral striatum/mPFC and anterior insula network compared to healthy controls. Weakened encoding of prediction errors in the ventral striatum/mPFC and anterior insula during a general reinforcement learning task, outside of the context of fear stimuli, suggests the possibility of a broader conceptualization of learning differences in PTSD than currently proposed in current neurocircuitry models of PTSD. Copyright © 2018 Elsevier Ltd. All rights reserved.

  17. Automated Inattention and Fatigue Detection System in Distance Education for Elementary School Students

    ERIC Educational Resources Information Center

    Hwang, Kuo-An; Yang, Chia-Hao

    2009-01-01

    Most courses based on distance learning focus on the cognitive domain of learning. Because students are sometimes inattentive or tired, they may neglect the attention goal of learning. This study proposes an auto-detection and reinforcement mechanism for the distance-education system based on the reinforcement teaching strategy. If a student is…

  18. When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition

    ERIC Educational Resources Information Center

    Janssen, Christian P.; Gray, Wayne D.

    2012-01-01

    Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other…

  19. Parallel Online Temporal Difference Learning for Motor Control.

    PubMed

    Caarls, Wouter; Schuitema, Erik

    2016-07-01

    Temporal difference (TD) learning, a key concept in reinforcement learning, is a popular method for solving simulated control problems. However, in real systems, this method is often avoided in favor of policy search methods because of its long learning time. But policy search suffers from its own drawbacks, such as the necessity of informed policy parameterization and initialization. In this paper, we show that TD learning can work effectively in real robotic systems as well, using parallel model learning and planning. Using locally weighted linear regression and trajectory sampled planning with 14 concurrent threads, we can achieve a speedup of almost two orders of magnitude over regular TD control on simulated control benchmarks. For a real-world pendulum swing-up task and a two-link manipulator movement task, we report a speedup of 20× to 60× , with a real-time learning speed of less than half a minute. The results are competitive with state-of-the-art policy search.

  20. Dissociating error-based and reinforcement-based loss functions during sensorimotor learning

    PubMed Central

    McGregor, Heather R.; Mohatarem, Ayman

    2017-01-01

    It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback. PMID:28753634

  1. Dissociating error-based and reinforcement-based loss functions during sensorimotor learning.

    PubMed

    Cashaback, Joshua G A; McGregor, Heather R; Mohatarem, Ayman; Gribble, Paul L

    2017-07-01

    It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback.

  2. Trading Rules on Stock Markets Using Genetic Network Programming with Reinforcement Learning and Importance Index

    NASA Astrophysics Data System (ADS)

    Mabu, Shingo; Hirasawa, Kotaro; Furuzuki, Takayuki

    Genetic Network Programming (GNP) is an evolutionary computation which represents its solutions using graph structures. Since GNP can create quite compact programs and has an implicit memory function, it has been clarified that GNP works well especially in dynamic environments. In addition, a study on creating trading rules on stock markets using GNP with Importance Index (GNP-IMX) has been done. IMX is a new element which is a criterion for decision making. In this paper, we combined GNP-IMX with Actor-Critic (GNP-IMX&AC) and create trading rules on stock markets. Evolution-based methods evolve their programs after enough period of time because they must calculate fitness values, however reinforcement learning can change programs during the period, therefore the trading rules can be created efficiently. In the simulation, the proposed method is trained using the stock prices of 10 brands in 2002 and 2003. Then the generalization ability is tested using the stock prices in 2004. The simulation results show that the proposed method can obtain larger profits than GNP-IMX without AC and Buy&Hold.

  3. Altered Risk-Based Decision Making following Adolescent Alcohol Use Results from an Imbalance in Reinforcement Learning in Rats

    PubMed Central

    Hart, Andrew S.; Collins, Anne L.; Bernstein, Ilene L.; Phillips, Paul E. M.

    2012-01-01

    Alcohol use during adolescence has profound and enduring consequences on decision-making under risk. However, the fundamental psychological processes underlying these changes are unknown. Here, we show that alcohol use produces over-fast learning for better-than-expected, but not worse-than-expected, outcomes without altering subjective reward valuation. We constructed a simple reinforcement learning model to simulate altered decision making using behavioral parameters extracted from rats with a history of adolescent alcohol use. Remarkably, the learning imbalance alone was sufficient to simulate the divergence in choice behavior observed between these groups of animals. These findings identify a selective alteration in reinforcement learning following adolescent alcohol use that can account for a robust change in risk-based decision making persisting into later life. PMID:22615989

  4. "You Be the Judge."

    ERIC Educational Resources Information Center

    Black, Susan

    1995-01-01

    Although teachers at all levels are encouraged to use role-playing and simulation, they usually overestimate role-playing's learning value. Teachers use these methods mainly to change behavior (and values), not reinforce curriculum content. Sociodramas (scenes based on typical situations facing children) are more effective role-playing activities…

  5. Artificial neural networks and approximate reasoning for intelligent control in space

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1991-01-01

    A method is introduced for learning to refine the control rules of approximate reasoning-based controllers. A reinforcement-learning technique is used in conjunction with a multi-layer neural network model of an approximate reasoning-based controller. The model learns by updating its prediction of the physical system's behavior. The model can use the control knowledge of an experienced operator and fine-tune it through the process of learning. Some of the space domains suitable for applications of the model such as rendezvous and docking, camera tracking, and tethered systems control are discussed.

  6. A graph-based evolutionary algorithm: Genetic Network Programming (GNP) and its extension using reinforcement learning.

    PubMed

    Mabu, Shingo; Hirasawa, Kotaro; Hu, Jinglu

    2007-01-01

    This paper proposes a graph-based evolutionary algorithm called Genetic Network Programming (GNP). Our goal is to develop GNP, which can deal with dynamic environments efficiently and effectively, based on the distinguished expression ability of the graph (network) structure. The characteristics of GNP are as follows. 1) GNP programs are composed of a number of nodes which execute simple judgment/processing, and these nodes are connected by directed links to each other. 2) The graph structure enables GNP to re-use nodes, thus the structure can be very compact. 3) The node transition of GNP is executed according to its node connections without any terminal nodes, thus the past history of the node transition affects the current node to be used and this characteristic works as an implicit memory function. These structural characteristics are useful for dealing with dynamic environments. Furthermore, we propose an extended algorithm, "GNP with Reinforcement Learning (GNPRL)" which combines evolution and reinforcement learning in order to create effective graph structures and obtain better results in dynamic environments. In this paper, we applied GNP to the problem of determining agents' behavior to evaluate its effectiveness. Tileworld was used as the simulation environment. The results show some advantages for GNP over conventional methods.

  7. A Discussion of Possibility of Reinforcement Learning Using Event-Related Potential in BCI

    NASA Astrophysics Data System (ADS)

    Yamagishi, Yuya; Tsubone, Tadashi; Wada, Yasuhiro

    Recently, Brain computer interface (BCI) which is a direct connecting pathway an external device such as a computer or a robot and a human brain have gotten a lot of attention. Since BCI can control the machines as robots by using the brain activity without using the voluntary muscle, the BCI may become a useful communication tool for handicapped persons, for instance, amyotrophic lateral sclerosis patients. However, in order to realize the BCI system which can perform precise tasks on various environments, it is necessary to design the control rules to adapt to the dynamic environments. Reinforcement learning is one approach of the design of the control rule. If this reinforcement leaning can be performed by the brain activity, it leads to the attainment of BCI that has general versatility. In this research, we paid attention to P300 of event-related potential as an alternative signal of the reward of reinforcement learning. We discriminated between the success and the failure trials from P300 of the EEG of the single trial by using the proposed discrimination algorithm based on Support vector machine. The possibility of reinforcement learning was examined from the viewpoint of the number of discriminated trials. It was shown that there was a possibility to be able to learn in most subjects.

  8. Comparative learning theory and its application in the training of horses.

    PubMed

    Cooper, J J

    1998-11-01

    Training can best be explained as a process that occurs through stimulus-response-reinforcement chains, whereby animals are conditioned to associate cues in their environment, with specific behavioural responses and their rewarding consequences. Research into learning in horses has concentrated on their powers of discrimination and on primary positive reinforcement schedules, where the correct response is paired with a desirable consequence such as food. In contrast, a number of other learning processes that are used in training have been widely studied in other species, but have received little scientific investigation in the horse. These include: negative reinforcement, where performance of the correct response is followed by removal of, or decrease in, intensity of a unpleasant stimulus; punishment, where an incorrect response is paired with an undesirable consequence, but without consistent prior warning; secondary conditioning, where a natural primary reinforcer such as food is closely associated with an arbitrary secondary reinforcer such as vocal praise; and variable or partial conditioning, where once the correct response has been learnt, reinforcement is presented according to an intermittent schedule to increase resistance to extinction outside of training.

  9. The nature of sexual reinforcement.

    PubMed Central

    Crawford, L L; Holloway, K S; Domjan, M

    1993-01-01

    Sexual reinforcers are not part of a regulatory system involved in the maintenance of critical metabolic processes, they differ for males and females, they differ as a function of species and mating system, and they show ontogenetic and seasonal changes related to endocrine conditions. Exposure to a member of the opposite sex without copulation can be sufficient for sexual reinforcement. However, copulatory access is a stronger reinforcer, and copulatory opportunity can serve to enhance the reinforcing efficacy of stimulus features of a sexual partner. Conversely, under certain conditions, noncopulatory exposure serves to decrease reinforcer efficacy. Many common learning phenomena such as acquisition, extinction, discrimination learning, second-order conditioning, and latent inhibition have been demonstrated in sexual conditioning. These observations extend the generality of findings obtained with more conventional reinforcers, but the mechanisms of these effects and their gender and species specificity remain to be explored. PMID:8354970

  10. Pointwise probability reinforcements for robust statistical inference.

    PubMed

    Frénay, Benoît; Verleysen, Michel

    2014-02-01

    Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation. Copyright © 2013 Elsevier Ltd. All rights reserved.

  11. Investigation of a reinforcement-based toilet training procedure for children with autism.

    PubMed

    Cicero, Frank R; Pfadt, Al

    2002-01-01

    Independent toileting is an important developmental skill which individuals with developmental disabilities often find a challenge to master. Effective toilet training interventions have been designed which rely on a combination of basic operant principles of positive reinforcement and punishment. In the present study, the effectiveness of a reinforcement-based toilet training intervention was investigated with three children with a diagnosis of autism. Procedures included a combination of positive reinforcement, graduated guidance, scheduled practice trials and forward prompting. Results indicated that all procedures were implemented in response to urination accidents. A three participants reduced urination accidents to zero and learned to spontaneously request use of the bathroom within 7-11 days of training. Gains were maintained over 6-month and 1-year follow-ups. Findings suggest that the proposed procedure is an effective and rapid method of toilet training, which can be implemented within a structured school setting with generalization to the home environment.

  12. Mesolimbic confidence signals guide perceptual learning in the absence of external feedback

    PubMed Central

    Guggenmos, Matthias; Wilbertz, Gregor; Hebart, Martin N; Sterzer, Philipp

    2016-01-01

    It is well established that learning can occur without external feedback, yet normative reinforcement learning theories have difficulties explaining such instances of learning. Here, we propose that human observers are capable of generating their own feedback signals by monitoring internal decision variables. We investigated this hypothesis in a visual perceptual learning task using fMRI and confidence reports as a measure for this monitoring process. Employing a novel computational model in which learning is guided by confidence-based reinforcement signals, we found that mesolimbic brain areas encoded both anticipation and prediction error of confidence—in remarkable similarity to previous findings for external reward-based feedback. We demonstrate that the model accounts for choice and confidence reports and show that the mesolimbic confidence prediction error modulation derived through the model predicts individual learning success. These results provide a mechanistic neurobiological explanation for learning without external feedback by augmenting reinforcement models with confidence-based feedback. DOI: http://dx.doi.org/10.7554/eLife.13388.001 PMID:27021283

  13. Efficiency Improvement of Action Acquisition in Two-Link Robot Arm Using Fuzzy ART with Genetic Algorithm

    NASA Astrophysics Data System (ADS)

    Kotani, Naoki; Taniguchi, Kenji

    An efficient learning method using Fuzzy ART with Genetic Algorithm is proposed. The proposed method reduces the number of trials by using a policy acquired in other tasks because a reinforcement learning needs a lot of the number of trials until an agent acquires appropriate actions. Fuzzy ART is an incremental unsupervised learning algorithm in responce to arbitrary sequences of analog or binary input vectors. Our proposed method gives a policy by crossover or mutation when an agent observes unknown states. Selection controls the category proliferation problem of Fuzzy ART. The effectiveness of the proposed method was verified with the simulation of the reaching problem for the two-link robot arm. The proposed method achieves a reduction of both the number of trials and the number of states.

  14. Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: a simulated robotic study.

    PubMed

    Mirolli, Marco; Santucci, Vieri G; Baldassarre, Gianluca

    2013-03-01

    An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions. Copyright © 2013 Elsevier Ltd. All rights reserved.

  15. Deficits in Positive Reinforcement Learning and Uncertainty-Driven Exploration are Associated with Distinct Aspects of Negative Symptoms in Schizophrenia

    PubMed Central

    Strauss, Gregory P.; Frank, Michael J.; Waltz, James A.; Kasanova, Zuzana; Herbener, Ellen S.; Gold, James M.

    2011-01-01

    Background Negative symptoms are core features of schizophrenia; however, the cognitive and neural basis for individual negative symptom domains remains unclear. Converging evidence suggests a role for striatal and prefrontal dopamine in reward learning and the exploration of actions that might produce outcomes that are better than the status quo. The current study examines whether deficits in reinforcement learning and uncertainty-driven exploration predict specific negative symptoms domains. Methods We administered a temporal decision making task, which required trial-by-trial adjustment of reaction time (RT) to maximize reward receipt, to 51 patients with schizophrenia and 39 age-matched healthy controls. Task conditions were designed such that expected value (probability * magnitude) increased (IEV), decreased (DEV), or remained constant (CEV) with increasing response times. Computational analyses were applied to estimate the degree to which trial-by-trial responses are influenced by reinforcement history. Results Individuals with schizophrenia showed impaired Go learning, but intact NoGo learning relative to controls. These effects were pronounced as a function of global measures of negative symptom. Uncertainty-based exploration was substantially reduced in individuals with schizophrenia, and selectively correlated with clinical ratings of anhedonia. Conclusions Schizophrenia patients, particularly those with high negative symptoms, failed to speed RT's to increase positive outcomes and showed reduced tendency to explore when alternative actions could lead to better outcomes than the status quo. Results are interpreted in the context of current computational, genetic, and pharmacological data supporting the roles of striatal and prefrontal dopamine in these processes. PMID:21168124

  16. Reinforced dynamics for enhanced sampling in large atomic and molecular systems

    NASA Astrophysics Data System (ADS)

    Zhang, Linfeng; Wang, Han; E, Weinan

    2018-03-01

    A new approach for efficiently exploring the configuration space and computing the free energy of large atomic and molecular systems is proposed, motivated by an analogy with reinforcement learning. There are two major components in this new approach. Like metadynamics, it allows for an efficient exploration of the configuration space by adding an adaptively computed biasing potential to the original dynamics. Like deep reinforcement learning, this biasing potential is trained on the fly using deep neural networks, with data collected judiciously from the exploration and an uncertainty indicator from the neural network model playing the role of the reward function. Parameterization using neural networks makes it feasible to handle cases with a large set of collective variables. This has the potential advantage that selecting precisely the right set of collective variables has now become less critical for capturing the structural transformations of the system. The method is illustrated by studying the full-atom explicit solvent models of alanine dipeptide and tripeptide, as well as the system of a polyalanine-10 molecule with 20 collective variables.

  17. Impairments in action-outcome learning in schizophrenia.

    PubMed

    Morris, Richard W; Cyrzon, Chad; Green, Melissa J; Le Pelley, Mike E; Balleine, Bernard W

    2018-03-03

    Learning the causal relation between actions and their outcomes (AO learning) is critical for goal-directed behavior when actions are guided by desire for the outcome. This can be contrasted with habits that are acquired by reinforcement and primed by prevailing stimuli, in which causal learning plays no part. Recently, we demonstrated that goal-directed actions are impaired in schizophrenia; however, whether this deficit exists alongside impairments in habit or reinforcement learning is unknown. The present study distinguished deficits in causal learning from reinforcement learning in schizophrenia. We tested people with schizophrenia (SZ, n = 25) and healthy adults (HA, n = 25) in a vending machine task. Participants learned two action-outcome contingencies (e.g., push left to get a chocolate M&M, push right to get a cracker), and they also learned one contingency was degraded by delivery of noncontingent outcomes (e.g., free M&Ms), as well as changes in value by outcome devaluation. Both groups learned the best action to obtain rewards; however, SZ did not distinguish the more causal action when one AO contingency was degraded. Moreover, action selection in SZ was insensitive to changes in outcome value unless feedback was provided, and this was related to the deficit in AO learning. The failure to encode the causal relation between action and outcome in schizophrenia occurred without any apparent deficit in reinforcement learning. This implies that poor goal-directed behavior in schizophrenia cannot be explained by a more primary deficit in reward learning such as insensitivity to reward value or reward prediction errors.

  18. The Effects of a Token Reinforcement System on the Reading and Arithmetic Skills Learnings of Migrant Primary School Pupils.

    ERIC Educational Resources Information Center

    Heitzman, Andrew J.

    The New York State Center for Migrant Studies conducted this 1968 study which investigated effects of token reinforcers on reading and arithmetic skills learnings of migrant primary school students during a 6-week summer school session. Students (Negro and Caucasian) received plastic tokens to reward skills learning responses. Tokens were traded…

  19. The Effects of Observation of Learn Units during Reinforcement and Correction Conditions on the Rate of Learning Math Algorithms by Fifth Grade Students

    ERIC Educational Resources Information Center

    Neu, Jessica Adele

    2013-01-01

    I conducted two studies on the comparative effects of the observation of learn units during (a) reinforcement or (b) correction conditions on the acquisition of math objectives. The dependent variables were the within-session cumulative numbers of correct responses emitted during observational sessions. The independent variables were the…

  20. An Evaluation of Pedagogical Tutorial Tactics for a Natural Language Tutoring System: A Reinforcement Learning Approach

    ERIC Educational Resources Information Center

    Chi, Min; VanLehn, Kurt; Litman, Diane; Jordan, Pamela

    2011-01-01

    Pedagogical strategies are policies for a tutor to decide the next action when there are multiple actions available. When the content is controlled to be the same across experimental conditions, there has been little evidence that tutorial decisions have an impact on students' learning. In this paper, we applied Reinforcement Learning (RL) to…

  1. The Identification and Establishment of Reinforcement for Collaboration in Elementary Students

    ERIC Educational Resources Information Center

    Darcy, Laura

    2017-01-01

    In Experiment 1, I conducted a functional analysis of student rate of learning with and without a peer-yoked contingency for 12 students in Kindergarten through 2nd grade in order to determine if they had conditioned reinforcement for collaboration. Using an ABAB reversal design, I compared rate of learning as measured by learn units to criterion…

  2. Stress enhances model-free reinforcement learning only after negative outcome

    PubMed Central

    Lee, Daeyeol

    2017-01-01

    Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i.e., how quickly information from the new environment is incorporated into choices. Participants were randomly assigned to either a stress or a control condition, and performed a two-stage Markov decision-making task in which the reward probabilities underwent periodic reversals without notice. We found that stress increased the contribution of model-free reinforcement learning only after negative outcome. Furthermore, stress decreased the learning rate. The results suggest that stress diminishes one’s ability to make adaptive choices in multiple aspects of reinforcement learning. This finding has implications for understanding how stress facilitates maladaptive habits, such as addictive behavior, and other dysfunctional behaviors associated with stress in clinical and educational contexts. PMID:28723943

  3. Stress enhances model-free reinforcement learning only after negative outcome.

    PubMed

    Park, Heyeon; Lee, Daeyeol; Chey, Jeanyung

    2017-01-01

    Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i.e., how quickly information from the new environment is incorporated into choices. Participants were randomly assigned to either a stress or a control condition, and performed a two-stage Markov decision-making task in which the reward probabilities underwent periodic reversals without notice. We found that stress increased the contribution of model-free reinforcement learning only after negative outcome. Furthermore, stress decreased the learning rate. The results suggest that stress diminishes one's ability to make adaptive choices in multiple aspects of reinforcement learning. This finding has implications for understanding how stress facilitates maladaptive habits, such as addictive behavior, and other dysfunctional behaviors associated with stress in clinical and educational contexts.

  4. Implicit chaining in cotton-top tamarins (Saguinus oedipus) with elements equated for probability of reinforcement

    PubMed Central

    Dillon, Laura; Collins, Meaghan; Conway, Maura; Cunningham, Kate

    2013-01-01

    Three experiments examined the implicit learning of sequences under conditions in which the elements comprising a sequence were equated in terms of reinforcement probability. In Experiment 1 cotton-top tamarins (Saguinus oedipus) experienced a five-element sequence displayed serially on a touch screen in which reinforcement probability was equated across elements at .16 per element. Tamarins demonstrated learning of this sequence with higher latencies during a random test as compared to baseline sequence training. In Experiments 2 and 3, manipulations of the procedure used in the first experiment were undertaken to rule out a confound owing to the fact that the elements in Experiment 1 bore different temporal relations to the intertrial interval (ITI), an inhibitory period. The results of Experiments 2 and 3 indicated that the implicit learning observed in Experiment 1 was not due to temporal proximity between some elements and the inhibitory ITI. The results taken together support two conclusion: First that tamarins engaged in sequence learning whether or not there was contingent reinforcement for learning the sequence, and second that this learning was not due to subtle differences in associative strength between the elements of the sequence. PMID:23344718

  5. Improving the Science Excursion: An Educational Technologist's View

    ERIC Educational Resources Information Center

    Balson, M.

    1973-01-01

    Analyzes the nature of the learning process and attempts to show how the three components of a reinforcement contingency, the stimulus, the response and the reinforcement can be utilized to increase the efficiency of a typical science learning experience, the excursion. (JR)

  6. A clustering-based graph Laplacian framework for value function approximation in reinforcement learning.

    PubMed

    Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold

    2014-12-01

    In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.

  7. Off-policy reinforcement learning for H∞ control design.

    PubMed

    Luo, Biao; Wu, Huai-Ning; Huang, Tingwen

    2015-01-01

    The H∞ control design problem is considered for nonlinear systems with unknown internal system model. It is known that the nonlinear H∞ control problem can be transformed into solving the so-called Hamilton-Jacobi-Isaacs (HJI) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, model-based approaches cannot be used for approximately solving HJI equation, when the accurate system model is unavailable or costly to obtain in practice. To overcome these difficulties, an off-policy reinforcement leaning (RL) method is introduced to learn the solution of HJI equation from real system data instead of mathematical system model, and its convergence is proved. In the off-policy RL method, the system data can be generated with arbitrary policies rather than the evaluating policy, which is extremely important and promising for practical systems. For implementation purpose, a neural network (NN)-based actor-critic structure is employed and a least-square NN weight update algorithm is derived based on the method of weighted residuals. Finally, the developed NN-based off-policy RL method is tested on a linear F16 aircraft plant, and further applied to a rotational/translational actuator system.

  8. PROGRAMMED INSTRUCTION AND LANGUAGE LEARNING.

    ERIC Educational Resources Information Center

    LUELSDORFF, PHILIP A.

    PROGRAMED INSTRUCTION, A TEACHING METHOD WHICH INCORPORATES (1) A DETAILED SPECIFICATION OF TERMINAL BEHAVIOR, (2) A CAREFUL SEQUENCING OF THE MATERIAL INTO GRADED STEPS, AND (3) THE REINFORCEMENT OF STUDENT RESPONSE, WORKS MORE FAVORABLY IN CERTAIN INSTRUCTIONAL MEDIA THAN IN OTHERS. CARROLL AND SKINNER BELIEVE THAT SUCCESS IN PROGRAMED…

  9. Quiet Quincy Quarter. Teacher's Guide [and] Student Materials.

    ERIC Educational Resources Information Center

    Zishka, Phyllis

    This document suggests learning activities, teaching methods, objectives, and evaluation measures for a second grade consumer education unit on quarters. The unit, which requires approximately six hours of class time, reinforces basic social studies and mathematics skills including following sequences of numbers, distinguishing left from right,…

  10. Gaze-contingent reinforcement learning reveals incentive value of social signals in young children and adults

    PubMed Central

    Smith, Tim J.; Senju, Atsushi

    2017-01-01

    While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue–reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue–reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. PMID:28250186

  11. Gaze-contingent reinforcement learning reveals incentive value of social signals in young children and adults.

    PubMed

    Vernetti, Angélina; Smith, Tim J; Senju, Atsushi

    2017-03-15

    While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. © 2017 The Authors.

  12. Learning and altering behaviours by reinforcement: neurocognitive differences between children and adults.

    PubMed

    Shephard, E; Jackson, G M; Groom, M J

    2014-01-01

    This study examined neurocognitive differences between children and adults in the ability to learn and adapt simple stimulus-response associations through feedback. Fourteen typically developing children (mean age=10.2) and 15 healthy adults (mean age=25.5) completed a simple task in which they learned to associate visually presented stimuli with manual responses based on performance feedback (acquisition phase), and then reversed and re-learned those associations following an unexpected change in reinforcement contingencies (reversal phase). Electrophysiological activity was recorded throughout task performance. We found no group differences in learning-related changes in performance (reaction time, accuracy) or in the amplitude of event-related potentials (ERPs) associated with stimulus processing (P3 ERP) or feedback processing (feedback-related negativity; FRN) during the acquisition phase. However, children's performance was significantly more disrupted by the reversal than adults and FRN amplitudes were significantly modulated by the reversal phase in children but not adults. These findings indicate that children have specific difficulties with reinforcement learning when acquired behaviours must be altered. This may be caused by the added demands on immature executive functioning, specifically response monitoring, created by the requirement to reverse the associations, or a developmental difference in the way in which children and adults approach reinforcement learning. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.

  13. Flow Navigation by Smart Microswimmers via Reinforcement Learning

    NASA Astrophysics Data System (ADS)

    Colabrese, Simona; Biferale, Luca; Celani, Antonio; Gustavsson, Kristian

    2017-11-01

    We have numerically modeled active particles which are able to acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. We show that those swimmers can learn effective strategies just by experience, using a reinforcement learning algorithm. As an example, we focus on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, exploiting the underlying flow whenever possible. The reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This work paves the way towards the engineering of smart microswimmers that solve difficult navigation problems. ERC AdG NewTURB 339032.

  14. Frontal Theta Links Prediction Errors to Behavioral Adaptation in Reinforcement Learning

    PubMed Central

    Cavanagh, James F.; Frank, Michael J.; Klein, Theresa J.; Allen, John J.B.

    2009-01-01

    Investigations into action monitoring have consistently detailed a fronto-central voltage deflection in the Event-Related Potential (ERP) following the presentation of negatively valenced feedback, sometimes termed the Feedback Related Negativity (FRN). The FRN has been proposed to reflect a neural response to prediction errors during reinforcement learning, yet the single trial relationship between neural activity and the quanta of expectation violation remains untested. Although ERP methods are not well suited to single trial analyses, the FRN has been associated with theta band oscillatory perturbations in the medial prefrontal cortex. Medio-frontal theta oscillations have been previously associated with expectation violation and behavioral adaptation and are well suited to single trial analysis. Here, we recorded EEG activity during a probabilistic reinforcement learning task and fit the performance data to an abstract computational model (Q-learning) for calculation of single-trial reward prediction errors. Single-trial theta oscillatory activities following feedback were investigated within the context of expectation (prediction error) and adaptation (subsequent reaction time change). Results indicate that interactive medial and lateral frontal theta activities reflect the degree of negative and positive reward prediction error in the service of behavioral adaptation. These different brain areas use prediction error calculations for different behavioral adaptations: with medial frontal theta reflecting the utilization of prediction errors for reaction time slowing (specifically following errors), but lateral frontal theta reflecting prediction errors leading to working memory-related reaction time speeding for the correct choice. PMID:19969093

  15. A neural model of hierarchical reinforcement learning.

    PubMed

    Rasmussen, Daniel; Voelker, Aaron; Eliasmith, Chris

    2017-01-01

    We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain's general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model's behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions.

  16. Neural correlates of reinforcement learning and social preferences in competitive bidding.

    PubMed

    van den Bos, Wouter; Talwar, Arjun; McClure, Samuel M

    2013-01-30

    In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.

  17. Attentional Selection Can Be Predicted by Reinforcement Learning of Task-relevant Stimulus Features Weighted by Value-independent Stickiness.

    PubMed

    Balcarras, Matthew; Ardid, Salva; Kaping, Daniel; Everling, Stefan; Womelsdorf, Thilo

    2016-02-01

    Attention includes processes that evaluate stimuli relevance, select the most relevant stimulus against less relevant stimuli, and bias choice behavior toward the selected information. It is not clear how these processes interact. Here, we captured these processes in a reinforcement learning framework applied to a feature-based attention task that required macaques to learn and update the value of stimulus features while ignoring nonrelevant sensory features, locations, and action plans. We found that value-based reinforcement learning mechanisms could account for feature-based attentional selection and choice behavior but required a value-independent stickiness selection process to explain selection errors while at asymptotic behavior. By comparing different reinforcement learning schemes, we found that trial-by-trial selections were best predicted by a model that only represents expected values for the task-relevant feature dimension, with nonrelevant stimulus features and action plans having only a marginal influence on covert selections. These findings show that attentional control subprocesses can be described by (1) the reinforcement learning of feature values within a restricted feature space that excludes irrelevant feature dimensions, (2) a stochastic selection process on feature-specific value representations, and (3) value-independent stickiness toward previous feature selections akin to perseveration in the motor domain. We speculate that these three mechanisms are implemented by distinct but interacting brain circuits and that the proposed formal account of feature-based stimulus selection will be important to understand how attentional subprocesses are implemented in primate brain networks.

  18. Extinction of Pavlovian conditioning: The influence of trial number and reinforcement history.

    PubMed

    Chan, C K J; Harris, Justin A

    2017-08-01

    Pavlovian conditioning is sensitive to the temporal relationship between the conditioned stimulus (CS) and the unconditioned stimulus (US). This has motivated models that describe learning as a process that continuously updates associative strength during the trial or specifically encodes the CS-US interval. These models predict that extinction of responding is also continuous, such that response loss is proportional to the cumulative duration of exposure to the CS without the US. We review evidence showing that this prediction is incorrect, and that extinction is trial-based rather than time-based. We also present two experiments that test the importance of trials versus time on the Partial Reinforcement Extinction Effect (PREE), in which responding extinguishes more slowly for a CS that was inconsistently reinforced with the US than for a consistently reinforced one. We show that increasing the number of extinction trials of the partially reinforced CS, relative to the consistently reinforced CS, overcomes the PREE. However, increasing the duration of extinction trials by the same amount does not overcome the PREE. We conclude that animals learn about the likelihood of the US per trial during conditioning, and learn trial-by-trial about the absence of the US during extinction. Moreover, what they learn about the likelihood of the US during conditioning affects how sensitive they are to the absence of the US during extinction. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Can Service Learning Reinforce Social and Cultural Bias? Exploring a Popular Model of Family Involvement for Early Childhood Teacher Candidates

    ERIC Educational Resources Information Center

    Dunn-Kenney, Maylan

    2010-01-01

    Service learning is often used in teacher education as a way to challenge social bias and provide teacher candidates with skills needed to work in partnership with diverse families. Although some literature suggests that service learning could reinforce cultural bias, there is little documentation. In a study of 21 early childhood teacher…

  20. Deep Gate Recurrent Neural Network

    DTIC Science & Technology

    2016-11-22

    Schmidhuber. A system for robotic heart surgery that learns to tie knots using recurrent neural networks. In IEEE International Conference on...tasks, such as Machine Translation (Bahdanau et al. (2015)) or Robot Reinforcement Learning (Bakker (2001)). The main idea behind these networks is to...and J. Peters. Reinforcement learning in robotics : A survey. The International Journal of Robotics Research, 32:1238–1274, 2013. ISSN 0278-3649. doi

  1. Managing Student Study Time.

    ERIC Educational Resources Information Center

    Hudson, H. T.

    1981-01-01

    Large lecture sections of science staffed by a single instructor make it impossible to provide individual student attention and lead to high dropout and failure rates. Daily assignments are used to encourage students to identify areas that need remediation, maintain a steady study pace, receive positive reinforcement, and learn methods of solving…

  2. Pre-testing Orientation for the Disadvantaged.

    ERIC Educational Resources Information Center

    Mihalka, Joseph A.

    A pre-testing orientation was incorporated into the Work Incentives Program, a pre-vocational program for disadvantaged youth. Test-taking skills were taught in seven and one half hours of instruction and a variety of methods were used to provide a sequential experience with distributed learning, positive reinforcement, and immediate feedback of…

  3. Learning Gains and Response to Digital Lessons on Soil Genesis and Development

    USDA-ARS?s Scientific Manuscript database

    Evolving computer technology offers opportunities for new online approaches in teaching methods and delivery. Well-designed online lessons should reinforce the critical need of the soil science discipline in today’s food, energy, and environmental issues, as well as meet the needs of the diverse cli...

  4. A Comparative Analysis of Reinforcement Learning Methods

    DTIC Science & Technology

    1991-10-01

    Technology. Support for this research was provided in part by the Mazda Corporation, in part by the University Research Initiative under Office of Naval...results in an update rule (e.g. [Goldberg 89]Goldberg85), genetic algorithms which disregards all history accumulated in the current will not be addressed

  5. Conceptualizing withdrawal-induced escalation of alcohol self-administration as a learned, plasticity-dependent process

    PubMed Central

    Walker, Brendan M.

    2013-01-01

    This article represents one of five contributions focusing on the topic “Plasticity and neuroadaptive responses within the extended amygdala in response to chronic or excessive alcohol exposure” that were developed by awardees participating in the Young Investigator Award Symposium at the “Alcoholism and Stress: A Framework for Future Treatment Strategies” conference in Volterra, Italy on May 3–6, 2011 that was organized/chaired by Drs. Antonio Noronha and Fulton Crews and sponsored by the National Institute on Alcohol Abuse and Alcoholism. This review discusses the dependence-induced neuroadaptations in affective systems that provide a basis for negative reinforcement learning and presents evidence demonstrating that escalated alcohol consumption during withdrawal is a learned, plasticity-dependent process. The review concludes by identifying changes within extended amygdala dynorphin/kappa-opioid receptor systems that could serve as the foundation for the occurrence of negative reinforcement processes. While some evidence contained herein may be specific to alcohol dependence-related learning and plasticity, much of the information will be of relevance to any addictive disorder involving negative reinforcement mechanisms. Collectively, the information presented within this review provides a framework to assess the negative reinforcing effects of alcohol in a manner that distinguishes neuroadaptations produced by chronic alcohol exposure from the actual plasticity that is associated with negative reinforcement learning in dependent organisms. PMID:22459874

  6. A new computational account of cognitive control over reinforcement-based decision-making: Modeling of a probabilistic learning task.

    PubMed

    Zendehrouh, Sareh

    2015-11-01

    Recent work on decision-making field offers an account of dual-system theory for decision-making process. This theory holds that this process is conducted by two main controllers: a goal-directed system and a habitual system. In the reinforcement learning (RL) domain, the habitual behaviors are connected with model-free methods, in which appropriate actions are learned through trial-and-error experiences. However, goal-directed behaviors are associated with model-based methods of RL, in which actions are selected using a model of the environment. Studies on cognitive control also suggest that during processes like decision-making, some cortical and subcortical structures work in concert to monitor the consequences of decisions and to adjust control according to current task demands. Here a computational model is presented based on dual system theory and cognitive control perspective of decision-making. The proposed model is used to simulate human performance on a variant of probabilistic learning task. The basic proposal is that the brain implements a dual controller, while an accompanying monitoring system detects some kinds of conflict including a hypothetical cost-conflict one. The simulation results address existing theories about two event-related potentials, namely error related negativity (ERN) and feedback related negativity (FRN), and explore the best account of them. Based on the results, some testable predictions are also presented. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. Cueing, demand fading, and positive reinforcement to establish self-feeding and oral consumption in a child with chronic food refusal.

    PubMed

    Luiselli, J K

    2000-07-01

    A 3-year-old child with multiple medical disorders and chronic food refusal was treated successfully using a program that incorporated antecedent control procedures combined with positive reinforcement. The antecedent manipulations included visual cueing of a criterion number of self-feeding responses that were required during meals to receive reinforcement and a gradual increase in the imposed criterion (demand fading) that was based on improved frequency of oral consumption. As evaluated in a changing criterion design, the child learned to feed himself as an outcome of treatment. One year following intervention, he was consuming a variety of foods and had gained weight. Advantages of antecedent control methods for the treatment of chronic food refusal are discussed.

  8. Depression, Activity, and Evaluation of Reinforcement

    ERIC Educational Resources Information Center

    Hammen, Constance L.; Glass, David R., Jr.

    1975-01-01

    This research attempted to find the causal relation between mood and level of reinforcement. An effort was made to learn what mood change might occur if depressed subjects increased their levels of participation in reinforcing activities. (Author/RK)

  9. What Can Reinforcement Learning Teach Us About Non-Equilibrium Quantum Dynamics

    NASA Astrophysics Data System (ADS)

    Bukov, Marin; Day, Alexandre; Sels, Dries; Weinberg, Phillip; Polkovnikov, Anatoli; Mehta, Pankaj

    Equilibrium thermodynamics and statistical physics are the building blocks of modern science and technology. Yet, our understanding of thermodynamic processes away from equilibrium is largely missing. In this talk, I will reveal the potential of what artificial intelligence can teach us about the complex behaviour of non-equilibrium systems. Specifically, I will discuss the problem of finding optimal drive protocols to prepare a desired target state in quantum mechanical systems by applying ideas from Reinforcement Learning [one can think of Reinforcement Learning as the study of how an agent (e.g. a robot) can learn and perfect a given policy through interactions with an environment.]. The driving protocols learnt by our agent suggest that the non-equilibrium world features possibilities easily defying intuition based on equilibrium physics.

  10. Kinesthetic Reinforcement-Is It a Boon to Learning?

    ERIC Educational Resources Information Center

    Bohrer, Roxilu K.

    1970-01-01

    Language instruction, particularly in the elementary school, should be reinforced through the use of visual aids and through associated physical activity. Kinesthetic experiences provide an opportunity to make use of non-verbal cues to meaning, enliven classroom activities, and maximize learning for pupils. The author discusses the educational…

  11. Reinforcing Basic Skills Through Social Studies. Grades 4-7.

    ERIC Educational Resources Information Center

    Lewis, Teresa Marie

    Arranged into seven parts, this document provides a variety of games and activities, bulletin board ideas, overhead transparencies, student handouts, and learning station ideas to help reinforce basic social studies skills in the intermediate grades. In part 1, students learn about timelines, first constructing their own life timeline, then a…

  12. Effects of Reinforcement on Peer Imitation in a Small Group Play Context

    ERIC Educational Resources Information Center

    Barton, Erin E.; Ledford, Jennifer R.

    2018-01-01

    Children with disabilities often have deficits in imitation skills, particularly in imitating peers. Imitation is considered a behavioral cusp--which, once learned, allows a child to access additional and previously unavailable learning opportunities. In the current study, researchers examined the efficacy of contingent reinforcement delivered…

  13. Neurofeedback in Learning Disabled Children: Visual versus Auditory Reinforcement.

    PubMed

    Fernández, Thalía; Bosch-Bayard, Jorge; Harmony, Thalía; Caballero, María I; Díaz-Comas, Lourdes; Galán, Lídice; Ricardo-Garcell, Josefina; Aubert, Eduardo; Otero-Ojeda, Gloria

    2016-03-01

    Children with learning disabilities (LD) frequently have an EEG characterized by an excess of theta and a deficit of alpha activities. NFB using an auditory stimulus as reinforcer has proven to be a useful tool to treat LD children by positively reinforcing decreases of the theta/alpha ratio. The aim of the present study was to optimize the NFB procedure by comparing the efficacy of visual (with eyes open) versus auditory (with eyes closed) reinforcers. Twenty LD children with an abnormally high theta/alpha ratio were randomly assigned to the Auditory or the Visual group, where a 500 Hz tone or a visual stimulus (a white square), respectively, was used as a positive reinforcer when the value of the theta/alpha ratio was reduced. Both groups had signs consistent with EEG maturation, but only the Auditory Group showed behavioral/cognitive improvements. In conclusion, the auditory reinforcer was more efficacious in reducing the theta/alpha ratio, and it improved the cognitive abilities more than the visual reinforcer.

  14. Neuromuscular control of the point to point and oscillatory movements of a sagittal arm with the actor-critic reinforcement learning method.

    PubMed

    Golkhou, Vahid; Parnianpour, Mohamad; Lucas, Caro

    2005-04-01

    In this study, we have used a single link system with a pair of muscles that are excited with alpha and gamma signals to achieve both point to point and oscillatory movements with variable amplitude and frequency.The system is highly nonlinear in all its physical and physiological attributes. The major physiological characteristics of this system are simultaneous activation of a pair of nonlinear muscle-like-actuators for control purposes, existence of nonlinear spindle-like sensors and Golgi tendon organ-like sensor, actions of gravity and external loading. Transmission delays are included in the afferent and efferent neural paths to account for a more accurate representation of the reflex loops.A reinforcement learning method with an actor-critic (AC) architecture instead of middle and low level of central nervous system (CNS), is used to track a desired trajectory. The actor in this structure is a two layer feedforward neural network and the critic is a model of the cerebellum. The critic is trained by state-action-reward-state-action (SARSA) method. The critic will train the actor by supervisory learning based on the prior experiences. Simulation studies of oscillatory movements based on the proposed algorithm demonstrate excellent tracking capability and after 280 epochs the RMS error for position and velocity profiles were 0.02, 0.04 rad and rad/s, respectively.

  15. The probability of reinforcement per trial affects posttrial responding and subsequent extinction but not within-trial responding.

    PubMed

    Harris, Justin A; Kwok, Dorothy W S

    2018-01-01

    During magazine approach conditioning, rats do not discriminate between a conditional stimulus (CS) that is consistently reinforced with food and a CS that is occasionally (partially) reinforced, as long as the CSs have the same overall reinforcement rate per second. This implies that rats are indifferent to the probability of reinforcement per trial. However, in the same rats, the per-trial reinforcement rate will affect subsequent extinction-responding extinguishes more rapidly for a CS that was consistently reinforced than for a partially reinforced CS. Here, we trained rats with consistently and partially reinforced CSs that were matched for overall reinforcement rate per second. We measured conditioned responding both during and immediately after the CSs. Differences in the per-trial probability of reinforcement did not affect the acquisition of responding during the CS but did affect subsequent extinction of that responding, and also affected the post-CS response rates during conditioning. Indeed, CSs with the same probability of reinforcement per trial evoked the same amount of post-CS responding even when they differed in overall reinforcement rate and thus evoked different amounts of responding during the CS. We conclude that reinforcement rate per second controls rats' acquisition of responding during the CS, but at the same time, rats also learn specifically about the probability of reinforcement per trial. The latter learning affects the rats' expectation of reinforcement as an outcome of the trial, which influences their ability to detect retrospectively that an opportunity for reinforcement was missed, and, in turn, drives extinction. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  16. Stochastic abstract policies: generalizing knowledge to improve reinforcement learning.

    PubMed

    Koga, Marcelo L; Freire, Valdinei; Costa, Anna H R

    2015-01-01

    Reinforcement learning (RL) enables an agent to learn behavior by acquiring experience through trial-and-error interactions with a dynamic environment. However, knowledge is usually built from scratch and learning to behave may take a long time. Here, we improve the learning performance by leveraging prior knowledge; that is, the learner shows proper behavior from the beginning of a target task, using the knowledge from a set of known, previously solved, source tasks. In this paper, we argue that building stochastic abstract policies that generalize over past experiences is an effective way to provide such improvement and this generalization outperforms the current practice of using a library of policies. We achieve that contributing with a new algorithm, AbsProb-PI-multiple and a framework for transferring knowledge represented as a stochastic abstract policy in new RL tasks. Stochastic abstract policies offer an effective way to encode knowledge because the abstraction they provide not only generalizes solutions but also facilitates extracting the similarities among tasks. We perform experiments in a robotic navigation environment and analyze the agent's behavior throughout the learning process and also assess the transfer ratio for different amounts of source tasks. We compare our method with the transfer of a library of policies, and experiments show that the use of a generalized policy produces better results by more effectively guiding the agent when learning a target task.

  17. Establishment and Maintenance of Socially Learned Conditioned Reinforcement in Young Children: Elimination of the Role of Adults and View of Peers' Faces

    ERIC Educational Resources Information Center

    Zrinzo, Michelle; Greer, R. Douglas

    2013-01-01

    Prior research has demonstrated the establishment of reinforcers for learning and maintenance with young children as a function of social learning where a peer and an adult experimenter were present. The presence of an adult experimenter was eliminated in the present study to test if the effect produced in the prior studies would occur with only…

  18. Structure identification in fuzzy inference using reinforcement learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap

    1993-01-01

    In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.

  19. Spared internal but impaired external reward prediction error signals in major depressive disorder during reinforcement learning.

    PubMed

    Bakic, Jasmina; Pourtois, Gilles; Jepma, Marieke; Duprat, Romain; De Raedt, Rudi; Baeken, Chris

    2017-01-01

    Major depressive disorder (MDD) creates debilitating effects on a wide range of cognitive functions, including reinforcement learning (RL). In this study, we sought to assess whether reward processing as such, or alternatively the complex interplay between motivation and reward might potentially account for the abnormal reward-based learning in MDD. A total of 35 treatment resistant MDD patients and 44 age matched healthy controls (HCs) performed a standard probabilistic learning task. RL was titrated using behavioral, computational modeling and event-related brain potentials (ERPs) data. MDD patients showed comparable learning rate compared to HCs. However, they showed decreased lose-shift responses as well as blunted subjective evaluations of the reinforcers used during the task, relative to HCs. Moreover, MDD patients showed normal internal (at the level of error-related negativity, ERN) but abnormal external (at the level of feedback-related negativity, FRN) reward prediction error (RPE) signals during RL, selectively when additional efforts had to be made to establish learning. Collectively, these results lend support to the assumption that MDD does not impair reward processing per se during RL. Instead, it seems to alter the processing of the emotional value of (external) reinforcers during RL, when additional intrinsic motivational processes have to be engaged. © 2016 Wiley Periodicals, Inc.

  20. Relationship between Reinforcement and Eye Movements during Ocular Motor Training with Learning Disabled Children.

    ERIC Educational Resources Information Center

    Punnett, Audrey F.; Steinhauer, Gene D.

    1984-01-01

    Four reading disabled children were given eight sessions of ocular motor training with reinforcement and eight sessions without reinforcement. Two reading disabled control Ss were treated similarly but received no ocular motor training. Results demonstrated that reinforcement can improve ocular motor skills, which in turn elevates reading…

  1. Learning the specific quality of taste reinforcement in larval Drosophila.

    PubMed

    Schleyer, Michael; Miura, Daisuke; Tanimura, Teiichi; Gerber, Bertram

    2015-01-27

    The only property of reinforcement insects are commonly thought to learn about is its value. We show that larval Drosophila not only remember the value of reinforcement (How much?), but also its quality (What?). This is demonstrated both within the appetitive domain by using sugar vs amino acid as different reward qualities, and within the aversive domain by using bitter vs high-concentration salt as different qualities of punishment. From the available literature, such nuanced memories for the quality of reinforcement are unexpected and pose a challenge to present models of how insect memory is organized. Given that animals as simple as larval Drosophila, endowed with but 10,000 neurons, operate with both reinforcement value and quality, we suggest that both are fundamental aspects of mnemonic processing-in any brain.

  2. The evolution of continuous learning of the structure of the environment

    PubMed Central

    Kolodny, Oren; Edelman, Shimon; Lotem, Arnon

    2014-01-01

    Continuous, ‘always on’, learning of structure from a stream of data is studied mainly in the fields of machine learning or language acquisition, but its evolutionary roots may go back to the first organisms that were internally motivated to learn and represent their environment. Here, we study under what conditions such continuous learning (CL) may be more adaptive than simple reinforcement learning and examine how it could have evolved from the same basic associative elements. We use agent-based computer simulations to compare three learning strategies: simple reinforcement learning; reinforcement learning with chaining (RL-chain) and CL that applies the same associative mechanisms used by the other strategies, but also seeks statistical regularities in the relations among all items in the environment, regardless of the initial association with food. We show that a sufficiently structured environment favours the evolution of both RL-chain and CL and that CL outperforms the other strategies when food is relatively rare and the time for learning is limited. This advantage of internally motivated CL stems from its ability to capture statistical patterns in the environment even before they are associated with food, at which point they immediately become useful for planning. PMID:24402920

  3. The partial-reinforcement extinction effect and the contingent-sampling hypothesis.

    PubMed

    Hochman, Guy; Erev, Ido

    2013-12-01

    The partial-reinforcement extinction effect (PREE) implies that learning under partial reinforcements is more robust than learning under full reinforcements. While the advantages of partial reinforcements have been well-documented in laboratory studies, field research has failed to support this prediction. In the present study, we aimed to clarify this pattern. Experiment 1 showed that partial reinforcements increase the tendency to select the promoted option during extinction; however, this effect is much smaller than the negative effect of partial reinforcements on the tendency to select the promoted option during the training phase. Experiment 2 demonstrated that the overall effect of partial reinforcements varies inversely with the attractiveness of the alternative to the promoted behavior: The overall effect is negative when the alternative is relatively attractive, and positive when the alternative is relatively unattractive. These results can be captured with a contingent-sampling model assuming that people select options that provided the best payoff in similar past experiences. The best fit was obtained under the assumption that similarity is defined by the sequence of the last four outcomes.

  4. The effects of aging on the interaction between reinforcement learning and attention.

    PubMed

    Radulescu, Angela; Daniel, Reka; Niv, Yael

    2016-11-01

    Reinforcement learning (RL) in complex environments relies on selective attention to uncover those aspects of the environment that are most predictive of reward. Whereas previous work has focused on age-related changes in RL, it is not known whether older adults learn differently from younger adults when selective attention is required. In 2 experiments, we examined how aging affects the interaction between RL and selective attention. Younger and older adults performed a learning task in which only 1 stimulus dimension was relevant to predicting reward, and within it, 1 "target" feature was the most rewarding. Participants had to discover this target feature through trial and error. In Experiment 1, stimuli varied on 1 or 3 dimensions and participants received hints that revealed the target feature, the relevant dimension, or gave no information. Group-related differences in accuracy and RTs differed systematically as a function of the number of dimensions and the type of hint available. In Experiment 2 we used trial-by-trial computational modeling of the learning process to test for age-related differences in learning strategies. Behavior of both young and older adults was explained well by a reinforcement-learning model that uses selective attention to constrain learning. However, the model suggested that older adults restricted their learning to fewer features, employing more focused attention than younger adults. Furthermore, this difference in strategy predicted age-related deficits in accuracy. We discuss these results suggesting that a narrower filter of attention may reflect an adaptation to the reduced capabilities of the reinforcement learning system. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  5. Tiger salamanders' (Ambystoma tigrinum) response learning and usage of visual cues.

    PubMed

    Kundey, Shannon M A; Millar, Roberto; McPherson, Justin; Gonzalez, Maya; Fitz, Aleyna; Allen, Chadbourne

    2016-05-01

    We explored tiger salamanders' (Ambystoma tigrinum) learning to execute a response within a maze as proximal visual cue conditions varied. In Experiment 1, salamanders learned to turn consistently in a T-maze for reinforcement before the maze was rotated. All learned the initial task and executed the trained turn during test, suggesting that they learned to demonstrate the reinforced response during training and continued to perform it during test. In a second experiment utilizing a similar procedure, two visual cues were placed consistently at the maze junction. Salamanders were reinforced for turning towards one cue. Cue placement was reversed during test. All learned the initial task, but executed the trained turn rather than turning towards the visual cue during test, evidencing response learning. In Experiment 3, we investigated whether a compound visual cue could control salamanders' behaviour when it was the only cue predictive of reinforcement in a cross-maze by varying start position and cue placement. All learned to turn in the direction indicated by the compound visual cue, indicating that visual cues can come to control their behaviour. Following training, testing revealed that salamanders attended to stimuli foreground over background features. Overall, these results suggest that salamanders learn to execute responses over learning to use visual cues but can use visual cues if required. Our success with this paradigm offers the potential in future studies to explore salamanders' cognition further, as well as to shed light on how features of the tiger salamanders' life history (e.g. hibernation and metamorphosis) impact cognition.

  6. Intelligent multiagent coordination based on reinforcement hierarchical neuro-fuzzy models.

    PubMed

    Mendoza, Leonardo Forero; Vellasco, Marley; Figueiredo, Karla

    2014-12-01

    This paper presents the research and development of two hybrid neuro-fuzzy models for the hierarchical coordination of multiple intelligent agents. The main objective of the models is to have multiple agents interact intelligently with each other in complex systems. We developed two new models of coordination for intelligent multiagent systems, which integrates the Reinforcement Learning Hierarchical Neuro-Fuzzy model with two proposed coordination mechanisms: the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with a market-driven coordination mechanism (MA-RL-HNFP-MD) and the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with graph coordination (MA-RL-HNFP-CG). In order to evaluate the proposed models and verify the contribution of the proposed coordination mechanisms, two multiagent benchmark applications were developed: the pursuit game and the robot soccer simulation. The results obtained demonstrated that the proposed coordination mechanisms greatly improve the performance of the multiagent system when compared with other strategies.

  7. Instructed knowledge shapes feedback-driven aversive learning in striatum and orbitofrontal cortex, but not the amygdala

    PubMed Central

    Atlas, Lauren Y; Doll, Bradley B; Li, Jian; Daw, Nathaniel D; Phelps, Elizabeth A

    2016-01-01

    Socially-conveyed rules and instructions strongly shape expectations and emotions. Yet most neuroscientific studies of learning consider reinforcement history alone, irrespective of knowledge acquired through other means. We examined fear conditioning and reversal in humans to test whether instructed knowledge modulates the neural mechanisms of feedback-driven learning. One group was informed about contingencies and reversals. A second group learned only from reinforcement. We combined quantitative models with functional magnetic resonance imaging and found that instructions induced dissociations in the neural systems of aversive learning. Responses in striatum and orbitofrontal cortex updated with instructions and correlated with prefrontal responses to instructions. Amygdala responses were influenced by reinforcement similarly in both groups and did not update with instructions. Results extend work on instructed reward learning and reveal novel dissociations that have not been observed with punishments or rewards. Findings support theories of specialized threat-detection and may have implications for fear maintenance in anxiety. DOI: http://dx.doi.org/10.7554/eLife.15192.001 PMID:27171199

  8. Flow Navigation by Smart Microswimmers via Reinforcement Learning

    NASA Astrophysics Data System (ADS)

    Colabrese, Simona; Gustavsson, Kristian; Celani, Antonio; Biferale, Luca

    2017-04-01

    Smart active particles can acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. Their goal is to learn the best way to navigate by exploiting the underlying flow whenever possible. As an example, we focus our attention on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, given the constraints enforced by fluid mechanics. By means of numerical experiments, we show that swimmers indeed learn nearly optimal strategies just by experience. A reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This Letter illustrates the potential of reinforcement learning algorithms to model adaptive behavior in complex flows and paves the way towards the engineering of smart microswimmers that solve difficult navigation problems.

  9. A Policy Representation Using Weighted Multiple Normal Distribution

    NASA Astrophysics Data System (ADS)

    Kimura, Hajime; Aramaki, Takeshi; Kobayashi, Shigenobu

    In this paper, we challenge to solve a reinforcement learning problem for a 5-linked ring robot within a real-time so that the real-robot can stand up to the trial and error. On this robot, incomplete perception problems are caused from noisy sensors and cheap position-control motor systems. This incomplete perception also causes varying optimum actions with the progress of the learning. To cope with this problem, we adopt an actor-critic method, and we propose a new hierarchical policy representation scheme, that consists of discrete action selection on the top level and continuous action selection on the low level of the hierarchy. The proposed hierarchical scheme accelerates learning on continuous action space, and it can pursue the optimum actions varying with the progress of learning on our robotics problem. This paper compares and discusses several learning algorithms through simulations, and demonstrates the proposed method showing application for the real robot.

  10. Analysis of the Pricing Process in Electricity Market using Multi-Agent Model

    NASA Astrophysics Data System (ADS)

    Shimomura, Takahiro; Saisho, Yuichi; Fujii, Yasumasa; Yamaji, Kenji

    Many electric utilities world-wide have been forced to change their ways of doing business, from vertically integrated mechanisms to open market systems. We are facing urgent issues about how we design the structures of power market systems. In order to settle down these issues, many studies have been made with market models of various characteristics and regulations. The goal of modeling analysis is to enrich our understanding of fundamental process that may appear. However, there are many kinds of modeling methods. Each has drawback and advantage about validity and versatility. This paper presents two kinds of methods to construct multi-agent market models. One is based on game theory and another is based on reinforcement learning. By comparing the results of the two methods, they can advance in validity and help us figure out potential problems in electricity markets which have oligopolistic generators, demand fluctuation and inelastic demand. Moreover, this model based on reinforcement learning enables us to consider characteristics peculiar to electricity markets which have plant unit characteristics, seasonable and hourly demand fluctuation, real-time regulation market and operating reserve market. This model figures out importance of the share of peak-load-plants and the way of designing operating reserve market.

  11. A statistical learning strategy for closed-loop control of fluid flows

    NASA Astrophysics Data System (ADS)

    Guéniat, Florimond; Mathelin, Lionel; Hussaini, M. Yousuff

    2016-12-01

    This work discusses a closed-loop control strategy for complex systems utilizing scarce and streaming data. A discrete embedding space is first built using hash functions applied to the sensor measurements from which a Markov process model is derived, approximating the complex system's dynamics. A control strategy is then learned using reinforcement learning once rewards relevant with respect to the control objective are identified. This method is designed for experimental configurations, requiring no computations nor prior knowledge of the system, and enjoys intrinsic robustness. It is illustrated on two systems: the control of the transitions of a Lorenz'63 dynamical system, and the control of the drag of a cylinder flow. The method is shown to perform well.

  12. Kernel-based least squares policy iteration for reinforcement learning.

    PubMed

    Xu, Xin; Hu, Dewen; Lu, Xicheng

    2007-07-01

    In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of uncertain dynamic systems. By using KLSPI, near-optimal control policies can be obtained without much a priori knowledge on dynamic models of control plants. In KLSPI, Mercer kernels are used in the policy evaluation of a policy iteration process, where a new kernel-based least squares temporal-difference algorithm called KLSTD-Q is proposed for efficient policy evaluation. To keep the sparsity and improve the generalization ability of KLSTD-Q solutions, a kernel sparsification procedure based on approximate linear dependency (ALD) is performed. Compared to the previous works on approximate RL methods, KLSPI makes two progresses to eliminate the main difficulties of existing results. One is the better convergence and (near) optimality guarantee by using the KLSTD-Q algorithm for policy evaluation with high precision. The other is the automatic feature selection using the ALD-based kernel sparsification. Therefore, the KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for large-scale Markov decision problems (MDPs). Experimental results on a typical RL task for a stochastic chain problem demonstrate that KLSPI can consistently achieve better learning efficiency and policy quality than the previous least squares policy iteration (LSPI) algorithm. Furthermore, the KLSPI method was also evaluated on two nonlinear feedback control problems, including a ship heading control problem and the swing up control of a double-link underactuated pendulum called acrobot. Simulation results illustrate that the proposed method can optimize controller performance using little a priori information of uncertain dynamic systems. It is also demonstrated that KLSPI can be applied to online learning control by incorporating an initial controller to ensure online performance.

  13. Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints

    NASA Astrophysics Data System (ADS)

    Yang, Xiong; Liu, Derong; Wang, Ding

    2014-03-01

    In this paper, an adaptive reinforcement learning-based solution is developed for the infinite-horizon optimal control problem of constrained-input continuous-time nonlinear systems in the presence of nonlinearities with unknown structures. Two different types of neural networks (NNs) are employed to approximate the Hamilton-Jacobi-Bellman equation. That is, an recurrent NN is constructed to identify the unknown dynamical system, and two feedforward NNs are used as the actor and the critic to approximate the optimal control and the optimal cost, respectively. Based on this framework, the action NN and the critic NN are tuned simultaneously, without the requirement for the knowledge of system drift dynamics. Moreover, by using Lyapunov's direct method, the weights of the action NN and the critic NN are guaranteed to be uniformly ultimately bounded, while keeping the closed-loop system stable. To demonstrate the effectiveness of the present approach, simulation results are illustrated.

  14. Life Span Differences in Electrophysiological Correlates of Monitoring Gains and Losses during Probabilistic Reinforcement Learning

    ERIC Educational Resources Information Center

    Hammerer, Dorothea; Li, Shu-Chen; Muller, Viktor; Lindenberger, Ulman

    2011-01-01

    By recording the feedback-related negativity (FRN) in response to gains and losses, we investigated the contribution of outcome monitoring mechanisms to age-associated differences in probabilistic reinforcement learning. Specifically, we assessed the difference of the monitoring reactions to gains and losses to investigate the monitoring of…

  15. Reinforcement Learning in Young Adults with Developmental Language Impairment

    ERIC Educational Resources Information Center

    Lee, Joanna C.; Tomblin, J. Bruce

    2012-01-01

    The aim of the study was to examine reinforcement learning (RL) in young adults with developmental language impairment (DLI) within the context of a neurocomputational model of the basal ganglia-dopamine system (Frank, Seeberger, & O'Reilly, 2004). Two groups of young adults, one with DLI and the other without, were recruited. A probabilistic…

  16. A neural model of hierarchical reinforcement learning

    PubMed Central

    Rasmussen, Daniel; Eliasmith, Chris

    2017-01-01

    We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain’s general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model’s behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions. PMID:28683111

  17. Reinforcement learning: Solving two case studies

    NASA Astrophysics Data System (ADS)

    Duarte, Ana Filipa; Silva, Pedro; dos Santos, Cristina Peixoto

    2012-09-01

    Reinforcement Learning algorithms offer interesting features for the control of autonomous systems, such as the ability to learn from direct interaction with the environment, and the use of a simple reward signalas opposed to the input-outputs pairsused in classic supervised learning. The reward signal indicates the success of failure of the actions executed by the agent in the environment. In this work, are described RL algorithmsapplied to two case studies: the Crawler robot and the widely known inverted pendulum. We explore RL capabilities to autonomously learn a basic locomotion pattern in the Crawler, andapproach the balancing problem of biped locomotion using the inverted pendulum.

  18. Safe Exploration Algorithms for Reinforcement Learning Controllers.

    PubMed

    Mannucci, Tommaso; van Kampen, Erik-Jan; de Visser, Cornelis; Chu, Qiping

    2018-04-01

    Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.

  19. Learning from Noisy and Delayed Rewards: The Value of Reinforcement Learning to Defense Modeling and Simulation

    DTIC Science & Technology

    2012-09-01

    following 500 trials with 1000 replications with single reward upon attainment of the goal state by algorithm and policy. DQ- C with -greedy obtained...aspects of the civilian population rather than combat forces. These agents rep- resent not a single human, but a population segment. Similar...TD(λ) combines elements of MC and TD methods into a single framework to estimate the value of each state, V(s), through the use of eligibility traces

  20. Reinforcement active learning in the vibrissae system: optimal object localization.

    PubMed

    Gordon, Goren; Dorfman, Nimrod; Ahissar, Ehud

    2013-01-01

    Rats move their whiskers to acquire information about their environment. It has been observed that they palpate novel objects and objects they are required to localize in space. We analyze whisker-based object localization using two complementary paradigms, namely, active learning and intrinsic-reward reinforcement learning. Active learning algorithms select the next training samples according to the hypothesized solution in order to better discriminate between correct and incorrect labels. Intrinsic-reward reinforcement learning uses prediction errors as the reward to an actor-critic design, such that behavior converges to the one that optimizes the learning process. We show that in the context of object localization, the two paradigms result in palpation whisking as their respective optimal solution. These results suggest that rats may employ principles of active learning and/or intrinsic reward in tactile exploration and can guide future research to seek the underlying neuronal mechanisms that implement them. Furthermore, these paradigms are easily transferable to biomimetic whisker-based artificial sensors and can improve the active exploration of their environment. Copyright © 2012 Elsevier Ltd. All rights reserved.

  1. A comparison of differential reinforcement procedures with children with autism.

    PubMed

    Boudreau, Brittany A; Vladescu, Jason C; Kodak, Tiffany M; Argott, Paul J; Kisamore, April N

    2015-12-01

    The current evaluation compared the effects of 2 differential reinforcement arrangements and a nondifferential reinforcement arrangement on the acquisition of tacts for 3 children with autism. Participants learned in all reinforcement-based conditions, and we discuss areas for future research in light of these findings and potential limitations. © Society for the Experimental Analysis of Behavior.

  2. Learning Gains and Response to Digital Lessons on Soil Genesis and Development

    ERIC Educational Resources Information Center

    Mamo, Martha; Ippolito, James A.; Kettler, Timothy A.; Reuter, Ronald; McCallister, Dennis; Morner, Patricia; Husmann, Dann; Blankenship, Erin

    2011-01-01

    Evolving computer technology is offering opportunities for new online approaches in teaching methods and delivery. Well-designed web-based (online) lessons should reinforce the critical need of the soil science discipline in today's food, energy, and environmental issues, as well as meet the needs of the diverse clientele with interest in…

  3. Overcoming Mutism in Adults with Learning Disabilities: A Case Study.

    ERIC Educational Resources Information Center

    Bell, Dorothy M.; Espie, Colin A.

    2003-01-01

    A woman with Down syndrome, who had shown selective mutism for more than 14 years, successfully participated in a program designed to reinforce communication and gradually increase the number of words spoken to one person and then to others. Nonaversive behavior methods were used and response initiative procedures were developed. (Contains…

  4. Caracteristicas de la Instruccion Programada como Tecnica de Ensenanza (Characteristics of Programed Instruction as a Teaching Technique).

    ERIC Educational Resources Information Center

    Dorrego, Maria Elena

    This discussion of programed instruction begins with the fundamental psychological aspects and learning theories behind this teaching method. Negative and positive reinforcement, conditioning, and their relationship to programed instruction are considered. Different types of programs, both linear and branching, are discussed; criticism of the…

  5. A Heuristic Tool for Teaching Business Writing: Self-Assessment, Knowledge Transfer, and Writing Exercises

    ERIC Educational Resources Information Center

    Ortiz, Lorelei A.

    2013-01-01

    To teach effective business communication, instructors must target students’ current weaknesses in writing. One method for doing so is by assigning writing exercises. When used heuristically, writing exercises encourage students to practice self-assessment, self-evaluation, active learning, and knowledge transfer, all while reinforcing the basics…

  6. Project "School:" A Handbook of a Validated Developmental Dropout Prevention Program's Methods and Procedures.

    ERIC Educational Resources Information Center

    Hooban, Louis; Pugsley, Robert

    Project SCHOOL (School Concerned with Helping Others' Objectives and Learning) is a program developed to prevent dropping out through group counseling, parent counseling, and positive reinforcement. Project SCHOOL operates as a dropout prevention program in grades 1-12. Teachers and counselors, using a locally developed check list and other…

  7. Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.

    PubMed

    Schönberg, Tom; Daw, Nathaniel D; Joel, Daphna; O'Doherty, John P

    2007-11-21

    The computational framework of reinforcement learning has been used to forward our understanding of the neural mechanisms underlying reward learning and decision-making behavior. It is known that humans vary widely in their performance in decision-making tasks. Here, we used a simple four-armed bandit task in which subjects are almost evenly split into two groups on the basis of their performance: those who do learn to favor choice of the optimal action and those who do not. Using models of reinforcement learning we sought to determine the neural basis of these intrinsic differences in performance by scanning both groups with functional magnetic resonance imaging. We scanned 29 subjects while they performed the reward-based decision-making task. Our results suggest that these two groups differ markedly in the degree to which reinforcement learning signals in the striatum are engaged during task performance. While the learners showed robust prediction error signals in both the ventral and dorsal striatum during learning, the nonlearner group showed a marked absence of such signals. Moreover, the magnitude of prediction error signals in a region of dorsal striatum correlated significantly with a measure of behavioral performance across all subjects. These findings support a crucial role of prediction error signals, likely originating from dopaminergic midbrain neurons, in enabling learning of action selection preferences on the basis of obtained rewards. Thus, spontaneously observed individual differences in decision making performance demonstrate the suggested dependence of this type of learning on the functional integrity of the dopaminergic striatal system in humans.

  8. Convergence of the standard RLS method and UDUT factorisation of covariance matrix for solving the algebraic Riccati equation of the DLQR via heuristic approximate dynamic programming

    NASA Astrophysics Data System (ADS)

    Moraes Rêgo, Patrícia Helena; Viana da Fonseca Neto, João; Ferreira, Ernesto M.

    2015-08-01

    The main focus of this article is to present a proposal to solve, via UDUT factorisation, the convergence and numerical stability problems that are related to the covariance matrix ill-conditioning of the recursive least squares (RLS) approach for online approximations of the algebraic Riccati equation (ARE) solution associated with the discrete linear quadratic regulator (DLQR) problem formulated in the actor-critic reinforcement learning and approximate dynamic programming context. The parameterisations of the Bellman equation, utility function and dynamic system as well as the algebra of Kronecker product assemble a framework for the solution of the DLQR problem. The condition number and the positivity parameter of the covariance matrix are associated with statistical metrics for evaluating the approximation performance of the ARE solution via RLS-based estimators. The performance of RLS approximators is also evaluated in terms of consistence and polarisation when associated with reinforcement learning methods. The used methodology contemplates realisations of online designs for DLQR controllers that is evaluated in a multivariable dynamic system model.

  9. Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.

    PubMed

    Ezaki, Takahiro; Horita, Yutaka; Takezawa, Masanori; Masuda, Naoki

    2016-07-01

    Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody) conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.

  10. Social stress reactivity alters reward and punishment learning

    PubMed Central

    Frank, Michael J.; Allen, John J. B.

    2011-01-01

    To examine how stress affects cognitive functioning, individual differences in trait vulnerability (punishment sensitivity) and state reactivity (negative affect) to social evaluative threat were examined during concurrent reinforcement learning. Lower trait-level punishment sensitivity predicted better reward learning and poorer punishment learning; the opposite pattern was found in more punishment sensitive individuals. Increasing state-level negative affect was directly related to punishment learning accuracy in highly punishment sensitive individuals, but these measures were inversely related in less sensitive individuals. Combined electrophysiological measurement, performance accuracy and computational estimations of learning parameters suggest that trait and state vulnerability to stress alter cortico-striatal functioning during reinforcement learning, possibly mediated via medio-frontal cortical systems. PMID:20453038

  11. Social stress reactivity alters reward and punishment learning.

    PubMed

    Cavanagh, James F; Frank, Michael J; Allen, John J B

    2011-06-01

    To examine how stress affects cognitive functioning, individual differences in trait vulnerability (punishment sensitivity) and state reactivity (negative affect) to social evaluative threat were examined during concurrent reinforcement learning. Lower trait-level punishment sensitivity predicted better reward learning and poorer punishment learning; the opposite pattern was found in more punishment sensitive individuals. Increasing state-level negative affect was directly related to punishment learning accuracy in highly punishment sensitive individuals, but these measures were inversely related in less sensitive individuals. Combined electrophysiological measurement, performance accuracy and computational estimations of learning parameters suggest that trait and state vulnerability to stress alter cortico-striatal functioning during reinforcement learning, possibly mediated via medio-frontal cortical systems.

  12. Predicting psychosis across diagnostic boundaries: Behavioral and computational modeling evidence for impaired reinforcement learning in schizophrenia and bipolar disorder with a history of psychosis.

    PubMed

    Strauss, Gregory P; Thaler, Nicholas S; Matveeva, Tatyana M; Vogel, Sally J; Sutton, Griffin P; Lee, Bern G; Allen, Daniel N

    2015-08-01

    There is increasing evidence that schizophrenia (SZ) and bipolar disorder (BD) share a number of cognitive, neurobiological, and genetic markers. Shared features may be most prevalent among SZ and BD with a history of psychosis. This study extended this literature by examining reinforcement learning (RL) performance in individuals with SZ (n = 29), BD with a history of psychosis (BD+; n = 24), BD without a history of psychosis (BD-; n = 23), and healthy controls (HC; n = 24). RL was assessed through a probabilistic stimulus selection task with acquisition and test phases. Computational modeling evaluated competing accounts of the data. Each participant's trial-by-trial decision-making behavior was fit to 3 computational models of RL: (a) a standard actor-critic model simulating pure basal ganglia-dependent learning, (b) a pure Q-learning model simulating action selection as a function of learned expected reward value, and (c) a hybrid model where an actor-critic is "augmented" by a Q-learning component, meant to capture the top-down influence of orbitofrontal cortex value representations on the striatum. The SZ group demonstrated greater reinforcement learning impairments at acquisition and test phases than the BD+, BD-, and HC groups. The BD+ and BD- groups displayed comparable performance at acquisition and test phases. Collapsing across diagnostic categories, greater severity of current psychosis was associated with poorer acquisition of the most rewarding stimuli as well as poor go/no-go learning at test. Model fits revealed that reinforcement learning in SZ was best characterized by a pure actor-critic model where learning is driven by prediction error signaling alone. In contrast, BD-, BD+, and HC were best fit by a hybrid model where prediction errors are influenced by top-down expected value representations that guide decision making. These findings suggest that abnormalities in the reward system are more prominent in SZ than BD; however, current psychotic symptoms may be associated with reinforcement learning deficits regardless of a Diagnostic and Statistical Manual of Mental Disorders (5th Edition; American Psychiatric Association, 2013) diagnosis. (c) 2015 APA, all rights reserved).

  13. Motor Learning Enhances Use-Dependent Plasticity

    PubMed Central

    2017-01-01

    Motor behaviors are shaped not only by current sensory signals but also by the history of recent experiences. For instance, repeated movements toward a particular target bias the subsequent movements toward that target direction. This process, called use-dependent plasticity (UDP), is considered a basic and goal-independent way of forming motor memories. Most studies consider movement history as the critical component that leads to UDP (Classen et al., 1998; Verstynen and Sabes, 2011). However, the effects of learning (i.e., improved performance) on UDP during movement repetition have not been investigated. Here, we used transcranial magnetic stimulation in two experiments to assess plasticity changes occurring in the primary motor cortex after individuals repeated reinforced and nonreinforced actions. The first experiment assessed whether learning a skill task modulates UDP. We found that a group that successfully learned the skill task showed greater UDP than a group that did not accumulate learning, but made comparable repeated actions. The second experiment aimed to understand the role of reinforcement learning in UDP while controlling for reward magnitude and action kinematics. We found that providing subjects with a binary reward without visual feedback of the cursor led to increased UDP effects. Subjects in the group that received comparable reward not associated with their actions maintained the previously induced UDP. Our findings illustrate how reinforcing consistent actions strengthens use-dependent memories and provide insight into operant mechanisms that modulate plastic changes in the motor cortex. SIGNIFICANCE STATEMENT Performing consistent motor actions induces use-dependent plastic changes in the motor cortex. This plasticity reflects one of the basic forms of human motor learning. Past studies assumed that this form of learning is exclusively affected by repetition of actions. However, here we showed that success-based reinforcement signals could affect the human use-dependent plasticity (UDP) process. Our results indicate that learning augments and interacts with UDP. This effect is important to the understanding of the interplay between the different forms of motor learning and suggests that reinforcement is not only important to learning new behaviors, but can shape our subsequent behavior via its interaction with UDP. PMID:28143961

  14. Specific effect of a dopamine partial agonist on counterfactual learning: evidence from Gilles de la Tourette syndrome.

    PubMed

    Salvador, Alexandre; Worbe, Yulia; Delorme, Cécile; Coricelli, Giorgio; Gaillard, Raphaël; Robbins, Trevor W; Hartmann, Andreas; Palminteri, Stefano

    2017-07-24

    The dopamine partial agonist aripiprazole is increasingly used to treat pathologies for which other antipsychotics are indicated because it displays fewer side effects, such as sedation and depression-like symptoms, than other dopamine receptor antagonists. Previously, we showed that aripiprazole may protect motivational function by preserving reinforcement-related signals used to sustain reward-maximization. However, the effect of aripiprazole on more cognitive facets of human reinforcement learning, such as learning from the forgone outcomes of alternative courses of action (i.e., counterfactual learning), is unknown. To test the influence of aripiprazole on counterfactual learning, we administered a reinforcement learning task that involves both direct learning from obtained outcomes and indirect learning from forgone outcomes to two groups of Gilles de la Tourette (GTS) patients, one consisting of patients who were completely unmedicated and the other consisting of patients who were receiving aripiprazole monotherapy, and to healthy subjects. We found that whereas learning performance improved in the presence of counterfactual feedback in both healthy controls and unmedicated GTS patients, this was not the case in aripiprazole-medicated GTS patients. Our results suggest that whereas aripiprazole preserves direct learning of action-outcome associations, it may impair more complex inferential processes, such as counterfactual learning from forgone outcomes, in GTS patients treated with this medication.

  15. Somatosensory Contribution to the Initial Stages of Human Motor Learning

    PubMed Central

    Bernardi, Nicolò F.; Darainy, Mohammad

    2015-01-01

    The early stages of motor skill acquisition are often marked by uncertainty about the sensory and motor goals of the task, as is the case in learning to speak or learning the feel of a good tennis serve. Here we present an experimental model of this early learning process, in which targets are acquired by exploration and reinforcement rather than sensory error. We use this model to investigate the relative contribution of motor and sensory factors to human motor learning. Participants make active reaching movements or matched passive movements to an unseen target using a robot arm. We find that learning through passive movements paired with reinforcement is comparable with learning associated with active movement, both in terms of magnitude and durability, with improvements due to training still observable at a 1 week retest. Motor learning is also accompanied by changes in somatosensory perceptual acuity. No stable changes in motor performance are observed for participants that train, actively or passively, in the absence of reinforcement, or for participants who are given explicit information about target position in the absence of somatosensory experience. These findings indicate that the somatosensory system dominates learning in the early stages of motor skill acquisition. SIGNIFICANCE STATEMENT The research focuses on the initial stages of human motor learning, introducing a new experimental model that closely approximates the key features of motor learning outside of the laboratory. The finding indicates that it is the somatosensory system rather than the motor system that dominates learning in the early stages of motor skill acquisition. This is important given that most of our computational models of motor learning are based on the idea that learning is motoric in origin. This is also a valuable finding for rehabilitation of patients with limited mobility as it shows that reinforcement in conjunction with passive movement results in benefits to motor learning that are as great as those observed for active movement training. PMID:26490869

  16. Adaptive optimal training of animal behavior

    NASA Astrophysics Data System (ADS)

    Bak, Ji Hyun; Choi, Jung Yoon; Akrami, Athena; Witten, Ilana; Pillow, Jonathan

    Neuroscience experiments often require training animals to perform tasks designed to elicit various sensory, cognitive, and motor behaviors. Training typically involves a series of gradual adjustments of stimulus conditions and rewards in order to bring about learning. However, training protocols are usually hand-designed, and often require weeks or months to achieve a desired level of task performance. Here we combine ideas from reinforcement learning and adaptive optimal experimental design to formulate methods for efficient training of animal behavior. Our work addresses two intriguing problems at once: first, it seeks to infer the learning rules underlying an animal's behavioral changes during training; second, it seeks to exploit these rules to select stimuli that will maximize the rate of learning toward a desired objective. We develop and test these methods using data collected from rats during training on a two-interval sensory discrimination task. We show that we can accurately infer the parameters of a learning algorithm that describes how the animal's internal model of the task evolves over the course of training. We also demonstrate by simulation that our method can provide a substantial speedup over standard training methods.

  17. Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning.

    PubMed

    Sidarta, Ananda; Vahdat, Shahabeddin; Bernardi, Nicolò F; Ostry, David J

    2016-11-16

    As one learns to dance or play tennis, the desired somatosensory state is typically unknown. Trial and error is important as motor behavior is shaped by successful and unsuccessful movements. As an experimental model, we designed a task in which human participants make reaching movements to a hidden target and receive positive reinforcement when successful. We identified somatic and reinforcement-based sources of plasticity on the basis of changes in functional connectivity using resting-state fMRI before and after learning. The neuroimaging data revealed reinforcement-related changes in both motor and somatosensory brain areas in which a strengthening of connectivity was related to the amount of positive reinforcement during learning. Areas of prefrontal cortex were similarly altered in relation to reinforcement, with connectivity between sensorimotor areas of putamen and the reward-related ventromedial prefrontal cortex strengthened in relation to the amount of successful feedback received. In other analyses, we assessed connectivity related to changes in movement direction between trials, a type of variability that presumably reflects exploratory strategies during learning. We found that connectivity in a network linking motor and somatosensory cortices increased with trial-to-trial changes in direction. Connectivity varied as well with the change in movement direction following incorrect movements. Here the changes were observed in a somatic memory and decision making network involving ventrolateral prefrontal cortex and second somatosensory cortex. Our results point to the idea that the initial stages of motor learning are not wholly motor but rather involve plasticity in somatic and prefrontal networks related both to reward and exploration. In the initial stages of motor learning, the placement of the limbs is learned primarily through trial and error. In an experimental analog, participants make reaching movements to a hidden target and receive positive feedback when successful. We identified sources of plasticity based on changes in functional connectivity using resting-state fMRI. The main finding is that there is a strengthening of connectivity between reward-related prefrontal areas and sensorimotor areas in the basal ganglia and frontal cortex. There is also a strengthening of connectivity related to movement exploration in sensorimotor circuits involved in somatic memory and decision making. The results indicate that initial stages of motor learning depend on plasticity in somatic and prefrontal networks related to reward and exploration. Copyright © 2016 the authors 0270-6474/16/3611682-11$15.00/0.

  18. Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning

    PubMed Central

    Sidarta, Ananda; Vahdat, Shahabeddin; Bernardi, Nicolò F.

    2016-01-01

    As one learns to dance or play tennis, the desired somatosensory state is typically unknown. Trial and error is important as motor behavior is shaped by successful and unsuccessful movements. As an experimental model, we designed a task in which human participants make reaching movements to a hidden target and receive positive reinforcement when successful. We identified somatic and reinforcement-based sources of plasticity on the basis of changes in functional connectivity using resting-state fMRI before and after learning. The neuroimaging data revealed reinforcement-related changes in both motor and somatosensory brain areas in which a strengthening of connectivity was related to the amount of positive reinforcement during learning. Areas of prefrontal cortex were similarly altered in relation to reinforcement, with connectivity between sensorimotor areas of putamen and the reward-related ventromedial prefrontal cortex strengthened in relation to the amount of successful feedback received. In other analyses, we assessed connectivity related to changes in movement direction between trials, a type of variability that presumably reflects exploratory strategies during learning. We found that connectivity in a network linking motor and somatosensory cortices increased with trial-to-trial changes in direction. Connectivity varied as well with the change in movement direction following incorrect movements. Here the changes were observed in a somatic memory and decision making network involving ventrolateral prefrontal cortex and second somatosensory cortex. Our results point to the idea that the initial stages of motor learning are not wholly motor but rather involve plasticity in somatic and prefrontal networks related both to reward and exploration. SIGNIFICANCE STATEMENT In the initial stages of motor learning, the placement of the limbs is learned primarily through trial and error. In an experimental analog, participants make reaching movements to a hidden target and receive positive feedback when successful. We identified sources of plasticity based on changes in functional connectivity using resting-state fMRI. The main finding is that there is a strengthening of connectivity between reward-related prefrontal areas and sensorimotor areas in the basal ganglia and frontal cortex. There is also a strengthening of connectivity related to movement exploration in sensorimotor circuits involved in somatic memory and decision making. The results indicate that initial stages of motor learning depend on plasticity in somatic and prefrontal networks related to reward and exploration. PMID:27852776

  19. Pavlovian to instrumental transfer of control in a human learning task.

    PubMed

    Nadler, Natasha; Delgado, Mauricio R; Delamater, Andrew R

    2011-10-01

    Pavlovian learning tasks have been widely used as tools to understand basic cognitive and emotional processes in humans. The present studies investigated one particular task, Pavlovian-to-instrumental transfer (PIT), with human participants in an effort to examine potential cognitive and emotional effects of Pavlovian cues upon instrumentally trained performance. In two experiments, subjects first learned two separate instrumental response-outcome relationships (i.e., R1-O1 and R2-O2) and then were exposed to various stimulus-outcome relationships (i.e., S1-O1, S2-O2, S3-O3, and S4-) before the effects of the Pavlovian stimuli on instrumental responding were assessed during a non-reinforced test. In Experiment 1, instrumental responding was established using a positive-reinforcement procedure, whereas in Experiment 2, a quasi-avoidance learning task was used. In both cases, the Pavlovian stimuli exerted selective control over instrumental responding, whereby S1 and S2 selectively elevated the instrumental response with which it shared an outcome. In addition, in Experiment 2, S3 exerted a nonselective transfer of control effect, whereby both responses were elevated over baseline levels. These data identify two ways, one specific and one general, in which Pavlovian processes can exert control over instrumental responding in human learning paradigms, suggesting that this method may serve as a useful tool in the study of basic cognitive and emotional processes in human learning.

  20. What is the optimal task difficulty for reinforcement learning of brain self-regulation?

    PubMed

    Bauer, Robert; Vukelić, Mathias; Gharabaghi, Alireza

    2016-09-01

    The balance between action and reward during neurofeedback may influence reinforcement learning of brain self-regulation. Eleven healthy volunteers participated in three runs of motor imagery-based brain-machine interface feedback where a robot passively opened the hand contingent to β-band modulation. For each run, the β-desynchronization threshold to initiate the hand robot movement increased in difficulty (low, moderate, and demanding). In this context, the incentive to learn was estimated by the change of reward per action, operationalized as the change in reward duration per movement onset. Variance analysis revealed a significant interaction between threshold difficulty and the relationship between reward duration and number of movement onsets (p<0.001), indicating a negative learning incentive for low difficulty, but a positive learning incentive for moderate and demanding runs. Exploration of different thresholds in the same data set indicated that the learning incentive peaked at higher thresholds than the threshold which resulted in maximum classification accuracy. Specificity is more important than sensitivity of neurofeedback for reinforcement learning of brain self-regulation. Learning efficiency requires adequate challenge by neurofeedback interventions. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  1. The Computational Development of Reinforcement Learning during Adolescence

    PubMed Central

    Palminteri, Stefano; Coricelli, Giorgio; Blakemore, Sarah-Jayne

    2016-01-01

    Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents’ behaviour was better explained by a basic reinforcement learning algorithm, adults’ behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence. PMID:27322574

  2. Robust reinforcement learning.

    PubMed

    Morimoto, Jun; Doya, Kenji

    2005-02-01

    This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both offline learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H(infinity) control, we consider a differential game in which a "disturbing" agent tries to make the worst possible disturbance while a "control" agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H(infinity) control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.

  3. Coexistence of Reward and Unsupervised Learning During the Operant Conditioning of Neural Firing Rates

    PubMed Central

    Kerr, Robert R.; Grayden, David B.; Thomas, Doreen A.; Gilson, Matthieu; Burkitt, Anthony N.

    2014-01-01

    A fundamental goal of neuroscience is to understand how cognitive processes, such as operant conditioning, are performed by the brain. Typical and well studied examples of operant conditioning, in which the firing rates of individual cortical neurons in monkeys are increased using rewards, provide an opportunity for insight into this. Studies of reward-modulated spike-timing-dependent plasticity (RSTDP), and of other models such as R-max, have reproduced this learning behavior, but they have assumed that no unsupervised learning is present (i.e., no learning occurs without, or independent of, rewards). We show that these models cannot elicit firing rate reinforcement while exhibiting both reward learning and ongoing, stable unsupervised learning. To fix this issue, we propose a new RSTDP model of synaptic plasticity based upon the observed effects that dopamine has on long-term potentiation and depression (LTP and LTD). We show, both analytically and through simulations, that our new model can exhibit unsupervised learning and lead to firing rate reinforcement. This requires that the strengthening of LTP by the reward signal is greater than the strengthening of LTD and that the reinforced neuron exhibits irregular firing. We show the robustness of our findings to spike-timing correlations, to the synaptic weight dependence that is assumed, and to changes in the mean reward. We also consider our model in the differential reinforcement of two nearby neurons. Our model aligns more strongly with experimental studies than previous models and makes testable predictions for future experiments. PMID:24475240

  4. Oculomotor learning revisited: a model of reinforcement learning in the basal ganglia incorporating an efference copy of motor actions

    PubMed Central

    Fee, Michale S.

    2012-01-01

    In its simplest formulation, reinforcement learning is based on the idea that if an action taken in a particular context is followed by a favorable outcome, then, in the same context, the tendency to produce that action should be strengthened, or reinforced. While reinforcement learning forms the basis of many current theories of basal ganglia (BG) function, these models do not incorporate distinct computational roles for signals that convey context, and those that convey what action an animal takes. Recent experiments in the songbird suggest that vocal-related BG circuitry receives two functionally distinct excitatory inputs. One input is from a cortical region that carries context information about the current “time” in the motor sequence. The other is an efference copy of motor commands from a separate cortical brain region that generates vocal variability during learning. Based on these findings, I propose here a general model of vertebrate BG function that combines context information with a distinct motor efference copy signal. The signals are integrated by a learning rule in which efference copy inputs gate the potentiation of context inputs (but not efference copy inputs) onto medium spiny neurons in response to a rewarded action. The hypothesis is described in terms of a circuit that implements the learning of visually guided saccades. The model makes testable predictions about the anatomical and functional properties of hypothesized context and efference copy inputs to the striatum from both thalamic and cortical sources. PMID:22754501

  5. Oculomotor learning revisited: a model of reinforcement learning in the basal ganglia incorporating an efference copy of motor actions.

    PubMed

    Fee, Michale S

    2012-01-01

    In its simplest formulation, reinforcement learning is based on the idea that if an action taken in a particular context is followed by a favorable outcome, then, in the same context, the tendency to produce that action should be strengthened, or reinforced. While reinforcement learning forms the basis of many current theories of basal ganglia (BG) function, these models do not incorporate distinct computational roles for signals that convey context, and those that convey what action an animal takes. Recent experiments in the songbird suggest that vocal-related BG circuitry receives two functionally distinct excitatory inputs. One input is from a cortical region that carries context information about the current "time" in the motor sequence. The other is an efference copy of motor commands from a separate cortical brain region that generates vocal variability during learning. Based on these findings, I propose here a general model of vertebrate BG function that combines context information with a distinct motor efference copy signal. The signals are integrated by a learning rule in which efference copy inputs gate the potentiation of context inputs (but not efference copy inputs) onto medium spiny neurons in response to a rewarded action. The hypothesis is described in terms of a circuit that implements the learning of visually guided saccades. The model makes testable predictions about the anatomical and functional properties of hypothesized context and efference copy inputs to the striatum from both thalamic and cortical sources.

  6. Histidine-decarboxylase knockout mice show deficient nonreinforced episodic object memory, improved negatively reinforced water-maze performance, and increased neo- and ventro-striatal dopamine turnover.

    PubMed

    Dere, Ekrem; De Souza-Silva, Maria A; Topic, Bianca; Spieler, Richard E; Haas, Helmut L; Huston, Joseph P

    2003-01-01

    The brain's histaminergic system has been implicated in hippocampal synaptic plasticity, learning, and memory, as well as brain reward and reinforcement. Our past pharmacological and lesion studies indicated that the brain's histamine system exerts inhibitory effects on the brain's reinforcement respective reward system reciprocal to mesolimbic dopamine systems, thereby modulating learning and memory performance. Given the close functional relationship between brain reinforcement and memory processes, the total disruption of brain histamine synthesis via genetic disruption of its synthesizing enzyme, histidine decarboxylase (HDC), in the mouse might have differential effects on learning dependent on the task-inherent reinforcement contingencies. Here, we investigated the effects of an HDC gene disruption in the mouse in a nonreinforced object exploration task and a negatively reinforced water-maze task as well as on neo- and ventro-striatal dopamine systems known to be involved in brain reward and reinforcement. Histidine decarboxylase knockout (HDC-KO) mice had higher dihydrophenylacetic acid concentrations and a higher dihydrophenylacetic acid/dopamine ratio in the neostriatum. In the ventral striatum, dihydrophenylacetic acid/dopamine and 3-methoxytyramine/dopamine ratios were higher in HDC-KO mice. Furthermore, the HDC-KO mice showed improved water-maze performance during both hidden and cued platform tasks, but deficient object discrimination based on temporal relationships. Our data imply that disruption of brain histamine synthesis can have both memory promoting and suppressive effects via distinct and independent mechanisms and further indicate that these opposed effects are related to the task-inherent reinforcement contingencies.

  7. Dorsal Striatal-Midbrain Connectivity in Humans Predicts How Reinforcements Are Used to Guide Decisions

    ERIC Educational Resources Information Center

    Kahnt, Thorsten; Park, Soyoung Q.; Cohen, Michael X.; Beck, Anne; Heinz, Andreas; Wrase, Jana

    2009-01-01

    It has been suggested that the target areas of dopaminergic midbrain neurons, the dorsal (DS) and ventral striatum (VS), are differently involved in reinforcement learning especially as actor and critic. Whereas the critic learns to predict rewards, the actor maintains action values to guide future decisions. The different midbrain connections to…

  8. Autonomous Inter-Task Transfer in Reinforcement Learning Domains

    DTIC Science & Technology

    2008-08-01

    Twentieth International Joint Conference on Artificial Intelli - gence, 2007. 304 Fumihide Tanaka and Masayuki Yamamura. Multitask reinforcement learning...Functions . . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . 18 2.2.4 Instance-based...tures [Laird et al., 1986, Choi et al., 2007]. However, TL for RL tasks has only recently been gaining attention in the artificial intelligence

  9. A look at Behaviourism and Perceptual Control Theory in Interface Design

    DTIC Science & Technology

    1998-02-01

    behaviours such as response variability, instinctive drift, autoshaping , etc. Perceptual Control Theory (PCT) postulates that behaviours result from the...internal variables. Behaviourism, on the other hand, can not account for variability in responses, instinctive drift, autoshaping , etc. Researchers... Autoshaping . Animals appear to learn without reinforcement. However, conditioning theory speculates that learning results only when reinforcement

  10. BEHAVIORAL MECHANISMS UNDERLYING NICOTINE REINFORCEMENT

    PubMed Central

    Rupprecht, Laura E.; Smith, Tracy T.; Schassburger, Rachel L.; Buffalari, Deanne M.; Sved, Alan F.; Donny, Eric C.

    2015-01-01

    Cigarette smoking is the leading cause of preventable deaths worldwide and nicotine, the primary psychoactive constituent in tobacco, drives sustained use. The behavioral actions of nicotine are complex and extend well beyond the actions of the drug as a primary reinforcer. Stimuli that are consistently paired with nicotine can, through associative learning, take on reinforcing properties as conditioned stimuli. These conditioned stimuli can then impact the rate and probability of behavior and even function as conditioning reinforcers that maintain behavior in the absence of nicotine. Nicotine can also act as a conditioned stimulus, predicting the delivery of other reinforcers, which may allow nicotine to acquire value as a conditioned reinforcer. These associative effects, establishing non-nicotine stimuli as conditioned stimuli with discriminative stimulus and conditioned reinforcing properties as well as establishing nicotine as a conditioned stimulus, are predicted by basic conditioning principles. However, nicotine can also act non-associatively. Nicotine directly enhances the reinforcing efficacy of other reinforcing stimuli in the environment, an effect that does not require a temporal or predictive relationship between nicotine and either the stimulus or the behavior. Hence, the reinforcing actions of nicotine stem both from the primary reinforcing actions of the drug (and the subsequent associative learning effects) as well as the reinforcement enhancement action of nicotine which is non-associative in nature. Gaining a better understanding of how nicotine impacts behavior will allow for maximally effective tobacco control efforts aimed at reducing the harm associated with tobacco use by reducing and/or treating its addictiveness. PMID:25638333

  11. Goal-Directed and Habit-Like Modulations of Stimulus Processing during Reinforcement Learning.

    PubMed

    Luque, David; Beesley, Tom; Morris, Richard W; Jack, Bradley N; Griffiths, Oren; Whitford, Thomas J; Le Pelley, Mike E

    2017-03-15

    Recent research has shown that perceptual processing of stimuli previously associated with high-value rewards is automatically prioritized even when rewards are no longer available. It has been hypothesized that such reward-related modulation of stimulus salience is conceptually similar to an "attentional habit." Recording event-related potentials in humans during a reinforcement learning task, we show strong evidence in favor of this hypothesis. Resistance to outcome devaluation (the defining feature of a habit) was shown by the stimulus-locked P1 component, reflecting activity in the extrastriate visual cortex. Analysis at longer latencies revealed a positive component (corresponding to the P3b, from 550-700 ms) sensitive to outcome devaluation. Therefore, distinct spatiotemporal patterns of brain activity were observed corresponding to habitual and goal-directed processes. These results demonstrate that reinforcement learning engages both attentional habits and goal-directed processes in parallel. Consequences for brain and computational models of reinforcement learning are discussed. SIGNIFICANCE STATEMENT The human attentional network adapts to detect stimuli that predict important rewards. A recent hypothesis suggests that the visual cortex automatically prioritizes reward-related stimuli, driven by cached representations of reward value; that is, stimulus-response habits. Alternatively, the neural system may track the current value of the predicted outcome. Our results demonstrate for the first time that visual cortex activity is increased for reward-related stimuli even when the rewarding event is temporarily devalued. In contrast, longer-latency brain activity was specifically sensitive to transient changes in reward value. Therefore, we show that both habit-like attention and goal-directed processes occur in the same learning episode at different latencies. This result has important consequences for computational models of reinforcement learning. Copyright © 2017 the authors 0270-6474/17/373009-09$15.00/0.

  12. Feature Reinforcement Learning: Part I. Unstructured MDPs

    NASA Astrophysics Data System (ADS)

    Hutter, Marcus

    2009-12-01

    General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part II (Hutter, 2009c). The role of POMDPs is also considered there.

  13. The role of within-compound associations in learning about absent cues.

    PubMed

    Witnauer, James E; Miller, Ralph R

    2011-05-01

    When two cues are reinforced together (in compound), most associative models assume that animals learn an associative network that includes direct cue-outcome associations and a within-compound association. All models of associative learning subscribe to the importance of cue-outcome associations, but most models assume that within-compound associations are irrelevant to each cue's subsequent behavioral control. In the present article, we present an extension of Van Hamme and Wasserman's (Learning and Motivation 25:127-151, 1994) model of retrospective revaluation based on learning about absent cues that are retrieved through within-compound associations. The model was compared with a model lacking retrieval through within-compound associations. Simulations showed that within-compound associations are necessary for the model to explain higher-order retrospective revaluation and the observed greater retrospective revaluation after partial reinforcement than after continuous reinforcement alone. These simulations suggest that the associability of an absent stimulus is determined by the extent to which the stimulus is activated through the within-compound association.

  14. The role of within-compound associations in learning about absent cues

    PubMed Central

    Witnauer, James E.

    2011-01-01

    When two cues are reinforced together (in compound), most associative models assume that animals learn an associative network that includes direct cue–outcome associations and a within-compound association. All models of associative learning subscribe to the importance of cue–outcome associations, but most models assume that within-compound associations are irrelevant to each cue's subsequent behavioral control. In the present article, we present an extension of Van Hamme and Wasserman's (Learning and Motivation 25:127–151, 1994) model of retrospective revaluation based on learning about absent cues that are retrieved through within-compound associations. The model was compared with a model lacking retrieval through within-compound associations. Simulations showed that within-compound associations are necessary for the model to explain higher-order retrospective revaluation and the observed greater retrospective revaluation after partial reinforcement than after continuous reinforcement alone. These simulations suggest that the associability of an absent stimulus is determined by the extent to which the stimulus is activated through the within-compound association. PMID:21264569

  15. Pleasurable music affects reinforcement learning according to the listener

    PubMed Central

    Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira

    2013-01-01

    Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875

  16. Network Supervision of Adult Experience and Learning Dependent Sensory Cortical Plasticity.

    PubMed

    Blake, David T

    2017-06-18

    The brain is capable of remodeling throughout life. The sensory cortices provide a useful preparation for studying neuroplasticity both during development and thereafter. In adulthood, sensory cortices change in the cortical area activated by behaviorally relevant stimuli, by the strength of response within that activated area, and by the temporal profiles of those responses. Evidence supports forms of unsupervised, reinforcement, and fully supervised network learning rules. Studies on experience-dependent plasticity have mostly not controlled for learning, and they find support for unsupervised learning mechanisms. Changes occur with greatest ease in neurons containing α-CamKII, which are pyramidal neurons in layers II/III and layers V/VI. These changes use synaptic mechanisms including long term depression. Synaptic strengthening at NMDA-containing synapses does occur, but its weak association with activity suggests other factors also initiate changes. Studies that control learning find support of reinforcement learning rules and limited evidence of other forms of supervised learning. Behaviorally associating a stimulus with reinforcement leads to a strengthening of cortical response strength and enlarging of response area with poor selectivity. Associating a stimulus with omission of reinforcement leads to a selective weakening of responses. In some preparations in which these associations are not as clearly made, neurons with the most informative discharges are relatively stronger after training. Studies analyzing the temporal profile of responses associated with omission of reward, or of plasticity in studies with different discriminanda but statistically matched stimuli, support the existence of limited supervised network learning. © 2017 American Physiological Society. Compr Physiol 7:977-1008, 2017. Copyright © 2017 John Wiley & Sons, Inc.

  17. Feedback from the heart: Emotional learning and memory is controlled by cardiac cycle, interoceptive accuracy and personality.

    PubMed

    Pfeifer, Gaby; Garfinkel, Sarah N; Gould van Praag, Cassandra D; Sahota, Kuljit; Betka, Sophie; Critchley, Hugo D

    2017-05-01

    Feedback processing is critical to trial-and-error learning. Here, we examined whether interoceptive signals concerning the state of cardiovascular arousal influence the processing of reinforcing feedback during the learning of 'emotional' face-name pairs, with subsequent effects on retrieval. Participants (N=29) engaged in a learning task of face-name pairs (fearful, neutral, happy faces). Correct and incorrect learning decisions were reinforced by auditory feedback, which was delivered either at cardiac systole (on the heartbeat, when baroreceptors signal the contraction of the heart to the brain), or at diastole (between heartbeats during baroreceptor quiescence). We discovered a cardiac influence on feedback processing that enhanced the learning of fearful faces in people with heightened interoceptive ability. Individuals with enhanced accuracy on a heartbeat counting task learned fearful face-name pairs better when feedback was given at systole than at diastole. This effect was not present for neutral and happy faces. At retrieval, we also observed related effects of personality: First, individuals scoring higher for extraversion showed poorer retrieval accuracy. These individuals additionally manifested lower resting heart rate and lower state anxiety, suggesting that attenuated levels of cardiovascular arousal in extraverts underlies poorer performance. Second, higher extraversion scores predicted higher emotional intensity ratings of fearful faces reinforced at systole. Third, individuals scoring higher for neuroticism showed higher retrieval confidence for fearful faces reinforced at diastole. Our results show that cardiac signals shape feedback processing to influence learning of fearful faces, an effect underpinned by personality differences linked to psychophysiological arousal. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. A cholinergic feedback circuit to regulate striatal population uncertainty and optimize reinforcement learning.

    PubMed

    Franklin, Nicholas T; Frank, Michael J

    2015-12-25

    Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments.

  19. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning.

    PubMed

    Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane

    2017-01-01

    Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.

  20. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    PubMed Central

    Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane

    2017-01-01

    Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning. PMID:29163004

  1. Application of fuzzy logic-neural network based reinforcement learning to proximity and docking operations: Translational controller results

    NASA Technical Reports Server (NTRS)

    Jani, Yashvant

    1992-01-01

    The reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Maximum Mission (SMM) satellite simulation. In utilizing these fuzzy learning techniques, we also use the Approximate Reasoning based Intelligent Control (ARIC) architecture, and so we use two terms interchangeable to imply the same. This activity is carried out in the Software Technology Laboratory utilizing the Orbital Operations Simulator (OOS). This report is the deliverable D3 in our project activity and provides the test results of the fuzzy learning translational controller. This report is organized in six sections. Based on our experience and analysis with the attitude controller, we have modified the basic configuration of the reinforcement learning algorithm in ARIC as described in section 2. The shuttle translational controller and its implementation in fuzzy learning architecture is described in section 3. Two test cases that we have performed are described in section 4. Our results and conclusions are discussed in section 5, and section 6 provides future plans and summary for the project.

  2. Blended learning for reinforcing dental pharmacology in the clinical years: A qualitative analysis.

    PubMed

    Eachempati, Prashanti; Kiran Kumar, K S; Sumanth, K N

    2016-10-01

    Blended learning has become the method of choice in educational institutions because of its systematic integration of traditional classroom teaching and online components. This study aims to analyze student's reflection regarding blended learning in dental pharmacology. A cross-sectional study was conducted in Faculty of Dentistry, Melaka-Manipal Medical College among 3 rd and 4 th year BDS students. A total of 145 dental students, who consented, participate in the study. Students were divided into 14 groups. Nine online sessions followed by nine face-to-face discussions were held. Each session addressed topics related to oral lesions and orofacial pain with pharmacological applications. After each week, students were asked to reflect on blended learning. On completion of 9 weeks, reflections were collected and analyzed. Qualitative analysis was done using thematic analysis model suggested by Braun and Clarke. The four main themes were identified, namely, merits of blended learning, skill in writing prescription for oral diseases, dosages of drugs, and identification of strengths and weakness. In general, the participants had a positive feedback regarding blended learning. Students felt more confident in drug selection and prescription writing. They could recollect the doses better after the online and face-to-face sessions. Most interestingly, the students reflected that they are able to identify their strength and weakness after the blended learning sessions. Blended learning module was successfully implemented for reinforcing dental pharmacology. The results obtained in this study enable us to plan future comparative studies to know the effectiveness of blended learning in dental pharmacology.

  3. Learning the specific quality of taste reinforcement in larval Drosophila

    PubMed Central

    Schleyer, Michael; Miura, Daisuke; Tanimura, Teiichi; Gerber, Bertram

    2015-01-01

    The only property of reinforcement insects are commonly thought to learn about is its value. We show that larval Drosophila not only remember the value of reinforcement (How much?), but also its quality (What?). This is demonstrated both within the appetitive domain by using sugar vs amino acid as different reward qualities, and within the aversive domain by using bitter vs high-concentration salt as different qualities of punishment. From the available literature, such nuanced memories for the quality of reinforcement are unexpected and pose a challenge to present models of how insect memory is organized. Given that animals as simple as larval Drosophila, endowed with but 10,000 neurons, operate with both reinforcement value and quality, we suggest that both are fundamental aspects of mnemonic processing—in any brain. DOI: http://dx.doi.org/10.7554/eLife.04711.001 PMID:25622533

  4. Evidence for a neural law of effect.

    PubMed

    Athalye, Vivek R; Santos, Fernando J; Carmena, Jose M; Costa, Rui M

    2018-03-02

    Thorndike's law of effect states that actions that lead to reinforcements tend to be repeated more often. Accordingly, neural activity patterns leading to reinforcement are also reentered more frequently. Reinforcement relies on dopaminergic activity in the ventral tegmental area (VTA), and animals shape their behavior to receive dopaminergic stimulation. Seeking evidence for a neural law of effect, we found that mice learn to reenter more frequently motor cortical activity patterns that trigger optogenetic VTA self-stimulation. Learning was accompanied by gradual shaping of these patterns, with participating neurons progressively increasing and aligning their covariance to that of the target pattern. Motor cortex patterns that lead to phasic dopaminergic VTA activity are progressively reinforced and shaped, suggesting a mechanism by which animals select and shape actions to reliably achieve reinforcement. Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

  5. The "proactive" model of learning: Integrative framework for model-free and model-based reinforcement learning utilizing the associative learning-based proactive brain concept.

    PubMed

    Zsuga, Judit; Biro, Klara; Papp, Csaba; Tajti, Gabor; Gesztelyi, Rudolf

    2016-02-01

    Reinforcement learning (RL) is a powerful concept underlying forms of associative learning governed by the use of a scalar reward signal, with learning taking place if expectations are violated. RL may be assessed using model-based and model-free approaches. Model-based reinforcement learning involves the amygdala, the hippocampus, and the orbitofrontal cortex (OFC). The model-free system involves the pedunculopontine-tegmental nucleus (PPTgN), the ventral tegmental area (VTA) and the ventral striatum (VS). Based on the functional connectivity of VS, model-free and model based RL systems center on the VS that by integrating model-free signals (received as reward prediction error) and model-based reward related input computes value. Using the concept of reinforcement learning agent we propose that the VS serves as the value function component of the RL agent. Regarding the model utilized for model-based computations we turned to the proactive brain concept, which offers an ubiquitous function for the default network based on its great functional overlap with contextual associative areas. Hence, by means of the default network the brain continuously organizes its environment into context frames enabling the formulation of analogy-based association that are turned into predictions of what to expect. The OFC integrates reward-related information into context frames upon computing reward expectation by compiling stimulus-reward and context-reward information offered by the amygdala and hippocampus, respectively. Furthermore we suggest that the integration of model-based expectations regarding reward into the value signal is further supported by the efferent of the OFC that reach structures canonical for model-free learning (e.g., the PPTgN, VTA, and VS). (c) 2016 APA, all rights reserved).

  6. Incorporating Dispositional Traits into the Treatment of Anorexia Nervosa

    PubMed Central

    Herzog, David; Moskovich, Ashley; Merwin, Rhonda; Lin, Tammy

    2014-01-01

    We provide a general framework to guide the development of interventions that aim to address persistent features in eating disorders that may preclude effective treatment. Using perfectionism as an exemplar, we draw from research in cognitive neuroscience regarding attention and reinforcement learning, from learning theory and social psychology regarding vicarious learning and implications for the role modeling of significant others, and from clinical psychology on the importance of verbal narratives as barriers that may influence expectations and shape reinforcement schedules. PMID:21243482

  7. Hybrid learning in signalling games

    NASA Astrophysics Data System (ADS)

    Barrett, Jeffrey A.; Cochran, Calvin T.; Huttegger, Simon; Fujiwara, Naoki

    2017-09-01

    Lewis-Skyrms signalling games have been studied under a variety of low-rationality learning dynamics. Reinforcement dynamics are stable but slow and prone to evolving suboptimal signalling conventions. A low-inertia trial-and-error dynamical like win-stay/lose-randomise is fast and reliable at finding perfect signalling conventions but unstable in the context of noise or agent error. Here we consider a low-rationality hybrid of reinforcement and win-stay/lose-randomise learning that exhibits the virtues of both. This hybrid dynamics is reliable, stable and exceptionally fast.

  8. Discrete Serotonin Systems Mediate Memory Enhancement and Escape Latencies after Unpredicted Aversive Experience in Drosophila Place Memory

    PubMed Central

    Sitaraman, Divya; Kramer, Elizabeth F.; Kahsai, Lily; Ostrowski, Daniela; Zars, Troy

    2017-01-01

    Feedback mechanisms in operant learning are critical for animals to increase reward or reduce punishment. However, not all conditions have a behavior that can readily resolve an event. Animals must then try out different behaviors to better their situation through outcome learning. This form of learning allows for novel solutions and with positive experience can lead to unexpected behavioral routines. Learned helplessness, as a type of outcome learning, manifests in part as increases in escape latency in the face of repeated unpredicted shocks. Little is known about the mechanisms of outcome learning. When fruit fly Drosophila melanogaster are exposed to unpredicted high temperatures in a place learning paradigm, flies both increase escape latencies and have a higher memory when given control of a place/temperature contingency. Here we describe discrete serotonin neuronal circuits that mediate aversive reinforcement, escape latencies, and memory levels after place learning in the presence and absence of unexpected aversive events. The results show that two features of learned helplessness depend on the same modulatory system as aversive reinforcement. Moreover, changes in aversive reinforcement and escape latency depend on local neural circuit modulation, while memory enhancement requires larger modulation of multiple behavioral control circuits. PMID:29321732

  9. Historia Oral, Experiencias de Aprendizagem e Enraizamento Sociocultural--Um Projeto em Curso (Oral History, Learning Experiences, and Sociocultural Setting--A Project in Process).

    ERIC Educational Resources Information Center

    Vidigal, Luis

    1995-01-01

    Examines education and childhood in Portugal. Uses oral history methods in an educational context, exploring oral statements pedagogically. Considers these statements especially suitable to maintaining aspects of collective memory and social identity, reinforcing students' national and regional identities. Suggests this is very important in…

  10. Homework in the 21st Century: The Antiquated and Ineffectual Implementation of a Time Honored Educational Strategy

    ERIC Educational Resources Information Center

    Simplicio, Joseph S. C.

    2005-01-01

    Homework had been a mainstay teacher strategy since education first began. When used properly, study after study show that homework, from the elementary through the university level, is an effective method for reinforcing educational learning goals. Studies that include original research, surveys, interviews, and literature reviews, conducted by…

  11. Competent or Not?: Exploring Adaptions to the Neo-Behaviorist Paradigm in a Sport Marketing Course

    ERIC Educational Resources Information Center

    Tyler, B. David; Cruz, Laura E.

    2016-01-01

    Educators and administrators are exploring competency-based education as an effective and efficient method to facilitate student learning. This reinforces a burgeoning neo-behaviorist movement in higher education which seeks to synthesize such behaviorist approaches with the cognitive focus of the last 20 years. The current research examines the…

  12. Tonic or Phasic Stimulation of Dopaminergic Projections to Prefrontal Cortex Causes Mice to Maintain or Deviate from Previously Learned Behavioral Strategies

    PubMed Central

    Ellwood, Ian T.; Patel, Tosha; Wadia, Varun; Lee, Anthony T.; Liptak, Alayna T.

    2017-01-01

    Dopamine neurons in the ventral tegmental area (VTA) encode reward prediction errors and can drive reinforcement learning through their projections to striatum, but much less is known about their projections to prefrontal cortex (PFC). Here, we studied these projections and observed phasic VTA–PFC fiber photometry signals after the delivery of rewards. Next, we studied how optogenetic stimulation of these projections affects behavior using conditioned place preference and a task in which mice learn associations between cues and food rewards and then use those associations to make choices. Neither phasic nor tonic stimulation of dopaminergic VTA–PFC projections elicited place preference. Furthermore, substituting phasic VTA–PFC stimulation for food rewards was not sufficient to reinforce new cue–reward associations nor maintain previously learned ones. However, the same patterns of stimulation that failed to reinforce place preference or cue–reward associations were able to modify behavior in other ways. First, continuous tonic stimulation maintained previously learned cue–reward associations even after they ceased being valid. Second, delivering phasic stimulation either continuously or after choices not previously associated with reward induced mice to make choices that deviated from previously learned associations. In summary, despite the fact that dopaminergic VTA–PFC projections exhibit phasic increases in activity that are time locked to the delivery of rewards, phasic activation of these projections does not necessarily reinforce specific actions. Rather, dopaminergic VTA–PFC activity can control whether mice maintain or deviate from previously learned cue–reward associations. SIGNIFICANCE STATEMENT Dopaminergic inputs from ventral tegmental area (VTA) to striatum encode reward prediction errors and reinforce specific actions; however, it is currently unknown whether dopaminergic inputs to prefrontal cortex (PFC) play similar or distinct roles. Here, we used bulk Ca2+ imaging to show that unexpected rewards or reward-predicting cues elicit phasic increases in the activity of dopaminergic VTA–PFC fibers. However, in multiple behavioral paradigms, we failed to observe reinforcing effects after stimulation of these fibers. In these same experiments, we did find that tonic or phasic patterns of stimulation caused mice to maintain or deviate from previously learned cue–reward associations, respectively. Therefore, although they may exhibit similar patterns of activity, dopaminergic inputs to striatum and PFC can elicit divergent behavioral effects. PMID:28739583

  13. The Effect of a Token Reinforcement Program on the Reading Comprehension of a Learning Disabled Student.

    ERIC Educational Resources Information Center

    Galbreath, Joy; Feldman, David

    The relationship of reading comprehension accuracy and a contingently administered token reinforcement program used with an elementary level learning disabled student in the classroom was examined. The S earned points for each correct answer made after oral reading sessions. At the conclusion of the class he could exchange his points for rewards.…

  14. Cocaine addiction as a homeostatic reinforcement learning disorder.

    PubMed

    Keramati, Mehdi; Durand, Audrey; Girardeau, Paul; Gutkin, Boris; Ahmed, Serge H

    2017-03-01

    Drug addiction implicates both reward learning and homeostatic regulation mechanisms of the brain. This has stimulated 2 partially successful theoretical perspectives on addiction. Many important aspects of addiction, however, remain to be explained within a single, unified framework that integrates the 2 mechanisms. Building upon a recently developed homeostatic reinforcement learning theory, the authors focus on a key transition stage of addiction that is well modeled in animals, escalation of drug use, and propose a computational theory of cocaine addiction where cocaine reinforces behavior due to its rapid homeostatic corrective effect, whereas its chronic use induces slow and long-lasting changes in homeostatic setpoint. Simulations show that our new theory accounts for key behavioral and neurobiological features of addiction, most notably, escalation of cocaine use, drug-primed craving and relapse, individual differences underlying dose-response curves, and dopamine D2-receptor downregulation in addicts. The theory also generates unique predictions about cocaine self-administration behavior in rats that are confirmed by new experimental results. Viewing addiction as a homeostatic reinforcement learning disorder coherently explains many behavioral and neurobiological aspects of the transition to cocaine addiction, and suggests a new perspective toward understanding addiction. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  15. Challenges in the Verification of Reinforcement Learning Algorithms

    NASA Technical Reports Server (NTRS)

    Van Wesel, Perry; Goodloe, Alwyn E.

    2017-01-01

    Machine learning (ML) is increasingly being applied to a wide array of domains from search engines to autonomous vehicles. These algorithms, however, are notoriously complex and hard to verify. This work looks at the assumptions underlying machine learning algorithms as well as some of the challenges in trying to verify ML algorithms. Furthermore, we focus on the specific challenges of verifying reinforcement learning algorithms. These are highlighted using a specific example. Ultimately, we do not offer a solution to the complex problem of ML verification, but point out possible approaches for verification and interesting research opportunities.

  16. Investigation of a Reinforcement-Based Toilet Training Procedure for Children with Autism.

    ERIC Educational Resources Information Center

    Cicero, Frank R.; Pfadt, Al

    2002-01-01

    This study evaluated the effectiveness of a reinforcement-based toilet training intervention with three children with autism. Procedures included positive reinforcement, graduated guidance, scheduled practice trials, and forward prompting. All three children reduced urination accidents to zero and learned to request bathroom use spontaneously…

  17. Sex Differences in Reinforcement and Punishment on Prime-Time Television.

    ERIC Educational Resources Information Center

    Downs, A. Chris; Gowan, Darryl C.

    1980-01-01

    Television programs were analyzed for frequencies of positive reinforcement and punishment exchanged among performers varying in age and sex. Females were found to more often exhibit and receive reinforcement, whereas males more often exhibited and received punishment. These findings have implications for children's learning of positive and…

  18. Strengths and weaknesses of Problem Based Learning from the professional perspective of registered nurses 1

    PubMed Central

    Cónsul-Giribet, María; Medina-Moya, José Luis

    2014-01-01

    OBJECTIVE: to identify competency strengths and weaknesses as perceived by nursing professionals who graduated with a integrated curriculum and competency-based through Problem Based Learning in small groups. METHOD: an intrinsic case study method was used, which analyzes this innovation through former students (from the first class) with three years of professional experience. The data were collected through a questionnaire and discussion groups. RESULTS: the results show that their competency level is valued in a very satisfactory manner. This level paradoxically contrasts with the lack of theoretical knowledge they perceived at the end of their education, when they started working in clinical practice. CONCLUSIONS: the teaching strategy was key to motivate an in-depth study and arouse the desire to know. In addition, Problem Based Learning favors and reinforces the decision to learn, which is that necessary in the course of professional life. PMID:25493666

  19. Fuzzy Sarsa with Focussed Replacing Eligibility Traces for Robust and Accurate Control

    NASA Astrophysics Data System (ADS)

    Kamdem, Sylvain; Ohki, Hidehiro; Sueda, Naomichi

    Several methods of reinforcement learning in continuous state and action spaces that utilize fuzzy logic have been proposed in recent years. This paper introduces Fuzzy Sarsa(λ), an on-policy algorithm for fuzzy learning that relies on a novel way of computing replacing eligibility traces to accelerate the policy evaluation. It is tested against several temporal difference learning algorithms: Sarsa(λ), Fuzzy Q(λ), an earlier fuzzy version of Sarsa and an actor-critic algorithm. We perform detailed evaluations on two benchmark problems : a maze domain and the cart pole. Results of various tests highlight the strengths and weaknesses of these algorithms and show that Fuzzy Sarsa(λ) outperforms all other algorithms tested for a larger granularity of design and under noisy conditions. It is a highly competitive method of learning in realistic noisy domains where a denser fuzzy design over the state space is needed for a more precise control.

  20. The touchscreen operant platform for testing learning and memory in rats and mice

    PubMed Central

    Horner, Alexa E.; Heath, Christopher J.; Hvoslef-Eide, Martha; Kent, Brianne A.; Kim, Chi Hun; Nilsson, Simon R. O.; Alsiö, Johan; Oomen, Charlotte A.; Holmes, Andrew; Saksida, Lisa M.; Bussey, Timothy J.

    2014-01-01

    Summary An increasingly popular method of assessing cognitive functions in rodents is the automated touchscreen platform, on which a number of different cognitive tests can be run in a manner very similar to touchscreen methods currently used to test human subjects. This methodology is low stress (using appetitive, rather than aversive reinforcement), has high translational potential, and lends itself to a high degree of standardisation and throughput. Applications include the study of cognition in rodent models of psychiatric and neurodegenerative diseases (e.g., Alzheimer’s disease, schizophrenia, Huntington’s disease, frontotemporal dementia), and characterisation of the role of select brain regions, neurotransmitter systems and genes in rodents. This protocol describes how to perform four touchscreen assays of learning and memory: Visual Discrimination, Object-Location Paired-Associates Learning, Visuomotor Conditional Learning and Autoshaping. It is accompanied by two further protocols using the touchscreen platform to assess executive function, working memory and pattern separation. PMID:24051959

  1. Separation of Time-Based and Trial-Based Accounts of the Partial Reinforcement Extinction Effect

    PubMed Central

    Bouton, Mark E.; Woods, Amanda M.; Todd, Travis P.

    2013-01-01

    Two appetitive conditioning experiments with rats examined time-based and trial-based accounts of the partial reinforcement extinction effect (PREE). In the PREE, the loss of responding that occurs in extinction is slower when the conditioned stimulus (CS) has been paired with a reinforcer on some of its presentations (partially reinforced) instead of every presentation (continuously reinforced). According to a time-based or “time-accumulation” view (e.g., Gallistel & Gibbon, 2000), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger amount of time has accumulated in the CS over trials. In contrast, according to a trial-based view (e.g., Capaldi, 1967), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger number of CS presentations. Experiment 1 used a procedure that equated partially- and continuously-reinforced groups on their expected times to reinforcement during conditioning. A PREE was still observed. Experiment 2 then used an extinction procedure that allowed time in the CS and the number of trials to accumulate differentially through extinction. The PREE was still evident when responding was examined as a function of expected time units to the reinforcer, but was eliminated when responding was examined as a function of expected trial units to the reinforcer. There was no evidence that the animal responded according to the ratio of time accumulated during the CS in extinction over the time in the CS expected before the reinforcer. The results thus favor a trial-based account over a time-based account of extinction and the PREE. PMID:23962669

  2. Novel reinforcement learning paradigm based on response patterning under interval schedules of reinforcement.

    PubMed

    Schifani, Christin; Sukhanov, Ilya; Dorofeikova, Mariia; Bespalov, Anton

    2017-07-28

    There is a need to develop cognitive tasks that address valid neuropsychological constructs implicated in disease mechanisms and can be used in animals and humans to guide novel drug discovery. Present experiments aimed to characterize a novel reinforcement learning task based on a classical operant behavioral phenomenon observed in multiple species - differences in response patterning under variable (VI) vs fixed interval (FI) schedules of reinforcement. Wistar rats were trained to press a lever for food under VI30s and later weekly test sessions were introduced with reinforcement schedule switched to FI30s. During the FI30s test session, post-reinforcement pauses (PRPs) gradually grew towards the end of the session reaching 22-43% of the initial values. Animals could be retrained under VI30s conditions, and FI30s test sessions were repeated over a period of several months without appreciable signs of a practice effect. Administration of the non-competitive N-methyl-d-aspartate (NMDA) receptor antagonist MK-801 ((5S,10R)-(+)-5-Methyl-10,11-dihydro-5H-dibenzo[a,d]cyclohepten-5,10-imine maleate) prior to FI30s sessions prevented adjustment of PRPs associated with the change from VI to FI schedule. This effect was most pronounced at the highest tested dose of MK-801 and appeared to be independent of the effects of this dose on response rates. These results provide initial evidence for the possibility to use different response patterning under VI and FI schedules with equivalent reinforcement density for studying effects of drug treatment on reinforcement learning. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. [Efficacy of the keyword mnemonic method in adults].

    PubMed

    Campos, Alfredo; Pérez-Fabello, María José; Camino, Estefanía

    2010-11-01

    Two experiments were used to assess the efficacy of the keyword mnemonic method in adults. In Experiment 1, immediate and delayed recall (at a one-day interval) were assessed by comparing the results obtained by a group of adults using the keyword mnemonic method in contrast to a group using the repetition method. The mean age of the sample under study was 59.35 years. Subjects were required to learn a list of 16 words translated from Latin into Spanish. Participants who used keyword mnemonics that had been devised by other experimental participants of the same characteristics, obtained significantly higher immediate and delayed recall scores than participants in the repetition method. In Experiment 2, other participants had to learn a list of 24 Latin words translated into Spanish by using the keyword mnemonic method reinforced with pictures. Immediate and delayed recall were significantly greater in the keyword mnemonic method group than in the repetition method group.

  4. Preliminary Work for Examining the Scalability of Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Clouse, Jeff

    1998-01-01

    Researchers began studying automated agents that learn to perform multiple-step tasks early in the history of artificial intelligence (Samuel, 1963; Samuel, 1967; Waterman, 1970; Fikes, Hart & Nilsonn, 1972). Multiple-step tasks are tasks that can only be solved via a sequence of decisions, such as control problems, robotics problems, classic problem-solving, and game-playing. The objective of agents attempting to learn such tasks is to use the resources they have available in order to become more proficient at the tasks. In particular, each agent attempts to develop a good policy, a mapping from states to actions, that allows it to select actions that optimize a measure of its performance on the task; for example, reducing the number of steps necessary to complete the task successfully. Our study focuses on reinforcement learning, a set of learning techniques where the learner performs trial-and-error experiments in the task and adapts its policy based on the outcome of those experiments. Much of the work in reinforcement learning has focused on a particular, simple representation, where every problem state is represented explicitly in a table, and associated with each state are the actions that can be chosen in that state. A major advantage of this table lookup representation is that one can prove that certain reinforcement learning techniques will develop an optimal policy for the current task. The drawback is that the representation limits the application of reinforcement learning to multiple-step tasks with relatively small state-spaces. There has been a little theoretical work that proves that convergence to optimal solutions can be obtained when using generalization structures, but the structures are quite simple. The theory says little about complex structures, such as multi-layer, feedforward artificial neural networks (Rumelhart & McClelland, 1986), but empirical results indicate that the use of reinforcement learning with such structures is promising. These empirical results make no theoretical claims, nor compare the policies produced to optimal policies. A goal of our work is to be able to make the comparison between an optimal policy and one stored in an artificial neural network. A difficulty of performing such a study is finding a multiple-step task that is small enough that one can find an optimal policy using table lookup, yet large enough that, for practical purposes, an artificial neural network is really required. We have identified a limited form of the game OTHELLO as satisfying these requirements. The work we report here is in the very preliminary stages of research, but this paper provides background for the problem being studied and a description of our initial approach to examining the problem. In the remainder of this paper, we first describe reinforcement learning in more detail. Next, we present the game OTHELLO. Finally we argue that a restricted form of the game meets the requirements of our study, and describe our preliminary approach to finding an optimal solution to the problem.

  5. Brain Research: Implications for Learning.

    ERIC Educational Resources Information Center

    Soares, Louise M.; Soares, Anthony T.

    Brain research has illuminated several areas of the learning process: (1) learning as association; (2) learning as reinforcement; (3) learning as perception; (4) learning as imitation; (5) learning as organization; (6) learning as individual style; and (7) learning as brain activity. The classic conditioning model developed by Pavlov advanced…

  6. A computational psychiatry approach identifies how alpha-2A noradrenergic agonist Guanfacine affects feature-based reinforcement learning in the macaque

    PubMed Central

    Hassani, S. A.; Oemisch, M.; Balcarras, M.; Westendorff, S.; Ardid, S.; van der Meer, M. A.; Tiesinga, P.; Womelsdorf, T.

    2017-01-01

    Noradrenaline is believed to support cognitive flexibility through the alpha 2A noradrenergic receptor (a2A-NAR) acting in prefrontal cortex. Enhanced flexibility has been inferred from improved working memory with the a2A-NA agonist Guanfacine. But it has been unclear whether Guanfacine improves specific attention and learning mechanisms beyond working memory, and whether the drug effects can be formalized computationally to allow single subject predictions. We tested and confirmed these suggestions in a case study with a healthy nonhuman primate performing a feature-based reversal learning task evaluating performance using Bayesian and Reinforcement learning models. In an initial dose-testing phase we found a Guanfacine dose that increased performance accuracy, decreased distractibility and improved learning. In a second experimental phase using only that dose we examined the faster feature-based reversal learning with Guanfacine with single-subject computational modeling. Parameter estimation suggested that improved learning is not accounted for by varying a single reinforcement learning mechanism, but by changing the set of parameter values to higher learning rates and stronger suppression of non-chosen over chosen feature information. These findings provide an important starting point for developing nonhuman primate models to discern the synaptic mechanisms of attention and learning functions within the context of a computational neuropsychiatry framework. PMID:28091572

  7. Electrophysiological correlates of observational learning in children.

    PubMed

    Rodriguez Buritica, Julia M; Eppinger, Ben; Schuck, Nicolas W; Heekeren, Hauke R; Li, Shu-Chen

    2016-09-01

    Observational learning is an important mechanism for cognitive and social development. However, the neurophysiological mechanisms underlying observational learning in children are not well understood. In this study, we used a probabilistic reward-based observational learning paradigm to compare behavioral and electrophysiological markers of individual and observational reinforcement learning in 8- to 10-year-old children. Specifically, we manipulated the amount of observable information as well as children's similarity in age to the observed person (same-aged child vs. adult) to examine the effects of similarity in age on the integration of observed information in children. We show that the feedback-related negativity (FRN) during individual reinforcement learning reflects the valence of outcomes of own actions. Furthermore, we found that the feedback-related negativity during observational reinforcement learning (oFRN) showed a similar distinction between outcome valences of observed actions. This suggests that the oFRN can serve as a measure of observational learning in middle childhood. Moreover, during observational learning children profited from the additional social information and imitated the choices of their own peers more than those of adults, indicating that children have a tendency to conform more with similar others (e.g. their own peers) compared to dissimilar others (adults). Taken together, our results show that children can benefit from integrating observable information and that oFRN may serve as a measure of observational learning in children. © 2015 John Wiley & Sons Ltd.

  8. Mechanisms and time course of vocal learning and consolidation in the adult songbird.

    PubMed

    Warren, Timothy L; Tumer, Evren C; Charlesworth, Jonathan D; Brainard, Michael S

    2011-10-01

    In songbirds, the basal ganglia outflow nucleus LMAN is a cortical analog that is required for several forms of song plasticity and learning. Moreover, in adults, inactivating LMAN can reverse the initial expression of learning driven via aversive reinforcement. In the present study, we investigated how LMAN contributes to both reinforcement-driven learning and a self-driven recovery process in adult Bengalese finches. We first drove changes in the fundamental frequency of targeted song syllables and compared the effects of inactivating LMAN with the effects of interfering with N-methyl-d-aspartate (NMDA) receptor-dependent transmission from LMAN to one of its principal targets, the song premotor nucleus RA. Inactivating LMAN and blocking NMDA receptors in RA caused indistinguishable reversions in the expression of learning, indicating that LMAN contributes to learning through NMDA receptor-mediated glutamatergic transmission to RA. We next assessed how LMAN's role evolves over time by maintaining learned changes to song while periodically inactivating LMAN. The expression of learning consolidated to become LMAN independent over multiple days, indicating that this form of consolidation is not completed over one night, as previously suggested, and instead may occur gradually during singing. Subsequent cessation of reinforcement was followed by a gradual self-driven recovery of original song structure, indicating that consolidation does not correspond with the lasting retention of changes to song. Finally, for self-driven recovery, as for reinforcement-driven learning, LMAN was required for the expression of initial, but not later, changes to song. Our results indicate that NMDA receptor-dependent transmission from LMAN to RA plays an essential role in the initial expression of two distinct forms of vocal learning and that this role gradually wanes over a multiday process of consolidation. The results support an emerging view that cortical-basal ganglia circuits can direct the initial expression of learning via top-down influences on primary motor circuitry.

  9. Mechanisms and time course of vocal learning and consolidation in the adult songbird

    PubMed Central

    Tumer, Evren C.; Charlesworth, Jonathan D.; Brainard, Michael S.

    2011-01-01

    In songbirds, the basal ganglia outflow nucleus LMAN is a cortical analog that is required for several forms of song plasticity and learning. Moreover, in adults, inactivating LMAN can reverse the initial expression of learning driven via aversive reinforcement. In the present study, we investigated how LMAN contributes to both reinforcement-driven learning and a self-driven recovery process in adult Bengalese finches. We first drove changes in the fundamental frequency of targeted song syllables and compared the effects of inactivating LMAN with the effects of interfering with N-methyl-d-aspartate (NMDA) receptor-dependent transmission from LMAN to one of its principal targets, the song premotor nucleus RA. Inactivating LMAN and blocking NMDA receptors in RA caused indistinguishable reversions in the expression of learning, indicating that LMAN contributes to learning through NMDA receptor-mediated glutamatergic transmission to RA. We next assessed how LMAN's role evolves over time by maintaining learned changes to song while periodically inactivating LMAN. The expression of learning consolidated to become LMAN independent over multiple days, indicating that this form of consolidation is not completed over one night, as previously suggested, and instead may occur gradually during singing. Subsequent cessation of reinforcement was followed by a gradual self-driven recovery of original song structure, indicating that consolidation does not correspond with the lasting retention of changes to song. Finally, for self-driven recovery, as for reinforcement-driven learning, LMAN was required for the expression of initial, but not later, changes to song. Our results indicate that NMDA receptor-dependent transmission from LMAN to RA plays an essential role in the initial expression of two distinct forms of vocal learning and that this role gradually wanes over a multiday process of consolidation. The results support an emerging view that cortical-basal ganglia circuits can direct the initial expression of learning via top-down influences on primary motor circuitry. PMID:21734110

  10. Neural Control of a Tracking Task via Attention-Gated Reinforcement Learning for Brain-Machine Interfaces.

    PubMed

    Wang, Yiwen; Wang, Fang; Xu, Kai; Zhang, Qiaosheng; Zhang, Shaomin; Zheng, Xiaoxiang

    2015-05-01

    Reinforcement learning (RL)-based brain machine interfaces (BMIs) enable the user to learn from the environment through interactions to complete the task without desired signals, which is promising for clinical applications. Previous studies exploited Q-learning techniques to discriminate neural states into simple directional actions providing the trial initial timing. However, the movements in BMI applications can be quite complicated, and the action timing explicitly shows the intention when to move. The rich actions and the corresponding neural states form a large state-action space, imposing generalization difficulty on Q-learning. In this paper, we propose to adopt attention-gated reinforcement learning (AGREL) as a new learning scheme for BMIs to adaptively decode high-dimensional neural activities into seven distinct movements (directional moves, holdings and resting) due to the efficient weight-updating. We apply AGREL on neural data recorded from M1 of a monkey to directly predict a seven-action set in a time sequence to reconstruct the trajectory of a center-out task. Compared to Q-learning techniques, AGREL could improve the target acquisition rate to 90.16% in average with faster convergence and more stability to follow neural activity over multiple days, indicating the potential to achieve better online decoding performance for more complicated BMI tasks.

  11. The Effects of Short Interval Delay of Reinforcement Upon Human Discrimination Learning. IMRID Papers and Reports Vol. 4 No. 12.

    ERIC Educational Resources Information Center

    Kral, Paul A.; And Others

    Investigates the effect of delay of reinforcement upon human discrimination learning with particular emphasis on the form of the gradient within the first few seconds of delay. In previous studies subjects are usually required to make an instrumental response to a stimulus, this is followed by the delay interval, and finally, the reinforcement…

  12. Clipping in neurocontrol by adaptive dynamic programming.

    PubMed

    Fairbank, Michael; Prokhorov, Danil; Alonso, Eduardo

    2014-10-01

    In adaptive dynamic programming, neurocontrol, and reinforcement learning, the objective is for an agent to learn to choose actions so as to minimize a total cost function. In this paper, we show that when discretized time is used to model the motion of the agent, it can be very important to do clipping on the motion of the agent in the final time step of the trajectory. By clipping, we mean that the final time step of the trajectory is to be truncated such that the agent stops exactly at the first terminal state reached, and no distance further. We demonstrate that when clipping is omitted, learning performance can fail to reach the optimum, and when clipping is done properly, learning performance can improve significantly. The clipping problem we describe affects algorithms that use explicit derivatives of the model functions of the environment to calculate a learning gradient. These include backpropagation through time for control and methods based on dual heuristic programming. However, the clipping problem does not significantly affect methods based on heuristic dynamic programming, temporal differences learning, or policy-gradient learning algorithms.

  13. Learning Theory and the Typewriter Teacher

    ERIC Educational Resources Information Center

    Wakin, B. Bertha

    1974-01-01

    Eight basic principles of learning are described and discussed in terms of practical learning strategies for typewriting. Described are goal setting, preassessment, active participation, individual differences, reinforcement, practice, transfer of learning, and evaluation. (SC)

  14. Disrupted Reinforcement Learning and Maladaptive Behavior in Women with a History of Childhood Sexual Abuse: A High-Density Event-Related Potential Study

    PubMed Central

    Pechtel, Pia; Pizzagalli, Diego A.

    2013-01-01

    Context Childhood sexual abuse (CSA) has been associated with psychopathology, particularly major depressive disorder (MDD), and high-risk behaviors. Despite grave epidemiological data, the mechanisms underlying these maladaptive outcomes remain poorly understood. Objective We examined whether CSA history, particularly in conjunction with past MDD, is associated with behavioral and neural dysfunction in reinforcement learning, and whether such dysfunction is linked to maladaptive behavior. Design Participants completed a clinical evaluation and a probabilistic reinforcement task while 128-channel event-related potentials were recorded. Setting Academic setting; participants recruited from the community. Participants Fifteen remitted depressed females with CSA history (CSA+rMDD), 16 remitted depressed females without CSA history (rMDD), and 18 healthy females. Main Outcome Measures Participants’ preference for choosing the most rewarded stimulus and avoiding the most punished stimulus was evaluated. The feedback-related negativity (FRN) and error-related negativity (ERN)–hypothesized to reflect activation in the anterior cingulate cortex–were used as electrophysiological indices of reinforcement learning. Results No group differences emerged in the acquisition of reinforcement contingencies. In trials requiring to rely partially or exclusively on previously rewarded information, the CSA+rMDD group showed (1) lower accuracy (relative to both controls and rMDD), (2) blunted electrophysiological differentiation between correct and incorrect responses (relative to controls), and (3) increased activation in the subgenual anterior cingulate cortex (relative to rMDD). CSA history was not associated with impairments in avoiding the most punished stimulus. Self-harm and suicidal behaviors correlated with poorer performance of previously rewarded–but not previously punished–trials. Conclusions Irrespective of past MDD, women with CSA histories showed neural and behavioral deficits in utilizing previous reinforcement to optimize decision-making in the absence of feedback (blunted “Go learning”). While the current study provides initial evidence for reward-specific deficits associated with CSA, future research is warranted to determine if disrupted positive reinforcement learning predicts high-risk behavior following CSA. PMID:23487253

  15. Reinforcement Learning Deficits in People with Schizophrenia Persist after Extended Trials

    PubMed Central

    Cicero, David C.; Martin, Elizabeth A.; Becker, Theresa M.; Kerns, John G.

    2014-01-01

    Previous research suggests that people with schizophrenia have difficulty learning from positive feedback and when learning needs to occur rapidly. However, they seem to have relatively intact learning from negative feedback when learning occurs gradually. Participants are typically given a limited amount of acquisition trials to learn the reward contingencies and then tested about what they learned. The current study examined whether participants with schizophrenia continue to display these deficits when given extra time to learn the contingences. Participants with schizophrenia and matched healthy controls completed the Probabilistic Selection Task, which measures positive and negative feedback learning separately. Participants with schizophrenia showed a deficit in learning from both positive and negative feedback. These reward learning deficits persisted even if people with schizophrenia are given extra time (up to 10 blocks of 60 trials) to learn the reward contingencies. These results suggest that the observed deficits cannot be attributed solely to slower learning and instead reflect a specific deficit in reinforcement learning. PMID:25172610

  16. Reinforcement learning deficits in people with schizophrenia persist after extended trials.

    PubMed

    Cicero, David C; Martin, Elizabeth A; Becker, Theresa M; Kerns, John G

    2014-12-30

    Previous research suggests that people with schizophrenia have difficulty learning from positive feedback and when learning needs to occur rapidly. However, they seem to have relatively intact learning from negative feedback when learning occurs gradually. Participants are typically given a limited amount of acquisition trials to learn the reward contingencies and then tested about what they learned. The current study examined whether participants with schizophrenia continue to display these deficits when given extra time to learn the contingences. Participants with schizophrenia and matched healthy controls completed the Probabilistic Selection Task, which measures positive and negative feedback learning separately. Participants with schizophrenia showed a deficit in learning from both positive feedback and negative feedback. These reward learning deficits persisted even if people with schizophrenia are given extra time (up to 10 blocks of 60 trials) to learn the reward contingencies. These results suggest that the observed deficits cannot be attributed solely to slower learning and instead reflect a specific deficit in reinforcement learning. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  17. Reinforcement learning of periodical gaits in locomotion robots

    NASA Astrophysics Data System (ADS)

    Svinin, Mikhail; Yamada, Kazuyaki; Ushio, S.; Ueda, Kanji

    1999-08-01

    Emergence of stable gaits in locomotion robots is studied in this paper. A classifier system, implementing an instance- based reinforcement learning scheme, is used for sensory- motor control of an eight-legged mobile robot. Important feature of the classifier system is its ability to work with the continuous sensor space. The robot does not have a prior knowledge of the environment, its own internal model, and the goal coordinates. It is only assumed that the robot can acquire stable gaits by learning how to reach a light source. During the learning process the control system, is self-organized by reinforcement signals. Reaching the light source defines a global reward. Forward motion gets a local reward, while stepping back and falling down get a local punishment. Feasibility of the proposed self-organized system is tested under simulation and experiment. The control actions are specified at the leg level. It is shown that, as learning progresses, the number of the action rules in the classifier systems is stabilized to a certain level, corresponding to the acquired gait patterns.

  18. Biases in probabilistic category learning in relation to social anxiety

    PubMed Central

    Abraham, Anna; Hermann, Christiane

    2015-01-01

    Instrumental learning paradigms are rarely employed to investigate the mechanisms underlying acquired fear responses in social anxiety. Here, we adapted a probabilistic category learning paradigm to assess information processing biases as a function of the degree of social anxiety traits in a sample of healthy individuals without a diagnosis of social phobia. Participants were presented with three pairs of neutral faces with differing probabilistic accuracy contingencies (A/B: 80/20, C/D: 70/30, E/F: 60/40). Upon making their choice, negative and positive feedback was conveyed using angry and happy faces, respectively. The highly socially anxious group showed a strong tendency to be more accurate at learning the probability contingency associated with the most ambiguous stimulus pair (E/F: 60/40). Moreover, when pairing the most positively reinforced stimulus or the most negatively reinforced stimulus with all the other stimuli in a test phase, the highly socially anxious group avoided the most negatively reinforced stimulus significantly more than the control group. The results are discussed with reference to avoidance learning and hypersensitivity to negative socially evaluative information associated with social anxiety. PMID:26347685

  19. A Comparison of Two Methods of Assessing Representation-Mediated Food Aversions Based on Shock or Illness

    ERIC Educational Resources Information Center

    Holland, Peter C.

    2008-01-01

    In experiments that measured food consumption, Holland (1981; "Learning and Motivation," 12, 1-18) found that food aversions were formed when an exteroceptive associate of food was paired with illness, but not when such an associate was paired with shock. By contrast, measuring the ability of food to reinforce instrumental responding,…

  20. The Application of Behavior Analysis to the Learning of Reading in a Retarded Child. Working Paper Series.

    ERIC Educational Resources Information Center

    Yamaguchi, Kaoru

    The study undertook to teach and fix the reading of meaningful symbols (hiragana, the Japanese syllabary alphabet, and kanji, Chinese characters) to a mildly retarded 7 year old Japanese boy. The phonological method was used to teach hiragana and pictures were used to teach kanji. Behavior modification using reinforcement and time out was…

  1. The 4-H Club Meeting: An Essential Youth Development Strategy

    ERIC Educational Resources Information Center

    Cassels, Alicia; Post, Liz; Nestor, Patrick I.

    2015-01-01

    The club meeting has served as a key delivery method for 4-H programming across the United States throughout its history. A survey of WV 4-H community club members reinforces the body of evidence that the 4-H club meeting is an effective vehicle for delivering positive youth learning opportunities within the umbrella of the Essential Elements of…

  2. Distribution majorization of corner points by reinforcement learning for moving object detection

    NASA Astrophysics Data System (ADS)

    Wu, Hao; Yu, Hao; Zhou, Dongxiang; Cheng, Yongqiang

    2018-04-01

    Corner points play an important role in moving object detection, especially in the case of free-moving camera. Corner points provide more accurate information than other pixels and reduce the computation which is unnecessary. Previous works only use intensity information to locate the corner points, however, the information that former and the last frames provided also can be used. We utilize the information to focus on more valuable area and ignore the invaluable area. The proposed algorithm is based on reinforcement learning, which regards the detection of corner points as a Markov process. In the Markov model, the video to be detected is regarded as environment, the selections of blocks for one corner point are regarded as actions and the performance of detection is regarded as state. Corner points are assigned to be the blocks which are seperated from original whole image. Experimentally, we select a conventional method which uses marching and Random Sample Consensus algorithm to obtain objects as the main framework and utilize our algorithm to improve the result. The comparison between the conventional method and the same one with our algorithm show that our algorithm reduce 70% of the false detection.

  3. Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints.

    PubMed

    Liu, Derong; Yang, Xiong; Wang, Ding; Wei, Qinglai

    2015-07-01

    The design of stabilizing controller for uncertain nonlinear systems with control constraints is a challenging problem. The constrained-input coupled with the inability to identify accurately the uncertainties motivates the design of stabilizing controller based on reinforcement-learning (RL) methods. In this paper, a novel RL-based robust adaptive control algorithm is developed for a class of continuous-time uncertain nonlinear systems subject to input constraints. The robust control problem is converted to the constrained optimal control problem with appropriately selecting value functions for the nominal system. Distinct from typical action-critic dual networks employed in RL, only one critic neural network (NN) is constructed to derive the approximate optimal control. Meanwhile, unlike initial stabilizing control often indispensable in RL, there is no special requirement imposed on the initial control. By utilizing Lyapunov's direct method, the closed-loop optimal control system and the estimated weights of the critic NN are proved to be uniformly ultimately bounded. In addition, the derived approximate optimal control is verified to guarantee the uncertain nonlinear system to be stable in the sense of uniform ultimate boundedness. Two simulation examples are provided to illustrate the effectiveness and applicability of the present approach.

  4. Surprise beyond prediction error

    PubMed Central

    Chumbley, Justin R; Burke, Christopher J; Stephan, Klaas E; Friston, Karl J; Tobler, Philippe N; Fehr, Ernst

    2014-01-01

    Surprise drives learning. Various neural “prediction error” signals are believed to underpin surprise-based reinforcement learning. Here, we report a surprise signal that reflects reinforcement learning but is neither un/signed reward prediction error (RPE) nor un/signed state prediction error (SPE). To exclude these alternatives, we measured surprise responses in the absence of RPE and accounted for a host of potential SPE confounds. This new surprise signal was evident in ventral striatum, primary sensory cortex, frontal poles, and amygdala. We interpret these findings via a normative model of surprise. PMID:24700400

  5. Emergence of Relations and the Essence of Learning: A Review of Sidman's Equivalence Relations and Behavior: A Research Story

    NASA Technical Reports Server (NTRS)

    Rumbaugh, Duane M.

    1995-01-01

    Sidman addresses two very important questions in Equivalence Relations and Behavior: A Research Story: What are the bases of behavioral competence? And how do units of learning become related? The book recounts the story of how an understanding of emergent relations and competencies was achieved through studies in his teaching-research program with mentally retarded subjects. Although children normally accrue vast networks of relations between stimuli and events, those with mental retardation typically do not. Consequently, by learning how to establish those networks, Sidman and his students contribute richly both to the cultivation of competencies by their subjects and, more generally, to an understanding of real-world human behavior. The basic equivalence paradigm affords the subject feedback and reinforcement for very specific choices during training, but the test is not for those choices! Rather, tests for equivalence look for new choices, ones seemingly quite foreign to the training regimen. The tests for equivalence relations entail presentations of stimuli that were the options for conditional choice during reinforced training. In tests of equivalence, correct choices are novel; hence, they have never been reinforced during training. The study of equivalence relations can encourage the emergence of new perspectives that are more symbiotic than competitive. In full acknowledgment of the important role and contributions made by those who identify themselves as experimental analysts of behavior, it is timely that rapprochements be worked toward, as indeed they are, to meld that perspective with others of our time. Both our research methods and our expectations about the nature of the learning process and the abilities of our subjects can delimit what they might learn and what we, in turn, learn about their learning. The text will be of great value for instruction at the upper-division and graduate levels. Its impact will be substantial, for it defines an important advance in our efforts to understand the richness of behavior in both humans and nonhuman animals. Although not presented to that end, the book might also serve to bridge communications with other groups of animal researchers whose interests lie more in a comparative or ethological framework.

  6. Autonomous learning based on cost assumptions: theoretical studies and experiments in robot control.

    PubMed

    Ribeiro, C H; Hemerly, E M

    2000-02-01

    Autonomous learning techniques are based on experience acquisition. In most realistic applications, experience is time-consuming: it implies sensor reading, actuator control and algorithmic update, constrained by the learning system dynamics. The information crudeness upon which classical learning algorithms operate make such problems too difficult and unrealistic. Nonetheless, additional information for facilitating the learning process ideally should be embedded in such a way that the structural, well-studied characteristics of these fundamental algorithms are maintained. We investigate in this article a more general formulation of the Q-learning method that allows for a spreading of information derived from single updates towards a neighbourhood of the instantly visited state and converges to optimality. We show how this new formulation can be used as a mechanism to safely embed prior knowledge about the structure of the state space, and demonstrate it in a modified implementation of a reinforcement learning algorithm in a real robot navigation task.

  7. Suppression of Striatal Prediction Errors by the Prefrontal Cortex in Placebo Hypoalgesia.

    PubMed

    Schenk, Lieven A; Sprenger, Christian; Onat, Selim; Colloca, Luana; Büchel, Christian

    2017-10-04

    Classical learning theories predict extinction after the discontinuation of reinforcement through prediction errors. However, placebo hypoalgesia, although mediated by associative learning, has been shown to be resistant to extinction. We tested the hypothesis that this is mediated by the suppression of prediction error processing through the prefrontal cortex (PFC). We compared pain modulation through treatment cues (placebo hypoalgesia, treatment context) with pain modulation through stimulus intensity cues (stimulus context) during functional magnetic resonance imaging in 48 male and female healthy volunteers. During acquisition, our data show that expectations are correctly learned and that this is associated with prediction error signals in the ventral striatum (VS) in both contexts. However, in the nonreinforced test phase, pain modulation and expectations of pain relief persisted to a larger degree in the treatment context, indicating that the expectations were not correctly updated in the treatment context. Consistently, we observed significantly stronger neural prediction error signals in the VS in the stimulus context compared with the treatment context. A connectivity analysis revealed negative coupling between the anterior PFC and the VS in the treatment context, suggesting that the PFC can suppress the expression of prediction errors in the VS. Consistent with this, a participant's conceptual views and beliefs about treatments influenced the pain modulation only in the treatment context. Our results indicate that in placebo hypoalgesia contextual treatment information engages prefrontal conceptual processes, which can suppress prediction error processing in the VS and lead to reduced updating of treatment expectancies, resulting in less extinction of placebo hypoalgesia. SIGNIFICANCE STATEMENT In aversive and appetitive reinforcement learning, learned effects show extinction when reinforcement is discontinued. This is thought to be mediated by prediction errors (i.e., the difference between expectations and outcome). Although reinforcement learning has been central in explaining placebo hypoalgesia, placebo hypoalgesic effects show little extinction and persist after the discontinuation of reinforcement. Our results support the idea that conceptual treatment beliefs bias the neural processing of expectations in a treatment context compared with a more stimulus-driven processing of expectations with stimulus intensity cues. We provide evidence that this is associated with the suppression of prediction error processing in the ventral striatum by the prefrontal cortex. This provides a neural basis for persisting effects in reinforcement learning and placebo hypoalgesia. Copyright © 2017 the authors 0270-6474/17/379715-09$15.00/0.

  8. Working Memory Contributions to Reinforcement Learning Impairments in Schizophrenia

    PubMed Central

    Brown, Jaime K.; Gold, James M.; Waltz, James A.; Frank, Michael J.

    2014-01-01

    Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine-dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia. PMID:25297101

  9. Reconsideration of Serial Visual Reversal Learning in Octopus (Octopus vulgaris) from a Methodological Perspective

    PubMed Central

    Bublitz, Alexander; Weinhold, Severine R.; Strobel, Sophia; Dehnhardt, Guido; Hanke, Frederike D.

    2017-01-01

    Octopuses (Octopus vulgaris) are generally considered to possess extraordinary cognitive abilities including the ability to successfully perform in a serial reversal learning task. During reversal learning, an animal is presented with a discrimination problem and after reaching a learning criterion, the signs of the stimuli are reversed: the former positive becomes the negative stimulus and vice versa. If an animal improves its performance over reversals, it is ascribed advanced cognitive abilities. Reversal learning has been tested in octopus in a number of studies. However, the experimental procedures adopted in these studies involved pre-training on the new positive stimulus after a reversal, strong negative reinforcement or might have enabled secondary cueing by the experimenter. These procedures could have all affected the outcome of reversal learning. Thus, in this study, serial visual reversal learning was revisited in octopus. We trained four common octopuses (O. vulgaris) to discriminate between 2-dimensional stimuli presented on a monitor in a simultaneous visual discrimination task and reversed the signs of the stimuli each time the animals reached the learning criterion of ≥80% in two consecutive sessions. The animals were trained using operant conditioning techniques including a secondary reinforcer, a rod that was pushed up and down the feeding tube, which signaled the correctness of a response and preceded the subsequent primary reinforcement of food. The experimental protocol did not involve negative reinforcement. One animal completed four reversals and showed progressive improvement, i.e., it decreased its errors to criterion the more reversals it experienced. This animal developed a generalized response strategy. In contrast, another animal completed only one reversal, whereas two animals did not learn to reverse during the first reversal. In conclusion, some octopus individuals can learn to reverse in a visual task demonstrating behavioral flexibility even with a refined methodology. PMID:28223940

  10. Learning in Mental Retardation: A Comprehensive Bibliography.

    ERIC Educational Resources Information Center

    Gardner, James M.; And Others

    The bibliography on learning in mentally handicapped persons is divided into the following topic categories: applied behavior change, classical conditioning, discrimination, generalization, motor learning, reinforcement, verbal learning, and miscellaneous. An author index is included. (KW)

  11. Asynchronous Gossip for Averaging and Spectral Ranking

    NASA Astrophysics Data System (ADS)

    Borkar, Vivek S.; Makhijani, Rahul; Sundaresan, Rajesh

    2014-08-01

    We consider two variants of the classical gossip algorithm. The first variant is a version of asynchronous stochastic approximation. We highlight a fundamental difficulty associated with the classical asynchronous gossip scheme, viz., that it may not converge to a desired average, and suggest an alternative scheme based on reinforcement learning that has guaranteed convergence to the desired average. We then discuss a potential application to a wireless network setting with simultaneous link activation constraints. The second variant is a gossip algorithm for distributed computation of the Perron-Frobenius eigenvector of a nonnegative matrix. While the first variant draws upon a reinforcement learning algorithm for an average cost controlled Markov decision problem, the second variant draws upon a reinforcement learning algorithm for risk-sensitive control. We then discuss potential applications of the second variant to ranking schemes, reputation networks, and principal component analysis.

  12. Reinforcement learning in professional basketball players

    PubMed Central

    Neiman, Tal; Loewenstein, Yonatan

    2011-01-01

    Reinforcement learning in complex natural environments is a challenging task because the agent should generalize from the outcomes of actions taken in one state of the world to future actions in different states of the world. The extent to which human experts find the proper level of generalization is unclear. Here we show, using the sequences of field goal attempts made by professional basketball players, that the outcome of even a single field goal attempt has a considerable effect on the rate of subsequent 3 point shot attempts, in line with standard models of reinforcement learning. However, this change in behaviour is associated with negative correlations between the outcomes of successive field goal attempts. These results indicate that despite years of experience and high motivation, professional players overgeneralize from the outcomes of their most recent actions, which leads to decreased performance. PMID:22146388

  13. Universal effect of dynamical reinforcement learning mechanism in spatial evolutionary games

    NASA Astrophysics Data System (ADS)

    Zhang, Hai-Feng; Wu, Zhi-Xi; Wang, Bing-Hong

    2012-06-01

    One of the prototypical mechanisms in understanding the ubiquitous cooperation in social dilemma situations is the win-stay, lose-shift rule. In this work, a generalized win-stay, lose-shift learning model—a reinforcement learning model with dynamic aspiration level—is proposed to describe how humans adapt their social behaviors based on their social experiences. In the model, the players incorporate the information of the outcomes in previous rounds with time-dependent aspiration payoffs to regulate the probability of choosing cooperation. By investigating such a reinforcement learning rule in the spatial prisoner's dilemma game and public goods game, a most noteworthy viewpoint is that moderate greediness (i.e. moderate aspiration level) favors best the development and organization of collective cooperation. The generality of this observation is tested against different regulation strengths and different types of network of interaction as well. We also make comparisons with two recently proposed models to highlight the importance of the mechanism of adaptive aspiration level in supporting cooperation in structured populations.

  14. From creatures of habit to goal-directed learners: Tracking the developmental emergence of model-based reinforcement learning

    PubMed Central

    Decker, Johannes H.; Otto, A. Ross; Daw, Nathaniel D.; Hartley, Catherine A.

    2016-01-01

    Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative strategies has not been well characterized. In this study, children, adolescents, and adults, performed a sequential reinforcement-learning task that enables estimation of model-based and model-free contributions to choice. Whereas a model-free strategy was evident in choice behavior across all age groups, evidence of a model-based strategy only emerged during adolescence and continued to increase into adulthood. These results suggest that recruitment of model-based valuation systems represents a critical cognitive component underlying the gradual maturation of goal-directed behavior. PMID:27084852

  15. Reinforcement learning or active inference?

    PubMed

    Friston, Karl J; Daunizeau, Jean; Kiebel, Stefan J

    2009-07-29

    This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  16. A cholinergic feedback circuit to regulate striatal population uncertainty and optimize reinforcement learning

    PubMed Central

    Franklin, Nicholas T; Frank, Michael J

    2015-01-01

    Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments. DOI: http://dx.doi.org/10.7554/eLife.12029.001 PMID:26705698

  17. Distributed Economic Dispatch in Microgrids Based on Cooperative Reinforcement Learning.

    PubMed

    Liu, Weirong; Zhuang, Peng; Liang, Hao; Peng, Jun; Huang, Zhiwu; Weirong Liu; Peng Zhuang; Hao Liang; Jun Peng; Zhiwu Huang; Liu, Weirong; Liang, Hao; Peng, Jun; Zhuang, Peng; Huang, Zhiwu

    2018-06-01

    Microgrids incorporated with distributed generation (DG) units and energy storage (ES) devices are expected to play more and more important roles in the future power systems. Yet, achieving efficient distributed economic dispatch in microgrids is a challenging issue due to the randomness and nonlinear characteristics of DG units and loads. This paper proposes a cooperative reinforcement learning algorithm for distributed economic dispatch in microgrids. Utilizing the learning algorithm can avoid the difficulty of stochastic modeling and high computational complexity. In the cooperative reinforcement learning algorithm, the function approximation is leveraged to deal with the large and continuous state spaces. And a diffusion strategy is incorporated to coordinate the actions of DG units and ES devices. Based on the proposed algorithm, each node in microgrids only needs to communicate with its local neighbors, without relying on any centralized controllers. Algorithm convergence is analyzed, and simulations based on real-world meteorological and load data are conducted to validate the performance of the proposed algorithm.

  18. Reinforcement learning with Marr.

    PubMed

    Niv, Yael; Langdon, Angela

    2016-10-01

    To many, the poster child for David Marr's famous three levels of scientific inquiry is reinforcement learning-a computational theory of reward optimization, which readily prescribes algorithmic solutions that evidence striking resemblance to signals found in the brain, suggesting a straightforward neural implementation. Here we review questions that remain open at each level of analysis, concluding that the path forward to their resolution calls for inspiration across levels, rather than a focus on mutual constraints.

  19. Modeling Avoidance in Mood and Anxiety Disorders Using Reinforcement Learning.

    PubMed

    Mkrtchian, Anahit; Aylward, Jessica; Dayan, Peter; Roiser, Jonathan P; Robinson, Oliver J

    2017-10-01

    Serious and debilitating symptoms of anxiety are the most common mental health problem worldwide, accounting for around 5% of all adult years lived with disability in the developed world. Avoidance behavior-avoiding social situations for fear of embarrassment, for instance-is a core feature of such anxiety. However, as for many other psychiatric symptoms the biological mechanisms underlying avoidance remain unclear. Reinforcement learning models provide formal and testable characterizations of the mechanisms of decision making; here, we examine avoidance in these terms. A total of 101 healthy participants and individuals with mood and anxiety disorders completed an approach-avoidance go/no-go task under stress induced by threat of unpredictable shock. We show an increased reliance in the mood and anxiety group on a parameter of our reinforcement learning model that characterizes a prepotent (pavlovian) bias to withhold responding in the face of negative outcomes. This was particularly the case when the mood and anxiety group was under stress. This formal description of avoidance within the reinforcement learning framework provides a new means of linking clinical symptoms with biophysically plausible models of neural circuitry and, as such, takes us closer to a mechanistic understanding of mood and anxiety disorders. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  20. Variable Behavior and Repeated Learning in Two Mouse Strains: Developmental and Genetic Contributions.

    PubMed

    Arnold, Megan A; Newland, M Christopher

    2018-06-16

    Behavioral inflexibility is often assessed using reversal learning tasks, which require a relatively low degree of response variability. No studies have assessed sensitivity to reinforcement contingencies that specifically select highly variable response patterns in mice, let alone in models of neurodevelopmental disorders involving limited response variation. Operant variability and incremental repeated acquisition (IRA) were used to assess unique aspects of behavioral variability of two mouse strains: BALB/c, a model of some deficits in ASD, and C57Bl/6. On the operant variability task, BALB/c mice responded more repetitively during adolescence than C57Bl/6 mice when reinforcement did not require variability but responded more variably when reinforcement required variability. During IRA testing in adulthood, both strains acquired an unchanging, performance sequence equally well. Strain differences emerged, however, after novel learning sequences began alternating with the performance sequence: BALB/c mice substantially outperformed C57Bl/6 mice. Using litter-mate controls, it was found that adolescent experience with variability did not affect either learning or performance on the IRA task in adulthood. These findings constrain the use of BALB/c mice as a model of ASD, but once again reveal this strain is highly sensitive to reinforcement contingencies and they are fast and robust learners. Copyright © 2018. Published by Elsevier B.V.

  1. Preventing Learned Helplessness.

    ERIC Educational Resources Information Center

    Hoy, Cheri

    1986-01-01

    To prevent learned helplessness in learning disabled students, teachers can share responsibilities with the students, train students to reinforce themselves for effort and self control, and introduce opportunities for changing counterproductive attitudes. (CL)

  2. Tunnel Ventilation Control Using Reinforcement Learning Methodology

    NASA Astrophysics Data System (ADS)

    Chu, Baeksuk; Kim, Dongnam; Hong, Daehie; Park, Jooyoung; Chung, Jin Taek; Kim, Tae-Hyung

    The main purpose of tunnel ventilation system is to maintain CO pollutant concentration and VI (visibility index) under an adequate level to provide drivers with comfortable and safe driving environment. Moreover, it is necessary to minimize power consumption used to operate ventilation system. To achieve the objectives, the control algorithm used in this research is reinforcement learning (RL) method. RL is a goal-directed learning of a mapping from situations to actions without relying on exemplary supervision or complete models of the environment. The goal of RL is to maximize a reward which is an evaluative feedback from the environment. In the process of constructing the reward of the tunnel ventilation system, two objectives listed above are included, that is, maintaining an adequate level of pollutants and minimizing power consumption. RL algorithm based on actor-critic architecture and gradient-following algorithm is adopted to the tunnel ventilation system. The simulations results performed with real data collected from existing tunnel ventilation system and real experimental verification are provided in this paper. It is confirmed that with the suggested controller, the pollutant level inside the tunnel was well maintained under allowable limit and the performance of energy consumption was improved compared to conventional control scheme.

  3. Relapse processes after the extinction of instrumental learning: Renewal, resurgence, and reacquisition

    PubMed Central

    Bouton, Mark E.; Winterbauer, Neil E.; Todd, Travis P.

    2012-01-01

    It is widely recognized that extinction (the procedure in which a Pavlovian conditioned stimulus or an instrumental action is repeatedly presented without its reinforcer) weakens behavior without erasing the original learning. Most of the experiments that support this claim have focused on several “relapse” effects that occur after Pavlovian extinction, which collectively suggest that the original learning is saved through extinction. However, although such effects do occur after instrumental extinction, they have not been explored there in as much detail. This article reviews recent research in our laboratory that has investigated three relapse effects that occur after the extinction of instrumental (operant) learning. In renewal, responding returns after extinction when the behavior is tested in a different context; in resurgence, responding recovers when a second response that has been reinforced during extinction of the first is itself put on extinction; and in rapid reacquisition, extinguished responding returns rapidly when the response is reinforced again. The results provide new insights into extinction and relapse, and are consistent with principles that have been developed to explain extinction and relapse as they occur after Pavlovian conditioning. Extinction of instrumental learning, like Pavlovian learning, involves new learning that is relatively dependent on the context for expression. PMID:22450305

  4. Enhanced appetitive learning and reversal learning in a mouse model for Prader-Willi syndrome.

    PubMed

    Relkovic, Dinko; Humby, Trevor; Hagan, Jim J; Wilkinson, Lawrence S; Isles, Anthony R

    2012-06-01

    Prader-Willi syndrome (PWS) is caused by lack of paternally derived gene expression from the imprinted gene cluster on human chromosome 15q11-q13. PWS is characterized by severe hypotonia, a failure to thrive in infancy and, on emerging from infancy, evidence of learning disabilities and overeating behavior due to an abnormal satiety response and increased motivation by food. We have previously shown that an imprinting center deletion mouse model (PWS-IC) is quicker to acquire a preference for, and consume more of a palatable food. Here we examined how the use of this palatable food as a reinforcer influences learning in PWS-IC mice performing a simple appetitive learning task. On a nonspatial maze-based task, PWS-IC mice acquired criteria much quicker, making fewer errors during initial acquisition and also reversal learning. A manipulation where the reinforcer was devalued impaired wild-type performance but had no effect on PWS-IC mice. This suggests that increased motivation for the reinforcer in PWS-IC mice may underlie their enhanced learning. This supports previous findings in PWS patients and is the first behavioral study of an animal model of PWS in which the motivation of behavior by food rewards has been examined. © 2012 American Psychological Association

  5. Economic decision-making in the ultimatum game by smokers.

    PubMed

    Takahashi, Taiki

    2007-10-01

    No study to date compared degrees of inequity aversion in economic decision-making in the ultimatum game between non-addictive and addictive reinforcers. The comparison is potentially important in neuroeconomics and reinforcement learning theory of addiction. We compared the degrees of inequity aversion in the ultimatum game between money and cigarettes in habitual smokers. Smokers avoided inequity in the ultimatum game more dramatically for money than for cigarettes; i.e., there was a "domain effect" in decision-making in the ultimatum game. Reward-processing neural activities in the brain for non-addictive and addictive reinforcers may be distinct and the insula activation due to cue-induced craving may conflict with unfair offer-induced insula activation. Future studies in neuroeconomics of addiction should employ game-theoretic decision tasks for elucidating reinforcement learning processes in dopaminergic neural circuits.

  6. Reinforced two-step-ahead weight adjustment technique for online training of recurrent neural networks.

    PubMed

    Chang, Li-Chiu; Chen, Pin-An; Chang, Fi-John

    2012-08-01

    A reliable forecast of future events possesses great value. The main purpose of this paper is to propose an innovative learning technique for reinforcing the accuracy of two-step-ahead (2SA) forecasts. The real-time recurrent learning (RTRL) algorithm for recurrent neural networks (RNNs) can effectively model the dynamics of complex processes and has been used successfully in one-step-ahead forecasts for various time series. A reinforced RTRL algorithm for 2SA forecasts using RNNs is proposed in this paper, and its performance is investigated by two famous benchmark time series and a streamflow during flood events in Taiwan. Results demonstrate that the proposed reinforced 2SA RTRL algorithm for RNNs can adequately forecast the benchmark (theoretical) time series, significantly improve the accuracy of flood forecasts, and effectively reduce time-lag effects.

  7. Fun While Learning and Earning. A Look Into Chattanooga Public Schools' Token Reinforcement Program.

    ERIC Educational Resources Information Center

    Smith, William F.; Sanders, Frank J.

    A token reinforcement program was used by the Piney Woods Research and Demonstration Center in Chattanooga, Tennessee. Children who were from economically deprived homes received tokens for positive behavior. The tokens were redeemable for recess privileges, ice cream, candy, and other such reinforcers. All tokens were spent on the day earned so…

  8. The Curriculum-Faculty-Reinforcement Alignment and Its Effect on Learning Retention of Core Marketing Concepts of Marketing Capstone Students

    ERIC Educational Resources Information Center

    Raska, David; Keller, Eileen Weisenbach; Shaw, Doris

    2014-01-01

    Curriculum-Faculty-Reinforcement (CFR) alignment is an alignment between fundamental marketing concepts that are integral to the mastery of knowledge expected of our marketing graduates, their perceived importance by the faculty, and their level of reinforcement throughout core marketing courses required to obtain a marketing degree. This research…

  9. Genetic Dissociation of Acquisition and Memory Strength in the Heat-Box Spatial Learning Paradigm in "Drosophila"

    ERIC Educational Resources Information Center

    Diegelmann, Soeren; Zars, Melissa; Zars, Troy

    2006-01-01

    Memories can have different strengths, largely dependent on the intensity of reinforcers encountered. The relationship between reinforcement and memory strength is evident in asymptotic memory curves, with the level of the asymptote related to the intensity of the reinforcer. Although this is likely a fundamental property of memory formation,…

  10. Context-Outcome Associations Underlie Context-Switch Effects after Partial Reinforcement in Human Predictive Learning

    ERIC Educational Resources Information Center

    Moreno-Fernandez, Maria M.; Abad, Maria J. F.; Ramos-Alvarez, Manuel M.; Rosas, Juan M.

    2011-01-01

    Predictive value for continuously reinforced cues is affected by context changes when they are trained within a context in which a different cue undergoes partial reinforcement. An experiment was conducted with the goal of exploring the mechanisms underlying this context-switch effect. Human participants were trained in a predictive learning…

  11. The Use of Reinforcement Procedures in Teaching Reading to Rural Culturally Deprived Children.

    ERIC Educational Resources Information Center

    Egeland, Byron

    A group of culturally deprived children with severe reading and behavior problems was systematically given tangible reinforcers while learning to read. Twelve second-grade and 12 third-grade boys from a rural and lower socioeconomic background were taught reading with the use of tangible reinforcers (E group). Four similar control groups (C group)…

  12. Cerebellar and prefrontal cortex contributions to adaptation, strategies, and reinforcement learning.

    PubMed

    Taylor, Jordan A; Ivry, Richard B

    2014-01-01

    Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. © 2014 Elsevier B.V. All rights reserved.

  13. Temporally Coordinated Deep Brain Stimulation in the Dorsal and Ventral Striatum Synergistically Enhances Associative Learning.

    PubMed

    Katnani, Husam A; Patel, Shaun R; Kwon, Churl-Su; Abdel-Aziz, Samer; Gale, John T; Eskandar, Emad N

    2016-01-04

    The primate brain has the remarkable ability of mapping sensory stimuli into motor behaviors that can lead to positive outcomes. We have previously shown that during the reinforcement of visual-motor behavior, activity in the caudate nucleus is correlated with the rate of learning. Moreover, phasic microstimulation in the caudate during the reinforcement period was shown to enhance associative learning, demonstrating the importance of temporal specificity to manipulate learning related changes. Here we present evidence that extends upon our previous finding by demonstrating that temporally coordinated phasic deep brain stimulation across both the nucleus accumbens and caudate can further enhance associative learning. Monkeys performed a visual-motor associative learning task and received stimulation at time points critical to learning related changes. Resulting performance revealed an enhancement in the rate, ceiling, and reaction times of learning. Stimulation of each brain region alone or at different time points did not generate the same effect.

  14. Cerebellar and Prefrontal Cortex Contributions to Adaptation, Strategies, and Reinforcement Learning

    PubMed Central

    Taylor, Jordan A.; Ivry, Richard B.

    2014-01-01

    Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. PMID:24916295

  15. Modeling the behavioral substrates of associate learning and memory - Adaptive neural models

    NASA Technical Reports Server (NTRS)

    Lee, Chuen-Chien

    1991-01-01

    Three adaptive single-neuron models based on neural analogies of behavior modification episodes are proposed, which attempt to bridge the gap between psychology and neurophysiology. The proposed models capture the predictive nature of Pavlovian conditioning, which is essential to the theory of adaptive/learning systems. The models learn to anticipate the occurrence of a conditioned response before the presence of a reinforcing stimulus when training is complete. Furthermore, each model can find the most nonredundant and earliest predictor of reinforcement. The behavior of the models accounts for several aspects of basic animal learning phenomena in Pavlovian conditioning beyond previous related models. Computer simulations show how well the models fit empirical data from various animal learning paradigms.

  16. Enhancing Self-Efficacy in Elementary Science Teaching With Professional Learning Communities

    NASA Astrophysics Data System (ADS)

    Mintzes, Joel J.; Marcum, Bev; Messerschmidt-Yates, Christl; Mark, Andrew

    2013-11-01

    Emerging from Bandura's Social Learning Theory, this study of in-service elementary school teachers examined the effects of sustained Professional Learning Communities (PLCs) on self-efficacy in science teaching. Based on mixed research methods, and a non-equivalent control group experimental design, the investigation explored changes in personal self-efficacy and outcome expectancy among teachers engaged in PLCs that featured Demonstration Laboratories, Lesson Study, and annual Summer Institutes. Significant changes favoring the experimental group were found on all quantitative measures of self-efficacy. Structured clinical interviews revealed that observed changes were largely attributable to a wide range of direct (mastery) and vicarious experiences, as well as emotional reinforcement and social persuasion.

  17. Stimulus discriminability may bias value-based probabilistic learning.

    PubMed

    Schutte, Iris; Slagter, Heleen A; Collins, Anne G E; Frank, Michael J; Kenemans, J Leon

    2017-01-01

    Reinforcement learning tasks are often used to assess participants' tendency to learn more from the positive or more from the negative consequences of one's action. However, this assessment often requires comparison in learning performance across different task conditions, which may differ in the relative salience or discriminability of the stimuli associated with more and less rewarding outcomes, respectively. To address this issue, in a first set of studies, participants were subjected to two versions of a common probabilistic learning task. The two versions differed with respect to the stimulus (Hiragana) characters associated with reward probability. The assignment of character to reward probability was fixed within version but reversed between versions. We found that performance was highly influenced by task version, which could be explained by the relative perceptual discriminability of characters assigned to high or low reward probabilities, as assessed by a separate discrimination experiment. Participants were more reliable in selecting rewarding characters that were more discriminable, leading to differences in learning curves and their sensitivity to reward probability. This difference in experienced reinforcement history was accompanied by performance biases in a test phase assessing ability to learn from positive vs. negative outcomes. In a subsequent large-scale web-based experiment, this impact of task version on learning and test measures was replicated and extended. Collectively, these findings imply a key role for perceptual factors in guiding reward learning and underscore the need to control stimulus discriminability when making inferences about individual differences in reinforcement learning.

  18. Multiagent Reinforcement Learning With Sparse Interactions by Negotiation and Knowledge Transfer.

    PubMed

    Zhou, Luowei; Yang, Pei; Chen, Chunlin; Gao, Yang

    2017-05-01

    Reinforcement learning has significant applications for multiagent systems, especially in unknown dynamic environments. However, most multiagent reinforcement learning (MARL) algorithms suffer from such problems as exponential computation complexity in the joint state-action space, which makes it difficult to scale up to realistic multiagent problems. In this paper, a novel algorithm named negotiation-based MARL with sparse interactions (NegoSIs) is presented. In contrast to traditional sparse-interaction-based MARL algorithms, NegoSI adopts the equilibrium concept and makes it possible for agents to select the nonstrict equilibrium-dominating strategy profile (nonstrict EDSP) or meta equilibrium for their joint actions. The presented NegoSI algorithm consists of four parts: 1) the equilibrium-based framework for sparse interactions; 2) the negotiation for the equilibrium set; 3) the minimum variance method for selecting one joint action; and 4) the knowledge transfer of local Q -values. In this integrated algorithm, three techniques, i.e., unshared value functions, equilibrium solutions, and sparse interactions are adopted to achieve privacy protection, better coordination and lower computational complexity, respectively. To evaluate the performance of the presented NegoSI algorithm, two groups of experiments are carried out regarding three criteria: 1) steps of each episode; 2) rewards of each episode; and 3) average runtime. The first group of experiments is conducted using six grid world games and shows fast convergence and high scalability of the presented algorithm. Then in the second group of experiments NegoSI is applied to an intelligent warehouse problem and simulated results demonstrate the effectiveness of the presented NegoSI algorithm compared with other state-of-the-art MARL algorithms.

  19. Stress enables reinforcement-elicited serotonergic consolidation of fear memory

    PubMed Central

    Baratta, Michael V.; Kodandaramaiah, Suhasa B.; Monahan, Patrick E.; Yao, Junmei; Weber, Michael D.; Lin, Pei-Ann; Gisabella, Barbara; Petrossian, Natalie; Amat, Jose; Kim, Kyungman; Yang, Aimei; Forest, Craig R.; Boyden, Edward S.; Goosens, Ki A.

    2015-01-01

    Background Prior exposure to stress is a risk factor for developing post-traumatic stress disorder (PTSD) in response to trauma, yet the mechanisms by which this occurs are unclear. Using a rodent model of stress-based susceptibility to PTSD, we investigated the role of serotonin in this phenomenon. Methods Adult mice were exposed to repeated immobilization stress or handling, and the role of serotonin in subsequent fear learning was assessed using pharmacological manipulation and western blot detection of serotonin receptors, measurements of serotonin, high-speed optogenetic silencing, and behavior. Results Both dorsal raphe serotonergic activity during aversive reinforcement and amygdala serotonin 2c receptor (5-HT2CR) activity during memory consolidation are necessary for stress enhancement of fear memory, but neither process affects fear memory in unstressed mice. Additionally, prior stress increases amygdala sensitivity to serotonin by promoting surface expression of 5-HT2CR without affecting tissue levels of serotonin in the amygdala. We also show that the serotonin that drives stress enhancement of associative cued fear memory can arise from paired or unpaired footshock, an effect not predicted by theoretical models of associative learning. Conclusion Stress bolsters the consequences of aversive reinforcement, not by simply enhancing the neurobiological signals used to encode fear in unstressed animals, but rather by engaging distinct mechanistic pathways. These results reveal that predictions from classical associative learning models do not always hold for stressed animals, and suggest that 5-HT2CR blockade may represent a promising therapeutic target for psychiatric disorders characterized by excessive fear responses such as that observed in PTSD. PMID:26248536

  20. Young Adults at Risk for Stimulant Dependence show Reward Dysfunction during Reinforcement-Based Decision Making

    PubMed Central

    Stewart, Jennifer L.; Flagan, Taru M.; May, April C.; Reske, Martina; Simmons, Alan N.; Paulus, Martin P.

    2012-01-01

    Background While stimulant dependent individuals continue to make risky decisions in spite of poor outcomes, much less is known about decision-making characteristics of occasional stimulant users (OSU) at risk for developing stimulant dependence. This study examines whether OSU exhibit inefficient learning and execution of reinforced decision-outcome contingencies. Methods OSU (n=161) and stimulant-naïve comparison subjects (CTL; n=48) performed a Paper Scissors Rock task during functional magnetic resonance imaging. Selecting a particular option was associated with a pre-determined probability of winning, which was altered repeatedly to examine neural and behavioral characteristics of reinforced contingencies. Results OSU displayed greater anterior insula, inferior frontal gyrus (IFG), and dorsal striatum activation than CTL during late trials when contingencies were familiar (as opposed to being learned) in the presence of comparable behavioral performance in both groups. Follow-up analyses demonstrated that during late trials: (1) OSU with high cannabis use displayed greater activation in these brain regions than CTL, whereas OSU with low cannabis use did not differ from the other two groups; and (2) OSU preferring cocaine exhibited greater anterior insula, IFG, and dorsal striatum activation than CTL and also displayed higher activation in the former two regions than OSU who preferred prescription stimulants. Conclusions OSU exhibit inefficient resource allocation during the execution of reinforced contingencies that may be a result of additive effects of cocaine and cannabis use. A critical next step is to establish whether this inefficiency predicts transition to stimulant dependence. PMID:23021534

  1. High and low temperatures have unequal reinforcing properties in Drosophila spatial learning.

    PubMed

    Zars, Melissa; Zars, Troy

    2006-07-01

    Small insects regulate their body temperature solely through behavior. Thus, sensing environmental temperature and implementing an appropriate behavioral strategy can be critical for survival. The fly Drosophila melanogaster prefers 24 degrees C, avoiding higher and lower temperatures when tested on a temperature gradient. Furthermore, temperatures above 24 degrees C have negative reinforcing properties. In contrast, we found that flies have a preference in operant learning experiments for a low-temperature-associated position rather than the 24 degrees C alternative in the heat-box. Two additional differences between high- and low-temperature reinforcement, i.e., temperatures above and below 24 degrees C, were found. Temperatures equally above and below 24 degrees C did not reinforce equally and only high temperatures supported increased memory performance with reversal conditioning. Finally, low- and high-temperature reinforced memories are similarly sensitive to two genetic mutations. Together these results indicate the qualitative meaning of temperatures below 24 degrees C depends on the dynamics of the temperatures encountered and that the reinforcing effects of these temperatures depend on at least some common genetic components. Conceptualizing these results using the Wolf-Heisenberg model of operant conditioning, we propose the maximum difference in experienced temperatures determines the magnitude of the reinforcement input to a conditioning circuit.

  2. Feedback-related negativity is enhanced in adolescence during a gambling task with and without probabilistic reinforcement learning.

    PubMed

    Martínez-Velázquez, Eduardo S; Ramos-Loyo, Julieta; González-Garrido, Andrés A; Sequeira, Henrique

    2015-01-21

    Feedback-related negativity (FRN) is a negative deflection that appears around 250 ms after the gain or loss of feedback to chosen alternatives in a gambling task in frontocentral regions following outcomes. Few studies have reported FRN enhancement in adolescents compared with adults in a gambling task without probabilistic reinforcement learning, despite the fact that learning from positive or negative consequences is crucial for decision-making during adolescence. Therefore, the aim of the present research was to identify differences in FRN amplitude and latency between adolescents and adults on a gambling task with favorable and unfavorable probabilistic reinforcement learning conditions, in addition to a nonlearning condition with monetary gains and losses. Higher rate scores of high-magnitude choices during the final 30 trials compared with the first 30 trials were observed during the favorable condition, whereas lower rates were observed during the unfavorable condition in both groups. Higher FRN amplitude in all conditions and longer latency in the nonlearning condition were observed in adolescents compared with adults and in relation to losses. Results indicate that both the adolescents and the adults improved their performance in relation to positive and negative feedback. However, the FRN findings suggest an increased sensitivity to external feedback to losses in adolescents compared with adults, irrespective of the presence or absence of probabilistic reinforcement learning. These results reflect processing differences on the neural monitoring system and provide new perspectives on the dynamic development of an adolescent's brain.

  3. Application of fuzzy logic-neural network based reinforcement learning to proximity and docking operations

    NASA Technical Reports Server (NTRS)

    Jani, Yashvant

    1992-01-01

    As part of the Research Institute for Computing and Information Systems (RICIS) activity, the reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Max satellite simulation. This activity is carried out in the software technology laboratory utilizing the Orbital Operations Simulator (OOS). This interim report provides the status of the project and outlines the future plans.

  4. Exploiting Multi-Step Sample Trajectories for Approximate Value Iteration

    DTIC Science & Technology

    2013-09-01

    WORK UNIT NUMBER IH 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) AFRL/ RISC 525 Brooks Road, Rome NY 13441-4505 Binghamton University...S) AND ADDRESS(ES) Air Force Research Laboratory/Information Directorate Rome Research Site/ RISC 525 Brooks Road Rome NY 13441-4505 10. SPONSOR...iteration methods for reinforcement learning (RL) generalize experience from limited samples across large state-action spaces. The function approximators

  5. Disrupted reinforcement learning and maladaptive behavior in women with a history of childhood sexual abuse: a high-density event-related potential study.

    PubMed

    Pechtel, Pia; Pizzagalli, Diego A

    2013-05-01

    Childhood sexual abuse (CSA) has been associated with psychopathology, particularly major depressive disorder (MDD), and high-risk behaviors. Despite the epidemiological data available, the mechanisms underlying these maladaptive outcomes remain poorly understood. We examined whether a history of CSA, particularly in conjunction with a past episode of MDD, is associated with behavioral and neural dysfunction in reinforcement learning, and whether such dysfunction is linked to maladaptive behavior. Participants completed a clinical evaluation and a probabilistic reinforcement task while 128-channel event-related potentials were recorded. Academic setting; participants recruited from the community. Fifteen women with a history of CSA and remitted MDD (CSA + rMDD), 16 women with remitted MDD with no history of CSA (rMDD), and 18 healthy women (controls). Three or more episodes of coerced sexual contact (mean [SD] duration, 3.00 [2.20] years) between the ages of 7 and 12 years by at least 1 male perpetrator. Participants' preference for choosing the most rewarded stimulus and avoiding the most punished stimulus was evaluated. The feedback-related negativity and error-related negativity-hypothesized to reflect activation in the anterior cingulate cortex-were used as electrophysiological indices of reinforcement learning. No group differences emerged in the acquisition of reinforcement contingencies. In trials requiring participants to rely partially or exclusively on previously rewarded information, the CSA + rMDD group showed (1) lower accuracy (relative to both controls and the rMDD group), (2) blunted electrophysiological differentiation between correct and incorrect responses (relative to controls), and (3) increased activation in the subgenual anterior cingulate cortex (relative to the rMDD group). A history of CSA was not associated with impairments in avoiding the most punished stimulus. Self-harm and suicidal behaviors correlated with poorer performance of previously rewarded, but not previously punished, trials. Irrespective of past MDD episodes, women with a history of CSA showed neural and behavioral deficits in utilizing previous reinforcement to optimize decision making in the absence of feedback (blunted "Go learning"). Although our study provides initial evidence for reward-specific deficits associated with CSA, future research is warranted to determine if disrupted positive reinforcement learning predicts high-risk behavior following CSA.

  6. Emotion-based learning systems and the development of morality.

    PubMed

    Blair, R J R

    2017-10-01

    In this paper it is proposed that important components of moral development and moral judgment rely on two forms of emotional learning: stimulus-reinforcement and response-outcome learning. Data in support of this position will be primarily drawn from work with individuals with the developmental condition of psychopathy as well as fMRI studies with healthy individuals. Individuals with psychopathy show impairment on moral judgment tasks and a pronounced increased risk for instrumental antisocial behavior. It will be argued that these impairments are developmental consequences of impaired stimulus-aversive conditioning on the basis of distress cue reinforcers and response-outcome learning in individuals with this disorder. Copyright © 2017. Published by Elsevier B.V.

  7. Feedback-related brain activity predicts learning from feedback in multiple-choice testing.

    PubMed

    Ernst, Benjamin; Steinhauser, Marco

    2012-06-01

    Different event-related potentials (ERPs) have been shown to correlate with learning from feedback in decision-making tasks and with learning in explicit memory tasks. In the present study, we investigated which ERPs predict learning from corrective feedback in a multiple-choice test, which combines elements from both paradigms. Participants worked through sets of multiple-choice items of a Swahili-German vocabulary task. Whereas the initial presentation of an item required the participants to guess the answer, corrective feedback could be used to learn the correct response. Initial analyses revealed that corrective feedback elicited components related to reinforcement learning (FRN), as well as to explicit memory processing (P300) and attention (early frontal positivity). However, only the P300 and early frontal positivity were positively correlated with successful learning from corrective feedback, whereas the FRN was even larger when learning failed. These results suggest that learning from corrective feedback crucially relies on explicit memory processing and attentional orienting to corrective feedback, rather than on reinforcement learning.

  8. More Than the Sum of Its Parts: A Role for the Hippocampus in Configural Reinforcement Learning.

    PubMed

    Duncan, Katherine; Doll, Bradley B; Daw, Nathaniel D; Shohamy, Daphna

    2018-05-02

    People often perceive configurations rather than the elements they comprise, a bias that may emerge because configurations often predict outcomes. But how does the brain learn to associate configurations with outcomes and how does this learning differ from learning about individual elements? We combined behavior, reinforcement learning models, and functional imaging to understand how people learn to associate configurations of cues with outcomes. We found that configural learning depended on the relative predictive strength of elements versus configurations and was related to both the strength of BOLD activity and patterns of BOLD activity in the hippocampus. Configural learning was further related to functional connectivity between the hippocampus and nucleus accumbens. Moreover, configural learning was associated with flexible knowledge about associations and differential eye movements during choice. Together, this suggests that configural learning is associated with a distinct computational, cognitive, and neural profile that is well suited to support flexible and adaptive behavior. Copyright © 2018 Elsevier Inc. All rights reserved.

  9. Segmentation of neuronal structures using SARSA (λ)-based boundary amendment with reinforced gradient-descent curve shape fitting.

    PubMed

    Zhu, Fei; Liu, Quan; Fu, Yuchen; Shen, Bairong

    2014-01-01

    The segmentation of structures in electron microscopy (EM) images is very important for neurobiological research. The low resolution neuronal EM images contain noise and generally few features are available for segmentation, therefore application of the conventional approaches to identify the neuron structure from EM images is not successful. We therefore present a multi-scale fused structure boundary detection algorithm in this study. In the algorithm, we generate an EM image Gaussian pyramid first, then at each level of the pyramid, we utilize Laplacian of Gaussian function (LoG) to attain structure boundary, we finally assemble the detected boundaries by using fusion algorithm to attain a combined neuron structure image. Since the obtained neuron structures usually have gaps, we put forward a reinforcement learning-based boundary amendment method to connect the gaps in the detected boundaries. We use a SARSA (λ)-based curve traveling and amendment approach derived from reinforcement learning to repair the incomplete curves. Using this algorithm, a moving point starts from one end of the incomplete curve and walks through the image where the decisions are supervised by the approximated curve model, with the aim of minimizing the connection cost until the gap is closed. Our approach provided stable and efficient structure segmentation. The test results using 30 EM images from ISBI 2012 indicated that both of our approaches, i.e., with or without boundary amendment, performed better than six conventional boundary detection approaches. In particular, after amendment, the Rand error and warping error, which are the most important performance measurements during structure segmentation, were reduced to very low values. The comparison with the benchmark method of ISBI 2012 and the recent developed methods also indicates that our method performs better for the accurate identification of substructures in EM images and therefore useful for the identification of imaging features related to brain diseases.

  10. Segmentation of Neuronal Structures Using SARSA (λ)-Based Boundary Amendment with Reinforced Gradient-Descent Curve Shape Fitting

    PubMed Central

    Zhu, Fei; Liu, Quan; Fu, Yuchen; Shen, Bairong

    2014-01-01

    The segmentation of structures in electron microscopy (EM) images is very important for neurobiological research. The low resolution neuronal EM images contain noise and generally few features are available for segmentation, therefore application of the conventional approaches to identify the neuron structure from EM images is not successful. We therefore present a multi-scale fused structure boundary detection algorithm in this study. In the algorithm, we generate an EM image Gaussian pyramid first, then at each level of the pyramid, we utilize Laplacian of Gaussian function (LoG) to attain structure boundary, we finally assemble the detected boundaries by using fusion algorithm to attain a combined neuron structure image. Since the obtained neuron structures usually have gaps, we put forward a reinforcement learning-based boundary amendment method to connect the gaps in the detected boundaries. We use a SARSA (λ)-based curve traveling and amendment approach derived from reinforcement learning to repair the incomplete curves. Using this algorithm, a moving point starts from one end of the incomplete curve and walks through the image where the decisions are supervised by the approximated curve model, with the aim of minimizing the connection cost until the gap is closed. Our approach provided stable and efficient structure segmentation. The test results using 30 EM images from ISBI 2012 indicated that both of our approaches, i.e., with or without boundary amendment, performed better than six conventional boundary detection approaches. In particular, after amendment, the Rand error and warping error, which are the most important performance measurements during structure segmentation, were reduced to very low values. The comparison with the benchmark method of ISBI 2012 and the recent developed methods also indicates that our method performs better for the accurate identification of substructures in EM images and therefore useful for the identification of imaging features related to brain diseases. PMID:24625699

  11. The Interaction of Temporal Generalization Gradients Predicts the Context Effect

    ERIC Educational Resources Information Center

    de Castro, Ana Catarina; Machado, Armando

    2012-01-01

    In a temporal double bisection task, animals learn two discriminations. In the presence of Red and Green keys, responses to Red are reinforced after 1-s samples and responses to Green are reinforced after 4-s samples; in the presence of Blue and Yellow keys, responses to Blue are reinforced after 4-s samples and responses to Yellow are reinforced…

  12. Amygdala and Ventral Striatum Make Distinct Contributions to Reinforcement Learning.

    PubMed

    Costa, Vincent D; Dal Monte, Olga; Lucas, Daniel R; Murray, Elisabeth A; Averbeck, Bruno B

    2016-10-19

    Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL, we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with an RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys' choice reaction times, which emphasized a speed-accuracy trade-off that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL. Published by Elsevier Inc.

  13. Amygdala and ventral striatum make distinct contributions to reinforcement learning

    PubMed Central

    Costa, Vincent D.; Monte, Olga Dal; Lucas, Daniel R.; Murray, Elisabeth A.; Averbeck, Bruno B.

    2016-01-01

    Summary Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with a RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys’ choice reaction times, which emphasized a speed-accuracy tradeoff that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL. PMID:27720488

  14. Working memory contributions to reinforcement learning impairments in schizophrenia.

    PubMed

    Collins, Anne G E; Brown, Jaime K; Gold, James M; Waltz, James A; Frank, Michael J

    2014-10-08

    Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine-dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia. Copyright © 2014 the authors 0270-6474/14/3413747-10$15.00/0.

  15. Evaluation of an advanced physical diagnosis course using consumer preferences methods: the nominal group technique.

    PubMed

    Coker, Joshua; Castiglioni, Analia; Kraemer, Ryan R; Massie, F Stanford; Morris, Jason L; Rodriguez, Martin; Russell, Stephen W; Shaneyfelt, Terrance; Willett, Lisa L; Estrada, Carlos A

    2014-03-01

    Current evaluation tools of medical school courses are limited by the scope of questions asked and may not fully engage the student to think on areas to improve. The authors sought to explore whether a technique to study consumer preferences would elicit specific and prioritized information for course evaluation from medical students. Using the nominal group technique (4 sessions), 12 senior medical students prioritized and weighed expectations and topics learned in a 100-hour advanced physical diagnosis course (4-week course; February 2012). Students weighted their top 3 responses (top = 3, middle = 2 and bottom = 1). Before the course, 12 students identified 23 topics they expected to learn; the top 3 were review sensitivity/specificity and high-yield techniques (percentage of total weight, 18.5%), improving diagnosis (13.8%) and reinforce usual and less well-known techniques (13.8%). After the course, students generated 22 topics learned; the top 3 were practice and reinforce advanced maneuvers (25.4%), gaining confidence (22.5%) and learn the evidence (16.9%). The authors observed no differences in the priority of responses before and after the course (P = 0.07). In a physical diagnosis course, medical students elicited specific and prioritized information using the nominal group technique. The course met student expectations regarding education of the evidence-based physical examination, building skills and confidence on the proper techniques and maneuvers and experiential learning. The novel use for curriculum evaluation may be used to evaluate other courses-especially comprehensive and multicomponent courses.

  16. The role of first impression in operant learning.

    PubMed

    Shteingart, Hanan; Neiman, Tal; Loewenstein, Yonatan

    2013-05-01

    We quantified the effect of first experience on behavior in operant learning and studied its underlying computational principles. To that goal, we analyzed more than 200,000 choices in a repeated-choice experiment. We found that the outcome of the first experience has a substantial and lasting effect on participants' subsequent behavior, which we term outcome primacy. We found that this outcome primacy can account for much of the underweighting of rare events, where participants apparently underestimate small probabilities. We modeled behavior in this task using a standard, model-free reinforcement learning algorithm. In this model, the values of the different actions are learned over time and are used to determine the next action according to a predefined action-selection rule. We used a novel nonparametric method to characterize this action-selection rule and showed that the substantial effect of first experience on behavior is consistent with the reinforcement learning model if we assume that the outcome of first experience resets the values of the experienced actions, but not if we assume arbitrary initial conditions. Moreover, the predictive power of our resetting model outperforms previously published models regarding the aggregate choice behavior. These findings suggest that first experience has a disproportionately large effect on subsequent actions, similar to primacy effects in other fields of cognitive psychology. The mechanism of resetting of the initial conditions that underlies outcome primacy may thus also account for other forms of primacy. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  17. Ontogeny of passive avoidance learning in domestic chicks: punishment of key-peck and running responses.

    PubMed

    Mattingly, B A; Zolman, J F

    1980-08-01

    The effect of the number of prepunishment acquisition trials on the age dependency of passive avoidance (PA) learning of the Vantress X Arbor Acre chick was determined in both key-peck and runway tests. In nine experiments, 1- and 4-day-old chicks were first trained to respond for heat reward, and then, following a variable number of reinforced acquisition trials, the chicks' responses were punished with aversive wing shocks. The major finding of these experiments was that the age dependency of PA learning of the young chick is related specifically to the number of reinforced training trials given prior to PA testing. When a large number of prepunishment acquisition trials were given, 1-day-old chicks learned as quickly as 4-day-old chicks to withhold responding when punished. However, when only a few acquisition trials preceded PA testing, 1-day-old chicks showed significantly less response suppression than 4-day-old chicks. These acquisition effects indicate that the age-dependent changes in PA learning of the chick are not solely due to developmental changes in general inhibitory ability. Rather, these PA results suggest that the 1-day-old chick, compared with the 4-day-old chick, is deficient in learning, or detecting changes in, stimulus- and/or response-reinforcement contingencies.

  18. Deficient reinforcement learning in medial frontal cortex as a model of dopamine-related motivational deficits in ADHD.

    PubMed

    Silvetti, Massimo; Wiersema, Jan R; Sonuga-Barke, Edmund; Verguts, Tom

    2013-10-01

    Attention Deficit/Hyperactivity Disorder (ADHD) is a pathophysiologically complex and heterogeneous condition with both cognitive and motivational components. We propose a novel computational hypothesis of motivational deficits in ADHD, drawing together recent evidence on the role of anterior cingulate cortex (ACC) and associated mesolimbic dopamine circuits in both reinforcement learning and ADHD. Based on findings of dopamine dysregulation and ACC involvement in ADHD we simulated a lesion in a previously validated computational model of ACC (Reward Value and Prediction Model, RVPM). We explored the effects of the lesion on the processing of reinforcement signals. We tested specific behavioral predictions about the profile of reinforcement-related deficits in ADHD in three experimental contexts; probability tracking task, partial and continuous reward schedules, and immediate versus delayed rewards. In addition, predictions were made at the neurophysiological level. Behavioral and neurophysiological predictions from the RVPM-based lesion-model of motivational dysfunction in ADHD were confirmed by data from previously published studies. RVPM represents a promising model of ADHD reinforcement learning suggesting that ACC dysregulation might play a role in the pathogenesis of motivational deficits in ADHD. However, more behavioral and neurophysiological studies are required to test core predictions of the model. In addition, the interaction with different brain networks underpinning other aspects of ADHD neuropathology (i.e., executive function) needs to be better understood. Copyright © 2013 Elsevier Ltd. All rights reserved.

  19. The touchscreen operant platform for testing learning and memory in rats and mice.

    PubMed

    Horner, Alexa E; Heath, Christopher J; Hvoslef-Eide, Martha; Kent, Brianne A; Kim, Chi Hun; Nilsson, Simon R O; Alsiö, Johan; Oomen, Charlotte A; Holmes, Andrew; Saksida, Lisa M; Bussey, Timothy J

    2013-10-01

    An increasingly popular method of assessing cognitive functions in rodents is the automated touchscreen platform, on which a number of different cognitive tests can be run in a manner very similar to touchscreen methods currently used to test human subjects. This methodology is low stress (using appetitive rather than aversive reinforcement), has high translational potential and lends itself to a high degree of standardization and throughput. Applications include the study of cognition in rodent models of psychiatric and neurodegenerative diseases (e.g., Alzheimer's disease, schizophrenia, Huntington's disease, frontotemporal dementia), as well as the characterization of the role of select brain regions, neurotransmitter systems and genes in rodents. This protocol describes how to perform four touchscreen assays of learning and memory: visual discrimination, object-location paired-associates learning, visuomotor conditional learning and autoshaping. It is accompanied by two further protocols (also published in this issue) that use the touchscreen platform to assess executive function, working memory and pattern separation.

  20. Robotic action acquisition with cognitive biases in coarse-grained state space.

    PubMed

    Uragami, Daisuke; Kohno, Yu; Takahashi, Tatsuji

    2016-07-01

    Some of the authors have previously proposed a cognitively inspired reinforcement learning architecture (LS-Q) that mimics cognitive biases in humans. LS-Q adaptively learns under uniform, coarse-grained state division and performs well without parameter tuning in a giant-swing robot task. However, these results were shown only in simulations. In this study, we test the validity of the LS-Q implemented in a robot in a real environment. In addition, we analyze the learning process to elucidate the mechanism by which the LS-Q adaptively learns under the partially observable environment. We argue that the LS-Q may be a versatile reinforcement learning architecture, which is, despite its simplicity, easily applicable and does not require well-prepared settings. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  1. Reinforcement Learning for Weakly-Coupled MDPs and an Application to Planetary Rover Control

    NASA Technical Reports Server (NTRS)

    Bernstein, Daniel S.; Zilberstein, Shlomo

    2003-01-01

    Weakly-coupled Markov decision processes can be decomposed into subprocesses that interact only through a small set of bottleneck states. We study a hierarchical reinforcement learning algorithm designed to take advantage of this particular type of decomposability. To test our algorithm, we use a decision-making problem faced by autonomous planetary rovers. In this problem, a Mars rover must decide which activities to perform and when to traverse between science sites in order to make the best use of its limited resources. In our experiments, the hierarchical algorithm performs better than Q-learning in the early stages of learning, but unlike Q-learning it converges to a suboptimal policy. This suggests that it may be advantageous to use the hierarchical algorithm when training time is limited.

  2. The development and pilot testing of a multimedia CD-ROM for diabetes education.

    PubMed

    Castaldini, M; Saltmarch, M; Luck, S; Sucher, K

    1998-01-01

    The multimedia CD-ROM program, Take Charge of Diabetes, was found to be accurate, easy to use, and enjoyable by the clients and health professionals who completed the pilot study. Participants perceived an increase in knowledge after completing the five modules. Two of the participants verbally stated that the program clarified information for them and they wished they had had such a program when they were first diagnosed with diabetes. Further evaluation is needed to generalize the effect of the program on knowledge of diabetes because the pilot study was not designed to fully evaluate the effectiveness of the program on knowledge level or behavior change. Behavior change resulting in better control of blood sugar levels and hemoglobin A1c within normal range is the goal for diabetes education. The person who lives with diabetes must learn self-care methods. To accomplish that, the person must be able to comprehend the material presented. CAI programs provide an individualized, interactive, and interesting way to learn about diabetes and self-care, using visual effects and audio to support the written text. CAI can provide an element of excitement that is not available with other conventional methods. Providing prompt reinforcement of correct answers in quiz sections and including positive written messages can increase patients' self-confidence and self-esteem. Computer-assisted instruction is not intended to replace personal contact with physicians and diabetes educators, but rather complement this contact, reinforce learning, and possibly increase self-motivation to take charge of one's diabetes.

  3. Moral learning: Psychological and philosophical perspectives.

    PubMed

    Cushman, Fiery; Kumar, Victor; Railton, Peter

    2017-10-01

    The past 15years occasioned an extraordinary blossoming of research into the cognitive and affective mechanisms that support moral judgment and behavior. This growth in our understanding of moral mechanisms overshadowed a crucial and complementary question, however: How are they learned? As this special issue of the journal Cognition attests, a new crop of research into moral learning has now firmly taken root. This new literature draws on recent advances in formal methods developed in other domains, such as Bayesian inference, reinforcement learning and other machine learning techniques. Meanwhile, it also demonstrates how learning and deciding in a social domain-and especially in the moral domain-sometimes involves specialized cognitive systems. We review the contributions to this special issue and situate them within the broader contemporary literature. Our review focuses on how we learn moral values and moral rules, how we learn about personal moral character and relationships, and the philosophical implications of these emerging models. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. A Blended Learning Course Design in Clinical Pharmacology for Post-graduate Dental Students

    PubMed Central

    Rosenbaum, Paul-Erik Lillholm; Mikalsen, Øyvind; Lygre, Henning; Solheim, Einar; Schjøtt, Jan

    2012-01-01

    Postgraduate courses in clinical pharmacology are important for dentists to be updated on drug therapy and information related to their clinical practice, as well as knowledge of relevant adverse effects and interactions. A traditional approach with classroom delivery as the only method to teaching and learning has shortcomings regarding flexibility, individual learning preferences, and problem based learning (PBL) activities compared to online environments. This study examines a five week postgraduate course in clinical pharmacology with 15 hours of lectures and online learning activities, i.e. blended course design. Six postgraduate dental students participated and at the end of the course they were interviewed. Our findings emphasize that a blended learning course design can be successfully used in postgraduate dental education. Key matters for discussion were time flexibility and location convenience, change in teacher’s role, rein-forced learning strategies towards professional needs, scarcity in online communication, and proposed future utilization of e-learning components. PMID:23248716

  5. Dopaminergic Contributions to Vocal Learning

    PubMed Central

    Hoffmann, Lukas A.; Saravanan, Varun; Wood, Alynda N.; He, Li

    2016-01-01

    Although the brain relies on auditory information to calibrate vocal behavior, the neural substrates of vocal learning remain unclear. Here we demonstrate that lesions of the dopaminergic inputs to a basal ganglia nucleus in a songbird species (Bengalese finches, Lonchura striata var. domestica) greatly reduced the magnitude of vocal learning driven by disruptive auditory feedback in a negative reinforcement task. These lesions produced no measureable effects on the quality of vocal performance or the amount of song produced. Our results suggest that dopaminergic inputs to the basal ganglia selectively mediate reinforcement-driven vocal plasticity. In contrast, dopaminergic lesions produced no measurable effects on the birds' ability to restore song acoustics to baseline following the cessation of reinforcement training, suggesting that different forms of vocal plasticity may use different neural mechanisms. SIGNIFICANCE STATEMENT During skill learning, the brain relies on sensory feedback to improve motor performance. However, the neural basis of sensorimotor learning is poorly understood. Here, we investigate the role of the neurotransmitter dopamine in regulating vocal learning in the Bengalese finch, a songbird with an extremely precise singing behavior that can nevertheless be reshaped dramatically by auditory feedback. Our findings show that reduction of dopamine inputs to a region of the songbird basal ganglia greatly impairs vocal learning but has no detectable effect on vocal performance. These results suggest a specific role for dopamine in regulating vocal plasticity. PMID:26888928

  6. Characteristics of implicit chaining in cotton-top tamarins (Saguinus oedipus).

    PubMed

    Locurto, Charles; Gagne, Matthew; Nutile, Lauren

    2010-07-01

    In human cognition there has been considerable interest in observing the conditions under which subjects learn material without explicit instructions to learn. In the present experiments, we adapted this issue to nonhumans by asking what subjects learn in the absence of explicit reinforcement for correct responses. Two experiments examined the acquisition of sequence information by cotton-top tamarins (Saguinus oedipus) when such learning was not demanded by the experimental contingencies. An implicit chaining procedure was used in which visual stimuli were presented serially on a touchscreen. Subjects were required to touch one stimulus to advance to the next stimulus. Stimulus presentations followed a pattern, but learning the pattern was not necessary for reinforcement. In Experiment 1 the chain consisted of five different visual stimuli that were presented in the same order on each trial. Each stimulus could occur at any one of six touchscreen positions. In Experiment 2 the same visual element was presented serially in the same five locations on each trial, thereby allowing a behavioral pattern to be correlated with the visual pattern. In this experiment two new tests, a Wild-Card test and a Running-Start test, were used to assess what was learned in this procedure. Results from both experiments indicated that tamarins acquired more information from an implicit chain than was required by the contingencies of reinforcement. These results contribute to the developing literature on nonhuman analogs of implicit learning.

  7. Understanding Optimal Decision-Making

    DTIC Science & Technology

    2015-06-01

    Task (IGT) (Bechara, Damasio, Damasio, & Anderson,1994), a very common test of reinforcement learning that has been used in hundreds of psychology ... psychology task that elicits reinforcement learning (Bechara et al., 1994) and has been used in hundreds of studies (Krain et al., 2006). Subjects...34) 70 # LatByTrial<- LatByTrial+geom_line(data=player,aes(x=trial,y=ewma),linetype=1, colour ="grey8 8") # LatByTrial<- LatByTrial+geom_point

  8. Reinforcement learning produces dominant strategies for the Iterated Prisoner's Dilemma.

    PubMed

    Harper, Marc; Knight, Vincent; Jones, Martin; Koutsovoulos, Georgios; Glynatsi, Nikoleta E; Campbell, Owen

    2017-01-01

    We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also.

  9. Unaltered radial maze performance and brain acetylcholine of the endothelial nitric oxide synthase knockout mouse.

    PubMed

    Dere, E; Frisch, C; De Souza Silva, M A; Gödecke, A; Schrader, J; Huston, J P

    2001-01-01

    Proceeding from previous findings of a beneficial effect of endothelial nitric oxide synthase (eNOS) gene inactivation on negatively reinforced water maze performance, we asked whether this improvement in place learning capacities also holds for a positively reinforced radial maze task. Unlike its beneficial effects on the water maze task, eNOS gene inactivation did not facilitate radial maze performance. The acquisition performance over the days of place learning did not differ between eNOS knockout (eNOS-/-) and wild-type mice (eNOS+/+). eNOS-/- mice displayed a slight and eNOS+/+ mice a more severe working memory deficit in the place learning version of the radial maze compared to the genetic background C57BL/6 strain. Possible differential effects of eNOS inactivation, related to differences in reinforcement contingencies between the Morris water maze and radial maze tasks, behavioral strategy requirements, or to different emotional and physiological concomitants inherent in the two tasks are discussed. These task-unique characteristics might be differentially affected by the reported anxiogenic and hypertensional effects of eNOS gene inactivation. Post-mortem determination of acetylcholine concentrations in diverse brain structures revealed that acetylcholine and choline contents were not different between eNOS-/- and eNOS+/+ mice, but were increased in eNOS+/+ mice compared to C57BL/6 mice in the frontal cortex. Our findings demonstrate that phenotyping of learning and memory capacities should not rely on one learning task only, but should include tasks employing both negative and positive reinforcement contingencies in order to allow valid statements regarding differences in learning capacities between rodent strains.

  10. Operant licking for intragastric sugar infusions: differential reinforcing actions of glucose, sucrose and fructose in mice

    PubMed Central

    Sclafani, Anthony; Ackroff, Karen

    2015-01-01

    Intragastric (IG) flavor conditioning studies in rodents indicate that isocaloric sugar infusions differ in their reinforcing actions, with glucose and sucrose more potent than fructose. Here we determined if the sugars also differ in their ability to maintain operant self-administration by licking an empty spout for IG infusions. Food-restricted C57BL/6J mice were trained 1 h/day to lick a food-baited spout, which triggered IG infusions of 16% sucrose. In testing, the mice licked an empty spout, which triggered IG infusions of different sugars. Mice shifted from sucrose to 16% glucose increased dry licking, whereas mice shifted to 16% fructose rapidly reduced licking to low levels. Other mice shifted from sucrose to IG water reduced licking more slowly but reached the same low levels. Thus IG fructose, like water, is not reinforcing to hungry mice. The more rapid decline in licking induced by fructose may be due to the sugar's satiating effects. Further tests revealed that the Glucose mice increased their dry licking when shifted from 16% to 8% glucose, and reduced their dry licking when shifted to 32% glucose. This may reflect caloric regulation and/or differences in satiation. The Glucose mice did not maintain caloric intake when tested with different sugars. They self-infused less sugar when shifted from 16% glucose to 16% sucrose, and even more so when shifted to 16% fructose. Reduced sucrose self-administration may occur because the fructose component of the disaccharide reduces its reinforcing potency. FVB mice also reduced operant licking when tested with 16% fructose, yet learned to prefer a flavor paired with IG fructose. These data indicate that sugars differ substantially in their ability to support IG self-administration and flavor preference learning. The same post-oral reinforcement process appears to mediate operant licking and flavor learning, although flavor learning provides a more sensitive measure of sugar reinforcement. PMID:26485294

  11. From Creatures of Habit to Goal-Directed Learners: Tracking the Developmental Emergence of Model-Based Reinforcement Learning.

    PubMed

    Decker, Johannes H; Otto, A Ross; Daw, Nathaniel D; Hartley, Catherine A

    2016-06-01

    Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative strategies has not been well characterized. In this study, children, adolescents, and adults performed a sequential reinforcement-learning task that enabled estimation of model-based and model-free contributions to choice. Whereas a model-free strategy was apparent in choice behavior across all age groups, a model-based strategy was absent in children, became evident in adolescents, and strengthened in adults. These results suggest that recruitment of model-based valuation systems represents a critical cognitive component underlying the gradual maturation of goal-directed behavior. © The Author(s) 2016.

  12. Reinforcement Learning with Autonomous Small Unmanned Aerial Vehicles in Cluttered Environments

    NASA Technical Reports Server (NTRS)

    Tran, Loc; Cross, Charles; Montague, Gilbert; Motter, Mark; Neilan, James; Qualls, Garry; Rothhaar, Paul; Trujillo, Anna; Allen, B. Danette

    2015-01-01

    We present ongoing work in the Autonomy Incubator at NASA Langley Research Center (LaRC) exploring the efficacy of a data set aggregation approach to reinforcement learning for small unmanned aerial vehicle (sUAV) flight in dense and cluttered environments with reactive obstacle avoidance. The goal is to learn an autonomous flight model using training experiences from a human piloting a sUAV around static obstacles. The training approach uses video data from a forward-facing camera that records the human pilot's flight. Various computer vision based features are extracted from the video relating to edge and gradient information. The recorded human-controlled inputs are used to train an autonomous control model that correlates the extracted feature vector to a yaw command. As part of the reinforcement learning approach, the autonomous control model is iteratively updated with feedback from a human agent who corrects undesired model output. This data driven approach to autonomous obstacle avoidance is explored for simulated forest environments furthering autonomous flight under the tree canopy research. This enables flight in previously inaccessible environments which are of interest to NASA researchers in Earth and Atmospheric sciences.

  13. Closed-loop and robust control of quantum systems.

    PubMed

    Chen, Chunlin; Wang, Lin-Cheng; Wang, Yuanlong

    2013-01-01

    For most practical quantum control systems, it is important and difficult to attain robustness and reliability due to unavoidable uncertainties in the system dynamics or models. Three kinds of typical approaches (e.g., closed-loop learning control, feedback control, and robust control) have been proved to be effective to solve these problems. This work presents a self-contained survey on the closed-loop and robust control of quantum systems, as well as a brief introduction to a selection of basic theories and methods in this research area, to provide interested readers with a general idea for further studies. In the area of closed-loop learning control of quantum systems, we survey and introduce such learning control methods as gradient-based methods, genetic algorithms (GA), and reinforcement learning (RL) methods from a unified point of view of exploring the quantum control landscapes. For the feedback control approach, the paper surveys three control strategies including Lyapunov control, measurement-based control, and coherent-feedback control. Then such topics in the field of quantum robust control as H(∞) control, sliding mode control, quantum risk-sensitive control, and quantum ensemble control are reviewed. The paper concludes with a perspective of future research directions that are likely to attract more attention.

  14. Gaussian Processes for Data-Efficient Learning in Robotics and Control.

    PubMed

    Deisenroth, Marc Peter; Fox, Dieter; Rasmussen, Carl Edward

    2015-02-01

    Autonomous learning has been a promising direction in control and robotics for more than a decade since data-driven learning allows to reduce the amount of engineering knowledge, which is otherwise required. However, autonomous reinforcement learning (RL) approaches typically require many interactions with the system to learn controllers, which is a practical limitation in real systems, such as robots, where many interactions can be impractical and time consuming. To address this problem, current learning approaches typically require task-specific knowledge in form of expert demonstrations, realistic simulators, pre-shaped policies, or specific knowledge about the underlying dynamics. In this paper, we follow a different approach and speed up learning by extracting more information from data. In particular, we learn a probabilistic, non-parametric Gaussian process transition model of the system. By explicitly incorporating model uncertainty into long-term planning and controller learning our approach reduces the effects of model errors, a key problem in model-based learning. Compared to state-of-the art RL our model-based policy search method achieves an unprecedented speed of learning. We demonstrate its applicability to autonomous learning in real robot and control tasks.

  15. Stimulating Deep Learning Using Active Learning Techniques

    ERIC Educational Resources Information Center

    Yew, Tee Meng; Dawood, Fauziah K. P.; a/p S. Narayansany, Kannaki; a/p Palaniappa Manickam, M. Kamala; Jen, Leong Siok; Hoay, Kuan Chin

    2016-01-01

    When students and teachers behave in ways that reinforce learning as a spectator sport, the result can often be a classroom and overall learning environment that is mostly limited to transmission of information and rote learning rather than deep approaches towards meaningful construction and application of knowledge. A group of college instructors…

  16. CENTRAL REINFORCING EFFECTS OF ETHANOL ARE BLOCKED BY CATALASE INHIBITION

    PubMed Central

    Nizhnikov, Michael Edward; Molina, Juan Carlos; Spear, Norman

    2007-01-01

    Recent studies have systematically indicated that newborn rats are highly sensitive to ethanol’s positive reinforcing effects. Central administrations of ethanol (25–200 mg %) associated with an olfactory conditioned stimulus (CS) promote subsequent conditioned approach to the CS as evaluated through the newborn’s response to a surrogate nipple scented with the CS. It has been shown that ethanol’s first metabolite, acetaldehyde, exerts significant reinforcing effects in the central nervous system. A significant amount of acetaldehyde is derived from ethanol metabolism via the catalase system. In newborn rats catalase levels are particularly high in several brain structures. The present study tested the effect of catalase inhibition on central ethanol reinforcement. In the first experiment, pups experienced lemon odor either paired or unpaired with intracisternal (i.c.) administrations of 100 mg% ethanol. Half of the animals corresponding to each learning condition were pretreated with i.c. administrations of either physiological saline or a catalase inhibitor (sodium-azide). Catalase inhibition completely suppressed ethanol reinforcement in paired groups without affecting responsiveness to the CS during conditioning or responding by unpaired control groups. A second experiment tested whether these effects were specific to ethanol reinforcement or due instead to general impairment in learning and expression capabilities. Central administration of an endogenous kappa opioid receptor agonist (dynorphin A-13) was employed as an alternative source of reinforcement. Inhibition of the catalase system had no effect on the reinforcing properties of dynorphin. The present results support the hypothesis that ethanol metabolism regulated by the catalase system plays a critical role in determination of ethanol reinforcement in newborn rats. PMID:17980789

  17. Reinforced Adversarial Neural Computer for de Novo Molecular Design.

    PubMed

    Putin, Evgeny; Asadulaev, Arip; Ivanenkov, Yan; Aladinskiy, Vladimir; Sanchez-Lengeling, Benjamin; Aspuru-Guzik, Alán; Zhavoronkov, Alex

    2018-06-12

    In silico modeling is a crucial milestone in modern drug design and development. Although computer-aided approaches in this field are well-studied, the application of deep learning methods in this research area is at the beginning. In this work, we present an original deep neural network (DNN) architecture named RANC (Reinforced Adversarial Neural Computer) for the de novo design of novel small-molecule organic structures based on the generative adversarial network (GAN) paradigm and reinforcement learning (RL). As a generator RANC uses a differentiable neural computer (DNC), a category of neural networks, with increased generation capabilities due to the addition of an explicit memory bank, which can mitigate common problems found in adversarial settings. The comparative results have shown that RANC trained on the SMILES string representation of the molecules outperforms its first DNN-based counterpart ORGANIC by several metrics relevant to drug discovery: the number of unique structures, passing medicinal chemistry filters (MCFs), Muegge criteria, and high QED scores. RANC is able to generate structures that match the distributions of the key chemical features/descriptors (e.g., MW, logP, TPSA) and lengths of the SMILES strings in the training data set. Therefore, RANC can be reasonably regarded as a promising starting point to develop novel molecules with activity against different biological targets or pathways. In addition, this approach allows scientists to save time and covers a broad chemical space populated with novel and diverse compounds.

  18. The involvement of model-based but not model-free learning signals during observational reward learning in the absence of choice.

    PubMed

    Dunne, Simon; D'Souza, Arun; O'Doherty, John P

    2016-06-01

    A major open question is whether computational strategies thought to be used during experiential learning, specifically model-based and model-free reinforcement learning, also support observational learning. Furthermore, the question of how observational learning occurs when observers must learn about the value of options from observing outcomes in the absence of choice has not been addressed. In the present study we used a multi-armed bandit task that encouraged human participants to employ both experiential and observational learning while they underwent functional magnetic resonance imaging (fMRI). We found evidence for the presence of model-based learning signals during both observational and experiential learning in the intraparietal sulcus. However, unlike during experiential learning, model-free learning signals in the ventral striatum were not detectable during this form of observational learning. These results provide insight into the flexibility of the model-based learning system, implicating this system in learning during observation as well as from direct experience, and further suggest that the model-free reinforcement learning system may be less flexible with regard to its involvement in observational learning. Copyright © 2016 the American Physiological Society.

  19. Methods for producing reinforced carbon nanotubes

    DOEpatents

    Ren, Zhifen [Newton, MA; Wen, Jian Guo [Newton, MA; Lao, Jing Y [Chestnut Hill, MA; Li, Wenzhi [Brookline, MA

    2008-10-28

    Methods for producing reinforced carbon nanotubes having a plurality of microparticulate carbide or oxide materials formed substantially on the surface of such reinforced carbon nanotubes composite materials are disclosed. In particular, the present invention provides reinforced carbon nanotubes (CNTs) having a plurality of boron carbide nanolumps formed substantially on a surface of the reinforced CNTs that provide a reinforcing effect on CNTs, enabling their use as effective reinforcing fillers for matrix materials to give high-strength composites. The present invention also provides methods for producing such carbide reinforced CNTs.

  20. Dissociable neural representations of reinforcement and belief prediction errors underlie strategic learning

    PubMed Central

    Zhu, Lusha; Mathewson, Kyle E.; Hsu, Ming

    2012-01-01

    Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents’ beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs. PMID:22307594

Top