Sample records for temporal-difference reinforcement learning

  1. Reconciling Reinforcement Learning Models with Behavioral Extinction and Renewal: Implications for Addiction, Relapse, and Problem Gambling

    ERIC Educational Resources Information Center

    Redish, A. David; Jensen, Steve; Johnson, Adam; Kurth-Nelson, Zeb

    2007-01-01

    Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated value and are thus unable to capture renewal. TDRL…

  2. Kernel Temporal Differences for Neural Decoding

    PubMed Central

    Bae, Jihye; Sanchez Giraldo, Luis G.; Pohlmeyer, Eric A.; Francis, Joseph T.; Sanchez, Justin C.; Príncipe, José C.

    2015-01-01

    We study the feasibility and capability of the kernel temporal difference (KTD)(λ) algorithm for neural decoding. KTD(λ) is an online, kernel-based learning algorithm, which has been introduced to estimate value functions in reinforcement learning. This algorithm combines kernel-based representations with the temporal difference approach to learning. One of our key observations is that by using strictly positive definite kernels, algorithm's convergence can be guaranteed for policy evaluation. The algorithm's nonlinear functional approximation capabilities are shown in both simulations of policy evaluation and neural decoding problems (policy improvement). KTD can handle high-dimensional neural states containing spatial-temporal information at a reasonable computational complexity allowing real-time applications. When the algorithm seeks a proper mapping between a monkey's neural states and desired positions of a computer cursor or a robot arm, in both open-loop and closed-loop experiments, it can effectively learn the neural state to action mapping. Finally, a visualization of the coadaptation process between the decoder and the subject shows the algorithm's capabilities in reinforcement learning brain machine interfaces. PMID:25866504

  3. On the asymptotic equivalence between differential Hebbian and temporal difference learning.

    PubMed

    Kolodziejski, Christoph; Porr, Bernd; Wörgötter, Florentin

    2009-04-01

    In this theoretical contribution, we provide mathematical proof that two of the most important classes of network learning-correlation-based differential Hebbian learning and reward-based temporal difference learning-are asymptotically equivalent when timing the learning with a modulatory signal. This opens the opportunity to consistently reformulate most of the abstract reinforcement learning framework from a correlation-based perspective more closely related to the biophysics of neurons.

  4. Temporally Coordinated Deep Brain Stimulation in the Dorsal and Ventral Striatum Synergistically Enhances Associative Learning.

    PubMed

    Katnani, Husam A; Patel, Shaun R; Kwon, Churl-Su; Abdel-Aziz, Samer; Gale, John T; Eskandar, Emad N

    2016-01-04

    The primate brain has the remarkable ability of mapping sensory stimuli into motor behaviors that can lead to positive outcomes. We have previously shown that during the reinforcement of visual-motor behavior, activity in the caudate nucleus is correlated with the rate of learning. Moreover, phasic microstimulation in the caudate during the reinforcement period was shown to enhance associative learning, demonstrating the importance of temporal specificity to manipulate learning related changes. Here we present evidence that extends upon our previous finding by demonstrating that temporally coordinated phasic deep brain stimulation across both the nucleus accumbens and caudate can further enhance associative learning. Monkeys performed a visual-motor associative learning task and received stimulation at time points critical to learning related changes. Resulting performance revealed an enhancement in the rate, ceiling, and reaction times of learning. Stimulation of each brain region alone or at different time points did not generate the same effect.

  5. Implicit chaining in cotton-top tamarins (Saguinus oedipus) with elements equated for probability of reinforcement

    PubMed Central

    Dillon, Laura; Collins, Meaghan; Conway, Maura; Cunningham, Kate

    2013-01-01

    Three experiments examined the implicit learning of sequences under conditions in which the elements comprising a sequence were equated in terms of reinforcement probability. In Experiment 1 cotton-top tamarins (Saguinus oedipus) experienced a five-element sequence displayed serially on a touch screen in which reinforcement probability was equated across elements at .16 per element. Tamarins demonstrated learning of this sequence with higher latencies during a random test as compared to baseline sequence training. In Experiments 2 and 3, manipulations of the procedure used in the first experiment were undertaken to rule out a confound owing to the fact that the elements in Experiment 1 bore different temporal relations to the intertrial interval (ITI), an inhibitory period. The results of Experiments 2 and 3 indicated that the implicit learning observed in Experiment 1 was not due to temporal proximity between some elements and the inhibitory ITI. The results taken together support two conclusion: First that tamarins engaged in sequence learning whether or not there was contingent reinforcement for learning the sequence, and second that this learning was not due to subtle differences in associative strength between the elements of the sequence. PMID:23344718

  6. On the integration of reinforcement learning and approximate reasoning for control

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1991-01-01

    The author discusses the importance of strengthening the knowledge representation characteristic of reinforcement learning techniques using methods such as approximate reasoning. The ARIC (approximate reasoning-based intelligent control) architecture is an example of such a hybrid approach in which the fuzzy control rules are modified (fine-tuned) using reinforcement learning. ARIC also demonstrates that it is possible to start with an approximately correct control knowledge base and learn to refine this knowledge through further experience. On the other hand, techniques such as the TD (temporal difference) algorithm and Q-learning establish stronger theoretical foundations for their use in adaptive control and also in stability analysis of hybrid reinforcement learning and approximate reasoning-based controllers.

  7. Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity, and Performance

    DTIC Science & Technology

    2014-09-29

    Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity , and Performance W. Bradley Knox...positive a trainer’s reward values are; temporal discounting, the extent to which future reward is discounted in value; episodicity , whether task...learning occurs in discrete learning episodes instead of one continuing session; and task performance, the agent’s performance on the task the trainer

  8. Apprenticeship Learning: Learning to Schedule from Human Experts

    DTIC Science & Technology

    2016-06-09

    approaches to learning such models are based on Markov models, such as reinforcement learning or inverse reinforcement learning (Busoniu, Babuska, and De...via inverse reinforcement learning. In ICML. Barto, A. G., and Mahadevan, S. 2003. Recent advances in hierarchical reinforcement learning. Discrete...of tasks with temporal constraints. In Proc. AAAI, 2110–2116. Odom, P., and Natarajan, S. 2015. Active advice seeking for inverse reinforcement

  9. Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules

    PubMed Central

    La Camera, Giancarlo; Richmond, Barry J.

    2008-01-01

    It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as “schedule length effect”). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: “framing,” wherein equivalent options are treated differently depending on the context in which they are presented, and the “sunk cost” effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys. PMID:18688266

  10. Modeling the violation of reward maximization and invariance in reinforcement schedules.

    PubMed

    La Camera, Giancarlo; Richmond, Barry J

    2008-08-08

    It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as "schedule length effect"). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: "framing," wherein equivalent options are treated differently depending on the context in which they are presented, and the "sunk cost" effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys.

  11. Navigating complex decision spaces: Problems and paradigms in sequential choice

    PubMed Central

    Walsh, Matthew M.; Anderson, John R.

    2015-01-01

    To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult when the consequences of an action follow a delay. This introduces the problem of temporal credit assignment. When feedback follows a sequence of decisions, how should the individual assign credit to the intermediate actions that comprise the sequence? Research in reinforcement learning provides two general solutions to this problem: model-free reinforcement learning and model-based reinforcement learning. In this review, we examine connections between stimulus-response and cognitive learning theories, habitual and goal-directed control, and model-free and model-based reinforcement learning. We then consider a range of problems related to temporal credit assignment. These include second-order conditioning and secondary reinforcers, latent learning and detour behavior, partially observable Markov decision processes, actions with distributed outcomes, and hierarchical learning. We ask whether humans and animals, when faced with these problems, behave in a manner consistent with reinforcement learning techniques. Throughout, we seek to identify neural substrates of model-free and model-based reinforcement learning. The former class of techniques is understood in terms of the neurotransmitter dopamine and its effects in the basal ganglia. The latter is understood in terms of a distributed network of regions including the prefrontal cortex, medial temporal lobes cerebellum, and basal ganglia. Not only do reinforcement learning techniques have a natural interpretation in terms of human and animal behavior, but they also provide a useful framework for understanding neural reward valuation and action selection. PMID:23834192

  12. Prediction error and trace dominance determine the fate of fear memories after post-training manipulations

    PubMed Central

    Alfei, Joaquín M.; Ferrer Monti, Roque I.; Molina, Victor A.; Bueno, Adrián M.

    2015-01-01

    Different mnemonic outcomes have been observed when associative memories are reactivated by CS exposure and followed by amnestics. These outcomes include mere retrieval, destabilization–reconsolidation, a transitional period (which is insensitive to amnestics), and extinction learning. However, little is known about the interaction between initial learning conditions and these outcomes during a reinforced or nonreinforced reactivation. Here we systematically combined temporally specific memories with different reactivation parameters to observe whether these four outcomes are determined by the conditions established during training. First, we validated two training regimens with different temporal expectations about US arrival. Then, using Midazolam (MDZ) as an amnestic agent, fear memories in both learning conditions were submitted to retraining either under identical or different parameters to the original training. Destabilization (i.e., susceptibly to MDZ) occurred when reactivation was reinforced, provided the occurrence of a temporal prediction error about US arrival. In subsequent experiments, both treatments were systematically reactivated by nonreinforced context exposure of different lengths, which allowed to explore the interaction between training and reactivation lengths. These results suggest that temporal prediction error and trace dominance determine the extent to which reactivation produces the different outcomes. PMID:26179232

  13. Constructing Temporally Extended Actions through Incremental Community Detection

    PubMed Central

    Li, Ge

    2018-01-01

    Hierarchical reinforcement learning works on temporally extended actions or skills to facilitate learning. How to automatically form such abstraction is challenging, and many efforts tackle this issue in the options framework. While various approaches exist to construct options from different perspectives, few of them concentrate on options' adaptability during learning. This paper presents an algorithm to create options and enhance their quality online. Both aspects operate on detected communities of the learning environment's state transition graph. We first construct options from initial samples as the basis of online learning. Then a rule-based community revision algorithm is proposed to update graph partitions, based on which existing options can be continuously tuned. Experimental results in two problems indicate that options from initial samples may perform poorly in more complex environments, and our presented strategy can effectively improve options and get better results compared with flat reinforcement learning. PMID:29849543

  14. Characterizing Reinforcement Learning Methods through Parameterized Learning Problems

    DTIC Science & Technology

    2011-06-03

    extraneous. The agent could potentially adapt these representational aspects by applying methods from feature selection ( Kolter and Ng, 2009; Petrik et al...611–616. AAAI Press. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature selection in least-squares temporal difference learning. In A. P

  15. The basal ganglia is necessary for learning spectral, but not temporal features of birdsong

    PubMed Central

    Ali, Farhan; Fantana, Antoniu L.; Burak, Yoram; Ölveczky, Bence P.

    2013-01-01

    Executing a motor skill requires the brain to control which muscles to activate at what times. How these aspects of control - motor implementation and timing - are acquired, and whether the learning processes underlying them differ, is not well understood. To address this we used a reinforcement learning paradigm to independently manipulate both spectral and temporal features of birdsong, a complex learned motor sequence, while recording and perturbing activity in underlying circuits. Our results uncovered a striking dissociation in how neural circuits underlie learning in the two domains. The basal ganglia was required for modifying spectral, but not temporal structure. This functional dissociation extended to the descending motor pathway, where recordings from a premotor cortex analogue nucleus reflected changes to temporal, but not spectral structure. Our results reveal a strategy in which the nervous system employs different and largely independent circuits to learn distinct aspects of a motor skill. PMID:24075977

  16. Novelty and Inductive Generalization in Human Reinforcement Learning

    PubMed Central

    Gershman, Samuel J.; Niv, Yael

    2015-01-01

    In reinforcement learning, a decision maker searching for the most rewarding option is often faced with the question: what is the value of an option that has never been tried before? One way to frame this question is as an inductive problem: how can I generalize my previous experience with one set of options to a novel option? We show how hierarchical Bayesian inference can be used to solve this problem, and describe an equivalence between the Bayesian model and temporal difference learning algorithms that have been proposed as models of reinforcement learning in humans and animals. According to our view, the search for the best option is guided by abstract knowledge about the relationships between different options in an environment, resulting in greater search efficiency compared to traditional reinforcement learning algorithms previously applied to human cognition. In two behavioral experiments, we test several predictions of our model, providing evidence that humans learn and exploit structured inductive knowledge to make predictions about novel options. In light of this model, we suggest a new interpretation of dopaminergic responses to novelty. PMID:25808176

  17. A neural model of hierarchical reinforcement learning.

    PubMed

    Rasmussen, Daniel; Voelker, Aaron; Eliasmith, Chris

    2017-01-01

    We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain's general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model's behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions.

  18. The Effects of Interval Duration on Temporal Tracking and Alternation Learning

    ERIC Educational Resources Information Center

    Ludvig, Elliot A.; Staddon, John E. R.

    2005-01-01

    On cyclic-interval reinforcement schedules, animals typically show a postreinforcement pause that is a function of the immediately preceding time interval ("temporal tracking"). Animals, however, do not track single-alternation schedules--when two different intervals are presented in strict alternation on successive trials. In this experiment,…

  19. Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.

    PubMed

    Ren, Zhipeng; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Zhipeng Ren; Daoyi Dong; Huaxiong Li; Chunlin Chen; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Ren, Zhipeng

    2018-06-01

    In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.

  20. The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces.

    PubMed

    Huertas, Marco A; Schwettmann, Sarah E; Shouval, Harel Z

    2016-01-01

    The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL) refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be different than those for artificial systems. Two major problems encountered by RL are how to relate a stimulus with a reinforcing signal that is delayed in time (temporal credit assignment), and how to stop learning once the target behaviors are attained (stopping rule). To address the first problem synaptic eligibility traces were introduced, bridging the temporal gap between a stimulus and its reward. Although, these were mere theoretical constructs, recent experiments have provided evidence of their existence. These experiments also reveal that the presence of specific neuromodulators converts the traces into changes in synaptic efficacy. A mechanistic implementation of the stopping rule usually assumes the inhibition of the reward nucleus; however, recent experimental results have shown that learning terminates at the appropriate network state even in setups where the reward nucleus cannot be inhibited. In an effort to describe a learning rule that solves the temporal credit assignment problem and implements a biologically plausible stopping rule, we proposed a model based on two separate synaptic eligibility traces, one for long-term potentiation (LTP) and one for long-term depression (LTD), each obeying different dynamics and having different effective magnitudes. The model has been shown to successfully generate stable learning in recurrent networks. Although, the model assumes the presence of a single neuromodulator, evidence indicates that there are different neuromodulators for expressing the different traces. What could be the role of different neuromodulators for expressing the LTP and LTD traces? Here we expand on our previous model to include several neuromodulators, and illustrate through various examples how different these contribute to learning reward-timing within a wide set of training paradigms and propose further roles that multiple neuromodulators can play in encoding additional information of the rewarding signal.

  1. A reinforcement learning-based architecture for fuzzy logic control

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1992-01-01

    This paper introduces a new method for learning to refine a rule-based fuzzy logic controller. A reinforcement learning technique is used in conjunction with a multilayer neural network model of a fuzzy controller. The approximate reasoning based intelligent control (ARIC) architecture proposed here learns by updating its prediction of the physical system's behavior and fine tunes a control knowledge base. Its theory is related to Sutton's temporal difference (TD) method. Because ARIC has the advantage of using the control knowledge of an experienced operator and fine tuning it through the process of learning, it learns faster than systems that train networks from scratch. The approach is applied to a cart-pole balancing system.

  2. A neural model of hierarchical reinforcement learning

    PubMed Central

    Rasmussen, Daniel; Eliasmith, Chris

    2017-01-01

    We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain’s general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model’s behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions. PMID:28683111

  3. Network Supervision of Adult Experience and Learning Dependent Sensory Cortical Plasticity.

    PubMed

    Blake, David T

    2017-06-18

    The brain is capable of remodeling throughout life. The sensory cortices provide a useful preparation for studying neuroplasticity both during development and thereafter. In adulthood, sensory cortices change in the cortical area activated by behaviorally relevant stimuli, by the strength of response within that activated area, and by the temporal profiles of those responses. Evidence supports forms of unsupervised, reinforcement, and fully supervised network learning rules. Studies on experience-dependent plasticity have mostly not controlled for learning, and they find support for unsupervised learning mechanisms. Changes occur with greatest ease in neurons containing α-CamKII, which are pyramidal neurons in layers II/III and layers V/VI. These changes use synaptic mechanisms including long term depression. Synaptic strengthening at NMDA-containing synapses does occur, but its weak association with activity suggests other factors also initiate changes. Studies that control learning find support of reinforcement learning rules and limited evidence of other forms of supervised learning. Behaviorally associating a stimulus with reinforcement leads to a strengthening of cortical response strength and enlarging of response area with poor selectivity. Associating a stimulus with omission of reinforcement leads to a selective weakening of responses. In some preparations in which these associations are not as clearly made, neurons with the most informative discharges are relatively stronger after training. Studies analyzing the temporal profile of responses associated with omission of reward, or of plasticity in studies with different discriminanda but statistically matched stimuli, support the existence of limited supervised network learning. © 2017 American Physiological Society. Compr Physiol 7:977-1008, 2017. Copyright © 2017 John Wiley & Sons, Inc.

  4. The Interaction of Temporal Generalization Gradients Predicts the Context Effect

    ERIC Educational Resources Information Center

    de Castro, Ana Catarina; Machado, Armando

    2012-01-01

    In a temporal double bisection task, animals learn two discriminations. In the presence of Red and Green keys, responses to Red are reinforced after 1-s samples and responses to Green are reinforced after 4-s samples; in the presence of Blue and Yellow keys, responses to Blue are reinforced after 4-s samples and responses to Yellow are reinforced…

  5. Influence of temporal context on value in the multiple-chains and successive-encounters procedures.

    PubMed

    O'Daly, Matthew; Angulo, Samuel; Gipson, Cassandra; Fantino, Edmund

    2006-05-01

    This set of studies explored the influence of temporal context across multiple-chain and multiple-successive-encounters procedures. Following training with different temporal contexts, the value of stimuli sharing similar reinforcement schedules was assessed by presenting these stimuli in concurrent probes. The results for the multiple-chain schedule indicate that temporal context does impact the value of a conditioned reinforcer consistent with delay-reduction theory, such that a stimulus signaling a greater reduction in delay until reinforcement has greater value. Further, nonreinforced stimuli that are concurrently presented with the preferred terminal link also have greater value, consistent with value transfer. The effects of context on value for conditions with the multiple-successive-encounters procedure, however, appear to depend on whether the search schedule or alternate handling schedule was manipulated, as well as on whether the tested stimuli were the rich or lean schedules in their components. Overall, the results help delineate the conditions under which temporal context affects conditioned-reinforcement value (acting as a learning variable) and the conditions under which it does not (acting as a performance variable), an issue of relevance to theories of choice.

  6. Neural correlates of reinforcement learning and social preferences in competitive bidding.

    PubMed

    van den Bos, Wouter; Talwar, Arjun; McClure, Samuel M

    2013-01-30

    In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.

  7. GA-based fuzzy reinforcement learning for control of a magnetic bearing system.

    PubMed

    Lin, C T; Jou, C P

    2000-01-01

    This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.

  8. Human-level control through deep reinforcement learning.

    PubMed

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-26

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  9. Human-level control through deep reinforcement learning

    NASA Astrophysics Data System (ADS)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-01

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  10. Temporal structure of motor variability is dynamically regulated and predicts motor learning ability.

    PubMed

    Wu, Howard G; Miyamoto, Yohsuke R; Gonzalez Castro, Luis Nicolas; Ölveczky, Bence P; Smith, Maurice A

    2014-02-01

    Individual differences in motor learning ability are widely acknowledged, yet little is known about the factors that underlie them. Here we explore whether movement-to-movement variability in motor output, a ubiquitous if often unwanted characteristic of motor performance, predicts motor learning ability. Surprisingly, we found that higher levels of task-relevant motor variability predicted faster learning both across individuals and across tasks in two different paradigms, one relying on reward-based learning to shape specific arm movement trajectories and the other relying on error-based learning to adapt movements in novel physical environments. We proceeded to show that training can reshape the temporal structure of motor variability, aligning it with the trained task to improve learning. These results provide experimental support for the importance of action exploration, a key idea from reinforcement learning theory, showing that motor variability facilitates motor learning in humans and that our nervous systems actively regulate it to improve learning.

  11. Temporal structure of motor variability is dynamically regulated and predicts motor learning ability

    PubMed Central

    Wu, Howard G; Miyamoto, Yohsuke R; Castro, Luis Nicolas Gonzalez; Ölveczky, Bence P; Smith, Maurice A

    2015-01-01

    Individual differences in motor learning ability are widely acknowledged, yet little is known about the factors that underlie them. Here we explore whether movement-to-movement variability in motor output, a ubiquitous if often unwanted characteristic of motor performance, predicts motor learning ability. Surprisingly, we found that higher levels of task-relevant motor variability predicted faster learning both across individuals and across tasks in two different paradigms, one relying on reward-based learning to shape specific arm movement trajectories and the other relying on error-based learning to adapt movements in novel physical environments. We proceeded to show that training can reshape the temporal structure of motor variability, aligning it with the trained task to improve learning. These results provide experimental support for the importance of action exploration, a key idea from reinforcement learning theory, showing that motor variability facilitates motor learning in humans and that our nervous systems actively regulate it to improve learning. PMID:24413700

  12. Further tests of the Scalar Expectancy Theory (SET) and the Learning-to-Time (LeT) model in a temporal bisection task.

    PubMed

    Machado, Armando; Arantes, Joana

    2006-06-01

    To contrast two models of timing, Scalar Expectancy Theory (SET) and Learning to Time (LeT), pigeons were exposed to a double temporal bisection procedure. On half of the trials, they learned to choose a red key after a 1s signal and a green key after a 4s signal; on the other half of the trials, they learned to choose a blue key after a 4-s signal and a yellow key after a 16-s signal. This was Phase A of an ABA design. On Phase B, the pigeons were divided into two groups and exposed to a new bisection task in which the signals ranged from 1 to 16s and the choice keys were blue and green. One group was reinforced for choosing blue after 1-s signals and green after 16-s signals and the other group was reinforced for the opposite mapping (green after 1-s signals and blue after 16-s signals). Whereas SET predicted no differences between the groups, LeT predicted that the former group would learn the new discrimination faster than the latter group. The results were consistent with LeT. Finally, the pigeons returned to Phase A. Only LeT made specific predictions regarding the reacquisition of the four temporal discriminations. These predictions were only partly consistent with the results.

  13. Model-free and model-based reward prediction errors in EEG.

    PubMed

    Sambrook, Thomas D; Hardwick, Ben; Wills, Andy J; Goslin, Jeremy

    2018-05-24

    Learning theorists posit two reinforcement learning systems: model-free and model-based. Model-based learning incorporates knowledge about structure and contingencies in the world to assign candidate actions with an expected value. Model-free learning is ignorant of the world's structure; instead, actions hold a value based on prior reinforcement, with this value updated by expectancy violation in the form of a reward prediction error. Because they use such different learning mechanisms, it has been previously assumed that model-based and model-free learning are computationally dissociated in the brain. However, recent fMRI evidence suggests that the brain may compute reward prediction errors to both model-free and model-based estimates of value, signalling the possibility that these systems interact. Because of its poor temporal resolution, fMRI risks confounding reward prediction errors with other feedback-related neural activity. In the present study, EEG was used to show the presence of both model-based and model-free reward prediction errors and their place in a temporal sequence of events including state prediction errors and action value updates. This demonstration of model-based prediction errors questions a long-held assumption that model-free and model-based learning are dissociated in the brain. Copyright © 2018 Elsevier Inc. All rights reserved.

  14. Organization in memory and behavior1

    PubMed Central

    Shimp, Charles P.

    1976-01-01

    Some common reinforcement contingencies make the delivery of a reinforcer depend on the occurrence of behavior lacking significant temporal structure: a reinforcer may be contingent on nearly instantaneous responses such as a pigeon's key peck, a rat's lever press, a human's button press or brief verbal utterance, and so on. Such a reinforcement contingency conforms much more closely to the functionalist tradition in experimental psychology than to the structuralist tradition. Until recently, the functionalist tradition, in the form of a kind of associationism, typified most research on human learning and memory. Recently, however, research on human memory has focused more on structural issues: now the basic unit of analysis often involves an organized temporal pattern of behavior. A focus on the interrelations between the function and structure of behavior identifies a set of independent and dependent variables different from those identified by certain common kinds of “molar” behavioral analyses. In so doing, such a focus redefines some of the significant issues in the experimental analysis of behavior. PMID:16811925

  15. Parallel Online Temporal Difference Learning for Motor Control.

    PubMed

    Caarls, Wouter; Schuitema, Erik

    2016-07-01

    Temporal difference (TD) learning, a key concept in reinforcement learning, is a popular method for solving simulated control problems. However, in real systems, this method is often avoided in favor of policy search methods because of its long learning time. But policy search suffers from its own drawbacks, such as the necessity of informed policy parameterization and initialization. In this paper, we show that TD learning can work effectively in real robotic systems as well, using parallel model learning and planning. Using locally weighted linear regression and trajectory sampled planning with 14 concurrent threads, we can achieve a speedup of almost two orders of magnitude over regular TD control on simulated control benchmarks. For a real-world pendulum swing-up task and a two-link manipulator movement task, we report a speedup of 20× to 60× , with a real-time learning speed of less than half a minute. The results are competitive with state-of-the-art policy search.

  16. Policy improvement by a model-free Dyna architecture.

    PubMed

    Hwang, Kao-Shing; Lo, Chia-Yue

    2013-05-01

    The objective of this paper is to accelerate the process of policy improvement in reinforcement learning. The proposed Dyna-style system combines two learning schemes, one of which utilizes a temporal difference method for direct learning; the other uses relative values for indirect learning in planning between two successive direct learning cycles. Instead of establishing a complicated world model, the approach introduces a simple predictor of average rewards to actor-critic architecture in the simulation (planning) mode. The relative value of a state, defined as the accumulated differences between immediate reward and average reward, is used to steer the improvement process in the right direction. The proposed learning scheme is applied to control a pendulum system for tracking a desired trajectory to demonstrate its adaptability and robustness. Through reinforcement signals from the environment, the system takes the appropriate action to drive an unknown dynamic to track desired outputs in few learning cycles. Comparisons are made between the proposed model-free method, a connectionist adaptive heuristic critic, and an advanced method of Dyna-Q learning in the experiments of labyrinth exploration. The proposed method outperforms its counterparts in terms of elapsed time and convergence rate.

  17. An extended reinforcement learning model of basal ganglia to understand the contributions of serotonin and dopamine in risk-based decision making, reward prediction, and punishment learning

    PubMed Central

    Balasubramani, Pragathi P.; Chakravarthy, V. Srinivasa; Ravindran, Balaraman; Moustafa, Ahmed A.

    2014-01-01

    Although empirical and neural studies show that serotonin (5HT) plays many functional roles in the brain, prior computational models mostly focus on its role in behavioral inhibition. In this study, we present a model of risk based decision making in a modified Reinforcement Learning (RL)-framework. The model depicts the roles of dopamine (DA) and serotonin (5HT) in Basal Ganglia (BG). In this model, the DA signal is represented by the temporal difference error (δ), while the 5HT signal is represented by a parameter (α) that controls risk prediction error. This formulation that accommodates both 5HT and DA reconciles some of the diverse roles of 5HT particularly in connection with the BG system. We apply the model to different experimental paradigms used to study the role of 5HT: (1) Risk-sensitive decision making, where 5HT controls risk assessment, (2) Temporal reward prediction, where 5HT controls time-scale of reward prediction, and (3) Reward/Punishment sensitivity, in which the punishment prediction error depends on 5HT levels. Thus the proposed integrated RL model reconciles several existing theories of 5HT and DA in the BG. PMID:24795614

  18. Reinforcing and timing properties of water in the schedule-induced drinking situation.

    PubMed

    Ruiz, Jorge A; López-Tolsa, Gabriela E; Pellón, Ricardo

    2016-06-01

    A series of recent studies from our laboratory have added to the preceding literature on the potential role of water (in addition to food) as a positive reinforcer in the schedule-induced drinking situation, thus suggesting that adjunctive behaviors might have motivational properties that make their engagement a preferable alternative. It has also been suggested that adjunctive behaviors serve as a behavioral clock that helps organisms to estimate time, making their engagement motivational, so that they enable more accurate time adjustment under temporal schedules. Here, we review some of these experiments on conditioned reinforcement and concurrent chains, as well as on temporal learning. Data presented in this article suggest that adjunctive behaviors may be a part of the behavior patterns maintained by reinforcement, thus serving towards a better performance in temporal tasks. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Novelty and Inductive Generalization in Human Reinforcement Learning.

    PubMed

    Gershman, Samuel J; Niv, Yael

    2015-07-01

    In reinforcement learning (RL), a decision maker searching for the most rewarding option is often faced with the question: What is the value of an option that has never been tried before? One way to frame this question is as an inductive problem: How can I generalize my previous experience with one set of options to a novel option? We show how hierarchical Bayesian inference can be used to solve this problem, and we describe an equivalence between the Bayesian model and temporal difference learning algorithms that have been proposed as models of RL in humans and animals. According to our view, the search for the best option is guided by abstract knowledge about the relationships between different options in an environment, resulting in greater search efficiency compared to traditional RL algorithms previously applied to human cognition. In two behavioral experiments, we test several predictions of our model, providing evidence that humans learn and exploit structured inductive knowledge to make predictions about novel options. In light of this model, we suggest a new interpretation of dopaminergic responses to novelty. Copyright © 2015 Cognitive Science Society, Inc.

  20. Engagement in Classroom Learning: Creating Temporal Participation Incentives for Extrinsically Motivated Students through Bonus Credits

    ERIC Educational Resources Information Center

    Rassuli, Ali

    2012-01-01

    Extrinsic inducements to adjust students' learning motivations have evolved within 2 opposing paradigms. Cognitive evaluation theories claim that controlling factors embedded in extrinsic rewards dissipate intrinsic aspirations. Behavioral theorists contend that if engagement is voluntary, extrinsic reinforcements enhance learning without ill…

  1. An architecture for designing fuzzy logic controllers using neural networks

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1991-01-01

    Described here is an architecture for designing fuzzy controllers through a hierarchical process of control rule acquisition and by using special classes of neural network learning techniques. A new method for learning to refine a fuzzy logic controller is introduced. A reinforcement learning technique is used in conjunction with a multi-layer neural network model of a fuzzy controller. The model learns by updating its prediction of the plant's behavior and is related to the Sutton's Temporal Difference (TD) method. The method proposed here has the advantage of using the control knowledge of an experienced operator and fine-tuning it through the process of learning. The approach is applied to a cart-pole balancing system.

  2. Running Improves Pattern Separation during Novel Object Recognition.

    PubMed

    Bolz, Leoni; Heigele, Stefanie; Bischofberger, Josef

    2015-10-09

    Running increases adult neurogenesis and improves pattern separation in various memory tasks including context fear conditioning or touch-screen based spatial learning. However, it is unknown whether pattern separation is improved in spontaneous behavior, not emotionally biased by positive or negative reinforcement. Here we investigated the effect of voluntary running on pattern separation during novel object recognition in mice using relatively similar or substantially different objects.We show that running increases hippocampal neurogenesis but does not affect object recognition memory with 1.5 h delay after sample phase. By contrast, at 24 h delay, running significantly improves recognition memory for similar objects, whereas highly different objects can be distinguished by both, running and sedentary mice. These data show that physical exercise improves pattern separation, independent of negative or positive reinforcement. In sedentary mice there is a pronounced temporal gradient for remembering object details. In running mice, however, increased neurogenesis improves hippocampal coding and temporally preserves distinction of novel objects from familiar ones.

  3. Delayed temporal discrimination in pigeons: A comparison of two procedures

    PubMed Central

    Chatlosh, Diane L.; Wasserman, Edward A.

    1987-01-01

    A within-subjects comparison was made of pigeons' performance on two temporal discrimination procedures that were signaled by differently colored keylight samples. During stimulus trials, a peck on the key displaying a slanted line was reinforced following short keylight samples, and a peck on the key displaying a horizontal line was reinforced following long keylight samples, regardless of the location of the stimuli on those two choice keys. During position trials, a peck on the left key was reinforced following short keylight samples and a peck on the right key was reinforced following long keylight samples, regardless of which line stimulus appeared on the correct key. Thus, on stimulus trials, the correct choice key could not be discriminated prior to the presentation of the test stimuli, whereas on position trials, the correct choice key could be discriminated during the presentation of the sample stimulus. During Phase 1, with a 0-s delay between sample and choice stimuli, discrimination learning was faster on position trials than on stimulus trials for all 4 birds. During Phase 2, 0-, 0.5-, and 1.0-s delays produced differential loss of stimulus control under the two tasks for 2 birds. Response patterns during the delay intervals provided some evidence for differential mediation of the two delayed discriminations. These between-task differences suggest that the same processes may not mediate performance in each. PMID:16812483

  4. Seizure Control in a Computational Model Using a Reinforcement Learning Stimulation Paradigm.

    PubMed

    Nagaraj, Vivek; Lamperski, Andrew; Netoff, Theoden I

    2017-11-01

    Neuromodulation technologies such as vagus nerve stimulation and deep brain stimulation, have shown some efficacy in controlling seizures in medically intractable patients. However, inherent patient-to-patient variability of seizure disorders leads to a wide range of therapeutic efficacy. A patient specific approach to determining stimulation parameters may lead to increased therapeutic efficacy while minimizing stimulation energy and side effects. This paper presents a reinforcement learning algorithm that optimizes stimulation frequency for controlling seizures with minimum stimulation energy. We apply our method to a computational model called the epileptor. The epileptor model simulates inter-ictal and ictal local field potential data. In order to apply reinforcement learning to the Epileptor, we introduce a specialized reward function and state-space discretization. With the reward function and discretization fixed, we test the effectiveness of the temporal difference reinforcement learning algorithm (TD(0)). For periodic pulsatile stimulation, we derive a relation that describes, for any stimulation frequency, the minimal pulse amplitude required to suppress seizures. The TD(0) algorithm is able to identify parameters that control seizures quickly. Additionally, our results show that the TD(0) algorithm refines the stimulation frequency to minimize stimulation energy thereby converging to optimal parameters reliably. An advantage of the TD(0) algorithm is that it is adaptive so that the parameters necessary to control the seizures can change over time. We show that the algorithm can converge on the optimal solution in simulation with slow and fast inter-seizure intervals.

  5. Negative reinforcement learning is affected in substance dependence.

    PubMed

    Thompson, Laetitia L; Claus, Eric D; Mikulich-Gilbertson, Susan K; Banich, Marie T; Crowley, Thomas; Krmpotich, Theodore; Miller, David; Tanabe, Jody

    2012-06-01

    Negative reinforcement results in behavior to escape or avoid an aversive outcome. Withdrawal symptoms are purported to be negative reinforcers in perpetuating substance dependence, but little is known about negative reinforcement learning in this population. The purpose of this study was to examine reinforcement learning in substance dependent individuals (SDI), with an emphasis on assessing negative reinforcement learning. We modified the Iowa Gambling Task to separately assess positive and negative reinforcement. We hypothesized that SDI would show differences in negative reinforcement learning compared to controls and we investigated whether learning differed as a function of the relative magnitude or frequency of the reinforcer. Thirty subjects dependent on psychostimulants were compared with 28 community controls on a decision making task that manipulated outcome frequencies and magnitudes and required an action to avoid a negative outcome. SDI did not learn to avoid negative outcomes to the same degree as controls. This difference was driven by the magnitude, not the frequency, of negative feedback. In contrast, approach behaviors in response to positive reinforcement were similar in both groups. Our findings are consistent with a specific deficit in negative reinforcement learning in SDI. SDI were relatively insensitive to the magnitude, not frequency, of loss. If this generalizes to drug-related stimuli, it suggests that repeated episodes of withdrawal may drive relapse more than the severity of a single episode. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  6. Rational and Mechanistic Perspectives on Reinforcement Learning

    ERIC Educational Resources Information Center

    Chater, Nick

    2009-01-01

    This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…

  7. Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning.

    PubMed

    Morimura, Tetsuro; Uchibe, Eiji; Yoshimoto, Junichiro; Peters, Jan; Doya, Kenji

    2010-02-01

    Most conventional policy gradient reinforcement learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the policy parameter. That term involves the derivative of the stationary state distribution that corresponds to the sensitivity of its distribution to changes in the policy parameter. Although the bias introduced by this omission can be reduced by setting the forgetting rate gamma for the value functions close to 1, these algorithms do not permit gamma to be set exactly at gamma = 1. In this article, we propose a method for estimating the log stationary state distribution derivative (LSD) as a useful form of the derivative of the stationary state distribution through backward Markov chain formulation and a temporal difference learning framework. A new policy gradient (PG) framework with an LSD is also proposed, in which the average reward gradient can be estimated by setting gamma = 0, so it becomes unnecessary to learn the value functions. We also test the performance of the proposed algorithms using simple benchmark tasks and show that these can improve the performances of existing PG methods.

  8. A model for discriminating reinforcers in time and space.

    PubMed

    Cowie, Sarah; Davison, Michael; Elliffe, Douglas

    2016-06-01

    Both the response-reinforcer and stimulus-reinforcer relation are important in discrimination learning; differential responding requires a minimum of two discriminably-different stimuli and two discriminably-different associated contingencies of reinforcement. When elapsed time is a discriminative stimulus for the likely availability of a reinforcer, choice over time may be modeled by an extension of the Davison and Nevin (1999) model that assumes that local choice strictly matches the effective local reinforcer ratio. The effective local reinforcer ratio may differ from the obtained local reinforcer ratio for two reasons: Because the animal inaccurately estimates times associated with obtained reinforcers, and thus incorrectly discriminates the stimulus-reinforcer relation across time; and because of error in discriminating the response-reinforcer relation. In choice-based timing tasks, the two responses are usually highly discriminable, and so the larger contributor to differences between the effective and obtained reinforcer ratio is error in discriminating the stimulus-reinforcer relation. Such error may be modeled either by redistributing the numbers of reinforcers obtained at each time across surrounding times, or by redistributing the ratio of reinforcers obtained at each time in the same way. We assessed the extent to which these two approaches to modeling discrimination of the stimulus-reinforcer relation could account for choice in a range of temporal-discrimination procedures. The version of the model that redistributed numbers of reinforcers accounted for more variance in the data. Further, this version provides an explanation for shifts in the point of subjective equality that occur as a result of changes in the local reinforcer rate. The inclusion of a parameter reflecting error in discriminating the response-reinforcer relation enhanced the ability of each version of the model to describe data. The ability of this class of model to account for a range of data suggests that timing, like other conditional discriminations, is choice under the joint discriminative control of elapsed time and differential reinforcement. Understanding the role of differential reinforcement is therefore critical to understanding control by elapsed time. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. Rational and mechanistic perspectives on reinforcement learning.

    PubMed

    Chater, Nick

    2009-12-01

    This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: mechanistic and rational. Reinforcement learning is often viewed in mechanistic terms--as describing the operation of aspects of an agent's cognitive and neural machinery. Yet it can also be viewed as a rational level of description, specifically, as describing a class of methods for learning from experience, using minimal background knowledge. This paper considers how rational and mechanistic perspectives differ, and what types of evidence distinguish between them. Reinforcement learning research in the cognitive and brain sciences is often implicitly committed to the mechanistic interpretation. Here the opposite view is put forward: that accounts of reinforcement learning should apply at the rational level, unless there is strong evidence for a mechanistic interpretation. Implications of this viewpoint for reinforcement-based theories in the cognitive and brain sciences are discussed.

  10. Extinction of Pavlovian conditioning: The influence of trial number and reinforcement history.

    PubMed

    Chan, C K J; Harris, Justin A

    2017-08-01

    Pavlovian conditioning is sensitive to the temporal relationship between the conditioned stimulus (CS) and the unconditioned stimulus (US). This has motivated models that describe learning as a process that continuously updates associative strength during the trial or specifically encodes the CS-US interval. These models predict that extinction of responding is also continuous, such that response loss is proportional to the cumulative duration of exposure to the CS without the US. We review evidence showing that this prediction is incorrect, and that extinction is trial-based rather than time-based. We also present two experiments that test the importance of trials versus time on the Partial Reinforcement Extinction Effect (PREE), in which responding extinguishes more slowly for a CS that was inconsistently reinforced with the US than for a consistently reinforced one. We show that increasing the number of extinction trials of the partially reinforced CS, relative to the consistently reinforced CS, overcomes the PREE. However, increasing the duration of extinction trials by the same amount does not overcome the PREE. We conclude that animals learn about the likelihood of the US per trial during conditioning, and learn trial-by-trial about the absence of the US during extinction. Moreover, what they learn about the likelihood of the US during conditioning affects how sensitive they are to the absence of the US during extinction. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Mesolimbic Dopamine Signals the Value of Work

    PubMed Central

    Hamid, Arif A.; Pettibone, Jeffrey R.; Mabrouk, Omar S.; Hetrick, Vaughn L.; Schmidt, Robert; Vander Weele, Caitlin M.; Kennedy, Robert T.; Aragona, Brandon J.; Berke, Joshua D.

    2015-01-01

    Dopamine cell firing can encode errors in reward prediction, providing a learning signal to guide future behavior. Yet dopamine is also a key modulator of motivation, invigorating current behavior. Existing theories propose that fast (“phasic”) dopamine fluctuations support learning, while much slower (“tonic”) dopamine changes are involved in motivation. We examined dopamine release in the nucleus accumbens across multiple time scales, using complementary microdialysis and voltammetric methods during adaptive decision-making. We first show that minute-by-minute dopamine levels covary with reward rate and motivational vigor. We then show that second-by-second dopamine release encodes an estimate of temporally-discounted future reward (a value function). We demonstrate that changing dopamine immediately alters willingness to work, and reinforces preceding action choices by encoding temporal-difference reward prediction errors. Our results indicate that dopamine conveys a single, rapidly-evolving decision variable, the available reward for investment of effort, that is employed for both learning and motivational functions. PMID:26595651

  12. Separation of time-based and trial-based accounts of the partial reinforcement extinction effect.

    PubMed

    Bouton, Mark E; Woods, Amanda M; Todd, Travis P

    2014-01-01

    Two appetitive conditioning experiments with rats examined time-based and trial-based accounts of the partial reinforcement extinction effect (PREE). In the PREE, the loss of responding that occurs in extinction is slower when the conditioned stimulus (CS) has been paired with a reinforcer on some of its presentations (partially reinforced) instead of every presentation (continuously reinforced). According to a time-based or "time-accumulation" view (e.g., Gallistel and Gibbon, 2000), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger amount of time has accumulated in the CS over trials. In contrast, according to a trial-based view (e.g., Capaldi, 1967), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger number of CS presentations. Experiment 1 used a procedure that equated partially and continuously reinforced groups on their expected times to reinforcement during conditioning. A PREE was still observed. Experiment 2 then used an extinction procedure that allowed time in the CS and the number of trials to accumulate differentially through extinction. The PREE was still evident when responding was examined as a function of expected time units to the reinforcer, but was eliminated when responding was examined as a function of expected trial units to the reinforcer. There was no evidence that the animal responded according to the ratio of time accumulated during the CS in extinction over the time in the CS expected before the reinforcer. The results thus favor a trial-based account over a time-based account of extinction and the PREE. This article is part of a Special Issue entitled: Associative and Temporal Learning. Copyright © 2013 Elsevier B.V. All rights reserved.

  13. Dynamical genetic programming in XCSF.

    PubMed

    Preen, Richard J; Bull, Larry

    2013-01-01

    A number of representation schemes have been presented for use within learning classifier systems, ranging from binary encodings to artificial neural networks. This paper presents results from an investigation into using a temporally dynamic symbolic representation within the XCSF learning classifier system. In particular, dynamical arithmetic networks are used to represent the traditional condition-action production system rules to solve continuous-valued reinforcement learning problems and to perform symbolic regression, finding competitive performance with traditional genetic programming on a number of composite polynomial tasks. In addition, the network outputs are later repeatedly sampled at varying temporal intervals to perform multistep-ahead predictions of a financial time series.

  14. Roles of OA1 octopamine receptor and Dop1 dopamine receptor in mediating appetitive and aversive reinforcement revealed by RNAi studies

    PubMed Central

    Awata, Hiroko; Wakuda, Ryo; Ishimaru, Yoshiyasu; Matsuoka, Yuji; Terao, Kanta; Katata, Satomi; Matsumoto, Yukihisa; Hamanaka, Yoshitaka; Noji, Sumihare; Mito, Taro; Mizunami, Makoto

    2016-01-01

    Revealing reinforcing mechanisms in associative learning is important for elucidation of brain mechanisms of behavior. In mammals, dopamine neurons are thought to mediate both appetitive and aversive reinforcement signals. Studies using transgenic fruit-flies suggested that dopamine neurons mediate both appetitive and aversive reinforcements, through the Dop1 dopamine receptor, but our studies using octopamine and dopamine receptor antagonists and using Dop1 knockout crickets suggested that octopamine neurons mediate appetitive reinforcement and dopamine neurons mediate aversive reinforcement in associative learning in crickets. To fully resolve this issue, we examined the effects of silencing of expression of genes that code the OA1 octopamine receptor and Dop1 and Dop2 dopamine receptors by RNAi in crickets. OA1-silenced crickets exhibited impairment in appetitive learning with water but not in aversive learning with sodium chloride solution, while Dop1-silenced crickets exhibited impairment in aversive learning but not in appetitive learning. Dop2-silenced crickets showed normal scores in both appetitive learning and aversive learning. The results indicate that octopamine neurons mediate appetitive reinforcement via OA1 and that dopamine neurons mediate aversive reinforcement via Dop1 in crickets, providing decisive evidence that neurotransmitters and receptors that mediate appetitive reinforcement indeed differ among different species of insects. PMID:27412401

  15. Roles of OA1 octopamine receptor and Dop1 dopamine receptor in mediating appetitive and aversive reinforcement revealed by RNAi studies.

    PubMed

    Awata, Hiroko; Wakuda, Ryo; Ishimaru, Yoshiyasu; Matsuoka, Yuji; Terao, Kanta; Katata, Satomi; Matsumoto, Yukihisa; Hamanaka, Yoshitaka; Noji, Sumihare; Mito, Taro; Mizunami, Makoto

    2016-07-14

    Revealing reinforcing mechanisms in associative learning is important for elucidation of brain mechanisms of behavior. In mammals, dopamine neurons are thought to mediate both appetitive and aversive reinforcement signals. Studies using transgenic fruit-flies suggested that dopamine neurons mediate both appetitive and aversive reinforcements, through the Dop1 dopamine receptor, but our studies using octopamine and dopamine receptor antagonists and using Dop1 knockout crickets suggested that octopamine neurons mediate appetitive reinforcement and dopamine neurons mediate aversive reinforcement in associative learning in crickets. To fully resolve this issue, we examined the effects of silencing of expression of genes that code the OA1 octopamine receptor and Dop1 and Dop2 dopamine receptors by RNAi in crickets. OA1-silenced crickets exhibited impairment in appetitive learning with water but not in aversive learning with sodium chloride solution, while Dop1-silenced crickets exhibited impairment in aversive learning but not in appetitive learning. Dop2-silenced crickets showed normal scores in both appetitive learning and aversive learning. The results indicate that octopamine neurons mediate appetitive reinforcement via OA1 and that dopamine neurons mediate aversive reinforcement via Dop1 in crickets, providing decisive evidence that neurotransmitters and receptors that mediate appetitive reinforcement indeed differ among different species of insects.

  16. Intolerance of uncertainty and startle potentiation in relation to different threat reinforcement rates.

    PubMed

    Chin, Brian; Nelson, Brady D; Jackson, Felicia; Hajcak, Greg

    2016-01-01

    Fear conditioning research on threat predictability has primarily examined the impact of temporal (i.e., timing) predictability on the startle reflex. However, there are other key features of threat that can vary in predictability. For example, the reinforcement rate (i.e., frequency) of threat is a crucial factor underlying fear learning. The present study examined the impact of threat reinforcement rate on the startle reflex and self-reported anxiety during a fear conditioning paradigm. Forty-five participants completed a fear learning task in which the conditioned stimulus was reinforced with an electric shock to the forearm on 50% of trials in one block and 75% of trials in a second block, in counter-balanced order. The present study also examined whether intolerance of uncertainty (IU), the tendency to perceive or experience uncertainty as stressful or unpleasant, was associated with the startle reflex during conditions of low (50%) vs. high (75%) reinforcement. Results indicated that, across all participants, startle was greater during the 75% relative to the 50% reinforcement condition. IU was positively correlated with startle potentiation (i.e., increased startle response to the CS+ relative to the CS-) during the 50%, but not the 75%, reinforcement condition. Thus, despite receiving fewer electric shocks during the 50% reinforcement condition, individuals with high IU uniquely demonstrated greater defense system activation when impending threat was more uncertain. The association between IU and startle was independent of state anxiety. The present study adds to a growing literature on threat predictability and aversive responding, and suggests IU is associated with abnormal responding in the context of uncertain threat. Copyright © 2015 Elsevier B.V. All rights reserved.

  17. Predictive representations can link model-based reinforcement learning to model-free mechanisms.

    PubMed

    Russek, Evan M; Momennejad, Ida; Botvinick, Matthew M; Gershman, Samuel J; Daw, Nathaniel D

    2017-09-01

    Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation.

  18. Predictive representations can link model-based reinforcement learning to model-free mechanisms

    PubMed Central

    Botvinick, Matthew M.

    2017-01-01

    Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation. PMID:28945743

  19. Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons

    PubMed Central

    Frémaux, Nicolas; Sprekeler, Henning; Gerstner, Wulfram

    2013-01-01

    Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity. PMID:23592970

  20. Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

    PubMed

    Frémaux, Nicolas; Sprekeler, Henning; Gerstner, Wulfram

    2013-04-01

    Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

  1. Reinforcement learning for resource allocation in LEO satellite networks.

    PubMed

    Usaha, Wipawee; Barria, Javier A

    2007-06-01

    In this paper, we develop and assess online decision-making algorithms for call admission and routing for low Earth orbit (LEO) satellite networks. It has been shown in a recent paper that, in a LEO satellite system, a semi-Markov decision process formulation of the call admission and routing problem can achieve better performance in terms of an average revenue function than existing routing methods. However, the conventional dynamic programming (DP) numerical solution becomes prohibited as the problem size increases. In this paper, two solution methods based on reinforcement learning (RL) are proposed in order to circumvent the computational burden of DP. The first method is based on an actor-critic method with temporal-difference (TD) learning. The second method is based on a critic-only method, called optimistic TD learning. The algorithms enhance performance in terms of requirements in storage, computational complexity and computational time, and in terms of an overall long-term average revenue function that penalizes blocked calls. Numerical studies are carried out, and the results obtained show that the RL framework can achieve up to 56% higher average revenue over existing routing methods used in LEO satellite networks with reasonable storage and computational requirements.

  2. Stress affects instrumental learning based on positive or negative reinforcement in interaction with personality in domestic horses

    PubMed Central

    Valenchon, Mathilde; Lévy, Frédéric; Moussu, Chantal; Lansade, Léa

    2017-01-01

    The present study investigated how stress affects instrumental learning performance in horses (Equus caballus) depending on the type of reinforcement. Horses were assigned to four groups (N = 15 per group); each group received training with negative or positive reinforcement in the presence or absence of stressors unrelated to the learning task. The instrumental learning task consisted of the horse entering one of two compartments at the appearance of a visual signal given by the experimenter. In the absence of stressors unrelated to the task, learning performance did not differ between negative and positive reinforcements. The presence of stressors unrelated to the task (exposure to novel and sudden stimuli) impaired learning performance. Interestingly, this learning deficit was smaller when the negative reinforcement was used. The negative reinforcement, considered as a stressor related to the task, could have counterbalanced the impact of the extrinsic stressor by focusing attention toward the learning task. In addition, learning performance appears to differ between certain dimensions of personality depending on the presence of stressors and the type of reinforcement. These results suggest that when negative reinforcement is used (i.e. stressor related to the task), the most fearful horses may be the best performers in the absence of stressors but the worst performers when stressors are present. On the contrary, when positive reinforcement is used, the most fearful horses appear to be consistently the worst performers, with and without exposure to stressors unrelated to the learning task. This study is the first to demonstrate in ungulates that stress affects learning performance differentially according to the type of reinforcement and in interaction with personality. It provides fundamental and applied perspectives in the understanding of the relationships between personality and training abilities. PMID:28475581

  3. Online selective kernel-based temporal difference learning.

    PubMed

    Chen, Xingguo; Gao, Yang; Wang, Ruili

    2013-12-01

    In this paper, an online selective kernel-based temporal difference (OSKTD) learning algorithm is proposed to deal with large scale and/or continuous reinforcement learning problems. OSKTD includes two online procedures: online sparsification and parameter updating for the selective kernel-based value function. A new sparsification method (i.e., a kernel distance-based online sparsification method) is proposed based on selective ensemble learning, which is computationally less complex compared with other sparsification methods. With the proposed sparsification method, the sparsified dictionary of samples is constructed online by checking if a sample needs to be added to the sparsified dictionary. In addition, based on local validity, a selective kernel-based value function is proposed to select the best samples from the sample dictionary for the selective kernel-based value function approximator. The parameters of the selective kernel-based value function are iteratively updated by using the temporal difference (TD) learning algorithm combined with the gradient descent technique. The complexity of the online sparsification procedure in the OSKTD algorithm is O(n). In addition, two typical experiments (Maze and Mountain Car) are used to compare with both traditional and up-to-date O(n) algorithms (GTD, GTD2, and TDC using the kernel-based value function), and the results demonstrate the effectiveness of our proposed algorithm. In the Maze problem, OSKTD converges to an optimal policy and converges faster than both traditional and up-to-date algorithms. In the Mountain Car problem, OSKTD converges, requires less computation time compared with other sparsification methods, gets a better local optima than the traditional algorithms, and converges much faster than the up-to-date algorithms. In addition, OSKTD can reach a competitive ultimate optima compared with the up-to-date algorithms.

  4. Unifying Temporal and Structural Credit Assignment Problems

    NASA Technical Reports Server (NTRS)

    Agogino, Adrian K.; Tumer, Kagan

    2004-01-01

    Single-agent reinforcement learners in time-extended domains and multi-agent systems share a common dilemma known as the credit assignment problem. Multi-agent systems have the structural credit assignment problem of determining the contributions of a particular agent to a common task. Instead, time-extended single-agent systems have the temporal credit assignment problem of determining the contribution of a particular action to the quality of the full sequence of actions. Traditionally these two problems are considered different and are handled in separate ways. In this article we show how these two forms of the credit assignment problem are equivalent. In this unified frame-work, a single-agent Markov decision process can be broken down into a single-time-step multi-agent process. Furthermore we show that Monte-Carlo estimation or Q-learning (depending on whether the values of resulting actions in the episode are known at the time of learning) are equivalent to different agent utility functions in a multi-agent system. This equivalence shows how an often neglected issue in multi-agent systems is equivalent to a well-known deficiency in multi-time-step learning and lays the basis for solving time-extended multi-agent problems, where both credit assignment problems are present.

  5. Reinforcement Probability Modulates Temporal Memory Selection and Integration Processes

    PubMed Central

    Matell, Matthew S.; Kurti, Allison N.

    2013-01-01

    We have previously shown that rats trained in a mixed-interval peak procedure (tone = 4s, light = 12s) respond in a scalar manner at a time in between the trained peak times when presented with the stimulus compound (Swanton & Matell, 2011). In our previous work, the two component cues were reinforced with different probabilities (short = 20%, long = 80%) to equate response rates, and we found that the compound peak time was biased toward the cue with the higher reinforcement probability. Here, we examined the influence that different reinforcement probabilities have on the temporal location and shape of the compound response function. We found that the time of peak responding shifted as a function of the relative reinforcement probability of the component cues, becoming earlier as the relative likelihood of reinforcement associated with the short cue increased. However, as the relative probabilities of the component cues grew dissimilar, the compound peak became non-scalar, suggesting that the temporal control of behavior shifted from a process of integration to one of selection. As our previous work has utilized durations and reinforcement probabilities more discrepant than those used here, these data suggest that the processes underlying the integration/selection decision for time are based on cue value. PMID:23896560

  6. An Upside to Reward Sensitivity: The Hippocampus Supports Enhanced Reinforcement Learning in Adolescence.

    PubMed

    Davidow, Juliet Y; Foerde, Karin; Galván, Adriana; Shohamy, Daphna

    2016-10-05

    Adolescents are notorious for engaging in reward-seeking behaviors, a tendency attributed to heightened activity in the brain's reward systems during adolescence. It has been suggested that reward sensitivity in adolescence might be adaptive, but evidence of an adaptive role has been scarce. Using a probabilistic reinforcement learning task combined with reinforcement learning models and fMRI, we found that adolescents showed better reinforcement learning and a stronger link between reinforcement learning and episodic memory for rewarding outcomes. This behavioral benefit was related to heightened prediction error-related BOLD activity in the hippocampus and to stronger functional connectivity between the hippocampus and the striatum at the time of reinforcement. These findings reveal an important role for the hippocampus in reinforcement learning in adolescence and suggest that reward sensitivity in adolescence is related to adaptive differences in how adolescents learn from experience. Copyright © 2016 Elsevier Inc. All rights reserved.

  7. Online Reinforcement Learning Using a Probability Density Estimation.

    PubMed

    Agostini, Alejandro; Celaya, Enric

    2017-01-01

    Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions.

  8. BEHAVIORAL MECHANISMS UNDERLYING NICOTINE REINFORCEMENT

    PubMed Central

    Rupprecht, Laura E.; Smith, Tracy T.; Schassburger, Rachel L.; Buffalari, Deanne M.; Sved, Alan F.; Donny, Eric C.

    2015-01-01

    Cigarette smoking is the leading cause of preventable deaths worldwide and nicotine, the primary psychoactive constituent in tobacco, drives sustained use. The behavioral actions of nicotine are complex and extend well beyond the actions of the drug as a primary reinforcer. Stimuli that are consistently paired with nicotine can, through associative learning, take on reinforcing properties as conditioned stimuli. These conditioned stimuli can then impact the rate and probability of behavior and even function as conditioning reinforcers that maintain behavior in the absence of nicotine. Nicotine can also act as a conditioned stimulus, predicting the delivery of other reinforcers, which may allow nicotine to acquire value as a conditioned reinforcer. These associative effects, establishing non-nicotine stimuli as conditioned stimuli with discriminative stimulus and conditioned reinforcing properties as well as establishing nicotine as a conditioned stimulus, are predicted by basic conditioning principles. However, nicotine can also act non-associatively. Nicotine directly enhances the reinforcing efficacy of other reinforcing stimuli in the environment, an effect that does not require a temporal or predictive relationship between nicotine and either the stimulus or the behavior. Hence, the reinforcing actions of nicotine stem both from the primary reinforcing actions of the drug (and the subsequent associative learning effects) as well as the reinforcement enhancement action of nicotine which is non-associative in nature. Gaining a better understanding of how nicotine impacts behavior will allow for maximally effective tobacco control efforts aimed at reducing the harm associated with tobacco use by reducing and/or treating its addictiveness. PMID:25638333

  9. Rules and mechanisms for efficient two-stage learning in neural circuits.

    PubMed

    Teşileanu, Tiberiu; Ölveczky, Bence; Balasubramanian, Vijay

    2017-04-04

    Trial-and-error learning requires evaluating variable actions and reinforcing successful variants. In songbirds, vocal exploration is induced by LMAN, the output of a basal ganglia-related circuit that also contributes a corrective bias to the vocal output. This bias is gradually consolidated in RA, a motor cortex analogue downstream of LMAN. We develop a new model of such two-stage learning. Using stochastic gradient descent, we derive how the activity in 'tutor' circuits ( e.g., LMAN) should match plasticity mechanisms in 'student' circuits ( e.g., RA) to achieve efficient learning. We further describe a reinforcement learning framework through which the tutor can build its teaching signal. We show that mismatches between the tutor signal and the plasticity mechanism can impair learning. Applied to birdsong, our results predict the temporal structure of the corrective bias from LMAN given a plasticity rule in RA. Our framework can be applied predictively to other paired brain areas showing two-stage learning.

  10. Context change explains resurgence after the extinction of operant behavior

    PubMed Central

    Trask, Sydney; Schepers, Scott T.; Bouton, Mark E.

    2016-01-01

    Extinguished operant behavior can return or “resurge” when a response that has replaced it is also extinguished. Typically studied in nonhuman animals, the resurgence effect may provide insight into relapse that is seen when reinforcement is discontinued following human contingency management (CM) and functional communication training (FCT) treatments, which both involve reinforcing alternative behaviors to reduce behavioral excess. Although the variables that affect resurgence have been studied for some time, the mechanisms through which they promote relapse are still debated. We discuss three explanations of resurgence (response prevention, an extension of behavioral momentum theory, and an account emphasizing context change) as well as studies that evaluate them. Several new findings from our laboratory concerning the effects of different temporal distributions of the reinforcer during response elimination and the effects of manipulating qualitative features of the reinforcer pose a particular challenge to the momentum-based model. Overall, the results are consistent with a contextual account of resurgence, which emphasizes that reinforcers presented during response elimination have a discriminative role controlling behavioral inhibition. Changing the “reinforcer context” at the start of testing produces relapse if the organism has not learned to suppress its responding under conditions similar to the ones that prevail during testing. PMID:27429503

  11. Histidine-decarboxylase knockout mice show deficient nonreinforced episodic object memory, improved negatively reinforced water-maze performance, and increased neo- and ventro-striatal dopamine turnover.

    PubMed

    Dere, Ekrem; De Souza-Silva, Maria A; Topic, Bianca; Spieler, Richard E; Haas, Helmut L; Huston, Joseph P

    2003-01-01

    The brain's histaminergic system has been implicated in hippocampal synaptic plasticity, learning, and memory, as well as brain reward and reinforcement. Our past pharmacological and lesion studies indicated that the brain's histamine system exerts inhibitory effects on the brain's reinforcement respective reward system reciprocal to mesolimbic dopamine systems, thereby modulating learning and memory performance. Given the close functional relationship between brain reinforcement and memory processes, the total disruption of brain histamine synthesis via genetic disruption of its synthesizing enzyme, histidine decarboxylase (HDC), in the mouse might have differential effects on learning dependent on the task-inherent reinforcement contingencies. Here, we investigated the effects of an HDC gene disruption in the mouse in a nonreinforced object exploration task and a negatively reinforced water-maze task as well as on neo- and ventro-striatal dopamine systems known to be involved in brain reward and reinforcement. Histidine decarboxylase knockout (HDC-KO) mice had higher dihydrophenylacetic acid concentrations and a higher dihydrophenylacetic acid/dopamine ratio in the neostriatum. In the ventral striatum, dihydrophenylacetic acid/dopamine and 3-methoxytyramine/dopamine ratios were higher in HDC-KO mice. Furthermore, the HDC-KO mice showed improved water-maze performance during both hidden and cued platform tasks, but deficient object discrimination based on temporal relationships. Our data imply that disruption of brain histamine synthesis can have both memory promoting and suppressive effects via distinct and independent mechanisms and further indicate that these opposed effects are related to the task-inherent reinforcement contingencies.

  12. Effects Of Reinforcement History On Response Rate And Response Pattern In Periodic Reinforcement

    PubMed Central

    López, Florente; Menez, Marina

    2005-01-01

    Several researchers have suggested that conditioning history may have long-term effects on fixed-interval performances of rats. To test this idea and to identify possible factors involved in temporal control development, groups of rats initially were exposed to different reinforcement schedules: continuous, fixed-time, and random-interval. Afterwards, half of the rats in each group were studied on a fixed-interval 30-s schedule of reinforcement and the other half on a fixed-interval 90-s schedule of reinforcement. No evidence of long-term effects attributable to conditioning history on either response output or response patterning was found; history effects were transitory. Different tendencies in trajectory across sessions were observed for measures of early and late responding within the interreinforcer interval, suggesting that temporal control is the result of two separate processes: one involved in response output and the other in time allocation of responding and not responding. PMID:16047607

  13. Racial bias shapes social reinforcement learning.

    PubMed

    Lindström, Björn; Selbing, Ida; Molapour, Tanaz; Olsson, Andreas

    2014-03-01

    Both emotional facial expressions and markers of racial-group belonging are ubiquitous signals in social interaction, but little is known about how these signals together affect future behavior through learning. To address this issue, we investigated how emotional (threatening or friendly) in-group and out-group faces reinforced behavior in a reinforcement-learning task. We asked whether reinforcement learning would be modulated by intergroup attitudes (i.e., racial bias). The results showed that individual differences in racial bias critically modulated reinforcement learning. As predicted, racial bias was associated with more efficiently learned avoidance of threatening out-group individuals. We used computational modeling analysis to quantitatively delimit the underlying processes affected by social reinforcement. These analyses showed that racial bias modulates the rate at which exposure to threatening out-group individuals is transformed into future avoidance behavior. In concert, these results shed new light on the learning processes underlying social interaction with racial-in-group and out-group individuals.

  14. Robust sensorimotor representation to physical interaction changes in humanoid motion learning.

    PubMed

    Shimizu, Toshihiko; Saegusa, Ryo; Ikemoto, Shuhei; Ishiguro, Hiroshi; Metta, Giorgio

    2015-05-01

    This paper proposes a learning from demonstration system based on a motion feature, called phase transfer sequence. The system aims to synthesize the knowledge on humanoid whole body motions learned during teacher-supported interactions, and apply this knowledge during different physical interactions between a robot and its surroundings. The phase transfer sequence represents the temporal order of the changing points in multiple time sequences. It encodes the dynamical aspects of the sequences so as to absorb the gaps in timing and amplitude derived from interaction changes. The phase transfer sequence was evaluated in reinforcement learning of sitting-up and walking motions conducted by a real humanoid robot and compatible simulator. In both tasks, the robotic motions were less dependent on physical interactions when learned by the proposed feature than by conventional similarity measurements. Phase transfer sequence also enhanced the convergence speed of motion learning. Our proposed feature is original primarily because it absorbs the gaps caused by changes of the originally acquired physical interactions, thereby enhancing the learning speed in subsequent interactions.

  15. The combination of appetitive and aversive reinforcers and the nature of their interaction during auditory learning.

    PubMed

    Ilango, A; Wetzel, W; Scheich, H; Ohl, F W

    2010-03-31

    Learned changes in behavior can be elicited by either appetitive or aversive reinforcers. It is, however, not clear whether the two types of motivation, (approaching appetitive stimuli and avoiding aversive stimuli) drive learning in the same or different ways, nor is their interaction understood in situations where the two types are combined in a single experiment. To investigate this question we have developed a novel learning paradigm for Mongolian gerbils, which not only allows rewards and punishments to be presented in isolation or in combination with each other, but also can use these opposite reinforcers to drive the same learned behavior. Specifically, we studied learning of tone-conditioned hurdle crossing in a shuttle box driven by either an appetitive reinforcer (brain stimulation reward) or an aversive reinforcer (electrical footshock), or by a combination of both. Combination of the two reinforcers potentiated speed of acquisition, led to maximum possible performance, and delayed extinction as compared to either reinforcer alone. Additional experiments, using partial reinforcement protocols and experiments in which one of the reinforcers was omitted after the animals had been previously trained with the combination of both reinforcers, indicated that appetitive and aversive reinforcers operated together but acted in different ways: in this particular experimental context, punishment appeared to be more effective for initial acquisition and reward more effective to maintain a high level of conditioned responses (CRs). The results imply that learning mechanisms in problem solving were maximally effective when the initial punishment of mistakes was combined with the subsequent rewarding of correct performance. Copyright 2010 IBRO. Published by Elsevier Ltd. All rights reserved.

  16. Prespeech motor learning in a neural network using reinforcement☆

    PubMed Central

    Warlaumont, Anne S.; Westermann, Gert; Buder, Eugene H.; Oller, D. Kimbrough

    2012-01-01

    Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one’s language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the differ-ent conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network’s post learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network’s post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model’s post-learning productions were more likely to resemble the English vowels and vice versa. PMID:23275137

  17. Fuzzy Sarsa with Focussed Replacing Eligibility Traces for Robust and Accurate Control

    NASA Astrophysics Data System (ADS)

    Kamdem, Sylvain; Ohki, Hidehiro; Sueda, Naomichi

    Several methods of reinforcement learning in continuous state and action spaces that utilize fuzzy logic have been proposed in recent years. This paper introduces Fuzzy Sarsa(λ), an on-policy algorithm for fuzzy learning that relies on a novel way of computing replacing eligibility traces to accelerate the policy evaluation. It is tested against several temporal difference learning algorithms: Sarsa(λ), Fuzzy Q(λ), an earlier fuzzy version of Sarsa and an actor-critic algorithm. We perform detailed evaluations on two benchmark problems : a maze domain and the cart pole. Results of various tests highlight the strengths and weaknesses of these algorithms and show that Fuzzy Sarsa(λ) outperforms all other algorithms tested for a larger granularity of design and under noisy conditions. It is a highly competitive method of learning in realistic noisy domains where a denser fuzzy design over the state space is needed for a more precise control.

  18. Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations.

    PubMed

    Lee, Jae Young; Park, Jin Bae; Choi, Yoon Ho

    2015-05-01

    This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral Q -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.

  19. Isolating Behavioral Mechanisms of Inter-Temporal Choice: Nicotine Effects on Delay Discounting and Amount Sensitivity

    ERIC Educational Resources Information Center

    Locey, Matthew L.; Dallery, Jesse

    2009-01-01

    Many drugs of abuse produce changes in impulsive choice, that is, choice for a smaller-sooner reinforcer over a larger-later reinforcer. Because the alternatives differ in both delay and amount, it is not clear whether these drug effects are due to the differences in reinforcer delay or amount. To isolate the effects of delay, we used a titrating…

  20. Reinforcement learning in complementarity game and population dynamics

    NASA Astrophysics Data System (ADS)

    Jost, Jürgen; Li, Wei

    2014-02-01

    We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005), 10.1016/j.physa.2004.07.005] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.

  1. The Memory Trace Supporting Lose-Shift Responding Decays Rapidly after Reward Omission and Is Distinct from Other Learning Mechanisms in Rats.

    PubMed

    Gruber, Aaron J; Thapa, Rajat

    2016-01-01

    The propensity of animals to shift choices immediately after unexpectedly poor reinforcement outcomes is a pervasive strategy across species and tasks. We report here that the memory supporting such lose-shift responding in rats rapidly decays during the intertrial interval and persists throughout training and testing on a binary choice task, despite being a suboptimal strategy. Lose-shift responding is not positively correlated with the prevalence and temporal dependence of win-stay responding, and it is inconsistent with predictions of reinforcement learning on the task. These data provide further evidence that win-stay and lose-shift are mediated by dissociated neural mechanisms and indicate that lose-shift responding presents a potential confound for the study of choice in the many operant choice tasks with short intertrial intervals. We propose that this immediate lose-shift responding is an intrinsic feature of the brain's choice mechanisms that is engaged as a choice reflex and works in parallel with reinforcement learning and other control mechanisms to guide action selection.

  2. Rules and mechanisms for efficient two-stage learning in neural circuits

    PubMed Central

    Teşileanu, Tiberiu; Ölveczky, Bence; Balasubramanian, Vijay

    2017-01-01

    Trial-and-error learning requires evaluating variable actions and reinforcing successful variants. In songbirds, vocal exploration is induced by LMAN, the output of a basal ganglia-related circuit that also contributes a corrective bias to the vocal output. This bias is gradually consolidated in RA, a motor cortex analogue downstream of LMAN. We develop a new model of such two-stage learning. Using stochastic gradient descent, we derive how the activity in ‘tutor’ circuits (e.g., LMAN) should match plasticity mechanisms in ‘student’ circuits (e.g., RA) to achieve efficient learning. We further describe a reinforcement learning framework through which the tutor can build its teaching signal. We show that mismatches between the tutor signal and the plasticity mechanism can impair learning. Applied to birdsong, our results predict the temporal structure of the corrective bias from LMAN given a plasticity rule in RA. Our framework can be applied predictively to other paired brain areas showing two-stage learning. DOI: http://dx.doi.org/10.7554/eLife.20944.001 PMID:28374674

  3. Risk-sensitive reinforcement learning.

    PubMed

    Shen, Yun; Tobia, Michael J; Sommer, Tobias; Obermayer, Klaus

    2014-07-01

    We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents' behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, 1979 ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used.

  4. Face processing in different brain areas, and critical band masking.

    PubMed

    Rolls, Edmund T

    2008-09-01

    Neurophysiological evidence is described showing that some neurons in the macaque inferior temporal visual cortex have responses that are invariant with respect to the position, size, view, and spatial frequency of faces and objects, and that these neurons show rapid processing and rapid learning. Critical band spatial frequency masking is shown to be a property of these face-selective neurons and of the human visual perception of faces. Which face or object is present is encoded using a distributed representation in which each neuron conveys independent information in its firing rate, with little information evident in the relative time of firing of different neurons. This ensemble encoding has the advantages of maximizing the information in the representation useful for discrimination between stimuli using a simple weighted sum of the neuronal firing by the receiving neurons, generalization, and graceful degradation. These invariant representations are ideally suited to provide the inputs to brain regions such as the orbitofrontal cortex and amygdala that learn the reinforcement associations of an individual's face, for then the learning, and the appropriate social and emotional responses generalize to other views of the same face. A theory is described of how such invariant representations may be produced by self-organizing learning in a hierarchically organized set of visual cortical areas with convergent connectivity. The theory utilizes either temporal or spatial continuity with an associative synaptic modification rule. Another population of neurons in the cortex in the superior temporal sulcus encodes other aspects of faces such as face expression, eye-gaze, face view, and whether the head is moving. These neurons thus provide important additional inputs to parts of the brain such as the orbitofrontal cortex and amygdala that are involved in social communication and emotional behaviour. Outputs of these systems reach the amygdala, in which face-selective neurons are found, and also the orbitofrontal cortex, in which some neurons are tuned to face identity and others to face expression. In humans, activation of the orbitofrontal cortex is found when a change of face expression acts as a social signal that behaviour should change; and damage to the human orbitofrontal and pregenual cingulate cortex can impair face and voice expression identification, and also the reversal of emotional behaviour that normally occurs when reinforcers are reversed.

  5. Saving the Best for Last? A Cross-Species Analysis of Choices between Reinforcer Sequences

    ERIC Educational Resources Information Center

    Andrade, Leonardo F.; Hackenberg, Timothy D.

    2012-01-01

    Two experiments were conducted to compare choices between sequences of reinforcers in pigeon (Experiment 1) and human (Experiment 2) subjects, using functionally analogous procedures. The subjects made pairwise choices among 3 sequence types, all of which provided the same overall reinforcement rate, but differed in their temporal patterning.…

  6. Reinforcer magnitude and rate dependency: evaluation of resistance-to-change mechanisms.

    PubMed

    Pinkston, Jonathan W; Ginsburg, Brett C; Lamb, Richard J

    2014-10-01

    Under many circumstances, reinforcer magnitude appears to modulate the rate-dependent effects of drugs such that when schedules arrange for relatively larger reinforcer magnitudes rate dependency is attenuated compared with behavior maintained by smaller magnitudes. The current literature on resistance to change suggests that increased reinforcer density strengthens operant behavior, and such strengthening effects appear to extend to the temporal control of behavior. As rate dependency may be understood as a loss of temporal control, the effects of reinforcer magnitude on rate dependency may be due to increased resistance to disruption of temporally controlled behavior. In the present experiments, pigeons earned different magnitudes of grain during signaled components of a multiple FI schedule. Three drugs, clonidine, haloperidol, and morphine, were examined. All three decreased overall rates of key pecking; however, only the effects of clonidine were attenuated as reinforcer magnitude increased. An analysis of within-interval performance found rate-dependent effects for clonidine and morphine; however, these effects were not modulated by reinforcer magnitude. In addition, we included prefeeding and extinction conditions, standard tests used to measure resistance to change. In general, rate-decreasing effects of prefeeding and extinction were attenuated by increasing reinforcer magnitudes. Rate-dependent analyses of prefeeding showed rate-dependency following those tests, but in no case were these effects modulated by reinforcer magnitude. The results suggest that a resistance-to-change interpretation of the effects of reinforcer magnitude on rate dependency is not viable.

  7. Probabilistic Reinforcement Learning in Adults with Autism Spectrum Disorders

    PubMed Central

    Solomon, Marjorie; Smith, Anne C.; Frank, Michael J.; Ly, Stanford; Carter, Cameron S.

    2017-01-01

    Background Autism spectrum disorders (ASDs) can be conceptualized as disorders of learning, however there have been few experimental studies taking this perspective. Methods We examined the probabilistic reinforcement learning performance of 28 adults with ASDs and 30 typically developing adults on a task requiring learning relationships between three stimulus pairs consisting of Japanese characters with feedback that was valid with different probabilities (80%, 70%, and 60%). Both univariate and Bayesian state–space data analytic methods were employed. Hypotheses were based on the extant literature as well as on neurobiological and computational models of reinforcement learning. Results Both groups learned the task after training. However, there were group differences in early learning in the first task block where individuals with ASDs acquired the most frequently accurately reinforced stimulus pair (80%) comparably to typically developing individuals; exhibited poorer acquisition of the less frequently reinforced 70% pair as assessed by state–space learning curves; and outperformed typically developing individuals on the near chance (60%) pair. Individuals with ASDs also demonstrated deficits in using positive feedback to exploit rewarded choices. Conclusions Results support the contention that individuals with ASDs are slower learners. Based on neurobiology and on the results of computational modeling, one interpretation of this pattern of findings is that impairments are related to deficits in flexible updating of reinforcement history as mediated by the orbito-frontal cortex, with spared functioning of the basal ganglia. This hypothesis about the pathophysiology of learning in ASDs can be tested using functional magnetic resonance imaging. PMID:21425243

  8. Life Span Differences in Electrophysiological Correlates of Monitoring Gains and Losses during Probabilistic Reinforcement Learning

    ERIC Educational Resources Information Center

    Hammerer, Dorothea; Li, Shu-Chen; Muller, Viktor; Lindenberger, Ulman

    2011-01-01

    By recording the feedback-related negativity (FRN) in response to gains and losses, we investigated the contribution of outcome monitoring mechanisms to age-associated differences in probabilistic reinforcement learning. Specifically, we assessed the difference of the monitoring reactions to gains and losses to investigate the monitoring of…

  9. Altered neural encoding of prediction errors in assault-related posttraumatic stress disorder.

    PubMed

    Ross, Marisa C; Lenow, Jennifer K; Kilts, Clinton D; Cisler, Josh M

    2018-05-12

    Posttraumatic stress disorder (PTSD) is widely associated with deficits in extinguishing learned fear responses, which relies on mechanisms of reinforcement learning (e.g., updating expectations based on prediction errors). However, the degree to which PTSD is associated with impairments in general reinforcement learning (i.e., outside of the context of fear stimuli) remains poorly understood. Here, we investigate brain and behavioral differences in general reinforcement learning between adult women with and without a current diagnosis of PTSD. 29 adult females (15 PTSD with exposure to assaultive violence, 14 controls) underwent a neutral reinforcement-learning task (i.e., two arm bandit task) during fMRI. We modeled participant behavior using different adaptations of the Rescorla-Wagner (RW) model and used Independent Component Analysis to identify timecourses for large-scale a priori brain networks. We found that an anticorrelated and risk sensitive RW model best fit participant behavior, with no differences in computational parameters between groups. Women in the PTSD group demonstrated significantly less neural encoding of prediction errors in both a ventral striatum/mPFC and anterior insula network compared to healthy controls. Weakened encoding of prediction errors in the ventral striatum/mPFC and anterior insula during a general reinforcement learning task, outside of the context of fear stimuli, suggests the possibility of a broader conceptualization of learning differences in PTSD than currently proposed in current neurocircuitry models of PTSD. Copyright © 2018 Elsevier Ltd. All rights reserved.

  10. Neural correlates of forward planning in a spatial decision task in humans

    PubMed Central

    Simon, Dylan Alexander; Daw, Nathaniel D.

    2011-01-01

    Although reinforcement learning (RL) theories have been influential in characterizing the brain’s mechanisms for reward-guided choice, the predominant temporal difference (TD) algorithm cannot explain many flexible or goal-directed actions that have been demonstrated behaviorally. We investigate such actions by contrasting an RL algorithm that is model-based, in that it relies on learning a map or model of the task and planning within it, to traditional model-free TD learning. To distinguish these approaches in humans, we used fMRI in a continuous spatial navigation task, in which frequent changes to the layout of the maze forced subjects continually to relearn their favored routes, thereby exposing the RL mechanisms employed. We sought evidence for the neural substrates of such mechanisms by comparing choice behavior and BOLD signals to decision variables extracted from simulations of either algorithm. Both choices and value-related BOLD signals in striatum, though most often associated with TD learning, were better explained by the model-based theory. Further, predecessor quantities for the model-based value computation were correlated with BOLD signals in the medial temporal lobe and frontal cortex. These results point to a significant extension of both the computational and anatomical substrates for RL in the brain. PMID:21471389

  11. The Role of the Posterior Temporal and Medial Prefrontal Cortices in Mediating Learning from Romantic Interest and Rejection

    PubMed Central

    Cooper, Jeffrey C.; Dunne, Simon; Furey, Teresa; O'Doherty, John P.

    2014-01-01

    Romantic interest or rejection can be powerful incentives not merely for their emotional impact, but for their potential to transform, in a single interaction, what we think we know about another person—or ourselves. Little is known, though, about how the brain computes expectations for, and learns from, real-world romantic signals. In a novel “speed-dating” paradigm, we had participants meet potential romantic partners in a series of 5-min “dates,” and decide whether they would be interested in seeing each partner again. Afterward, participants were scanned with functional magnetic resonance imaging while they were told, for the first time, whether that partner was interested in them or rejected them. Expressions of interest and rejection activated regions previously associated with “mentalizing,” including the posterior superior temporal sulcus (pSTS) and rostromedial prefrontal cortex (RMPFC); while pSTS responded to differences from the participant's own decision, RMPFC responded to prediction errors from a reinforcement-learning model of personal desirability. Responses in affective regions were also highly sensitive to participants' expectations. Far from being inscrutable, then, responses to romantic expressions seem to involve a quantitative learning process, rooted in distinct sources of expectations, and encoded in neural networks that process both affective value and social beliefs. PMID:23599165

  12. Can model-free reinforcement learning explain deontological moral judgments?

    PubMed

    Ayars, Alisabeth

    2016-05-01

    Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Learning and altering behaviours by reinforcement: neurocognitive differences between children and adults.

    PubMed

    Shephard, E; Jackson, G M; Groom, M J

    2014-01-01

    This study examined neurocognitive differences between children and adults in the ability to learn and adapt simple stimulus-response associations through feedback. Fourteen typically developing children (mean age=10.2) and 15 healthy adults (mean age=25.5) completed a simple task in which they learned to associate visually presented stimuli with manual responses based on performance feedback (acquisition phase), and then reversed and re-learned those associations following an unexpected change in reinforcement contingencies (reversal phase). Electrophysiological activity was recorded throughout task performance. We found no group differences in learning-related changes in performance (reaction time, accuracy) or in the amplitude of event-related potentials (ERPs) associated with stimulus processing (P3 ERP) or feedback processing (feedback-related negativity; FRN) during the acquisition phase. However, children's performance was significantly more disrupted by the reversal than adults and FRN amplitudes were significantly modulated by the reversal phase in children but not adults. These findings indicate that children have specific difficulties with reinforcement learning when acquired behaviours must be altered. This may be caused by the added demands on immature executive functioning, specifically response monitoring, created by the requirement to reverse the associations, or a developmental difference in the way in which children and adults approach reinforcement learning. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.

  14. Prespeech motor learning in a neural network using reinforcement.

    PubMed

    Warlaumont, Anne S; Westermann, Gert; Buder, Eugene H; Oller, D Kimbrough

    2013-02-01

    Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one's language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the different conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network's post-learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network's post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model's post-learning productions were more likely to resemble the English vowels and vice versa. Copyright © 2012 Elsevier Ltd. All rights reserved.

  15. Clipping in neurocontrol by adaptive dynamic programming.

    PubMed

    Fairbank, Michael; Prokhorov, Danil; Alonso, Eduardo

    2014-10-01

    In adaptive dynamic programming, neurocontrol, and reinforcement learning, the objective is for an agent to learn to choose actions so as to minimize a total cost function. In this paper, we show that when discretized time is used to model the motion of the agent, it can be very important to do clipping on the motion of the agent in the final time step of the trajectory. By clipping, we mean that the final time step of the trajectory is to be truncated such that the agent stops exactly at the first terminal state reached, and no distance further. We demonstrate that when clipping is omitted, learning performance can fail to reach the optimum, and when clipping is done properly, learning performance can improve significantly. The clipping problem we describe affects algorithms that use explicit derivatives of the model functions of the environment to calculate a learning gradient. These include backpropagation through time for control and methods based on dual heuristic programming. However, the clipping problem does not significantly affect methods based on heuristic dynamic programming, temporal differences learning, or policy-gradient learning algorithms.

  16. Effect of reinforcement learning on coordination of multiangent systems

    NASA Astrophysics Data System (ADS)

    Bukkapatnam, Satish T. S.; Gao, Greg

    2000-12-01

    For effective coordination of distributed environments involving multiagent systems, learning ability of each agent in the environment plays a crucial role. In this paper, we develop a simple group learning method based on reinforcement, and study its effect on coordination through application to a supply chain procurement scenario involving a computer manufacturer. Here, all parties are represented by self-interested, autonomous agents, each capable of performing specific simple tasks. They negotiate with each other to perform complex tasks and thus coordinate supply chain procurement. Reinforcement learning is intended to enable each agent to reach a best negotiable price within a shortest possible time. Our simulations of the application scenario under different learning strategies reveals the positive effects of reinforcement learning on an agent's as well as the system's performance.

  17. Infant Contingency Learning in Different Cultural Contexts

    ERIC Educational Resources Information Center

    Graf, Frauke; Lamm, Bettina; Goertz, Claudia; Kolling, Thorsten; Freitag, Claudia; Spangler, Sibylle; Fassbender, Ina; Teubert, Manuel; Vierhaus, Marc; Keller, Heidi; Lohaus, Arnold; Schwarzer, Gudrun; Knopf, Monika

    2012-01-01

    Three-month-old Cameroonian Nso farmer and German middle-class infants were compared regarding learning and retention in a computerized mobile task. Infants achieving a preset learning criterion during reinforcement were tested for immediate and long-term retention measured in terms of an increased response rate after reinforcement and after a…

  18. Neural Basis of Reinforcement Learning and Decision Making

    PubMed Central

    Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan

    2012-01-01

    Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain. PMID:22462543

  19. The effects of aging on the interaction between reinforcement learning and attention.

    PubMed

    Radulescu, Angela; Daniel, Reka; Niv, Yael

    2016-11-01

    Reinforcement learning (RL) in complex environments relies on selective attention to uncover those aspects of the environment that are most predictive of reward. Whereas previous work has focused on age-related changes in RL, it is not known whether older adults learn differently from younger adults when selective attention is required. In 2 experiments, we examined how aging affects the interaction between RL and selective attention. Younger and older adults performed a learning task in which only 1 stimulus dimension was relevant to predicting reward, and within it, 1 "target" feature was the most rewarding. Participants had to discover this target feature through trial and error. In Experiment 1, stimuli varied on 1 or 3 dimensions and participants received hints that revealed the target feature, the relevant dimension, or gave no information. Group-related differences in accuracy and RTs differed systematically as a function of the number of dimensions and the type of hint available. In Experiment 2 we used trial-by-trial computational modeling of the learning process to test for age-related differences in learning strategies. Behavior of both young and older adults was explained well by a reinforcement-learning model that uses selective attention to constrain learning. However, the model suggested that older adults restricted their learning to fewer features, employing more focused attention than younger adults. Furthermore, this difference in strategy predicted age-related deficits in accuracy. We discuss these results suggesting that a narrower filter of attention may reflect an adaptation to the reduced capabilities of the reinforcement learning system. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  20. Heterogeneous Suppression of Sequential Effects in Random Sequence Generation, but Not in Operant Learning.

    PubMed

    Shteingart, Hanan; Loewenstein, Yonatan

    2016-01-01

    There is a long history of experiments in which participants are instructed to generate a long sequence of binary random numbers. The scope of this line of research has shifted over the years from identifying the basic psychological principles and/or the heuristics that lead to deviations from randomness, to one of predicting future choices. In this paper, we used generalized linear regression and the framework of Reinforcement Learning in order to address both points. In particular, we used logistic regression analysis in order to characterize the temporal sequence of participants' choices. Surprisingly, a population analysis indicated that the contribution of the most recent trial has only a weak effect on behavior, compared to more preceding trials, a result that seems irreconcilable with standard sequential effects that decay monotonously with the delay. However, when considering each participant separately, we found that the magnitudes of the sequential effect are a monotonous decreasing function of the delay, yet these individual sequential effects are largely averaged out in a population analysis because of heterogeneity. The substantial behavioral heterogeneity in this task is further demonstrated quantitatively by considering the predictive power of the model. We show that a heterogeneous model of sequential dependencies captures the structure available in random sequence generation. Finally, we show that the results of the logistic regression analysis can be interpreted in the framework of reinforcement learning, allowing us to compare the sequential effects in the random sequence generation task to those in an operant learning task. We show that in contrast to the random sequence generation task, sequential effects in operant learning are far more homogenous across the population. These results suggest that in the random sequence generation task, different participants adopt different cognitive strategies to suppress sequential dependencies when generating the "random" sequences.

  1. Relatively high motivation for context-evoked reward produces the magnitude effect in rats.

    PubMed

    Yuki, Shoko; Okanoya, Kazuo

    2014-09-01

    Using a concurrent-chain schedule, we demonstrated the effect of absolute reinforcement (i.e., the magnitude effect) on choice behavior in rats. In general, animals' simultaneous choices conform to a relative reinforcement ratio between alternatives. However, studies in pigeons and rats have found that on a concurrent-chain schedule, the overall reinforcement ratio, or absolute amount, also influences choice. The effect of reinforcement amount has also been studied in inter-temporal choice situations, and this effect has been referred to as the magnitude effect. The magnitude effect has been observed in humans under various conditions, but little research has assessed it in animals (e.g., pigeons and rats). The present study confirmed the effect of reinforcement amount in rats during simultaneous and inter-temporal choice situations. We used a concurrent-chain procedure to examine the cause of the magnitude effect during inter-temporal choice. Our results suggest that rats can use differences in reinforcement amount as a contextual cue during choice, and the direction of the magnitude effect in rats might be similar to humans when using the present procedure. Furthermore, our results indicate that the magnitude effect was caused by the initial-link effect when the reinforcement amount was relatively small, while a loss aversion tendency was observed when the reinforcement amount changed within a session. The emergence of the initial-link effect and loss aversion suggests that rats make choices through cognitive processes predicted by prospect theory. Copyright © 2014 Elsevier B.V. All rights reserved.

  2. Nature vs Nurture: Effects of Learning on Evolution

    NASA Astrophysics Data System (ADS)

    Nagrani, Nagina

    In the field of Evolutionary Robotics, the design, development and application of artificial neural networks as controllers have derived their inspiration from biology. Biologists and artificial intelligence researchers are trying to understand the effects of neural network learning during the lifetime of the individuals on evolution of these individuals by qualitative and quantitative analyses. The conclusion of these analyses can help develop optimized artificial neural networks to perform any given task. The purpose of this thesis is to study the effects of learning on evolution. This has been done by applying Temporal Difference Reinforcement Learning methods to the evolution of Artificial Neural Tissue controller. The controller has been assigned the task to collect resources in a designated area in a simulated environment. The performance of the individuals is measured by the amount of resources collected. A comparison has been made between the results obtained by incorporating learning in evolution and evolution alone. The effects of learning parameters: learning rate, training period, discount rate, and policy on evolution have also been studied. It was observed that learning delays the performance of the evolving individuals over the generations. However, the non zero learning rate throughout the evolution process signifies natural selection preferring individuals possessing plasticity.

  3. The nature of sexual reinforcement.

    PubMed Central

    Crawford, L L; Holloway, K S; Domjan, M

    1993-01-01

    Sexual reinforcers are not part of a regulatory system involved in the maintenance of critical metabolic processes, they differ for males and females, they differ as a function of species and mating system, and they show ontogenetic and seasonal changes related to endocrine conditions. Exposure to a member of the opposite sex without copulation can be sufficient for sexual reinforcement. However, copulatory access is a stronger reinforcer, and copulatory opportunity can serve to enhance the reinforcing efficacy of stimulus features of a sexual partner. Conversely, under certain conditions, noncopulatory exposure serves to decrease reinforcer efficacy. Many common learning phenomena such as acquisition, extinction, discrimination learning, second-order conditioning, and latent inhibition have been demonstrated in sexual conditioning. These observations extend the generality of findings obtained with more conventional reinforcers, but the mechanisms of these effects and their gender and species specificity remain to be explored. PMID:8354970

  4. Reinforcement learning agents providing advice in complex video games

    NASA Astrophysics Data System (ADS)

    Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa

    2014-01-01

    This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

  5. Dorsal Striatal-Midbrain Connectivity in Humans Predicts How Reinforcements Are Used to Guide Decisions

    ERIC Educational Resources Information Center

    Kahnt, Thorsten; Park, Soyoung Q.; Cohen, Michael X.; Beck, Anne; Heinz, Andreas; Wrase, Jana

    2009-01-01

    It has been suggested that the target areas of dopaminergic midbrain neurons, the dorsal (DS) and ventral striatum (VS), are differently involved in reinforcement learning especially as actor and critic. Whereas the critic learns to predict rewards, the actor maintains action values to guide future decisions. The different midbrain connections to…

  6. Influence of Temporal Context on Value in the Multiple-Chains and Successive-Encounters Procedures

    ERIC Educational Resources Information Center

    O'Daly, Matthew; Angulo, Samuel; Gipson, Cassandra; Fantino, Edmund

    2006-01-01

    This set of studies explored the influence of temporal context across multiple-chain and multiple-successive-encounters procedures. Following training with different temporal contexts, the value of stimuli sharing similar reinforcement schedules was assessed by presenting these stimuli in concurrent probes. The results for the multiple-chain…

  7. Knockout crickets for the study of learning and memory: Dopamine receptor Dop1 mediates aversive but not appetitive reinforcement in crickets.

    PubMed

    Awata, Hiroko; Watanabe, Takahito; Hamanaka, Yoshitaka; Mito, Taro; Noji, Sumihare; Mizunami, Makoto

    2015-11-02

    Elucidation of reinforcement mechanisms in associative learning is an important subject in neuroscience. In mammals, dopamine neurons are thought to play critical roles in mediating both appetitive and aversive reinforcement. Our pharmacological studies suggested that octopamine and dopamine neurons mediate reward and punishment, respectively, in crickets, but recent studies in fruit-flies concluded that dopamine neurons mediates both reward and punishment, via the type 1 dopamine receptor Dop1. To resolve the discrepancy between studies in different insect species, we produced Dop1 knockout crickets using the CRISPR/Cas9 system and found that they are defective in aversive learning with sodium chloride punishment but not appetitive learning with water or sucrose reward. The results suggest that dopamine and octopamine neurons mediate aversive and appetitive reinforcement, respectively, in crickets. We suggest unexpected diversity in neurotransmitters mediating appetitive reinforcement between crickets and fruit-flies, although the neurotransmitter mediating aversive reinforcement is conserved. This study demonstrates usefulness of the CRISPR/Cas9 system for producing knockout animals for the study of learning and memory.

  8. A reward optimization method based on action subrewards in hierarchical reinforcement learning.

    PubMed

    Fu, Yuchen; Liu, Quan; Ling, Xionghong; Cui, Zhiming

    2014-01-01

    Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

  9. Generating Adaptive Behaviour within a Memory-Prediction Framework

    PubMed Central

    Rawlinson, David; Kowadlo, Gideon

    2012-01-01

    The Memory-Prediction Framework (MPF) and its Hierarchical-Temporal Memory implementation (HTM) have been widely applied to unsupervised learning problems, for both classification and prediction. To date, there has been no attempt to incorporate MPF/HTM in reinforcement learning or other adaptive systems; that is, to use knowledge embodied within the hierarchy to control a system, or to generate behaviour for an agent. This problem is interesting because the human neocortex is believed to play a vital role in the generation of behaviour, and the MPF is a model of the human neocortex. We propose some simple and biologically-plausible enhancements to the Memory-Prediction Framework. These cause it to explore and interact with an external world, while trying to maximize a continuous, time-varying reward function. All behaviour is generated and controlled within the MPF hierarchy. The hierarchy develops from a random initial configuration by interaction with the world and reinforcement learning only. Among other demonstrations, we show that a 2-node hierarchy can learn to successfully play “rocks, paper, scissors” against a predictable opponent. PMID:22272231

  10. Social Cognition as Reinforcement Learning: Feedback Modulates Emotion Inference.

    PubMed

    Zaki, Jamil; Kallman, Seth; Wimmer, G Elliott; Ochsner, Kevin; Shohamy, Daphna

    2016-09-01

    Neuroscientific studies of social cognition typically employ paradigms in which perceivers draw single-shot inferences about the internal states of strangers. Real-world social inference features much different parameters: People often encounter and learn about particular social targets (e.g., friends) over time and receive feedback about whether their inferences are correct or incorrect. Here, we examined this process and, more broadly, the intersection between social cognition and reinforcement learning. Perceivers were scanned using fMRI while repeatedly encountering three social targets who produced conflicting visual and verbal emotional cues. Perceivers guessed how targets felt and received feedback about whether they had guessed correctly. Visual cues reliably predicted one target's emotion, verbal cues predicted a second target's emotion, and neither reliably predicted the third target's emotion. Perceivers successfully used this information to update their judgments over time. Furthermore, trial-by-trial learning signals-estimated using two reinforcement learning models-tracked activity in ventral striatum and ventromedial pFC, structures associated with reinforcement learning, and regions associated with updating social impressions, including TPJ. These data suggest that learning about others' emotions, like other forms of feedback learning, relies on domain-general reinforcement mechanisms as well as domain-specific social information processing.

  11. Learning the specific quality of taste reinforcement in larval Drosophila.

    PubMed

    Schleyer, Michael; Miura, Daisuke; Tanimura, Teiichi; Gerber, Bertram

    2015-01-27

    The only property of reinforcement insects are commonly thought to learn about is its value. We show that larval Drosophila not only remember the value of reinforcement (How much?), but also its quality (What?). This is demonstrated both within the appetitive domain by using sugar vs amino acid as different reward qualities, and within the aversive domain by using bitter vs high-concentration salt as different qualities of punishment. From the available literature, such nuanced memories for the quality of reinforcement are unexpected and pose a challenge to present models of how insect memory is organized. Given that animals as simple as larval Drosophila, endowed with but 10,000 neurons, operate with both reinforcement value and quality, we suggest that both are fundamental aspects of mnemonic processing-in any brain.

  12. The probability of reinforcement per trial affects posttrial responding and subsequent extinction but not within-trial responding.

    PubMed

    Harris, Justin A; Kwok, Dorothy W S

    2018-01-01

    During magazine approach conditioning, rats do not discriminate between a conditional stimulus (CS) that is consistently reinforced with food and a CS that is occasionally (partially) reinforced, as long as the CSs have the same overall reinforcement rate per second. This implies that rats are indifferent to the probability of reinforcement per trial. However, in the same rats, the per-trial reinforcement rate will affect subsequent extinction-responding extinguishes more rapidly for a CS that was consistently reinforced than for a partially reinforced CS. Here, we trained rats with consistently and partially reinforced CSs that were matched for overall reinforcement rate per second. We measured conditioned responding both during and immediately after the CSs. Differences in the per-trial probability of reinforcement did not affect the acquisition of responding during the CS but did affect subsequent extinction of that responding, and also affected the post-CS response rates during conditioning. Indeed, CSs with the same probability of reinforcement per trial evoked the same amount of post-CS responding even when they differed in overall reinforcement rate and thus evoked different amounts of responding during the CS. We conclude that reinforcement rate per second controls rats' acquisition of responding during the CS, but at the same time, rats also learn specifically about the probability of reinforcement per trial. The latter learning affects the rats' expectation of reinforcement as an outcome of the trial, which influences their ability to detect retrospectively that an opportunity for reinforcement was missed, and, in turn, drives extinction. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  13. Stimulus discriminability may bias value-based probabilistic learning.

    PubMed

    Schutte, Iris; Slagter, Heleen A; Collins, Anne G E; Frank, Michael J; Kenemans, J Leon

    2017-01-01

    Reinforcement learning tasks are often used to assess participants' tendency to learn more from the positive or more from the negative consequences of one's action. However, this assessment often requires comparison in learning performance across different task conditions, which may differ in the relative salience or discriminability of the stimuli associated with more and less rewarding outcomes, respectively. To address this issue, in a first set of studies, participants were subjected to two versions of a common probabilistic learning task. The two versions differed with respect to the stimulus (Hiragana) characters associated with reward probability. The assignment of character to reward probability was fixed within version but reversed between versions. We found that performance was highly influenced by task version, which could be explained by the relative perceptual discriminability of characters assigned to high or low reward probabilities, as assessed by a separate discrimination experiment. Participants were more reliable in selecting rewarding characters that were more discriminable, leading to differences in learning curves and their sensitivity to reward probability. This difference in experienced reinforcement history was accompanied by performance biases in a test phase assessing ability to learn from positive vs. negative outcomes. In a subsequent large-scale web-based experiment, this impact of task version on learning and test measures was replicated and extended. Collectively, these findings imply a key role for perceptual factors in guiding reward learning and underscore the need to control stimulus discriminability when making inferences about individual differences in reinforcement learning.

  14. Dissociating hippocampal and striatal contributions to sequential prediction learning

    PubMed Central

    Bornstein, Aaron M.; Daw, Nathaniel D.

    2011-01-01

    Behavior may be generated on the basis of many different kinds of learned contingencies. For instance, responses could be guided by the direct association between a stimulus and response, or by sequential stimulus-stimulus relationships (as in model-based reinforcement learning or goal-directed actions). However, the neural architecture underlying sequential predictive learning is not well-understood, in part because it is difficult to isolate its effect on choice behavior. To track such learning more directly, we examined reaction times (RTs) in a probabilistic sequential picture identification task. We used computational learning models to isolate trial-by-trial effects of two distinct learning processes in behavior, and used these as signatures to analyze the separate neural substrates of each process. RTs were best explained via the combination of two delta rule learning processes with different learning rates. To examine neural manifestations of these learning processes, we used functional magnetic resonance imaging to seek correlates of timeseries related to expectancy or surprise. We observed such correlates in two regions, hippocampus and striatum. By estimating the learning rates best explaining each signal, we verified that they were uniquely associated with one of the two distinct processes identified behaviorally. These differential correlates suggest that complementary anticipatory functions drive each region's effect on behavior. Our results provide novel insights as to the quantitative computational distinctions between medial temporal and basal ganglia learning networks and enable experiments that exploit trial-by-trial measurement of the unique contributions of both hippocampus and striatum to response behavior. PMID:22487032

  15. Does temporal discounting explain unhealthy behavior? A systematic review and reinforcement learning perspective

    PubMed Central

    Story, Giles W.; Vlaev, Ivo; Seymour, Ben; Darzi, Ara; Dolan, Raymond J.

    2014-01-01

    The tendency to make unhealthy choices is hypothesized to be related to an individual's temporal discount rate, the theoretical rate at which they devalue delayed rewards. Furthermore, a particular form of temporal discounting, hyperbolic discounting, has been proposed to explain why unhealthy behavior can occur despite healthy intentions. We examine these two hypotheses in turn. We first systematically review studies which investigate whether discount rates can predict unhealthy behavior. These studies reveal that high discount rates for money (and in some instances food or drug rewards) are associated with several unhealthy behaviors and markers of health status, establishing discounting as a promising predictive measure. We secondly examine whether intention-incongruent unhealthy actions are consistent with hyperbolic discounting. We conclude that intention-incongruent actions are often triggered by environmental cues or changes in motivational state, whose effects are not parameterized by hyperbolic discounting. We propose a framework for understanding these state-based effects in terms of the interplay of two distinct reinforcement learning mechanisms: a “model-based” (or goal-directed) system and a “model-free” (or habitual) system. Under this framework, while discounting of delayed health may contribute to the initiation of unhealthy behavior, with repetition, many unhealthy behaviors become habitual; if health goals then change, habitual behavior can still arise in response to environmental cues. We propose that the burgeoning development of computational models of these processes will permit further identification of health decision-making phenotypes. PMID:24659960

  16. Does temporal discounting explain unhealthy behavior? A systematic review and reinforcement learning perspective.

    PubMed

    Story, Giles W; Vlaev, Ivo; Seymour, Ben; Darzi, Ara; Dolan, Raymond J

    2014-01-01

    The tendency to make unhealthy choices is hypothesized to be related to an individual's temporal discount rate, the theoretical rate at which they devalue delayed rewards. Furthermore, a particular form of temporal discounting, hyperbolic discounting, has been proposed to explain why unhealthy behavior can occur despite healthy intentions. We examine these two hypotheses in turn. We first systematically review studies which investigate whether discount rates can predict unhealthy behavior. These studies reveal that high discount rates for money (and in some instances food or drug rewards) are associated with several unhealthy behaviors and markers of health status, establishing discounting as a promising predictive measure. We secondly examine whether intention-incongruent unhealthy actions are consistent with hyperbolic discounting. We conclude that intention-incongruent actions are often triggered by environmental cues or changes in motivational state, whose effects are not parameterized by hyperbolic discounting. We propose a framework for understanding these state-based effects in terms of the interplay of two distinct reinforcement learning mechanisms: a "model-based" (or goal-directed) system and a "model-free" (or habitual) system. Under this framework, while discounting of delayed health may contribute to the initiation of unhealthy behavior, with repetition, many unhealthy behaviors become habitual; if health goals then change, habitual behavior can still arise in response to environmental cues. We propose that the burgeoning development of computational models of these processes will permit further identification of health decision-making phenotypes.

  17. A Novel Task for the Investigation of Action Acquisition

    PubMed Central

    Stafford, Tom; Thirkettle, Martin; Walton, Tom; Vautrelle, Nicolas; Hetherington, Len; Port, Michael; Gurney, Kevin; Redgrave, Pete

    2012-01-01

    We present a behavioural task designed for the investigation of how novel instrumental actions are discovered and learnt. The task consists of free movement with a manipulandum, during which the full range of possible movements can be explored by the participant and recorded. A subset of these movements, the ‘target’, is set to trigger a reinforcing signal. The task is to discover what movements of the manipulandum evoke the reinforcement signal. Targets can be defined in spatial, temporal, or kinematic terms, can be a combination of these aspects, or can represent the concatenation of actions into a larger gesture. The task allows the study of how the specific elements of behaviour which cause the reinforcing signal are identified, refined and stored by the participant. The task provides a paradigm where the exploratory motive drives learning and as such we view it as in the tradition of Thorndike [1]. Most importantly it allows for repeated measures, since when a novel action is acquired the criterion for triggering reinforcement can be changed requiring a new action to be discovered. Here, we present data using both humans and rats as subjects, showing that our task is easily scalable in difficulty, adaptable across species, and produces a rich set of behavioural measures offering new and valuable insight into the action learning process. PMID:22675490

  18. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning

    PubMed Central

    Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, BJ

    2014-01-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents towards action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggests possible explanations for how peers may motivate adolescent behavior. PMID:24550063

  19. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning.

    PubMed

    Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, B J

    2014-06-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.

  20. Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: a simulated robotic study.

    PubMed

    Mirolli, Marco; Santucci, Vieri G; Baldassarre, Gianluca

    2013-03-01

    An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions. Copyright © 2013 Elsevier Ltd. All rights reserved.

  1. Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.

    PubMed

    Schönberg, Tom; Daw, Nathaniel D; Joel, Daphna; O'Doherty, John P

    2007-11-21

    The computational framework of reinforcement learning has been used to forward our understanding of the neural mechanisms underlying reward learning and decision-making behavior. It is known that humans vary widely in their performance in decision-making tasks. Here, we used a simple four-armed bandit task in which subjects are almost evenly split into two groups on the basis of their performance: those who do learn to favor choice of the optimal action and those who do not. Using models of reinforcement learning we sought to determine the neural basis of these intrinsic differences in performance by scanning both groups with functional magnetic resonance imaging. We scanned 29 subjects while they performed the reward-based decision-making task. Our results suggest that these two groups differ markedly in the degree to which reinforcement learning signals in the striatum are engaged during task performance. While the learners showed robust prediction error signals in both the ventral and dorsal striatum during learning, the nonlearner group showed a marked absence of such signals. Moreover, the magnitude of prediction error signals in a region of dorsal striatum correlated significantly with a measure of behavioral performance across all subjects. These findings support a crucial role of prediction error signals, likely originating from dopaminergic midbrain neurons, in enabling learning of action selection preferences on the basis of obtained rewards. Thus, spontaneously observed individual differences in decision making performance demonstrate the suggested dependence of this type of learning on the functional integrity of the dopaminergic striatal system in humans.

  2. Learning to make collective decisions: the impact of confidence escalation.

    PubMed

    Mahmoodi, Ali; Bang, Dan; Ahmadabadi, Majid Nili; Bahrami, Bahador

    2013-01-01

    Little is known about how people learn to take into account others' opinions in joint decisions. To address this question, we combined computational and empirical approaches. Human dyads made individual and joint visual perceptual decision and rated their confidence in those decisions (data previously published). We trained a reinforcement (temporal difference) learning agent to get the participants' confidence level and learn to arrive at a dyadic decision by finding the policy that either maximized the accuracy of the model decisions or maximally conformed to the empirical dyadic decisions. When confidences were shared visually without verbal interaction, RL agents successfully captured social learning. When participants exchanged confidences visually and interacted verbally, no collective benefit was achieved and the model failed to predict the dyadic behaviour. Behaviourally, dyad members' confidence increased progressively and verbal interaction accelerated this escalation. The success of the model in drawing collective benefit from dyad members was inversely related to confidence escalation rate. The findings show an automated learning agent can, in principle, combine individual opinions and achieve collective benefit but the same agent cannot discount the escalation suggesting that one cognitive component of collective decision making in human may involve discounting of overconfidence arising from interactions.

  3. Effective reinforcement learning following cerebellar damage requires a balance between exploration and motor noise.

    PubMed

    Therrien, Amanda S; Wolpert, Daniel M; Bastian, Amy J

    2016-01-01

    Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. © The Author (2015). Published by Oxford University Press on behalf of the Guarantors of Brain.

  4. Effective reinforcement learning following cerebellar damage requires a balance between exploration and motor noise

    PubMed Central

    Therrien, Amanda S.; Wolpert, Daniel M.

    2016-01-01

    Abstract See Miall and Galea (doi: 10.1093/awv343 ) for a scientific commentary on this article. Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. PMID:26626368

  5. Mobile robots exploration through cnn-based reinforcement learning.

    PubMed

    Tai, Lei; Liu, Ming

    2016-01-01

    Exploration in an unknown environment is an elemental application for mobile robots. In this paper, we outlined a reinforcement learning method aiming for solving the exploration problem in a corridor environment. The learning model took the depth image from an RGB-D sensor as the only input. The feature representation of the depth image was extracted through a pre-trained convolutional-neural-networks model. Based on the recent success of deep Q-network on artificial intelligence, the robot controller achieved the exploration and obstacle avoidance abilities in several different simulated environments. It is the first time that the reinforcement learning is used to build an exploration strategy for mobile robots through raw sensor information.

  6. Delayed Reinforcement of Operant Behavior

    ERIC Educational Resources Information Center

    Lattal, Kennon A.

    2010-01-01

    The experimental analysis of delay of reinforcement is considered from the perspective of three questions that seem basic not only to understanding delay of reinforcement but also, by implication, the contributions of temporal relations between events to operant behavior. The first question is whether effects of the temporal relation between…

  7. Deep reinforcement learning for automated radiation adaptation in lung cancer.

    PubMed

    Tseng, Huan-Hsin; Luo, Yi; Cui, Sunan; Chien, Jen-Tzung; Ten Haken, Randall K; Naqa, Issam El

    2017-12-01

    To investigate deep reinforcement learning (DRL) based on historical treatment plans for developing automated radiation adaptation protocols for nonsmall cell lung cancer (NSCLC) patients that aim to maximize tumor local control at reduced rates of radiation pneumonitis grade 2 (RP2). In a retrospective population of 114 NSCLC patients who received radiotherapy, a three-component neural networks framework was developed for deep reinforcement learning (DRL) of dose fractionation adaptation. Large-scale patient characteristics included clinical, genetic, and imaging radiomics features in addition to tumor and lung dosimetric variables. First, a generative adversarial network (GAN) was employed to learn patient population characteristics necessary for DRL training from a relatively limited sample size. Second, a radiotherapy artificial environment (RAE) was reconstructed by a deep neural network (DNN) utilizing both original and synthetic data (by GAN) to estimate the transition probabilities for adaptation of personalized radiotherapy patients' treatment courses. Third, a deep Q-network (DQN) was applied to the RAE for choosing the optimal dose in a response-adapted treatment setting. This multicomponent reinforcement learning approach was benchmarked against real clinical decisions that were applied in an adaptive dose escalation clinical protocol. In which, 34 patients were treated based on avid PET signal in the tumor and constrained by a 17.2% normal tissue complication probability (NTCP) limit for RP2. The uncomplicated cure probability (P+) was used as a baseline reward function in the DRL. Taking our adaptive dose escalation protocol as a blueprint for the proposed DRL (GAN + RAE + DQN) architecture, we obtained an automated dose adaptation estimate for use at ∼2/3 of the way into the radiotherapy treatment course. By letting the DQN component freely control the estimated adaptive dose per fraction (ranging from 1-5 Gy), the DRL automatically favored dose escalation/de-escalation between 1.5 and 3.8 Gy, a range similar to that used in the clinical protocol. The same DQN yielded two patterns of dose escalation for the 34 test patients, but with different reward variants. First, using the baseline P+ reward function, individual adaptive fraction doses of the DQN had similar tendencies to the clinical data with an RMSE = 0.76 Gy; but adaptations suggested by the DQN were generally lower in magnitude (less aggressive). Second, by adjusting the P+ reward function with higher emphasis on mitigating local failure, better matching of doses between the DQN and the clinical protocol was achieved with an RMSE = 0.5 Gy. Moreover, the decisions selected by the DQN seemed to have better concordance with patients eventual outcomes. In comparison, the traditional temporal difference (TD) algorithm for reinforcement learning yielded an RMSE = 3.3 Gy due to numerical instabilities and lack of sufficient learning. We demonstrated that automated dose adaptation by DRL is a feasible and a promising approach for achieving similar results to those chosen by clinicians. The process may require customization of the reward function if individual cases were to be considered. However, development of this framework into a fully credible autonomous system for clinical decision support would require further validation on larger multi-institutional datasets. © 2017 American Association of Physicists in Medicine.

  8. Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning

    PubMed Central

    Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

    2015-01-01

    Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction. PMID:26065018

  9. Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning.

    PubMed

    Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

    2015-01-01

    Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction.

  10. Improving Robot Motor Learning with Negatively Valenced Reinforcement Signals

    PubMed Central

    Navarro-Guerrero, Nicolás; Lowe, Robert J.; Wermter, Stefan

    2017-01-01

    Both nociception and punishment signals have been used in robotics. However, the potential for using these negatively valenced types of reinforcement learning signals for robot learning has not been exploited in detail yet. Nociceptive signals are primarily used as triggers of preprogrammed action sequences. Punishment signals are typically disembodied, i.e., with no or little relation to the agent-intrinsic limitations, and they are often used to impose behavioral constraints. Here, we provide an alternative approach for nociceptive signals as drivers of learning rather than simple triggers of preprogrammed behavior. Explicitly, we use nociception to expand the state space while we use punishment as a negative reinforcement learning signal. We compare the performance—in terms of task error, the amount of perceived nociception, and length of learned action sequences—of different neural networks imbued with punishment-based reinforcement signals for inverse kinematic learning. We contrast the performance of a version of the neural network that receives nociceptive inputs to that without such a process. Furthermore, we provide evidence that nociception can improve learning—making the algorithm more robust against network initializations—as well as behavioral performance by reducing the task error, perceived nociception, and length of learned action sequences. Moreover, we provide evidence that punishment, at least as typically used within reinforcement learning applications, may be detrimental in all relevant metrics. PMID:28420976

  11. Learning the specific quality of taste reinforcement in larval Drosophila

    PubMed Central

    Schleyer, Michael; Miura, Daisuke; Tanimura, Teiichi; Gerber, Bertram

    2015-01-01

    The only property of reinforcement insects are commonly thought to learn about is its value. We show that larval Drosophila not only remember the value of reinforcement (How much?), but also its quality (What?). This is demonstrated both within the appetitive domain by using sugar vs amino acid as different reward qualities, and within the aversive domain by using bitter vs high-concentration salt as different qualities of punishment. From the available literature, such nuanced memories for the quality of reinforcement are unexpected and pose a challenge to present models of how insect memory is organized. Given that animals as simple as larval Drosophila, endowed with but 10,000 neurons, operate with both reinforcement value and quality, we suggest that both are fundamental aspects of mnemonic processing—in any brain. DOI: http://dx.doi.org/10.7554/eLife.04711.001 PMID:25622533

  12. Humanoids Learning to Walk: A Natural CPG-Actor-Critic Architecture.

    PubMed

    Li, Cai; Lowe, Robert; Ziemke, Tom

    2013-01-01

    The identification of learning mechanisms for locomotion has been the subject of much research for some time but many challenges remain. Dynamic systems theory (DST) offers a novel approach to humanoid learning through environmental interaction. Reinforcement learning (RL) has offered a promising method to adaptively link the dynamic system to the environment it interacts with via a reward-based value system. In this paper, we propose a model that integrates the above perspectives and applies it to the case of a humanoid (NAO) robot learning to walk the ability of which emerges from its value-based interaction with the environment. In the model, a simplified central pattern generator (CPG) architecture inspired by neuroscientific research and DST is integrated with an actor-critic approach to RL (cpg-actor-critic). In the cpg-actor-critic architecture, least-square-temporal-difference based learning converges to the optimal solution quickly by using natural gradient learning and balancing exploration and exploitation. Futhermore, rather than using a traditional (designer-specified) reward it uses a dynamic value function as a stability indicator that adapts to the environment. The results obtained are analyzed using a novel DST-based embodied cognition approach. Learning to walk, from this perspective, is a process of integrating levels of sensorimotor activity and value.

  13. Humanoids Learning to Walk: A Natural CPG-Actor-Critic Architecture

    PubMed Central

    Li, Cai; Lowe, Robert; Ziemke, Tom

    2013-01-01

    The identification of learning mechanisms for locomotion has been the subject of much research for some time but many challenges remain. Dynamic systems theory (DST) offers a novel approach to humanoid learning through environmental interaction. Reinforcement learning (RL) has offered a promising method to adaptively link the dynamic system to the environment it interacts with via a reward-based value system. In this paper, we propose a model that integrates the above perspectives and applies it to the case of a humanoid (NAO) robot learning to walk the ability of which emerges from its value-based interaction with the environment. In the model, a simplified central pattern generator (CPG) architecture inspired by neuroscientific research and DST is integrated with an actor-critic approach to RL (cpg-actor-critic). In the cpg-actor-critic architecture, least-square-temporal-difference based learning converges to the optimal solution quickly by using natural gradient learning and balancing exploration and exploitation. Futhermore, rather than using a traditional (designer-specified) reward it uses a dynamic value function as a stability indicator that adapts to the environment. The results obtained are analyzed using a novel DST-based embodied cognition approach. Learning to walk, from this perspective, is a process of integrating levels of sensorimotor activity and value. PMID:23675345

  14. Deficits in Positive Reinforcement Learning and Uncertainty-Driven Exploration are Associated with Distinct Aspects of Negative Symptoms in Schizophrenia

    PubMed Central

    Strauss, Gregory P.; Frank, Michael J.; Waltz, James A.; Kasanova, Zuzana; Herbener, Ellen S.; Gold, James M.

    2011-01-01

    Background Negative symptoms are core features of schizophrenia; however, the cognitive and neural basis for individual negative symptom domains remains unclear. Converging evidence suggests a role for striatal and prefrontal dopamine in reward learning and the exploration of actions that might produce outcomes that are better than the status quo. The current study examines whether deficits in reinforcement learning and uncertainty-driven exploration predict specific negative symptoms domains. Methods We administered a temporal decision making task, which required trial-by-trial adjustment of reaction time (RT) to maximize reward receipt, to 51 patients with schizophrenia and 39 age-matched healthy controls. Task conditions were designed such that expected value (probability * magnitude) increased (IEV), decreased (DEV), or remained constant (CEV) with increasing response times. Computational analyses were applied to estimate the degree to which trial-by-trial responses are influenced by reinforcement history. Results Individuals with schizophrenia showed impaired Go learning, but intact NoGo learning relative to controls. These effects were pronounced as a function of global measures of negative symptom. Uncertainty-based exploration was substantially reduced in individuals with schizophrenia, and selectively correlated with clinical ratings of anhedonia. Conclusions Schizophrenia patients, particularly those with high negative symptoms, failed to speed RT's to increase positive outcomes and showed reduced tendency to explore when alternative actions could lead to better outcomes than the status quo. Results are interpreted in the context of current computational, genetic, and pharmacological data supporting the roles of striatal and prefrontal dopamine in these processes. PMID:21168124

  15. Operant conditioning of enhanced pain sensitivity by heat-pain titration.

    PubMed

    Becker, Susanne; Kleinböhl, Dieter; Klossika, Iris; Hölzl, Rupert

    2008-11-15

    Operant conditioning mechanisms have been demonstrated to be important in the development of chronic pain. Most experimental studies have investigated the operant modulation of verbal pain reports with extrinsic reinforcement, such as verbal reinforcement. Whether this reflects actual changes in the subjective experience of the nociceptive stimulus remained unclear. This study replicates and extends our previous demonstration that enhanced pain sensitivity to prolonged heat-pain stimulation could be learned in healthy participants through intrinsic reinforcement (contingent changes in nociceptive input) independent of verbal pain reports. In addition, we examine whether different magnitudes of reinforcement differentially enhance pain sensitivity using an operant heat-pain titration paradigm. It is based on the previously developed non-verbal behavioral discrimination task for the assessment of sensitization, which uses discriminative down- or up-regulation of stimulus temperatures in response to changes in subjective intensity. In operant heat-pain titration, this discriminative behavior and not verbal pain report was contingently reinforced or punished by acute decreases or increases in heat-pain intensity. The magnitude of reinforcement was varied between three groups: low (N1=13), medium (N2=11) and high reinforcement (N3=12). Continuous reinforcement was applied to acquire and train the operant behavior, followed by partial reinforcement to analyze the underlying learning mechanisms. Results demonstrated that sensitization to prolonged heat-pain stimulation was enhanced by operant learning within 1h. The extent of sensitization was directly dependent on the received magnitude of reinforcement. Thus, operant learning mechanisms based on intrinsic reinforcement may provide an explanation for the gradual development of sustained hypersensitivity during pain that is becoming chronic.

  16. Operant licking for intragastric sugar infusions: differential reinforcing actions of glucose, sucrose and fructose in mice

    PubMed Central

    Sclafani, Anthony; Ackroff, Karen

    2015-01-01

    Intragastric (IG) flavor conditioning studies in rodents indicate that isocaloric sugar infusions differ in their reinforcing actions, with glucose and sucrose more potent than fructose. Here we determined if the sugars also differ in their ability to maintain operant self-administration by licking an empty spout for IG infusions. Food-restricted C57BL/6J mice were trained 1 h/day to lick a food-baited spout, which triggered IG infusions of 16% sucrose. In testing, the mice licked an empty spout, which triggered IG infusions of different sugars. Mice shifted from sucrose to 16% glucose increased dry licking, whereas mice shifted to 16% fructose rapidly reduced licking to low levels. Other mice shifted from sucrose to IG water reduced licking more slowly but reached the same low levels. Thus IG fructose, like water, is not reinforcing to hungry mice. The more rapid decline in licking induced by fructose may be due to the sugar's satiating effects. Further tests revealed that the Glucose mice increased their dry licking when shifted from 16% to 8% glucose, and reduced their dry licking when shifted to 32% glucose. This may reflect caloric regulation and/or differences in satiation. The Glucose mice did not maintain caloric intake when tested with different sugars. They self-infused less sugar when shifted from 16% glucose to 16% sucrose, and even more so when shifted to 16% fructose. Reduced sucrose self-administration may occur because the fructose component of the disaccharide reduces its reinforcing potency. FVB mice also reduced operant licking when tested with 16% fructose, yet learned to prefer a flavor paired with IG fructose. These data indicate that sugars differ substantially in their ability to support IG self-administration and flavor preference learning. The same post-oral reinforcement process appears to mediate operant licking and flavor learning, although flavor learning provides a more sensitive measure of sugar reinforcement. PMID:26485294

  17. Reinforcement learning techniques for controlling resources in power networks

    NASA Astrophysics Data System (ADS)

    Kowli, Anupama Sunil

    As power grids transition towards increased reliance on renewable generation, energy storage and demand response resources, an effective control architecture is required to harness the full functionalities of these resources. There is a critical need for control techniques that recognize the unique characteristics of the different resources and exploit the flexibility afforded by them to provide ancillary services to the grid. The work presented in this dissertation addresses these needs. Specifically, new algorithms are proposed, which allow control synthesis in settings wherein the precise distribution of the uncertainty and its temporal statistics are not known. These algorithms are based on recent developments in Markov decision theory, approximate dynamic programming and reinforcement learning. They impose minimal assumptions on the system model and allow the control to be "learned" based on the actual dynamics of the system. Furthermore, they can accommodate complex constraints such as capacity and ramping limits on generation resources, state-of-charge constraints on storage resources, comfort-related limitations on demand response resources and power flow limits on transmission lines. Numerical studies demonstrating applications of these algorithms to practical control problems in power systems are discussed. Results demonstrate how the proposed control algorithms can be used to improve the performance and reduce the computational complexity of the economic dispatch mechanism in a power network. We argue that the proposed algorithms are eminently suitable to develop operational decision-making tools for large power grids with many resources and many sources of uncertainty.

  18. Unaltered radial maze performance and brain acetylcholine of the endothelial nitric oxide synthase knockout mouse.

    PubMed

    Dere, E; Frisch, C; De Souza Silva, M A; Gödecke, A; Schrader, J; Huston, J P

    2001-01-01

    Proceeding from previous findings of a beneficial effect of endothelial nitric oxide synthase (eNOS) gene inactivation on negatively reinforced water maze performance, we asked whether this improvement in place learning capacities also holds for a positively reinforced radial maze task. Unlike its beneficial effects on the water maze task, eNOS gene inactivation did not facilitate radial maze performance. The acquisition performance over the days of place learning did not differ between eNOS knockout (eNOS-/-) and wild-type mice (eNOS+/+). eNOS-/- mice displayed a slight and eNOS+/+ mice a more severe working memory deficit in the place learning version of the radial maze compared to the genetic background C57BL/6 strain. Possible differential effects of eNOS inactivation, related to differences in reinforcement contingencies between the Morris water maze and radial maze tasks, behavioral strategy requirements, or to different emotional and physiological concomitants inherent in the two tasks are discussed. These task-unique characteristics might be differentially affected by the reported anxiogenic and hypertensional effects of eNOS gene inactivation. Post-mortem determination of acetylcholine concentrations in diverse brain structures revealed that acetylcholine and choline contents were not different between eNOS-/- and eNOS+/+ mice, but were increased in eNOS+/+ mice compared to C57BL/6 mice in the frontal cortex. Our findings demonstrate that phenotyping of learning and memory capacities should not rely on one learning task only, but should include tasks employing both negative and positive reinforcement contingencies in order to allow valid statements regarding differences in learning capacities between rodent strains.

  19. Framework for robot skill learning using reinforcement learning

    NASA Astrophysics Data System (ADS)

    Wei, Yingzi; Zhao, Mingyang

    2003-09-01

    Robot acquiring skill is a process similar to human skill learning. Reinforcement learning (RL) is an on-line actor critic method for a robot to develop its skill. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. We present an augmented reward function that provides a new way for RL controller to incorporate prior knowledge and experience into the RL controller. Also, the difference form of augmented reward function is considered carefully. The additional reward beyond conventional reward will provide more heuristic information for RL. In this paper, we present a strategy for the task of complex skill learning. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. The new form of value function is introduced to attain smooth motion switching swiftly. We present a formal, but practical, framework for robot skill learning and also illustrate with an example the utility of method for learning skilled robot control on line.

  20. Learning Theory and the Typewriter Teacher

    ERIC Educational Resources Information Center

    Wakin, B. Bertha

    1974-01-01

    Eight basic principles of learning are described and discussed in terms of practical learning strategies for typewriting. Described are goal setting, preassessment, active participation, individual differences, reinforcement, practice, transfer of learning, and evaluation. (SC)

  1. A simple computational algorithm of model-based choice preference.

    PubMed

    Toyama, Asako; Katahira, Kentaro; Ohira, Hideki

    2017-08-01

    A broadly used computational framework posits that two learning systems operate in parallel during the learning of choice preferences-namely, the model-free and model-based reinforcement-learning systems. In this study, we examined another possibility, through which model-free learning is the basic system and model-based information is its modulator. Accordingly, we proposed several modified versions of a temporal-difference learning model to explain the choice-learning process. Using the two-stage decision task developed by Daw, Gershman, Seymour, Dayan, and Dolan (2011), we compared their original computational model, which assumes a parallel learning process, and our proposed models, which assume a sequential learning process. Choice data from 23 participants showed a better fit with the proposed models. More specifically, the proposed eligibility adjustment model, which assumes that the environmental model can weight the degree of the eligibility trace, can explain choices better under both model-free and model-based controls and has a simpler computational algorithm than the original model. In addition, the forgetting learning model and its variation, which assume changes in the values of unchosen actions, substantially improved the fits to the data. Overall, we show that a hybrid computational model best fits the data. The parameters used in this model succeed in capturing individual tendencies with respect to both model use in learning and exploration behavior. This computational model provides novel insights into learning with interacting model-free and model-based components.

  2. Sex Differences in Reinforcement and Punishment on Prime-Time Television.

    ERIC Educational Resources Information Center

    Downs, A. Chris; Gowan, Darryl C.

    1980-01-01

    Television programs were analyzed for frequencies of positive reinforcement and punishment exchanged among performers varying in age and sex. Females were found to more often exhibit and receive reinforcement, whereas males more often exhibited and received punishment. These findings have implications for children's learning of positive and…

  3. Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.

    PubMed

    Ezaki, Takahiro; Horita, Yutaka; Takezawa, Masanori; Masuda, Naoki

    2016-07-01

    Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody) conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.

  4. Instructional control of reinforcement learning: A behavioral and neurocomputational investigation

    PubMed Central

    Doll, Bradley B.; Jacobs, W. Jake; Sanfey, Alan G.; Frank, Michael J.

    2011-01-01

    Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is “overridden” at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract “Q-learning” and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a “confirmation bias” in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes. PMID:19595993

  5. How partial reinforcement of food cues affects the extinction and reacquisition of appetitive responses. A new model for dieting success?

    PubMed

    van den Akker, Karolien; Havermans, Remco C; Bouton, Mark E; Jansen, Anita

    2014-10-01

    Animals and humans can easily learn to associate an initially neutral cue with food intake through classical conditioning, but extinction of learned appetitive responses can be more difficult. Intermittent or partial reinforcement of food cues causes especially persistent behaviour in animals: after exposure to such learning schedules, the decline in responding that occurs during extinction is slow. After extinction, increases in responding with renewed reinforcement of food cues (reacquisition) might be less rapid after acquisition with partial reinforcement. In humans, it may be that the eating behaviour of some individuals resembles partial reinforcement schedules to a greater extent, possibly affecting dieting success by interacting with extinction and reacquisition. Furthermore, impulsivity has been associated with less successful dieting, and this association might be explained by impulsivity affecting the learning and extinction of appetitive responses. In the present two studies, the effects of different reinforcement schedules and impulsivity on the acquisition, extinction, and reacquisition of appetitive responses were investigated in a conditioning paradigm involving food rewards in healthy humans. Overall, the results indicate both partial reinforcement schedules and, possibly, impulsivity to be associated with worse extinction performance. A new model of dieting success is proposed: learning histories and, perhaps, certain personality traits (impulsivity) can interfere with the extinction and reacquisition of appetitive responses to food cues and they may be causally related to unsuccessful dieting. Copyright © 2014 Elsevier Ltd. All rights reserved.

  6. The representation of information about faces in the temporal and frontal lobes.

    PubMed

    Rolls, Edmund T

    2007-01-07

    Neurophysiological evidence is described showing that some neurons in the macaque inferior temporal visual cortex have responses that are invariant with respect to the position, size and view of faces and objects, and that these neurons show rapid processing and rapid learning. Which face or object is present is encoded using a distributed representation in which each neuron conveys independent information in its firing rate, with little information evident in the relative time of firing of different neurons. This ensemble encoding has the advantages of maximising the information in the representation useful for discrimination between stimuli using a simple weighted sum of the neuronal firing by the receiving neurons, generalisation and graceful degradation. These invariant representations are ideally suited to provide the inputs to brain regions such as the orbitofrontal cortex and amygdala that learn the reinforcement associations of an individual's face, for then the learning, and the appropriate social and emotional responses, generalise to other views of the same face. A theory is described of how such invariant representations may be produced in a hierarchically organised set of visual cortical areas with convergent connectivity. The theory proposes that neurons in these visual areas use a modified Hebb synaptic modification rule with a short-term memory trace to capture whatever can be captured at each stage that is invariant about objects as the objects change in retinal view, position, size and rotation. Another population of neurons in the cortex in the superior temporal sulcus encodes other aspects of faces such as face expression, eye gaze, face view and whether the head is moving. These neurons thus provide important additional inputs to parts of the brain such as the orbitofrontal cortex and amygdala that are involved in social communication and emotional behaviour. Outputs of these systems reach the amygdala, in which face-selective neurons are found, and also the orbitofrontal cortex, in which some neurons are tuned to face identity and others to face expression. In humans, activation of the orbitofrontal cortex is found when a change of face expression acts as a social signal that behaviour should change; and damage to the orbitofrontal cortex can impair face and voice expression identification, and also the reversal of emotional behaviour that normally occurs when reinforcers are reversed.

  7. Antipsychotic dose modulates behavioral and neural responses to feedback during reinforcement learning in schizophrenia.

    PubMed

    Insel, Catherine; Reinen, Jenna; Weber, Jochen; Wager, Tor D; Jarskog, L Fredrik; Shohamy, Daphna; Smith, Edward E

    2014-03-01

    Schizophrenia is characterized by an abnormal dopamine system, and dopamine blockade is the primary mechanism of antipsychotic treatment. Consistent with the known role of dopamine in reward processing, prior research has demonstrated that patients with schizophrenia exhibit impairments in reward-based learning. However, it remains unknown how treatment with antipsychotic medication impacts the behavioral and neural signatures of reinforcement learning in schizophrenia. The goal of this study was to examine whether antipsychotic medication modulates behavioral and neural responses to prediction error coding during reinforcement learning. Patients with schizophrenia completed a reinforcement learning task while undergoing functional magnetic resonance imaging. The task consisted of two separate conditions in which participants accumulated monetary gain or avoided monetary loss. Behavioral results indicated that antipsychotic medication dose was associated with altered behavioral approaches to learning, such that patients taking higher doses of medication showed increased sensitivity to negative reinforcement. Higher doses of antipsychotic medication were also associated with higher learning rates (LRs), suggesting that medication enhanced sensitivity to trial-by-trial feedback. Neuroimaging data demonstrated that antipsychotic dose was related to differences in neural signatures of feedback prediction error during the loss condition. Specifically, patients taking higher doses of medication showed attenuated prediction error responses in the striatum and the medial prefrontal cortex. These findings indicate that antipsychotic medication treatment may influence motivational processes in patients with schizophrenia.

  8. The Computational Development of Reinforcement Learning during Adolescence

    PubMed Central

    Palminteri, Stefano; Coricelli, Giorgio; Blakemore, Sarah-Jayne

    2016-01-01

    Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents’ behaviour was better explained by a basic reinforcement learning algorithm, adults’ behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence. PMID:27322574

  9. Does overall reinforcer rate affect discrimination of time-based contingencies?

    PubMed

    Cowie, Sarah; Davison, Michael; Blumhardt, Luca; Elliffe, Douglas

    2016-05-01

    Overall reinforcer rate appears to affect choice. The mechanism for such an effect is uncertain, but may relate to reinforcer rate changing the discrimination of the relation between stimuli and reinforcers. We assessed whether a quantitative model based on a stimulus-control approach could be used to account for the effects of overall reinforcer rate on choice under changing time-based contingencies. On a two-key concurrent schedule, the likely availability of a reinforcer reversed when a fixed time had elapsed since the last reinforcer, and the overall reinforcer rate was varied across conditions. Changes in the overall reinforcer rate produced a change in response bias, and some indication of a change in discrimination. These changes in bias and discrimination always occurred quickly, usually within the first session of a condition. The stimulus-control approach provided an excellent account of the data, suggesting that changes in overall reinforcer rate affect choice because they alter the frequency of reinforcers obtained at different times, or in different stimulus contexts, and thus change the discriminated relation between stimuli and reinforcers. These findings support the notion that temporal and spatial discriminations can be understood in terms of discrimination of reinforcers across time and space. © 2016 Society for the Experimental Analysis of Behavior.

  10. Prenatal choline supplementation increases sensitivity to time by reducing non-scalar sources of variance in adult temporal processing

    PubMed Central

    Cheng, Ruey-Kuang; Meck, Warren H.

    2009-01-01

    Choline supplementation of the maternal diet has a long-term facilitative effect on timing and temporal memory of the offspring. To further delineate the impact of early nutritional status on interval timing, we examined effects of prenatal-choline supplementation on the temporal sensitivity of adult (6 mo) male rats. Rats that were given sufficient choline in their chow (CON: 1.1 g/kg) or supplemental choline added to their drinking water (SUP: 3.5 g/kg) during embryonic days (ED) 12–17 were trained with a peak-interval procedure that was shifted among 75%, 50%, and 25% probabilities of reinforcement with transitions from 18s –> 36s –>72s temporal criteria. Prenatal-choline supplementation systematically sharpened interval-timing functions by reducing the associative/non-temporal response enhancing effects of reinforcement probability on the Start response threshold, thereby reducing non-scalar sources of variance in the left-hand portion of the Gaussian-shaped response functions. No effect was observed for the Stop response threshold as a function of any of these manipulations. In addition, independence of peak time and peak rate was demonstrated as a function of reinforcement probability for both prenatal-choline supplemented and control rats. Overall, these results suggest that prenatal-choline supplementation facilitates timing by reducing impulsive responding early in the interval, thereby improving the superimposition of peak functions for different temporal criteria. PMID:17996223

  11. Spatial and temporal relations in conditioned reinforcement and observing behavior

    PubMed Central

    Bowe, Craig A.; Dinsmoor, James A.

    1983-01-01

    In Experiment 1, depressing one perch produced stimuli indicating which of two keys, if pecked, could produce food (spatial information) and depressing the other perch produced stimuli indicating whether a variable-interval or an extinction schedule was operating (temporal information). The pigeons increased the time they spent depressing the perch that produced the temporal information but did not increase the time they spent depressing the perch that produced the spatial information. In Experiment 2, pigeons that were allowed to produce combined spatial and temporal information did not acquire the perch pressing any faster or maintain it at a higher level than pigeons allowed to produce only temporal information. Later, when perching produced only spatial information, the time spent depressing the perch eventually declined. The results are not those implied by the statement that information concerning biologically important events is reinforcing but are consistent with an interpretation in terms of the acquisition of reinforcing properties by a stimulus associated with a higher density of primary reinforcement. PMID:16812316

  12. Universal effect of dynamical reinforcement learning mechanism in spatial evolutionary games

    NASA Astrophysics Data System (ADS)

    Zhang, Hai-Feng; Wu, Zhi-Xi; Wang, Bing-Hong

    2012-06-01

    One of the prototypical mechanisms in understanding the ubiquitous cooperation in social dilemma situations is the win-stay, lose-shift rule. In this work, a generalized win-stay, lose-shift learning model—a reinforcement learning model with dynamic aspiration level—is proposed to describe how humans adapt their social behaviors based on their social experiences. In the model, the players incorporate the information of the outcomes in previous rounds with time-dependent aspiration payoffs to regulate the probability of choosing cooperation. By investigating such a reinforcement learning rule in the spatial prisoner's dilemma game and public goods game, a most noteworthy viewpoint is that moderate greediness (i.e. moderate aspiration level) favors best the development and organization of collective cooperation. The generality of this observation is tested against different regulation strengths and different types of network of interaction as well. We also make comparisons with two recently proposed models to highlight the importance of the mechanism of adaptive aspiration level in supporting cooperation in structured populations.

  13. Spatial-Temporal Reasoning Applications of Computational Intelligence in the Game of Go and Computer Networks

    DTIC Science & Technology

    2012-01-01

    dimensionality, Tesauro used a backpropagation- based , three-layer neural network and implemented the outcome from a self-play game as the reinforcement signal...a school of fish, flock of birds, and colony of ants. Our literature review reveals that no one has used PSO to train the neural network ...trained with a variant of PSO called cellular PSO (CPSO). CSRN is a supervised learning neural network (SLNN). The proposed algorithm for the

  14. Reinforcement Learning in Information Searching

    ERIC Educational Resources Information Center

    Cen, Yonghua; Gan, Liren; Bai, Chen

    2013-01-01

    Introduction: The study seeks to answer two questions: How do university students learn to use correct strategies to conduct scholarly information searches without instructions? and, What are the differences in learning mechanisms between users at different cognitive levels? Method: Two groups of users, thirteen first year undergraduate students…

  15. Social stress reactivity alters reward and punishment learning

    PubMed Central

    Frank, Michael J.; Allen, John J. B.

    2011-01-01

    To examine how stress affects cognitive functioning, individual differences in trait vulnerability (punishment sensitivity) and state reactivity (negative affect) to social evaluative threat were examined during concurrent reinforcement learning. Lower trait-level punishment sensitivity predicted better reward learning and poorer punishment learning; the opposite pattern was found in more punishment sensitive individuals. Increasing state-level negative affect was directly related to punishment learning accuracy in highly punishment sensitive individuals, but these measures were inversely related in less sensitive individuals. Combined electrophysiological measurement, performance accuracy and computational estimations of learning parameters suggest that trait and state vulnerability to stress alter cortico-striatal functioning during reinforcement learning, possibly mediated via medio-frontal cortical systems. PMID:20453038

  16. Social stress reactivity alters reward and punishment learning.

    PubMed

    Cavanagh, James F; Frank, Michael J; Allen, John J B

    2011-06-01

    To examine how stress affects cognitive functioning, individual differences in trait vulnerability (punishment sensitivity) and state reactivity (negative affect) to social evaluative threat were examined during concurrent reinforcement learning. Lower trait-level punishment sensitivity predicted better reward learning and poorer punishment learning; the opposite pattern was found in more punishment sensitive individuals. Increasing state-level negative affect was directly related to punishment learning accuracy in highly punishment sensitive individuals, but these measures were inversely related in less sensitive individuals. Combined electrophysiological measurement, performance accuracy and computational estimations of learning parameters suggest that trait and state vulnerability to stress alter cortico-striatal functioning during reinforcement learning, possibly mediated via medio-frontal cortical systems.

  17. Discounting of reward sequences: a test of competing formal models of hyperbolic discounting

    PubMed Central

    Zarr, Noah; Alexander, William H.; Brown, Joshua W.

    2014-01-01

    Humans are known to discount future rewards hyperbolically in time. Nevertheless, a formal recursive model of hyperbolic discounting has been elusive until recently, with the introduction of the hyperbolically discounted temporal difference (HDTD) model. Prior to that, models of learning (especially reinforcement learning) have relied on exponential discounting, which generally provides poorer fits to behavioral data. Recently, it has been shown that hyperbolic discounting can also be approximated by a summed distribution of exponentially discounted values, instantiated in the μAgents model. The HDTD model and the μAgents model differ in one key respect, namely how they treat sequences of rewards. The μAgents model is a particular implementation of a Parallel discounting model, which values sequences based on the summed value of the individual rewards whereas the HDTD model contains a non-linear interaction. To discriminate among these models, we observed how subjects discounted a sequence of three rewards, and then we tested how well each candidate model fit the subject data. The results show that the Parallel model generally provides a better fit to the human data. PMID:24639662

  18. Team Learning: New Insights Through a Temporal Lens.

    PubMed

    Lehmann-Willenbrock, Nale

    2017-04-01

    Team learning is a complex social phenomenon that develops and changes over time. Hence, to promote understanding of the fine-grained dynamics of team learning, research should account for the temporal patterns of team learning behavior. Taking important steps in this direction, this special issue offers novel insights into the dynamics of team learning by advocating a temporal perspective. Based on a symposium presented at the 2016 Interdisciplinary Network for Group Research (INGRoup) Conference in Helsinki, the four empirical articles in this special issue showcase four different and innovative approaches to implementing a temporal perspective in team learning research. Specifically, the contributions highlight team learning dynamics in student teams, self-managing teams, teacher teams, and command and control teams. The articles cover a broad range of methods and designs, including both qualitative and quantitative methodologies, and longitudinal as well as micro-temporal approaches. The contributors represent four countries and five different disciplines in group research.

  19. Kernel-based least squares policy iteration for reinforcement learning.

    PubMed

    Xu, Xin; Hu, Dewen; Lu, Xicheng

    2007-07-01

    In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of uncertain dynamic systems. By using KLSPI, near-optimal control policies can be obtained without much a priori knowledge on dynamic models of control plants. In KLSPI, Mercer kernels are used in the policy evaluation of a policy iteration process, where a new kernel-based least squares temporal-difference algorithm called KLSTD-Q is proposed for efficient policy evaluation. To keep the sparsity and improve the generalization ability of KLSTD-Q solutions, a kernel sparsification procedure based on approximate linear dependency (ALD) is performed. Compared to the previous works on approximate RL methods, KLSPI makes two progresses to eliminate the main difficulties of existing results. One is the better convergence and (near) optimality guarantee by using the KLSTD-Q algorithm for policy evaluation with high precision. The other is the automatic feature selection using the ALD-based kernel sparsification. Therefore, the KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for large-scale Markov decision problems (MDPs). Experimental results on a typical RL task for a stochastic chain problem demonstrate that KLSPI can consistently achieve better learning efficiency and policy quality than the previous least squares policy iteration (LSPI) algorithm. Furthermore, the KLSPI method was also evaluated on two nonlinear feedback control problems, including a ship heading control problem and the swing up control of a double-link underactuated pendulum called acrobot. Simulation results illustrate that the proposed method can optimize controller performance using little a priori information of uncertain dynamic systems. It is also demonstrated that KLSPI can be applied to online learning control by incorporating an initial controller to ensure online performance.

  20. Effect of quinolinic acid-induced lesions of the nucleus accumbens core on performance on a progressive ratio schedule of reinforcement: implications for inter-temporal choice.

    PubMed

    Bezzina, G; Body, S; Cheung, T H C; Hampson, C L; Deakin, J F W; Anderson, I M; Szabadi, E; Bradshaw, C M

    2008-04-01

    The nucleus accumbens core (AcbC) is believed to contribute to the control of operant behaviour by reinforcers. Recent evidence suggests that it is not crucial for determining the incentive value of immediately available reinforcers, but is important for maintaining the values of delayed reinforcers. This study aims to examine the effect of AcbC lesions on performance on a progressive-ratio schedule using a quantitative model that dissociates effects of interventions on motor and motivational processes (Killeen 1994 Mathematical principles of reinforcement. Behav Brain Sci 17:105-172). Rats with bilateral quinolinic acid-induced lesions of the AcbC (n = 15) or sham lesions (n = 14) were trained to lever-press for food-pellet reinforcers under a progressive-ratio schedule. In Phase 1 (90 sessions) the reinforcer was one pellet; in Phase 2 (30 sessions), it was two pellets; in Phase 3, (30 sessions) it was one pellet. The performance of both groups conformed to the model of progressive-ratio performance (group mean data: r2 > 0.92). The motor parameter, delta, was significantly higher in the AcbC-lesioned than the sham-lesioned group, reflecting lower overall response rates in the lesioned group. The motivational parameter, a, was sensitive to changes in reinforcer size, but did not differ significantly between the two groups. The AcbC-lesioned group showed longer post-reinforcement pauses and lower running response rates than the sham-lesioned group. The results suggest that destruction of the AcbC impairs response capacity but does not alter the efficacy of food reinforcers. The results are consistent with recent findings that AcbC lesions do not alter sensitivity to reinforcer size in inter-temporal choice schedules.

  1. Stress enhances model-free reinforcement learning only after negative outcome

    PubMed Central

    Lee, Daeyeol

    2017-01-01

    Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i.e., how quickly information from the new environment is incorporated into choices. Participants were randomly assigned to either a stress or a control condition, and performed a two-stage Markov decision-making task in which the reward probabilities underwent periodic reversals without notice. We found that stress increased the contribution of model-free reinforcement learning only after negative outcome. Furthermore, stress decreased the learning rate. The results suggest that stress diminishes one’s ability to make adaptive choices in multiple aspects of reinforcement learning. This finding has implications for understanding how stress facilitates maladaptive habits, such as addictive behavior, and other dysfunctional behaviors associated with stress in clinical and educational contexts. PMID:28723943

  2. Stress enhances model-free reinforcement learning only after negative outcome.

    PubMed

    Park, Heyeon; Lee, Daeyeol; Chey, Jeanyung

    2017-01-01

    Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i.e., how quickly information from the new environment is incorporated into choices. Participants were randomly assigned to either a stress or a control condition, and performed a two-stage Markov decision-making task in which the reward probabilities underwent periodic reversals without notice. We found that stress increased the contribution of model-free reinforcement learning only after negative outcome. Furthermore, stress decreased the learning rate. The results suggest that stress diminishes one's ability to make adaptive choices in multiple aspects of reinforcement learning. This finding has implications for understanding how stress facilitates maladaptive habits, such as addictive behavior, and other dysfunctional behaviors associated with stress in clinical and educational contexts.

  3. Neural signals of vicarious extinction learning

    PubMed Central

    Haaker, Jan; Selbing, Ida; Olsson, Andreas

    2016-01-01

    Social transmission of both threat and safety is ubiquitous, but little is known about the neural circuitry underlying vicarious safety learning. This is surprising given that these processes are critical to flexibly adapt to a changeable environment. To address how the expression of previously learned fears can be modified by the transmission of social information, two conditioned stimuli (CS + s) were paired with shock and the third was not. During extinction, we held constant the amount of direct, non-reinforced, exposure to the CSs (i.e. direct extinction), and critically varied whether another individual—acting as a demonstrator—experienced safety (CS + vic safety) or aversive reinforcement (CS + vic reinf). During extinction, ventromedial prefrontal cortex (vmPFC) responses to the CS + vic reinf increased but decreased to the CS + vic safety. This pattern of vmPFC activity was reversed during a subsequent fear reinstatement test, suggesting a temporal shift in the involvement of the vmPFC. Moreover, only the CS + vic reinf association recovered. Our data suggest that vicarious extinction prevents the return of conditioned fear responses, and that this efficacy is reflected by diminished vmPFC involvement during extinction learning. The present findings may have important implications for understanding how social information influences the persistence of fear memories in individuals suffering from emotional disorders. PMID:27278792

  4. Genetic Dissociation of Acquisition and Memory Strength in the Heat-Box Spatial Learning Paradigm in "Drosophila"

    ERIC Educational Resources Information Center

    Diegelmann, Soeren; Zars, Melissa; Zars, Troy

    2006-01-01

    Memories can have different strengths, largely dependent on the intensity of reinforcers encountered. The relationship between reinforcement and memory strength is evident in asymptotic memory curves, with the level of the asymptote related to the intensity of the reinforcer. Although this is likely a fundamental property of memory formation,…

  5. Context-Outcome Associations Underlie Context-Switch Effects after Partial Reinforcement in Human Predictive Learning

    ERIC Educational Resources Information Center

    Moreno-Fernandez, Maria M.; Abad, Maria J. F.; Ramos-Alvarez, Manuel M.; Rosas, Juan M.

    2011-01-01

    Predictive value for continuously reinforced cues is affected by context changes when they are trained within a context in which a different cue undergoes partial reinforcement. An experiment was conducted with the goal of exploring the mechanisms underlying this context-switch effect. Human participants were trained in a predictive learning…

  6. Dissociating error-based and reinforcement-based loss functions during sensorimotor learning

    PubMed Central

    McGregor, Heather R.; Mohatarem, Ayman

    2017-01-01

    It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback. PMID:28753634

  7. Dissociating error-based and reinforcement-based loss functions during sensorimotor learning.

    PubMed

    Cashaback, Joshua G A; McGregor, Heather R; Mohatarem, Ayman; Gribble, Paul L

    2017-07-01

    It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback.

  8. Translingual Literacy, Language Difference, and Matters of Agency

    ERIC Educational Resources Information Center

    Lu, Min-Zhan; Horner, Bruce

    2013-01-01

    We argue that composition scholarship's defenses of language differences in student writing reinforce dominant ideology's spatial framework conceiving language difference as deviation from a norm of sameness. We argue instead for adopting a temporal-spatial framework defining difference as the norm of utterances, and defining languages,…

  9. Opposing acute and chronic behavioural effects of a beta-blocker, propranolol, in the rat.

    PubMed

    Salmon, P; Gray, J A

    1985-01-01

    Rats were trained over 40 days to lever-press for food reward under a schedule of differential reinforcement of low rates of response with a 20-s criterion (DRL 20), following seven sessions of continuous reinforcement. The effect of injecting a beta-adrenergic blocker, propranolol (5 mg/kg IP), before and at two different delays after each daily session of DRL were investigated. In Experiment I, rats drugged 5-8 min before every session earned fewer reinforcements compared to controls, and showed impaired temporal discrimination. In Experiment II, this result was not replicated, but similar effects were clear in animals drugged pre-session from the 15th day of acquisition. By contrast, an improved temporal discrimination, and increased number of reinforcements were seen in rats drugged 5-8 min after every session. In Experiment III, the post-session effects were replicated and found also in rats drugged 4-5.5 h after each session. These results suggest that propranolol has an acute effect on DRL responding which resembles that of anxiolytics, and a chronic effect which opposes the acute one.

  10. Fuzzy self-learning control for magnetic servo system

    NASA Technical Reports Server (NTRS)

    Tarn, J. H.; Kuo, L. T.; Juang, K. Y.; Lin, C. E.

    1994-01-01

    It is known that an effective control system is the key condition for successful implementation of high-performance magnetic servo systems. Major issues to design such control systems are nonlinearity; unmodeled dynamics, such as secondary effects for copper resistance, stray fields, and saturation; and that disturbance rejection for the load effect reacts directly on the servo system without transmission elements. One typical approach to design control systems under these conditions is a special type of nonlinear feedback called gain scheduling. It accommodates linear regulators whose parameters are changed as a function of operating conditions in a preprogrammed way. In this paper, an on-line learning fuzzy control strategy is proposed. To inherit the wealth of linear control design, the relations between linear feedback and fuzzy logic controllers have been established. The exercise of engineering axioms of linear control design is thus transformed into tuning of appropriate fuzzy parameters. Furthermore, fuzzy logic control brings the domain of candidate control laws from linear into nonlinear, and brings new prospects into design of the local controllers. On the other hand, a self-learning scheme is utilized to automatically tune the fuzzy rule base. It is based on network learning infrastructure; statistical approximation to assign credit; animal learning method to update the reinforcement map with a fast learning rate; and temporal difference predictive scheme to optimize the control laws. Different from supervised and statistical unsupervised learning schemes, the proposed method learns on-line from past experience and information from the process and forms a rule base of an FLC system from randomly assigned initial control rules.

  11. Novel reinforcement learning paradigm based on response patterning under interval schedules of reinforcement.

    PubMed

    Schifani, Christin; Sukhanov, Ilya; Dorofeikova, Mariia; Bespalov, Anton

    2017-07-28

    There is a need to develop cognitive tasks that address valid neuropsychological constructs implicated in disease mechanisms and can be used in animals and humans to guide novel drug discovery. Present experiments aimed to characterize a novel reinforcement learning task based on a classical operant behavioral phenomenon observed in multiple species - differences in response patterning under variable (VI) vs fixed interval (FI) schedules of reinforcement. Wistar rats were trained to press a lever for food under VI30s and later weekly test sessions were introduced with reinforcement schedule switched to FI30s. During the FI30s test session, post-reinforcement pauses (PRPs) gradually grew towards the end of the session reaching 22-43% of the initial values. Animals could be retrained under VI30s conditions, and FI30s test sessions were repeated over a period of several months without appreciable signs of a practice effect. Administration of the non-competitive N-methyl-d-aspartate (NMDA) receptor antagonist MK-801 ((5S,10R)-(+)-5-Methyl-10,11-dihydro-5H-dibenzo[a,d]cyclohepten-5,10-imine maleate) prior to FI30s sessions prevented adjustment of PRPs associated with the change from VI to FI schedule. This effect was most pronounced at the highest tested dose of MK-801 and appeared to be independent of the effects of this dose on response rates. These results provide initial evidence for the possibility to use different response patterning under VI and FI schedules with equivalent reinforcement density for studying effects of drug treatment on reinforcement learning. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Visual paired-associate learning: in search of material-specific effects in adult patients who have undergone temporal lobectomy.

    PubMed

    Smith, Mary Lou; Bigel, Marla; Miller, Laurie A

    2011-02-01

    The mesial temporal lobes are important for learning arbitrary associations. It has previously been demonstrated that left mesial temporal structures are involved in learning word pairs, but it is not yet known whether comparable lesions in the right temporal lobe impair visually mediated associative learning. Patients who had undergone left (n=16) or right (n=18) temporal lobectomy for relief of intractable epilepsy and healthy controls (n=13) were administered two paired-associate learning tasks assessing their learning and memory of pairs of abstract designs or pairs of symbols in unique locations. Both patient groups had deficits in learning the designs, but only the right temporal group was impaired in recognition. For the symbol location task, differences were not found in learning, but again a recognition deficit was found for the right temporal group. The findings implicate the mesial temporal structures in relational learning. They support a material-specific effect for recognition but not for learning and recall of arbitrary visual and visual-spatial associative information. Copyright © 2010 Elsevier Inc. All rights reserved.

  13. Longitudinal investigation on learned helplessness tested under negative and positive reinforcement involving stimulus control.

    PubMed

    Oliveira, Emileane C; Hunziker, Maria Helena

    2014-07-01

    In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title. Copyright © 2014. Published by Elsevier B.V.

  14. High and low temperatures have unequal reinforcing properties in Drosophila spatial learning.

    PubMed

    Zars, Melissa; Zars, Troy

    2006-07-01

    Small insects regulate their body temperature solely through behavior. Thus, sensing environmental temperature and implementing an appropriate behavioral strategy can be critical for survival. The fly Drosophila melanogaster prefers 24 degrees C, avoiding higher and lower temperatures when tested on a temperature gradient. Furthermore, temperatures above 24 degrees C have negative reinforcing properties. In contrast, we found that flies have a preference in operant learning experiments for a low-temperature-associated position rather than the 24 degrees C alternative in the heat-box. Two additional differences between high- and low-temperature reinforcement, i.e., temperatures above and below 24 degrees C, were found. Temperatures equally above and below 24 degrees C did not reinforce equally and only high temperatures supported increased memory performance with reversal conditioning. Finally, low- and high-temperature reinforced memories are similarly sensitive to two genetic mutations. Together these results indicate the qualitative meaning of temperatures below 24 degrees C depends on the dynamics of the temperatures encountered and that the reinforcing effects of these temperatures depend on at least some common genetic components. Conceptualizing these results using the Wolf-Heisenberg model of operant conditioning, we propose the maximum difference in experienced temperatures determines the magnitude of the reinforcement input to a conditioning circuit.

  15. Alpha7 Nicotinic Acetylcholine Receptors and Temporal Memory: Synergistic Effects of Combining Prenatal Choline and Nicotine on Reinforcement-Induced Resetting of an Interval Clock

    ERIC Educational Resources Information Center

    Cheng, Ruey-Kuang; Meck, Warren H.; Williams, Christina L.

    2006-01-01

    We previously showed that prenatal choline supplementation could increase the precision of timing and temporal memory and facilitate simultaneous temporal processing in mature and aged rats. In the present study, we investigated the ability of adult rats to selectively control the reinforcement-induced resetting of an internal clock as a function…

  16. Attentional Selection Can Be Predicted by Reinforcement Learning of Task-relevant Stimulus Features Weighted by Value-independent Stickiness.

    PubMed

    Balcarras, Matthew; Ardid, Salva; Kaping, Daniel; Everling, Stefan; Womelsdorf, Thilo

    2016-02-01

    Attention includes processes that evaluate stimuli relevance, select the most relevant stimulus against less relevant stimuli, and bias choice behavior toward the selected information. It is not clear how these processes interact. Here, we captured these processes in a reinforcement learning framework applied to a feature-based attention task that required macaques to learn and update the value of stimulus features while ignoring nonrelevant sensory features, locations, and action plans. We found that value-based reinforcement learning mechanisms could account for feature-based attentional selection and choice behavior but required a value-independent stickiness selection process to explain selection errors while at asymptotic behavior. By comparing different reinforcement learning schemes, we found that trial-by-trial selections were best predicted by a model that only represents expected values for the task-relevant feature dimension, with nonrelevant stimulus features and action plans having only a marginal influence on covert selections. These findings show that attentional control subprocesses can be described by (1) the reinforcement learning of feature values within a restricted feature space that excludes irrelevant feature dimensions, (2) a stochastic selection process on feature-specific value representations, and (3) value-independent stickiness toward previous feature selections akin to perseveration in the motor domain. We speculate that these three mechanisms are implemented by distinct but interacting brain circuits and that the proposed formal account of feature-based stimulus selection will be important to understand how attentional subprocesses are implemented in primate brain networks.

  17. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning.

    PubMed

    Pilarski, Patrick M; Dawson, Michael R; Degris, Thomas; Fahimi, Farbod; Carey, Jason P; Sutton, Richard S

    2011-01-01

    As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis. © 2011 IEEE

  18. "Notice of Violation of IEEE Publication Principles" Multiobjective Reinforcement Learning: A Comprehensive Overview.

    PubMed

    Liu, Chunming; Xu, Xin; Hu, Dewen

    2013-04-29

    Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.

  19. Learning of Temporal and Spatial Movement Aspects: A Comparison of Four Types of Haptic Control and Concurrent Visual Feedback.

    PubMed

    Rauter, Georg; Sigrist, Roland; Riener, Robert; Wolf, Peter

    2015-01-01

    In literature, the effectiveness of haptics for motor learning is controversially discussed. Haptics is believed to be effective for motor learning in general; however, different types of haptic control enhance different movement aspects. Thus, in dependence on the movement aspects of interest, one type of haptic control may be effective whereas another one is not. Therefore, in the current work, it was investigated if and how different types of haptic controllers affect learning of spatial and temporal movement aspects. In particular, haptic controllers that enforce active participation of the participants were expected to improve spatial aspects. Only haptic controllers that provide feedback about the task's velocity profile were expected to improve temporal aspects. In a study on learning a complex trunk-arm rowing task, the effect of training with four different types of haptic control was investigated: position control, path control, adaptive path control, and reactive path control. A fifth group (control) trained with visual concurrent augmented feedback. As hypothesized, the position controller was most effective for learning of temporal movement aspects, while the path controller was most effective in teaching spatial movement aspects of the rowing task. Visual feedback was also effective for learning temporal and spatial movement aspects.

  20. Hierarchically organized behavior and its neural foundations: A reinforcement-learning perspective

    PubMed Central

    Botvinick, Matthew M.; Niv, Yael; Barto, Andrew C.

    2009-01-01

    Research on human and animal behavior has long emphasized its hierarchical structure — the divisibility of ongoing behavior into discrete tasks, which are comprised of subtask sequences, which in turn are built of simple actions. The hierarchical structure of behavior has also been of enduring interest within neuroscience, where it has been widely considered to reflect prefrontal cortical functions. In this paper, we reexamine behavioral hierarchy and its neural substrates from the point of view of recent developments in computational reinforcement learning. Specifically, we consider a set of approaches known collectively as hierarchical reinforcement learning, which extend the reinforcement learning paradigm by allowing the learning agent to aggregate actions into reusable subroutines or skills. A close look at the components of hierarchical reinforcement learning suggests how they might map onto neural structures, in particular regions within the dorsolateral and orbital prefrontal cortex. It also suggests specific ways in which hierarchical reinforcement learning might provide a complement to existing psychological models of hierarchically structured behavior. A particularly important question that hierarchical reinforcement learning brings to the fore is that of how learning identifies new action routines that are likely to provide useful building blocks in solving a wide range of future problems. Here and at many other points, hierarchical reinforcement learning offers an appealing framework for investigating the computational and neural underpinnings of hierarchically structured behavior. PMID:18926527

  1. Hierarchical extreme learning machine based reinforcement learning for goal localization

    NASA Astrophysics Data System (ADS)

    AlDahoul, Nouar; Zaw Htike, Zaw; Akmeliawati, Rini

    2017-03-01

    The objective of goal localization is to find the location of goals in noisy environments. Simple actions are performed to move the agent towards the goal. The goal detector should be capable of minimizing the error between the predicted locations and the true ones. Few regions need to be processed by the agent to reduce the computational effort and increase the speed of convergence. In this paper, reinforcement learning (RL) method was utilized to find optimal series of actions to localize the goal region. The visual data, a set of images, is high dimensional unstructured data and needs to be represented efficiently to get a robust detector. Different deep Reinforcement models have already been used to localize a goal but most of them take long time to learn the model. This long learning time results from the weights fine tuning stage that is applied iteratively to find an accurate model. Hierarchical Extreme Learning Machine (H-ELM) was used as a fast deep model that doesn’t fine tune the weights. In other words, hidden weights are generated randomly and output weights are calculated analytically. H-ELM algorithm was used in this work to find good features for effective representation. This paper proposes a combination of Hierarchical Extreme learning machine and Reinforcement learning to find an optimal policy directly from visual input. This combination outperforms other methods in terms of accuracy and learning speed. The simulations and results were analysed by using MATLAB.

  2. Feedback-related negativity is enhanced in adolescence during a gambling task with and without probabilistic reinforcement learning.

    PubMed

    Martínez-Velázquez, Eduardo S; Ramos-Loyo, Julieta; González-Garrido, Andrés A; Sequeira, Henrique

    2015-01-21

    Feedback-related negativity (FRN) is a negative deflection that appears around 250 ms after the gain or loss of feedback to chosen alternatives in a gambling task in frontocentral regions following outcomes. Few studies have reported FRN enhancement in adolescents compared with adults in a gambling task without probabilistic reinforcement learning, despite the fact that learning from positive or negative consequences is crucial for decision-making during adolescence. Therefore, the aim of the present research was to identify differences in FRN amplitude and latency between adolescents and adults on a gambling task with favorable and unfavorable probabilistic reinforcement learning conditions, in addition to a nonlearning condition with monetary gains and losses. Higher rate scores of high-magnitude choices during the final 30 trials compared with the first 30 trials were observed during the favorable condition, whereas lower rates were observed during the unfavorable condition in both groups. Higher FRN amplitude in all conditions and longer latency in the nonlearning condition were observed in adolescents compared with adults and in relation to losses. Results indicate that both the adolescents and the adults improved their performance in relation to positive and negative feedback. However, the FRN findings suggest an increased sensitivity to external feedback to losses in adolescents compared with adults, irrespective of the presence or absence of probabilistic reinforcement learning. These results reflect processing differences on the neural monitoring system and provide new perspectives on the dynamic development of an adolescent's brain.

  3. Identification of animal behavioral strategies by inverse reinforcement learning.

    PubMed

    Yamaguchi, Shoichiro; Naoki, Honda; Ikeda, Muneki; Tsukada, Yuki; Nakano, Shunji; Mori, Ikue; Ishii, Shin

    2018-05-01

    Animals are able to reach a desired state in an environment by controlling various behavioral patterns. Identification of the behavioral strategy used for this control is important for understanding animals' decision-making and is fundamental to dissect information processing done by the nervous system. However, methods for quantifying such behavioral strategies have not been fully established. In this study, we developed an inverse reinforcement-learning (IRL) framework to identify an animal's behavioral strategy from behavioral time-series data. We applied this framework to C. elegans thermotactic behavior; after cultivation at a constant temperature with or without food, fed worms prefer, while starved worms avoid the cultivation temperature on a thermal gradient. Our IRL approach revealed that the fed worms used both the absolute temperature and its temporal derivative and that their behavior involved two strategies: directed migration (DM) and isothermal migration (IM). With DM, worms efficiently reached specific temperatures, which explains their thermotactic behavior when fed. With IM, worms moved along a constant temperature, which reflects isothermal tracking, well-observed in previous studies. In contrast to fed animals, starved worms escaped the cultivation temperature using only the absolute, but not the temporal derivative of temperature. We also investigated the neural basis underlying these strategies, by applying our method to thermosensory neuron-deficient worms. Thus, our IRL-based approach is useful in identifying animal strategies from behavioral time-series data and could be applied to a wide range of behavioral studies, including decision-making, in other organisms.

  4. Associative Processes in Early Olfactory Preference Acquisition

    PubMed Central

    Sullivan, Regina M.; Wilson, Donald A.; Leon, Michael

    2007-01-01

    Acquisition of behavioral conditioned responding and learned odor preferences during olfactory classical conditioning in rat pups requires forward or simultaneous pairings of the conditioned stimulus (CS) and the unconditioned stimulus (US). Other temporal relationships between the CS and US do not usually result in learning. The present study examined the influence of this CS-US relationship upon the neural olfactory bulb modifications that are acquired during early classical conditioning. Wistar rat pups were trained from Postnatal Days (PN) 1-18 with either forward (odor overlapping temporally with reinforcing stroking) or backward (stroking followed by odor) CS-US pairings. On PN 19, pups received either a behavioral odor preference test to the odor CS or an injection of 14C 2-DG and exposure to the odor CS, or olfactory bulb single unit responses were recorded in response to exposure to the odor CS. Only pups that received forward presentations of the CS and US exhibited both a preference for the CS and modified olfactory bulb neural responses to the CS. These results, then, suggest that the modified olfactory bulb neural responses acquired during classical conditioning are guided by the same temporal constraints as those which govern the acquisition of behavioral conditioned responses. PMID:17572798

  5. Model-Based Reinforcement Learning under Concurrent Schedules of Reinforcement in Rodents

    ERIC Educational Resources Information Center

    Huh, Namjung; Jo, Suhyun; Kim, Hoseok; Sul, Jung Hoon; Jung, Min Whan

    2009-01-01

    Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's…

  6. Homeostatic reinforcement learning for integrating reward collection and physiological stability.

    PubMed

    Keramati, Mehdi; Gutkin, Boris

    2014-12-02

    Efficient regulation of internal homeostasis and defending it against perturbations requires adaptive behavioral strategies. However, the computational principles mediating the interaction between homeostatic and associative learning processes remain undefined. Here we use a definition of primary rewards, as outcomes fulfilling physiological needs, to build a normative theory showing how learning motivated behaviors may be modulated by internal states. Within this framework, we mathematically prove that seeking rewards is equivalent to the fundamental objective of physiological stability, defining the notion of physiological rationality of behavior. We further suggest a formal basis for temporal discounting of rewards by showing that discounting motivates animals to follow the shortest path in the space of physiological variables toward the desired setpoint. We also explain how animals learn to act predictively to preclude prospective homeostatic challenges, and several other behavioral patterns. Finally, we suggest a computational role for interaction between hypothalamus and the brain reward system.

  7. Context transfer in reinforcement learning using action-value functions.

    PubMed

    Mousavi, Amin; Nadjar Araabi, Babak; Nili Ahmadabadi, Majid

    2014-01-01

    This paper discusses the notion of context transfer in reinforcement learning tasks. Context transfer, as defined in this paper, implies knowledge transfer between source and target tasks that share the same environment dynamics and reward function but have different states or action spaces. In other words, the agents learn the same task while using different sensors and actuators. This requires the existence of an underlying common Markov decision process (MDP) to which all the agents' MDPs can be mapped. This is formulated in terms of the notion of MDP homomorphism. The learning framework is Q-learning. To transfer the knowledge between these tasks, the feature space is used as a translator and is expressed as a partial mapping between the state-action spaces of different tasks. The Q-values learned during the learning process of the source tasks are mapped to the sets of Q-values for the target task. These transferred Q-values are merged together and used to initialize the learning process of the target task. An interval-based approach is used to represent and merge the knowledge of the source tasks. Empirical results show that the transferred initialization can be beneficial to the learning process of the target task.

  8. Context Transfer in Reinforcement Learning Using Action-Value Functions

    PubMed Central

    Mousavi, Amin; Nadjar Araabi, Babak; Nili Ahmadabadi, Majid

    2014-01-01

    This paper discusses the notion of context transfer in reinforcement learning tasks. Context transfer, as defined in this paper, implies knowledge transfer between source and target tasks that share the same environment dynamics and reward function but have different states or action spaces. In other words, the agents learn the same task while using different sensors and actuators. This requires the existence of an underlying common Markov decision process (MDP) to which all the agents' MDPs can be mapped. This is formulated in terms of the notion of MDP homomorphism. The learning framework is Q-learning. To transfer the knowledge between these tasks, the feature space is used as a translator and is expressed as a partial mapping between the state-action spaces of different tasks. The Q-values learned during the learning process of the source tasks are mapped to the sets of Q-values for the target task. These transferred Q-values are merged together and used to initialize the learning process of the target task. An interval-based approach is used to represent and merge the knowledge of the source tasks. Empirical results show that the transferred initialization can be beneficial to the learning process of the target task. PMID:25610457

  9. Dopaminergic Contributions to Vocal Learning

    PubMed Central

    Hoffmann, Lukas A.; Saravanan, Varun; Wood, Alynda N.; He, Li

    2016-01-01

    Although the brain relies on auditory information to calibrate vocal behavior, the neural substrates of vocal learning remain unclear. Here we demonstrate that lesions of the dopaminergic inputs to a basal ganglia nucleus in a songbird species (Bengalese finches, Lonchura striata var. domestica) greatly reduced the magnitude of vocal learning driven by disruptive auditory feedback in a negative reinforcement task. These lesions produced no measureable effects on the quality of vocal performance or the amount of song produced. Our results suggest that dopaminergic inputs to the basal ganglia selectively mediate reinforcement-driven vocal plasticity. In contrast, dopaminergic lesions produced no measurable effects on the birds' ability to restore song acoustics to baseline following the cessation of reinforcement training, suggesting that different forms of vocal plasticity may use different neural mechanisms. SIGNIFICANCE STATEMENT During skill learning, the brain relies on sensory feedback to improve motor performance. However, the neural basis of sensorimotor learning is poorly understood. Here, we investigate the role of the neurotransmitter dopamine in regulating vocal learning in the Bengalese finch, a songbird with an extremely precise singing behavior that can nevertheless be reshaped dramatically by auditory feedback. Our findings show that reduction of dopamine inputs to a region of the songbird basal ganglia greatly impairs vocal learning but has no detectable effect on vocal performance. These results suggest a specific role for dopamine in regulating vocal plasticity. PMID:26888928

  10. Toward a Science of Learning Games

    ERIC Educational Resources Information Center

    Howard-Jones, Paul; Demetriou, Skevi; Bogacz, Rafal; Yoo, Jee H.; Leonards, Ute

    2011-01-01

    Reinforcement learning involves a tight coupling of reward-associated behavior and a type of learning that is very different from that promoted by education. However, the emerging understanding of its underlying processes may help derive principles for effective learning games that have, until now, been elusive. This article first reviews findings…

  11. Childhood ADHD and Delayed Reinforcement: A Direct Comparison of Performance on Hypothetical and Real-Time Delay Tasks.

    PubMed

    Yu, Xue; Sonuga-Barke, Edmund

    2016-07-28

    Individuals with ADHD have been shown to prefer smaller sooner over larger later rewards. This has been explained in terms of abnormally steeper discounting of the value of delayed reinforcers. Evidence for this comes from different experimental paradigms. In some, participants experience delay in the laboratory (real-time delay tasks; R-TD), in others they imagine the delay to reinforcers (hypothetical delay tasks; HD). We directly contrasted the performance of 7- to 12-year-old children with ADHD (n = 23) and matched controls (n = 23) on R-TD and HD tasks with monetary rewards. Children with ADHD displayed steeper temporal discounting on the R-TD, but not the HD tasks. These findings suggest that the experience of waiting prior to the delivery of rewards is an important determinant of heightened temporal discounting in ADHD-a finding consistent with models that emphasize the aversive nature of delay for children. © The Author(s) 2016.

  12. What is the optimal task difficulty for reinforcement learning of brain self-regulation?

    PubMed

    Bauer, Robert; Vukelić, Mathias; Gharabaghi, Alireza

    2016-09-01

    The balance between action and reward during neurofeedback may influence reinforcement learning of brain self-regulation. Eleven healthy volunteers participated in three runs of motor imagery-based brain-machine interface feedback where a robot passively opened the hand contingent to β-band modulation. For each run, the β-desynchronization threshold to initiate the hand robot movement increased in difficulty (low, moderate, and demanding). In this context, the incentive to learn was estimated by the change of reward per action, operationalized as the change in reward duration per movement onset. Variance analysis revealed a significant interaction between threshold difficulty and the relationship between reward duration and number of movement onsets (p<0.001), indicating a negative learning incentive for low difficulty, but a positive learning incentive for moderate and demanding runs. Exploration of different thresholds in the same data set indicated that the learning incentive peaked at higher thresholds than the threshold which resulted in maximum classification accuracy. Specificity is more important than sensitivity of neurofeedback for reinforcement learning of brain self-regulation. Learning efficiency requires adequate challenge by neurofeedback interventions. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  13. 11.2 YIP Human In the Loop Statistical RelationalLearners

    DTIC Science & Technology

    2017-10-23

    learning formalisms including inverse reinforcement learning [4] and statistical relational learning [7, 5, 8]. We have also applied our algorithms in...one introduced for label preferences. 4 Figure 2: Active Advice Seeking for Inverse Reinforcement Learning. active advice seeking is in selecting the...learning tasks. 1.2.1 Sequential Decision-Making Our previous work on advice for inverse reinforcement learning (IRL) defined advice as action

  14. Different dimensions of the prediction error as a decisive factor for the triggering of the reconsolidation process.

    PubMed

    Agustina López, M; Jimena Santos, M; Cortasa, Santiago; Fernández, Rodrigo S; Carbó Tano, Martin; Pedreira, María E

    2016-12-01

    The reconsolidation process is the mechanism by which strength and/or content of consolidated memories are updated. Prediction error (PE) is the difference between the prediction made and current events. It is proposed as a necessary condition to trigger the reconsolidation process. Here we analyzed deeply the role of the PE in the associative memory reconsolidation in the crab Neohelice granulata. An incongruence between the learned temporal relationship between conditioned and unconditioned stimuli (CS-US) was enough to trigger the reconsolidation process. Moreover, after a partial reinforced training, a PE of 50% opened the possibility to labilize the consolidated memory with a reminder which included or not the US. Further, during an extinction training a small PE in the first interval between CSs was enough to trigger reconsolidation. Overall, we highlighted the relation between training history and different reactivation possibilities to recruit the process responsible of memory updating. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. Learning by subtraction: Hippocampal activity and effects of ethanol during the acquisition and performance of response sequences.

    PubMed

    Ketchum, Myles J; Weyand, Theodore G; Weed, Peter F; Winsauer, Peter J

    2016-05-01

    Learning is believed to be reflected in the activity of the hippocampus. However, neural correlates of learning have been difficult to characterize because hippocampal activity is integrated with ongoing behavior. To address this issue, male rats (n = 5) implanted with electrodes (n = 14) in the CA1 subfield responded during two tasks within a single test session. In one task, subjects acquired a new 3-response sequence (acquisition), whereas in the other task, subjects completed a well-rehearsed 3-response sequence (performance). Both tasks though could be completed using an identical response topography and used the same sensory stimuli and schedule of reinforcement. More important, comparing neural patterns during sequence acquisition to those during sequence performance allows for a subtractive approach whereby activity associated with learning could potentially be dissociated from the activity associated with ongoing behavior. At sites where CA1 activity was closely associated with behavior, the patterns of activity were differentially modulated by key position and the serial position of a response within the schedule of reinforcement. Temporal shifts between peak activity and responding on particular keys also occurred during sequence acquisition, but not during sequence performance. Ethanol disrupted CA1 activity while producing rate-decreasing effects in both tasks and error-increasing effects that were more selective for sequence acquisition than sequence performance. Ethanol also produced alterations in the magnitude of modulations and temporal pattern of CA1 activity, although these effects were not selective for sequence acquisition. Similar to ethanol, hippocampal micro-stimulation decreased response rate in both tasks and selectively increased the percentage of errors during sequence acquisition, and provided a more direct demonstration of hippocampal involvement during sequence acquisition. Together, these results strongly support the notion that ethanol disrupts sequence acquisition by disrupting hippocampal activity and that the hippocampus is necessary for the conditioned associations required for sequence acquisition. © 2015 Wiley Periodicals, Inc.

  16. Learning by Subtraction: Hippocampal Activity and Effects of Ethanol during the Acquisition and Performance of Response Sequences

    PubMed Central

    Ketchum, Myles J.; Weyand, Theodore G.; Weed, Peter F.; Winsauer, Peter J.

    2015-01-01

    Learning is believed to be reflected in the activity of the hippocampus. However, neural correlates of learning have been difficult to characterize because hippocampal activity is integrated with ongoing behavior. To address this issue, male rats (n=5) implanted with electrodes (n=14) in the CA1 subfield responded during two tasks within a single test session. In one task, subjects acquired a new 3-response sequence (acquisition), whereas in the other task, subjects completed a well-rehearsed 3-response sequence (performance). Both tasks though could be completed using an identical response topography and used the same sensory stimuli and schedule of reinforcement. More important, comparing neural patterns during sequence acquisition to those during sequence performance allows for a subtractive approach whereby activity associated with learning could potentially be dissociated from the activity associated with ongoing behavior. At sites where CA1 activity was closely associated with behavior, the patterns of activity were differentially modulated by key position and the serial position of a response within the schedule of reinforcement. Temporal shifts between peak activity and responding on particular keys also occurred during sequence acquisition, but not during sequence performance. Ethanol disrupted CA1 activity while producing rate-decreasing effects in both tasks and error-increasing effects that were more selective for sequence acquisition than sequence performance. Ethanol also produced alterations in the magnitude of modulations and temporal pattern of CA1 activity, although these effects were not selective for sequence acquisition. Similar to ethanol, hippocampal micro-stimulation decreased response rate in both tasks and selectively increased the percentage of errors during sequence acquisition, and provided a more direct demonstration of hippocampal involvement during sequence acquisition. Together, these results strongly support the notion that ethanol disrupts sequence acquisition by disrupting hippocampal activity and that the hippocampus is necessary for the conditioned associations required for sequence acquisition. PMID:26482846

  17. Striatum-medial prefrontal cortex connectivity predicts developmental changes in reinforcement learning.

    PubMed

    van den Bos, Wouter; Cohen, Michael X; Kahnt, Thorsten; Crone, Eveline A

    2012-06-01

    During development, children improve in learning from feedback to adapt their behavior. However, it is still unclear which neural mechanisms might underlie these developmental changes. In the current study, we used a reinforcement learning model to investigate neurodevelopmental changes in the representation and processing of learning signals. Sixty-seven healthy volunteers between ages 8 and 22 (children: 8-11 years, adolescents: 13-16 years, and adults: 18-22 years) performed a probabilistic learning task while in a magnetic resonance imaging scanner. The behavioral data demonstrated age differences in learning parameters with a stronger impact of negative feedback on expected value in children. Imaging data revealed that the neural representation of prediction errors was similar across age groups, but functional connectivity between the ventral striatum and the medial prefrontal cortex changed as a function of age. Furthermore, the connectivity strength predicted the tendency to alter expectations after receiving negative feedback. These findings suggest that the underlying mechanisms of developmental changes in learning are not related to differences in the neural representation of learning signals per se but rather in how learning signals are used to guide behavior and expectations.

  18. View Estimation Based on Value System

    NASA Astrophysics Data System (ADS)

    Takahashi, Yasutake; Shimada, Kouki; Asada, Minoru

    Estimation of a caregiver's view is one of the most important capabilities for a child to understand the behavior demonstrated by the caregiver, that is, to infer the intention of behavior and/or to learn the observed behavior efficiently. We hypothesize that the child develops this ability in the same way as behavior learning motivated by an intrinsic reward, that is, he/she updates the model of the estimated view of his/her own during the behavior imitated from the observation of the behavior demonstrated by the caregiver based on minimizing the estimation error of the reward during the behavior. From this view, this paper shows a method for acquiring such a capability based on a value system from which values can be obtained by reinforcement learning. The parameters of the view estimation are updated based on the temporal difference error (hereafter TD error: estimation error of the state value), analogous to the way such that the parameters of the state value of the behavior are updated based on the TD error. Experiments with simple humanoid robots show the validity of the method, and the developmental process parallel to young children's estimation of its own view during the imitation of the observed behavior of the caregiver is discussed.

  19. Statistical learning: A powerful mechanism that operates by mere exposure

    PubMed Central

    Aslin, Richard N.

    2015-01-01

    How do infants learn so rapidly and with little apparent effort? In 1996, Saffran, Aslin, and Newport reported that 8-month-old human infants could learn the underlying temporal structure of a stream of speech syllables after only two minutes of passive listening. This demonstration of what was called statistical learning, involving no instruction, reinforcement, or feedback, led to dozens of confirmations of this powerful mechanism of implicit learning in a variety of modalities, domains, and species. These findings reveal that infants are not nearly as dependent on explicit forms of instruction as we might have assumed from studies of learning in which children or adults are taught facts such as math or problem solving skills. Instead, at least in some domains, infants soak up the information around them by mere exposure. Learning and development in these domains thus appear to occur automatically and with little active involvement by an instructor (parent or teacher). The details of this statistical learning mechanism are discussed, including how exposure to specific types of information can, under some circumstances, generalize to never-before-observed information, thereby enabling transfer of learning. PMID:27906526

  20. Effects of dopamine on reinforcement learning and consolidation in Parkinson's disease.

    PubMed

    Grogan, John P; Tsivos, Demitra; Smith, Laura; Knight, Brogan E; Bogacz, Rafal; Whone, Alan; Coulthard, Elizabeth J

    2017-07-10

    Emerging evidence suggests that dopamine may modulate learning and memory with important implications for understanding the neurobiology of memory and future therapeutic targeting. An influential hypothesis posits that dopamine biases reinforcement learning. More recent data also suggest an influence during both consolidation and retrieval. Eighteen Parkinson's disease patients learned through feedback ON or OFF medication, with memory tested 24 hr later ON or OFF medication (4 conditions, within-subjects design with matched healthy control group). Patients OFF medication during learning decreased in memory accuracy over the following 24 hr. In contrast to previous studies, however, dopaminergic medication during learning and testing did not affect expression of positive or negative reinforcement. Two further experiments were run without the 24 hr delay, but they too failed to reproduce effects of dopaminergic medication on reinforcement learning. While supportive of a dopaminergic role in consolidation, this study failed to replicate previous findings on reinforcement learning.

  1. Enhanced Experience Replay for Deep Reinforcement Learning

    DTIC Science & Technology

    2015-11-01

    ARL-TR-7538 ● NOV 2015 US Army Research Laboratory Enhanced Experience Replay for Deep Reinforcement Learning by David Doria...Experience Replay for Deep Reinforcement Learning by David Doria, Bryan Dawson, and Manuel Vindiola Computational and Information Sciences Directorate...

  2. Reinforcement learning in professional basketball players

    PubMed Central

    Neiman, Tal; Loewenstein, Yonatan

    2011-01-01

    Reinforcement learning in complex natural environments is a challenging task because the agent should generalize from the outcomes of actions taken in one state of the world to future actions in different states of the world. The extent to which human experts find the proper level of generalization is unclear. Here we show, using the sequences of field goal attempts made by professional basketball players, that the outcome of even a single field goal attempt has a considerable effect on the rate of subsequent 3 point shot attempts, in line with standard models of reinforcement learning. However, this change in behaviour is associated with negative correlations between the outcomes of successive field goal attempts. These results indicate that despite years of experience and high motivation, professional players overgeneralize from the outcomes of their most recent actions, which leads to decreased performance. PMID:22146388

  3. [Functional neuroanatomy of implicit learning: associative, motor and habit].

    PubMed

    Correa, M

    The present review focuses on the neuroanatomy of aspects of implicit learning that involve stimulus-response associations, such as classical and instrumental conditioning, motor learning and habit formation. These types of learning all require a progression in the acquisition of procedural information about 'how to do things' instead of 'what things are'. These forms of implicit learning share the neural substrate formed mainly by brain circuits involving basal ganglia, prefrontal cortex and amygdala. The relationship between pavlovian and instrumental learning is shown in the transference and autoshaping studies. There has been a resurgence of interest in habit learning because of the suggestion that addiction is a process that progresses from a reinforced response to a habit in which the stimulus-response association is supraselected and becomes independent of voluntary cognitive control. Dopamine has demonstrated to be involved in the acquisition of these procedures. The different forms of procedural learning studied here all are characterized by stimulus-response-reinforcement associations, but there are differences between them in terms of the degree to which some of these associations or components are strengthened. These different patterns of association are partially regulated by the degree of involvement of the frontal-striatal-amygdala circuits.

  4. Memory-Guided Attention: Independent Contributions of the Hippocampus and Striatum.

    PubMed

    Goldfarb, Elizabeth V; Chun, Marvin M; Phelps, Elizabeth A

    2016-01-20

    Memory can strongly influence how attention is deployed in future encounters. Though memory dependent on the medial temporal lobes has been shown to drive attention, how other memory systems could concurrently and comparably enhance attention is less clear. Here, we demonstrate that both reinforcement learning and context memory facilitate attention in a visual search task. Using functional magnetic resonance imaging, we dissociate the mechanisms by which these memories guide attention: trial by trial, the hippocampus (not the striatum) predicted attention benefits from context memory, while the striatum (not the hippocampus) predicted facilitation from rewarded stimulus-response associations. Responses in these regions were also distinctly correlated with individual differences in each type of memory-guided attention. This study provides novel evidence for the role of the striatum in guiding attention, dissociable from hippocampus-dependent context memory.

  5. Memory-Guided Attention: Independent Contributions of the Hippocampus and Striatum

    PubMed Central

    Goldfarb, Elizabeth V.; Chun, Marvin M.; Phelps, Elizabeth A.

    2015-01-01

    SUMMARY Memory can strongly influence how attention is deployed in future encounters. Though memory dependent on the medial temporal lobes has been shown to drive attention, how other memory systems could concurrently and comparably enhance attention is less clear. Here, we demonstrate that both reinforcement learning and context memory facilitate attention in a visual search task. Using functional magnetic resonance imaging, we dissociate the mechanisms by which these memories guide attention: trial by trial, the hippocampus (not the striatum) predicted attention benefits from context memory, while the striatum (not the hippocampus) predicted facilitation from rewarded stimulus-response associations. Responses in these regions were also distinctly correlated with individual differences in each type of memory-guided attention. This study provides novel evidence for the role of the striatum in guiding attention, dissociable from hippocampus-dependent context memory. PMID:26777274

  6. The Origins of Individual Differences in How Learning Is Expressed in Rats: A General-Process Perspective

    PubMed Central

    2016-01-01

    Laboratory rats can exhibit marked, qualitative individual differences in the form of acquired behaviors. For example, when exposed to a signal-reinforcer relationship some rats show marked and consistent changes in sign-tracking (interacting with the signal; e.g., a lever) and others show marked and consistent changes in goal-tracking (interacting with the location of the predicted reinforcer; e.g., the food well). Here, stable individual differences in rats’ sign-tracking and goal-tracking emerged over the course of training, but these differences did not generalize across different signal-reinforcer relationships (Experiment 1). This selectivity suggests that individual differences in sign- and goal-tracking reflect differences in the value placed on individual reinforcers. Two findings provide direct support for this interpretation: the palatability of a reinforcer (as measured by an analysis of lick-cluster size) was positively correlated with goal-tracking (and negatively correlated with sign-tracking); and sating rats with a reinforcer affected goal-tracking but not sign-tracking (Experiment 2). These results indicate that the observed individual differences in sign- and goal-tracking behavior arise from the interaction between the palatability or value of the reinforcer and processes of association as opposed to dispositional differences (e.g., in sensory processes, “temperament,” or response repertoire). PMID:27732045

  7. Transfer of Learning from One Language to Another.

    ERIC Educational Resources Information Center

    Liu, Shirley

    Can the way children learn Chinese help them to learn English? In this study, it is noted that there is a possibility that despite the structural differences between them, there is transference from the learning of Chinese to the learning of English. Skinner's four major reinforcement schedules were used in this study to promote this transference…

  8. Cocaine addiction as a homeostatic reinforcement learning disorder.

    PubMed

    Keramati, Mehdi; Durand, Audrey; Girardeau, Paul; Gutkin, Boris; Ahmed, Serge H

    2017-03-01

    Drug addiction implicates both reward learning and homeostatic regulation mechanisms of the brain. This has stimulated 2 partially successful theoretical perspectives on addiction. Many important aspects of addiction, however, remain to be explained within a single, unified framework that integrates the 2 mechanisms. Building upon a recently developed homeostatic reinforcement learning theory, the authors focus on a key transition stage of addiction that is well modeled in animals, escalation of drug use, and propose a computational theory of cocaine addiction where cocaine reinforces behavior due to its rapid homeostatic corrective effect, whereas its chronic use induces slow and long-lasting changes in homeostatic setpoint. Simulations show that our new theory accounts for key behavioral and neurobiological features of addiction, most notably, escalation of cocaine use, drug-primed craving and relapse, individual differences underlying dose-response curves, and dopamine D2-receptor downregulation in addicts. The theory also generates unique predictions about cocaine self-administration behavior in rats that are confirmed by new experimental results. Viewing addiction as a homeostatic reinforcement learning disorder coherently explains many behavioral and neurobiological aspects of the transition to cocaine addiction, and suggests a new perspective toward understanding addiction. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  9. Behavioral and neural properties of social reinforcement learning

    PubMed Central

    Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Libby, Victoria; Glover, Gary; Voss, Henning U.; Ballon, Douglas J.; Casey, BJ

    2011-01-01

    Social learning is critical for engaging in complex interactions with other individuals. Learning from positive social exchanges, such as acceptance from peers, may be similar to basic reinforcement learning. We formally test this hypothesis by developing a novel paradigm that is based upon work in non-human primates and human imaging studies of reinforcement learning. The probability of receiving positive social reinforcement from three distinct peers was parametrically manipulated while brain activity was recorded in healthy adults using event-related functional magnetic resonance imaging (fMRI). Over the course of the experiment, participants responded more quickly to faces of peers who provided more frequent positive social reinforcement, and rated them as more likeable. Modeling trial-by-trial learning showed ventral striatum and orbital frontal cortex activity correlated positively with forming expectations about receiving social reinforcement. Rostral anterior cingulate cortex activity tracked positively with modulations of expected value of the cues (peers). Together, the findings across three levels of analysis - social preferences, response latencies and modeling neural responses – are consistent with reinforcement learning theory and non-human primate electrophysiological studies of reward. This work highlights the fundamental influence of acceptance by one’s peers in altering subsequent behavior. PMID:21917787

  10. Open Source Tools for Temporally Controlled Rodent Behavior Suitable for Electrophysiology and Optogenetic Manipulations.

    PubMed

    Solari, Nicola; Sviatkó, Katalin; Laszlovszky, Tamás; Hegedüs, Panna; Hangya, Balázs

    2018-01-01

    Understanding how the brain controls behavior requires observing and manipulating neural activity in awake behaving animals. Neuronal firing is timed at millisecond precision. Therefore, to decipher temporal coding, it is necessary to monitor and control animal behavior at the same level of temporal accuracy. However, it is technically challenging to deliver sensory stimuli and reinforcers as well as to read the behavioral responses they elicit with millisecond precision. Presently available commercial systems often excel in specific aspects of behavior control, but they do not provide a customizable environment allowing flexible experimental design while maintaining high standards for temporal control necessary for interpreting neuronal activity. Moreover, delay measurements of stimulus and reinforcement delivery are largely unavailable. We combined microcontroller-based behavior control with a sound delivery system for playing complex acoustic stimuli, fast solenoid valves for precisely timed reinforcement delivery and a custom-built sound attenuated chamber using high-end industrial insulation materials. Together this setup provides a physical environment to train head-fixed animals, enables calibrated sound stimuli and precisely timed fluid and air puff presentation as reinforcers. We provide latency measurements for stimulus and reinforcement delivery and an algorithm to perform such measurements on other behavior control systems. Combined with electrophysiology and optogenetic manipulations, the millisecond timing accuracy will help interpret temporally precise neural signals and behavioral changes. Additionally, since software and hardware provided here can be readily customized to achieve a large variety of paradigms, these solutions enable an unusually flexible design of rodent behavioral experiments.

  11. Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers

    ERIC Educational Resources Information Center

    Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.

    2014-01-01

    Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…

  12. Discrete Serotonin Systems Mediate Memory Enhancement and Escape Latencies after Unpredicted Aversive Experience in Drosophila Place Memory

    PubMed Central

    Sitaraman, Divya; Kramer, Elizabeth F.; Kahsai, Lily; Ostrowski, Daniela; Zars, Troy

    2017-01-01

    Feedback mechanisms in operant learning are critical for animals to increase reward or reduce punishment. However, not all conditions have a behavior that can readily resolve an event. Animals must then try out different behaviors to better their situation through outcome learning. This form of learning allows for novel solutions and with positive experience can lead to unexpected behavioral routines. Learned helplessness, as a type of outcome learning, manifests in part as increases in escape latency in the face of repeated unpredicted shocks. Little is known about the mechanisms of outcome learning. When fruit fly Drosophila melanogaster are exposed to unpredicted high temperatures in a place learning paradigm, flies both increase escape latencies and have a higher memory when given control of a place/temperature contingency. Here we describe discrete serotonin neuronal circuits that mediate aversive reinforcement, escape latencies, and memory levels after place learning in the presence and absence of unexpected aversive events. The results show that two features of learned helplessness depend on the same modulatory system as aversive reinforcement. Moreover, changes in aversive reinforcement and escape latency depend on local neural circuit modulation, while memory enhancement requires larger modulation of multiple behavioral control circuits. PMID:29321732

  13. The cerebellum: a neural system for the study of reinforcement learning.

    PubMed

    Swain, Rodney A; Kerr, Abigail L; Thompson, Richard F

    2011-01-01

    In its strictest application, the term "reinforcement learning" refers to a computational approach to learning in which an agent (often a machine) interacts with a mutable environment to maximize reward through trial and error. The approach borrows essentials from several fields, most notably Computer Science, Behavioral Neuroscience, and Psychology. At the most basic level, a neural system capable of mediating reinforcement learning must be able to acquire sensory information about the external environment and internal milieu (either directly or through connectivities with other brain regions), must be able to select a behavior to be executed, and must be capable of providing evaluative feedback about the success of that behavior. Given that Psychology informs us that reinforcers, both positive and negative, are stimuli or consequences that increase the probability that the immediately antecedent behavior will be repeated and that reinforcer strength or viability is modulated by the organism's past experience with the reinforcer, its affect, and even the state of its muscles (e.g., eyes open or closed); it is the case that any neural system that supports reinforcement learning must also be sensitive to these same considerations. Once learning is established, such a neural system must finally be able to maintain continued response expression and prevent response drift. In this report, we examine both historical and recent evidence that the cerebellum satisfies all of these requirements. While we report evidence from a variety of learning paradigms, the majority of our discussion will focus on classical conditioning of the rabbit eye blink response as an ideal model system for the study of reinforcement and reinforcement learning.

  14. Robust reinforcement learning.

    PubMed

    Morimoto, Jun; Doya, Kenji

    2005-02-01

    This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both offline learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H(infinity) control, we consider a differential game in which a "disturbing" agent tries to make the worst possible disturbance while a "control" agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H(infinity) control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.

  15. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning.

    PubMed

    Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane

    2017-01-01

    Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.

  16. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    PubMed Central

    Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane

    2017-01-01

    Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning. PMID:29163004

  17. An analysis of value function learning with piecewise linear control

    NASA Astrophysics Data System (ADS)

    Tutsoy, Onder; Brown, Martin

    2016-05-01

    Reinforcement learning (RL) algorithms attempt to learn optimal control actions by iteratively estimating a long-term measure of system performance, the so-called value function. For example, RL algorithms have been applied to walking robots to examine the connection between robot motion and the brain, which is known as embodied cognition. In this paper, RL algorithms are analysed using an exemplar test problem. A closed form solution for the value function is calculated and this is represented in terms of a set of basis functions and parameters, which is used to investigate parameter convergence. The value function expression is shown to have a polynomial form where the polynomial terms depend on the plant's parameters and the value function's discount factor. It is shown that the temporal difference error introduces a null space for the differenced higher order basis associated with the effects of controller switching (saturated to linear control or terminating an experiment) apart from the time of the switch. This leads to slow convergence in the relevant subspace. It is also shown that badly conditioned learning problems can occur, and this is a function of the value function discount factor and the controller switching points. Finally, a comparison is performed between the residual gradient and TD(0) learning algorithms, and it is shown that the former has a faster rate of convergence for this test problem.

  18. Incorporation of perception-based information in robot learning using fuzzy reinforcement learning agents

    NASA Astrophysics Data System (ADS)

    Zhou, Changjiu; Meng, Qingchun; Guo, Zhongwen; Qu, Wiefen; Yin, Bo

    2002-04-01

    Robot learning in unstructured environments has been proved to be an extremely challenging problem, mainly because of many uncertainties always present in the real world. Human beings, on the other hand, seem to cope very well with uncertain and unpredictable environments, often relying on perception-based information. Furthermore, humans beings can also utilize perceptions to guide their learning on those parts of the perception-action space that are actually relevant to the task. Therefore, we conduct a research aimed at improving robot learning through the incorporation of both perception-based and measurement-based information. For this reason, a fuzzy reinforcement learning (FRL) agent is proposed in this paper. Based on a neural-fuzzy architecture, different kinds of information can be incorporated into the FRL agent to initialise its action network, critic network and evaluation feedback module so as to accelerate its learning. By making use of the global optimisation capability of GAs (genetic algorithms), a GA-based FRL (GAFRL) agent is presented to solve the local minima problem in traditional actor-critic reinforcement learning. On the other hand, with the prediction capability of the critic network, GAs can perform a more effective global search. Different GAFRL agents are constructed and verified by using the simulation model of a physical biped robot. The simulation analysis shows that the biped learning rate for dynamic balance can be improved by incorporating perception-based information on biped balancing and walking evaluation. The biped robot can find its application in ocean exploration, detection or sea rescue activity, as well as military maritime activity.

  19. A Reinforcement Learning Model Equipped with Sensors for Generating Perception Patterns: Implementation of a Simulated Air Navigation System Using ADS-B (Automatic Dependent Surveillance-Broadcast) Technology.

    PubMed

    Álvarez de Toledo, Santiago; Anguera, Aurea; Barreiro, José M; Lara, Juan A; Lizcano, David

    2017-01-19

    Over the last few decades, a number of reinforcement learning techniques have emerged, and different reinforcement learning-based applications have proliferated. However, such techniques tend to specialize in a particular field. This is an obstacle to their generalization and extrapolation to other areas. Besides, neither the reward-punishment (r-p) learning process nor the convergence of results is fast and efficient enough. To address these obstacles, this research proposes a general reinforcement learning model. This model is independent of input and output types and based on general bioinspired principles that help to speed up the learning process. The model is composed of a perception module based on sensors whose specific perceptions are mapped as perception patterns. In this manner, similar perceptions (even if perceived at different positions in the environment) are accounted for by the same perception pattern. Additionally, the model includes a procedure that statistically associates perception-action pattern pairs depending on the positive or negative results output by executing the respective action in response to a particular perception during the learning process. To do this, the model is fitted with a mechanism that reacts positively or negatively to particular sensory stimuli in order to rate results. The model is supplemented by an action module that can be configured depending on the maneuverability of each specific agent. The model has been applied in the air navigation domain, a field with strong safety restrictions, which led us to implement a simulated system equipped with the proposed model. Accordingly, the perception sensors were based on Automatic Dependent Surveillance-Broadcast (ADS-B) technology, which is described in this paper. The results were quite satisfactory, and it outperformed traditional methods existing in the literature with respect to learning reliability and efficiency.

  20. A Reinforcement Learning Model Equipped with Sensors for Generating Perception Patterns: Implementation of a Simulated Air Navigation System Using ADS-B (Automatic Dependent Surveillance-Broadcast) Technology

    PubMed Central

    Álvarez de Toledo, Santiago; Anguera, Aurea; Barreiro, José M.; Lara, Juan A.; Lizcano, David

    2017-01-01

    Over the last few decades, a number of reinforcement learning techniques have emerged, and different reinforcement learning-based applications have proliferated. However, such techniques tend to specialize in a particular field. This is an obstacle to their generalization and extrapolation to other areas. Besides, neither the reward-punishment (r-p) learning process nor the convergence of results is fast and efficient enough. To address these obstacles, this research proposes a general reinforcement learning model. This model is independent of input and output types and based on general bioinspired principles that help to speed up the learning process. The model is composed of a perception module based on sensors whose specific perceptions are mapped as perception patterns. In this manner, similar perceptions (even if perceived at different positions in the environment) are accounted for by the same perception pattern. Additionally, the model includes a procedure that statistically associates perception-action pattern pairs depending on the positive or negative results output by executing the respective action in response to a particular perception during the learning process. To do this, the model is fitted with a mechanism that reacts positively or negatively to particular sensory stimuli in order to rate results. The model is supplemented by an action module that can be configured depending on the maneuverability of each specific agent. The model has been applied in the air navigation domain, a field with strong safety restrictions, which led us to implement a simulated system equipped with the proposed model. Accordingly, the perception sensors were based on Automatic Dependent Surveillance-Broadcast (ADS-B) technology, which is described in this paper. The results were quite satisfactory, and it outperformed traditional methods existing in the literature with respect to learning reliability and efficiency. PMID:28106849

  1. Biases in probabilistic category learning in relation to social anxiety

    PubMed Central

    Abraham, Anna; Hermann, Christiane

    2015-01-01

    Instrumental learning paradigms are rarely employed to investigate the mechanisms underlying acquired fear responses in social anxiety. Here, we adapted a probabilistic category learning paradigm to assess information processing biases as a function of the degree of social anxiety traits in a sample of healthy individuals without a diagnosis of social phobia. Participants were presented with three pairs of neutral faces with differing probabilistic accuracy contingencies (A/B: 80/20, C/D: 70/30, E/F: 60/40). Upon making their choice, negative and positive feedback was conveyed using angry and happy faces, respectively. The highly socially anxious group showed a strong tendency to be more accurate at learning the probability contingency associated with the most ambiguous stimulus pair (E/F: 60/40). Moreover, when pairing the most positively reinforced stimulus or the most negatively reinforced stimulus with all the other stimuli in a test phase, the highly socially anxious group avoided the most negatively reinforced stimulus significantly more than the control group. The results are discussed with reference to avoidance learning and hypersensitivity to negative socially evaluative information associated with social anxiety. PMID:26347685

  2. The role of GABAB receptors in human reinforcement learning.

    PubMed

    Ort, Andres; Kometer, Michael; Rohde, Judith; Seifritz, Erich; Vollenweider, Franz X

    2014-10-01

    Behavioral evidence from human studies suggests that the γ-aminobutyric acid type B receptor (GABAB receptor) agonist baclofen modulates reinforcement learning and reduces craving in patients with addiction spectrum disorders. However, in contrast to the well established role of dopamine in reinforcement learning, the mechanisms by which the GABAB receptor influences reinforcement learning in humans remain completely unknown. To further elucidate this issue, a cross-over, double-blind, placebo-controlled study was performed in healthy human subjects (N=15) to test the effects of baclofen (20 and 50mg p.o.) on probabilistic reinforcement learning. Outcomes were the feedback-induced P2 component of the event-related potential, the feedback-related negativity, and the P300 component of the event-related potential. Baclofen produced a reduction of P2 amplitude over the course of the experiment, but did not modulate the feedback-related negativity. Furthermore, there was a trend towards increased learning after baclofen administration relative to placebo over the course of the experiment. The present results extend previous theories of reinforcement learning, which focus on the importance of mesolimbic dopamine signaling, and indicate that stimulation of cortical GABAB receptors in a fronto-parietal network leads to better attentional allocation in reinforcement learning. This observation is a first step in our understanding of how baclofen may improve reinforcement learning in healthy subjects. Further studies with bigger sample sizes are needed to corroborate this conclusion and furthermore, test this effect in patients with addiction spectrum disorder. Copyright © 2014 Elsevier B.V. and ECNP. All rights reserved.

  3. Effects of dopamine on reinforcement learning and consolidation in Parkinson’s disease

    PubMed Central

    Grogan, John P; Tsivos, Demitra; Smith, Laura; Knight, Brogan E; Bogacz, Rafal; Whone, Alan; Coulthard, Elizabeth J

    2017-01-01

    Emerging evidence suggests that dopamine may modulate learning and memory with important implications for understanding the neurobiology of memory and future therapeutic targeting. An influential hypothesis posits that dopamine biases reinforcement learning. More recent data also suggest an influence during both consolidation and retrieval. Eighteen Parkinson’s disease patients learned through feedback ON or OFF medication, with memory tested 24 hr later ON or OFF medication (4 conditions, within-subjects design with matched healthy control group). Patients OFF medication during learning decreased in memory accuracy over the following 24 hr. In contrast to previous studies, however, dopaminergic medication during learning and testing did not affect expression of positive or negative reinforcement. Two further experiments were run without the 24 hr delay, but they too failed to reproduce effects of dopaminergic medication on reinforcement learning. While supportive of a dopaminergic role in consolidation, this study failed to replicate previous findings on reinforcement learning. DOI: http://dx.doi.org/10.7554/eLife.26801.001 PMID:28691905

  4. Attractor concretion as a mechanism for the formation of context representations

    PubMed Central

    Rigotti, Mattia; Ben Dayan Rubin, Daniel; Morrison, Sara E.; Salzman, C. Daniel; Fusi, Stefano

    2010-01-01

    Complex tasks often require the memory of recent events, the knowledge about the context in which they occur, and the goals we intend to reach. All this information is stored in our mental states. Given a set of mental states, reinforcement learning (RL) algorithms predict the optimal policy that maximizes future reward. RL algorithms assign a value to each already-known state so that discovering the optimal policy reduces to selecting the action leading to the state with the highest value. But how does the brain create representations of these mental states in the first place? We propose a mechanism for the creation of mental states that contain information about the temporal statistics of the events in a particular context. We suggest that the mental states are represented by stable patterns of reverberating activity, which are attractors of the neural dynamics. These representations are built from neurons that are selective to specific combinations of external events (e.g. sensory stimuli) and pre-existent mental states. Consistent with this notion, we find that neurons in the amygdala and in orbito-frontal cortex (OFC) often exhibit this form of mixed selectivity. We propose that activating different mixed selectivity neurons in a fixed temporal order modifies synaptic connections so that conjunctions of events and mental states merge into a single pattern of reverberating activity. This process corresponds to the birth of a new different mental state that encodes a different temporal context. The concretion process depends on temporal contiguity, i.e. on the probability that a combination of an event and mental states follows or precedes the events and states that define a certain context. The information contained in the context thereby allows an animal to assign unambiguously a value to the events that initially appeared in different situations with different meanings. PMID:20100580

  5. Effects of Ventral Striatum Lesions on Stimulus-Based versus Action-Based Reinforcement Learning.

    PubMed

    Rothenhoefer, Kathryn M; Costa, Vincent D; Bartolo, Ramón; Vicario-Feliciano, Raquel; Murray, Elisabeth A; Averbeck, Bruno B

    2017-07-19

    Learning the values of actions versus stimuli may depend on separable neural circuits. In the current study, we evaluated the performance of rhesus macaques with ventral striatum (VS) lesions on a two-arm bandit task that had randomly interleaved blocks of stimulus-based and action-based reinforcement learning (RL). Compared with controls, monkeys with VS lesions had deficits in learning to select rewarding images but not rewarding actions. We used a RL model to quantify learning and choice consistency and found that, in stimulus-based RL, the VS lesion monkeys were more influenced by negative feedback and had lower choice consistency than controls. Using a Bayesian model to parse the groups' learning strategies, we also found that VS lesion monkeys defaulted to an action-based choice strategy. Therefore, the VS is involved specifically in learning the value of stimuli, not actions. SIGNIFICANCE STATEMENT Reinforcement learning models of the ventral striatum (VS) often assume that it maintains an estimate of state value. This suggests that it plays a general role in learning whether rewards are assigned based on a chosen action or stimulus. In the present experiment, we examined the effects of VS lesions on monkeys' ability to learn that choosing a particular action or stimulus was more likely to lead to reward. We found that VS lesions caused a specific deficit in the monkeys' ability to discriminate between images with different values, whereas their ability to discriminate between actions with different values remained intact. Our results therefore suggest that the VS plays a specific role in learning to select rewarded stimuli. Copyright © 2017 the authors 0270-6474/17/376902-13$15.00/0.

  6. Pleasurable music affects reinforcement learning according to the listener

    PubMed Central

    Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira

    2013-01-01

    Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875

  7. Feedback from the heart: Emotional learning and memory is controlled by cardiac cycle, interoceptive accuracy and personality.

    PubMed

    Pfeifer, Gaby; Garfinkel, Sarah N; Gould van Praag, Cassandra D; Sahota, Kuljit; Betka, Sophie; Critchley, Hugo D

    2017-05-01

    Feedback processing is critical to trial-and-error learning. Here, we examined whether interoceptive signals concerning the state of cardiovascular arousal influence the processing of reinforcing feedback during the learning of 'emotional' face-name pairs, with subsequent effects on retrieval. Participants (N=29) engaged in a learning task of face-name pairs (fearful, neutral, happy faces). Correct and incorrect learning decisions were reinforced by auditory feedback, which was delivered either at cardiac systole (on the heartbeat, when baroreceptors signal the contraction of the heart to the brain), or at diastole (between heartbeats during baroreceptor quiescence). We discovered a cardiac influence on feedback processing that enhanced the learning of fearful faces in people with heightened interoceptive ability. Individuals with enhanced accuracy on a heartbeat counting task learned fearful face-name pairs better when feedback was given at systole than at diastole. This effect was not present for neutral and happy faces. At retrieval, we also observed related effects of personality: First, individuals scoring higher for extraversion showed poorer retrieval accuracy. These individuals additionally manifested lower resting heart rate and lower state anxiety, suggesting that attenuated levels of cardiovascular arousal in extraverts underlies poorer performance. Second, higher extraversion scores predicted higher emotional intensity ratings of fearful faces reinforced at systole. Third, individuals scoring higher for neuroticism showed higher retrieval confidence for fearful faces reinforced at diastole. Our results show that cardiac signals shape feedback processing to influence learning of fearful faces, an effect underpinned by personality differences linked to psychophysiological arousal. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Interactions between prefrontal cortex and cerebellum revealed by trace eyelid conditioning.

    PubMed

    Kalmbach, Brian E; Ohyama, Tatsuya; Kreider, Joy C; Riusech, Frank; Mauk, Michael D

    2009-01-01

    Eyelid conditioning has proven useful for analysis of learning and computation in the cerebellum. Two variants, delay and trace conditioning, differ only by the relative timing of the training stimuli. Despite the subtlety of this difference, trace eyelid conditioning is prevented by lesions of the cerebellum, hippocampus, or medial prefrontal cortex (mPFC), whereas delay eyelid conditioning is prevented by cerebellar lesions and is largely unaffected by forebrain lesions. Here we test whether these lesion results can be explained by two assertions: (1) Cerebellar learning requires temporal overlap between the mossy fiber inputs activated by the tone conditioned stimulus (CS) and the climbing fiber inputs activated by the reinforcing unconditioned stimulus (US), and therefore (2) trace conditioning requires activity that outlasts the presentation of the CS in a subset of mossy fibers separate from those activated directly by the CS. By use of electrical stimulation of mossy fibers as a CS, we show that cerebellar learning during trace eyelid conditioning requires an input that persists during the stimulus-free trace interval. By use of reversible inactivation experiments, we provide evidence that this input arises from the mPFC and arrives at the cerebellum via a previously unidentified site in the pontine nuclei. In light of previous PFC recordings in various species, we suggest that trace eyelid conditioning involves an interaction between the persistent activity of delay cells in mPFC-a putative mechanism of working memory-and motor learning in the cerebellum.

  9. Fear of losing money? Aversive conditioning with secondary reinforcers.

    PubMed

    Delgado, M R; Labouliere, C D; Phelps, E A

    2006-12-01

    Money is a secondary reinforcer that acquires its value through social communication and interaction. In everyday human behavior and laboratory studies, money has been shown to influence appetitive or reward learning. It is unclear, however, if money has a similar impact on aversive learning. The goal of this study was to investigate the efficacy of money in aversive learning, comparing it with primary reinforcers that are traditionally used in fear conditioning paradigms. A series of experiments were conducted in which participants initially played a gambling game that led to a monetary gain. They were then presented with an aversive conditioning paradigm, with either shock (primary reinforcer) or loss of money (secondary reinforcer) as the unconditioned stimulus. Skin conductance responses and subjective ratings indicated that potential monetary loss modulated the conditioned response. Depending on the presentation context, the secondary reinforcer was as effective as the primary reinforcer during aversive conditioning. These results suggest that stimuli that acquire reinforcing properties through social communication and interaction, such as money, can effectively influence aversive learning.

  10. Reinforcement learning and Tourette syndrome.

    PubMed

    Palminteri, Stefano; Pessiglione, Mathias

    2013-01-01

    In this chapter, we report the first experimental explorations of reinforcement learning in Tourette syndrome, realized by our team in the last few years. This report will be preceded by an introduction aimed to provide the reader with the state of the art of the knowledge concerning the neural bases of reinforcement learning at the moment of these studies and the scientific rationale beyond them. In short, reinforcement learning is learning by trial and error to maximize rewards and minimize punishments. This decision-making and learning process implicates the dopaminergic system projecting to the frontal cortex-basal ganglia circuits. A large body of evidence suggests that the dysfunction of the same neural systems is implicated in the pathophysiology of Tourette syndrome. Our results show that Tourette condition, as well as the most common pharmacological treatments (dopamine antagonists), affects reinforcement learning performance in these patients. Specifically, the results suggest a deficit in negative reinforcement learning, possibly underpinned by a functional hyperdopaminergia, which could explain the persistence of tics, despite their evident inadaptive (negative) value. This idea, together with the implications of these results in Tourette therapy and the future perspectives, is discussed in Section 4 of this chapter. © 2013 Elsevier Inc. All rights reserved.

  11. Better or Worse than Expected? Aging, Learning, and the ERN

    ERIC Educational Resources Information Center

    Eppinger, Ben; Kray, Jutta; Mock, Barbara; Mecklinger, Axel

    2008-01-01

    This study examined age differences in error processing and reinforcement learning. We were interested in whether the electrophysiological correlates of error processing, the error-related negativity (ERN) and the feedback-related negativity (FRN), reflect learning-related changes in younger and older adults. To do so, we applied a probabilistic…

  12. Goal-Directed and Habit-Like Modulations of Stimulus Processing during Reinforcement Learning.

    PubMed

    Luque, David; Beesley, Tom; Morris, Richard W; Jack, Bradley N; Griffiths, Oren; Whitford, Thomas J; Le Pelley, Mike E

    2017-03-15

    Recent research has shown that perceptual processing of stimuli previously associated with high-value rewards is automatically prioritized even when rewards are no longer available. It has been hypothesized that such reward-related modulation of stimulus salience is conceptually similar to an "attentional habit." Recording event-related potentials in humans during a reinforcement learning task, we show strong evidence in favor of this hypothesis. Resistance to outcome devaluation (the defining feature of a habit) was shown by the stimulus-locked P1 component, reflecting activity in the extrastriate visual cortex. Analysis at longer latencies revealed a positive component (corresponding to the P3b, from 550-700 ms) sensitive to outcome devaluation. Therefore, distinct spatiotemporal patterns of brain activity were observed corresponding to habitual and goal-directed processes. These results demonstrate that reinforcement learning engages both attentional habits and goal-directed processes in parallel. Consequences for brain and computational models of reinforcement learning are discussed. SIGNIFICANCE STATEMENT The human attentional network adapts to detect stimuli that predict important rewards. A recent hypothesis suggests that the visual cortex automatically prioritizes reward-related stimuli, driven by cached representations of reward value; that is, stimulus-response habits. Alternatively, the neural system may track the current value of the predicted outcome. Our results demonstrate for the first time that visual cortex activity is increased for reward-related stimuli even when the rewarding event is temporarily devalued. In contrast, longer-latency brain activity was specifically sensitive to transient changes in reward value. Therefore, we show that both habit-like attention and goal-directed processes occur in the same learning episode at different latencies. This result has important consequences for computational models of reinforcement learning. Copyright © 2017 the authors 0270-6474/17/373009-09$15.00/0.

  13. Instant transformation of learned repulsion into motivational "wanting".

    PubMed

    Robinson, Mike J F; Berridge, Kent C

    2013-02-18

    Learned cues for pleasant reward often elicit desire, which, in addicts, may become compulsive. According to the dominant view in addiction neuroscience and reinforcement modeling, such desires are the simple products of learning, coming from a past association with reward outcome. We demonstrate that cravings are more than merely the products of accumulated pleasure memories-even a repulsive learned cue for unpleasantness can become suddenly desired via the activation of mesocorticolimbic circuitry. Rats learned repulsion toward a Pavlovian cue (a briefly-inserted metal lever) that always predicted an unpleasant Dead Sea saltiness sensation. Yet, upon first reencounter in a novel sodium-depletion state to promote mesocorticolimbic reactivity (reflected by elevated Fos activation in ventral tegmentum, nucleus accumbens, ventral pallidum, and the orbitofrontal prefrontal cortex), the learned cue was instantly transformed into an attractive and powerful motivational magnet. Rats jumped and gnawed on the suddenly attractive Pavlovian lever cue, despite never having tasted intense saltiness as anything other than disgusting. Instant desire transformation of a learned cue contradicts views that Pavlovian desires are essentially based on previously learned values (e.g., prediction error or temporal difference models). Instead desire is recomputed at reencounter by integrating Pavlovian information with the current brain/physiological state. This powerful brain transformation reverses strong learned revulsion into avid attraction. When applied to addiction, related mesocorticolimbic transformations (e.g., drugs or neural sensitization) of cues for already-pleasant drug experiences could create even more intense cravings. This cue/state transformation helps define what it means to say that addiction hijacks brain limbic circuits of natural reward. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. Effect of Round Window Reinforcement on Hearing: A Temporal Bone Study With Clinical Implications for Surgical Reinforcement of the Round Window.

    PubMed

    Wegner, Inge; Eldaebes, Mostafa M A S; Landry, Thomas G; Adamson, Robert B; Grolman, Wilko; Bance, Manohar L

    2016-06-01

    Round window reinforcement leads to conductive hearing loss. The round window is stiffened surgically as therapy for various conditions, including perilymphatic fistula and superior semicircular canal dehiscence. Round window reinforcement reduces symptoms in these patients. However, it also reduces fluid displacement in the cochlea and might therefore increase conductive hearing loss. Perichondrium was applied to the round window membrane in nine fresh-frozen, nonpathologic temporal bones. In four temporal bones cartilage was applied subsequently. Acoustic stimuli in the form of frequency sweeps from 250 to 8000 Hz were generated at 110 dB sound pressure level. A total of 16 frequencies in a 1/3-octave series were used. Stapes velocities in response to the acoustic stimuli were measured at equally spaced multiple points covering the stapes footplate using a scanning laser Doppler interferometry system. Measurements were made at baseline, after applying perichondrium, and after applying cartilage. At frequencies up to 1000 Hz perichondrium reinforcement decreased stapes velocities by 1.5 to 2.9 dB compared with no reinforcement (p value = 0.003). Reinforcement with cartilage led to a further deterioration of stapes velocities by 2.6 to 4.2 dB at frequencies up to 1000 Hz (p value = 0.050). The higher frequencies were not affected by perichondrium reinforcement (p value = 0.774) or cartilage reinforcement (p value = 0.644). Our results seem to suggest a modest, clinically negligible effect of reinforcement with perichondrium. Placing cartilage on the round window resulted in a graded effect on stapes velocities in keeping with the increased stiffness of cartilage compared with perichondrium. Even so, the effect was relatively small.

  15. Risk of coinfection outbreaks in temporal networks: a case study of a hospital contact network

    NASA Astrophysics Data System (ADS)

    Rodríguez, Jorge P.; Ghanbarnejad, Fakhteh; Eguíluz, Víctor M.

    2017-10-01

    We study the spreading of cooperative infections in an empirical temporal network of contacts between people, including health care workers and patients, in a hospital. The system exhibits a phase transition leading to one or several endemic branches, depending on the connectivity pattern and the temporal correlations. There are two endemic branches in the original setting and the non-cooperative case. However, the cooperative interaction between infections reinforces the upper branch, leading to a smaller epidemic threshold and a higher probability for having a big outbreak. We show the microscopic mechanisms leading to these differences, characterize three different risks, and use the influenza features as an example for this dynamics.

  16. Intelligence moderates reinforcement learning: a mini-review of the neural evidence

    PubMed Central

    2014-01-01

    Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. PMID:25185818

  17. Intelligence moderates reinforcement learning: a mini-review of the neural evidence.

    PubMed

    Chen, Chong

    2015-06-01

    Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. Copyright © 2015 the American Physiological Society.

  18. Exploring the spatio-temporal neural basis of face learning

    PubMed Central

    Yang, Ying; Xu, Yang; Jew, Carol A.; Pyles, John A.; Kass, Robert E.; Tarr, Michael J.

    2017-01-01

    Humans are experts at face individuation. Although previous work has identified a network of face-sensitive regions and some of the temporal signatures of face processing, as yet, we do not have a clear understanding of how such face-sensitive regions support learning at different time points. To study the joint spatio-temporal neural basis of face learning, we trained subjects to categorize two groups of novel faces and recorded their neural responses using magnetoencephalography (MEG) throughout learning. A regression analysis of neural responses in face-sensitive regions against behavioral learning curves revealed significant correlations with learning in the majority of the face-sensitive regions in the face network, mostly between 150–250 ms, but also after 300 ms. However, the effect was smaller in nonventral regions (within the superior temporal areas and prefrontal cortex) than that in the ventral regions (within the inferior occipital gyri (IOG), midfusiform gyri (mFUS) and anterior temporal lobes). A multivariate discriminant analysis also revealed that IOG and mFUS, which showed strong correlation effects with learning, exhibited significant discriminability between the two face categories at different time points both between 150–250 ms and after 300 ms. In contrast, the nonventral face-sensitive regions, where correlation effects with learning were smaller, did exhibit some significant discriminability, but mainly after 300 ms. In sum, our findings indicate that early and recurring temporal components arising from ventral face-sensitive regions are critically involved in learning new faces. PMID:28570739

  19. Exploring the spatio-temporal neural basis of face learning.

    PubMed

    Yang, Ying; Xu, Yang; Jew, Carol A; Pyles, John A; Kass, Robert E; Tarr, Michael J

    2017-06-01

    Humans are experts at face individuation. Although previous work has identified a network of face-sensitive regions and some of the temporal signatures of face processing, as yet, we do not have a clear understanding of how such face-sensitive regions support learning at different time points. To study the joint spatio-temporal neural basis of face learning, we trained subjects to categorize two groups of novel faces and recorded their neural responses using magnetoencephalography (MEG) throughout learning. A regression analysis of neural responses in face-sensitive regions against behavioral learning curves revealed significant correlations with learning in the majority of the face-sensitive regions in the face network, mostly between 150-250 ms, but also after 300 ms. However, the effect was smaller in nonventral regions (within the superior temporal areas and prefrontal cortex) than that in the ventral regions (within the inferior occipital gyri (IOG), midfusiform gyri (mFUS) and anterior temporal lobes). A multivariate discriminant analysis also revealed that IOG and mFUS, which showed strong correlation effects with learning, exhibited significant discriminability between the two face categories at different time points both between 150-250 ms and after 300 ms. In contrast, the nonventral face-sensitive regions, where correlation effects with learning were smaller, did exhibit some significant discriminability, but mainly after 300 ms. In sum, our findings indicate that early and recurring temporal components arising from ventral face-sensitive regions are critically involved in learning new faces.

  20. Psychopathy: cognitive and neural dysfunction.

    PubMed

    R Blair, R James

    2013-06-01

    Psychopathy is a developmental disorder marked by emotional deficits and an increased risk for antisocial behavior. It is not equivalent to the diagnosis Antisocial Personality Disorder, which concentrates only on the increased risk for antisocial behavior and not a specific cause-ie, the reduced empathy and guilt that constitutes the emotional deficit. The current review considers data from adults with psychopathy with respect to the main cognitive accounts of the disorder that stress either a primary attention deficit or a primary emotion deficit. In addition, the current review considers data regarding the neurobiology of this disorder. Dysfunction within the amygdala's role in reinforcement learning and the role of ventromedial frontal cortex in the representation of reinforcement value is stressed. Data is also presented indicating potential difficulties within parts of temporal and posterior cingulate cortex. Suggestions are made with respect to why these deficits lead to the development of the disorder.

  1. Psychopathy: cognitive and neural dysfunction

    PubMed Central

    R. Blair, R. James

    2013-01-01

    Psychopathy is a developmental disorder marked by emotional deficits and an increased risk for antisocial behavior. It is not equivalent to the diagnosis Antisocial Personality Disorder, which concentrates only on the increased risk for antisocial behavior and not a specific cause—ie, the reduced empathy and guilt that constitutes the emotional deficit. The current review considers data from adults with psychopathy with respect to the main cognitive accounts of the disorder that stress either a primary attention deficit or a primary emotion deficit. In addition, the current review considers data regarding the neurobiology of this disorder. Dysfunction within the amygdala's role in reinforcement learning and the role of ventromedial frontal cortex in the representation of reinforcement value is stressed. Data is also presented indicating potential difficulties within parts of temporal and posterior cingulate cortex. Suggestions are made with respect to why these deficits lead to the development of the disorder. PMID:24174892

  2. Motor Learning Enhances Use-Dependent Plasticity

    PubMed Central

    2017-01-01

    Motor behaviors are shaped not only by current sensory signals but also by the history of recent experiences. For instance, repeated movements toward a particular target bias the subsequent movements toward that target direction. This process, called use-dependent plasticity (UDP), is considered a basic and goal-independent way of forming motor memories. Most studies consider movement history as the critical component that leads to UDP (Classen et al., 1998; Verstynen and Sabes, 2011). However, the effects of learning (i.e., improved performance) on UDP during movement repetition have not been investigated. Here, we used transcranial magnetic stimulation in two experiments to assess plasticity changes occurring in the primary motor cortex after individuals repeated reinforced and nonreinforced actions. The first experiment assessed whether learning a skill task modulates UDP. We found that a group that successfully learned the skill task showed greater UDP than a group that did not accumulate learning, but made comparable repeated actions. The second experiment aimed to understand the role of reinforcement learning in UDP while controlling for reward magnitude and action kinematics. We found that providing subjects with a binary reward without visual feedback of the cursor led to increased UDP effects. Subjects in the group that received comparable reward not associated with their actions maintained the previously induced UDP. Our findings illustrate how reinforcing consistent actions strengthens use-dependent memories and provide insight into operant mechanisms that modulate plastic changes in the motor cortex. SIGNIFICANCE STATEMENT Performing consistent motor actions induces use-dependent plastic changes in the motor cortex. This plasticity reflects one of the basic forms of human motor learning. Past studies assumed that this form of learning is exclusively affected by repetition of actions. However, here we showed that success-based reinforcement signals could affect the human use-dependent plasticity (UDP) process. Our results indicate that learning augments and interacts with UDP. This effect is important to the understanding of the interplay between the different forms of motor learning and suggests that reinforcement is not only important to learning new behaviors, but can shape our subsequent behavior via its interaction with UDP. PMID:28143961

  3. Feedback-related brain activity predicts learning from feedback in multiple-choice testing.

    PubMed

    Ernst, Benjamin; Steinhauser, Marco

    2012-06-01

    Different event-related potentials (ERPs) have been shown to correlate with learning from feedback in decision-making tasks and with learning in explicit memory tasks. In the present study, we investigated which ERPs predict learning from corrective feedback in a multiple-choice test, which combines elements from both paradigms. Participants worked through sets of multiple-choice items of a Swahili-German vocabulary task. Whereas the initial presentation of an item required the participants to guess the answer, corrective feedback could be used to learn the correct response. Initial analyses revealed that corrective feedback elicited components related to reinforcement learning (FRN), as well as to explicit memory processing (P300) and attention (early frontal positivity). However, only the P300 and early frontal positivity were positively correlated with successful learning from corrective feedback, whereas the FRN was even larger when learning failed. These results suggest that learning from corrective feedback crucially relies on explicit memory processing and attentional orienting to corrective feedback, rather than on reinforcement learning.

  4. More Than the Sum of Its Parts: A Role for the Hippocampus in Configural Reinforcement Learning.

    PubMed

    Duncan, Katherine; Doll, Bradley B; Daw, Nathaniel D; Shohamy, Daphna

    2018-05-02

    People often perceive configurations rather than the elements they comprise, a bias that may emerge because configurations often predict outcomes. But how does the brain learn to associate configurations with outcomes and how does this learning differ from learning about individual elements? We combined behavior, reinforcement learning models, and functional imaging to understand how people learn to associate configurations of cues with outcomes. We found that configural learning depended on the relative predictive strength of elements versus configurations and was related to both the strength of BOLD activity and patterns of BOLD activity in the hippocampus. Configural learning was further related to functional connectivity between the hippocampus and nucleus accumbens. Moreover, configural learning was associated with flexible knowledge about associations and differential eye movements during choice. Together, this suggests that configural learning is associated with a distinct computational, cognitive, and neural profile that is well suited to support flexible and adaptive behavior. Copyright © 2018 Elsevier Inc. All rights reserved.

  5. Open Source Tools for Temporally Controlled Rodent Behavior Suitable for Electrophysiology and Optogenetic Manipulations

    PubMed Central

    Solari, Nicola; Sviatkó, Katalin; Laszlovszky, Tamás; Hegedüs, Panna; Hangya, Balázs

    2018-01-01

    Understanding how the brain controls behavior requires observing and manipulating neural activity in awake behaving animals. Neuronal firing is timed at millisecond precision. Therefore, to decipher temporal coding, it is necessary to monitor and control animal behavior at the same level of temporal accuracy. However, it is technically challenging to deliver sensory stimuli and reinforcers as well as to read the behavioral responses they elicit with millisecond precision. Presently available commercial systems often excel in specific aspects of behavior control, but they do not provide a customizable environment allowing flexible experimental design while maintaining high standards for temporal control necessary for interpreting neuronal activity. Moreover, delay measurements of stimulus and reinforcement delivery are largely unavailable. We combined microcontroller-based behavior control with a sound delivery system for playing complex acoustic stimuli, fast solenoid valves for precisely timed reinforcement delivery and a custom-built sound attenuated chamber using high-end industrial insulation materials. Together this setup provides a physical environment to train head-fixed animals, enables calibrated sound stimuli and precisely timed fluid and air puff presentation as reinforcers. We provide latency measurements for stimulus and reinforcement delivery and an algorithm to perform such measurements on other behavior control systems. Combined with electrophysiology and optogenetic manipulations, the millisecond timing accuracy will help interpret temporally precise neural signals and behavioral changes. Additionally, since software and hardware provided here can be readily customized to achieve a large variety of paradigms, these solutions enable an unusually flexible design of rodent behavioral experiments. PMID:29867383

  6. Inhibition, Reinforcement Sensitivity and Temporal Information Processing in ADHD and ADHD+ODD: Evidence of a Separate Entity?

    ERIC Educational Resources Information Center

    Luman, Marjolein; van Noesel, Steffen J. P.; Papanikolau, Alky; Van Oostenbruggen-Scheffer, Janneke; Veugelers, Diane; Sergeant, Joseph A.; Oosterlaan, Jaap

    2009-01-01

    This study compared children with ADHD-only, ADHD+ODD and normal controls (age 8-12) on three key neurocognitive functions: response inhibition, reinforcement sensitivity, and temporal information processing. The goal was twofold: (a) to investigate neurocognitive impairments in children with ADHD-only and children with ADHD+ODD, and (b) to test…

  7. The prefrontal cortex and hybrid learning during iterative competitive games.

    PubMed

    Abe, Hiroshi; Seo, Hyojung; Lee, Daeyeol

    2011-12-01

    Behavioral changes driven by reinforcement and punishment are referred to as simple or model-free reinforcement learning. Animals can also change their behaviors by observing events that are neither appetitive nor aversive when these events provide new information about payoffs available from alternative actions. This is an example of model-based reinforcement learning and can be accomplished by incorporating hypothetical reward signals into the value functions for specific actions. Recent neuroimaging and single-neuron recording studies showed that the prefrontal cortex and the striatum are involved not only in reinforcement and punishment, but also in model-based reinforcement learning. We found evidence for both types of learning, and hence hybrid learning, in monkeys during simulated competitive games. In addition, in both the dorsolateral prefrontal cortex and orbitofrontal cortex, individual neurons heterogeneously encoded signals related to actual and hypothetical outcomes from specific actions, suggesting that both areas might contribute to hybrid learning. © 2011 New York Academy of Sciences.

  8. Reinforcement Learning and Dopamine in Schizophrenia: Dimensions of Symptoms or Specific Features of a Disease Group?

    PubMed Central

    Deserno, Lorenz; Boehme, Rebecca; Heinz, Andreas; Schlagenhauf, Florian

    2013-01-01

    Abnormalities in reinforcement learning are a key finding in schizophrenia and have been proposed to be linked to elevated levels of dopamine neurotransmission. Behavioral deficits in reinforcement learning and their neural correlates may contribute to the formation of clinical characteristics of schizophrenia. The ability to form predictions about future outcomes is fundamental for environmental interactions and depends on neuronal teaching signals, like reward prediction errors. While aberrant prediction errors, that encode non-salient events as surprising, have been proposed to contribute to the formation of positive symptoms, a failure to build neural representations of decision values may result in negative symptoms. Here, we review behavioral and neuroimaging research in schizophrenia and focus on studies that implemented reinforcement learning models. In addition, we discuss studies that combined reinforcement learning with measures of dopamine. Thereby, we suggest how reinforcement learning abnormalities in schizophrenia may contribute to the formation of psychotic symptoms and may interact with cognitive deficits. These ideas point toward an interplay of more rigid versus flexible control over reinforcement learning. Pronounced deficits in the flexible or model-based domain may allow for a detailed characterization of well-established cognitive deficits in schizophrenia patients based on computational models of learning. Finally, we propose a framework based on the potentially crucial contribution of dopamine to dysfunctional reinforcement learning on the level of neural networks. Future research may strongly benefit from computational modeling but also requires further methodological improvement for clinical group studies. These research tools may help to improve our understanding of disease-specific mechanisms and may help to identify clinically relevant subgroups of the heterogeneous entity schizophrenia. PMID:24391603

  9. Generalization of value in reinforcement learning by humans.

    PubMed

    Wimmer, G Elliott; Daw, Nathaniel D; Shohamy, Daphna

    2012-04-01

    Research in decision-making has focused on the role of dopamine and its striatal targets in guiding choices via learned stimulus-reward or stimulus-response associations, behavior that is well described by reinforcement learning theories. However, basic reinforcement learning is relatively limited in scope and does not explain how learning about stimulus regularities or relations may guide decision-making. A candidate mechanism for this type of learning comes from the domain of memory, which has highlighted a role for the hippocampus in learning of stimulus-stimulus relations, typically dissociated from the role of the striatum in stimulus-response learning. Here, we used functional magnetic resonance imaging and computational model-based analyses to examine the joint contributions of these mechanisms to reinforcement learning. Humans performed a reinforcement learning task with added relational structure, modeled after tasks used to isolate hippocampal contributions to memory. On each trial participants chose one of four options, but the reward probabilities for pairs of options were correlated across trials. This (uninstructed) relationship between pairs of options potentially enabled an observer to learn about option values based on experience with the other options and to generalize across them. We observed blood oxygen level-dependent (BOLD) activity related to learning in the striatum and also in the hippocampus. By comparing a basic reinforcement learning model to one augmented to allow feedback to generalize between correlated options, we tested whether choice behavior and BOLD activity were influenced by the opportunity to generalize across correlated options. Although such generalization goes beyond standard computational accounts of reinforcement learning and striatal BOLD, both choices and striatal BOLD activity were better explained by the augmented model. Consistent with the hypothesized role for the hippocampus in this generalization, functional connectivity between the ventral striatum and hippocampus was modulated, across participants, by the ability of the augmented model to capture participants' choice. Our results thus point toward an interactive model in which striatal reinforcement learning systems may employ relational representations typically associated with the hippocampus. © 2012 The Authors. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.

  10. Instrumental learning and relearning in individuals with psychopathy and in patients with lesions involving the amygdala or orbitofrontal cortex.

    PubMed

    Mitchell, D G V; Fine, C; Richell, R A; Newman, C; Lumsden, J; Blair, K S; Blair, R J R

    2006-05-01

    Previous work has shown that individuals with psychopathy are impaired on some forms of associative learning, particularly stimulus-reinforcement learning (Blair et al., 2004; Newman & Kosson, 1986). Animal work suggests that the acquisition of stimulus-reinforcement associations requires the amygdala (Baxter & Murray, 2002). Individuals with psychopathy also show impoverished reversal learning (Mitchell, Colledge, Leonard, & Blair, 2002). Reversal learning is supported by the ventrolateral and orbitofrontal cortex (Rolls, 2004). In this paper we present experiments investigating stimulus-reinforcement learning and relearning in patients with lesions of the orbitofrontal cortex or amygdala, and individuals with developmental psychopathy without known trauma. The results are interpreted with reference to current neurocognitive models of stimulus-reinforcement learning, relearning, and developmental psychopathy. Copyright (c) 2006 APA, all rights reserved.

  11. Model-based reinforcement learning with dimension reduction.

    PubMed

    Tangkaratt, Voot; Morimoto, Jun; Sugiyama, Masashi

    2016-12-01

    The goal of reinforcement learning is to learn an optimal policy which controls an agent to acquire the maximum cumulative reward. The model-based reinforcement learning approach learns a transition model of the environment from data, and then derives the optimal policy using the transition model. However, learning an accurate transition model in high-dimensional environments requires a large amount of data which is difficult to obtain. To overcome this difficulty, in this paper, we propose to combine model-based reinforcement learning with the recently developed least-squares conditional entropy (LSCE) method, which simultaneously performs transition model estimation and dimension reduction. We also further extend the proposed method to imitation learning scenarios. The experimental results show that policy search combined with LSCE performs well for high-dimensional control tasks including real humanoid robot control. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. Ongoing behavior predicts perceptual report of interval duration

    PubMed Central

    Gouvêa, Thiago S.; Monteiro, Tiago; Soares, Sofia; Atallah, Bassam V.; Paton, Joseph J.

    2014-01-01

    The ability to estimate the passage of time is essential for adaptive behavior in complex environments. Yet, it is not known how the brain encodes time over the durations necessary to explain animal behavior. Under temporally structured reinforcement schedules, animals tend to develop temporally structured behavior, and interval timing has been suggested to be accomplished by learning sequences of behavioral states. If this is true, trial to trial fluctuations in behavioral sequences should be predictive of fluctuations in time estimation. We trained rodents in an duration categorization task while continuously monitoring their behavior with a high speed camera. Animals developed highly reproducible behavioral sequences during the interval being timed. Moreover, those sequences were often predictive of perceptual report from early in the trial, providing support to the idea that animals may use learned behavioral patterns to estimate the duration of time intervals. To better resolve the issue, we propose that continuous and simultaneous behavioral and neural monitoring will enable identification of neural activity related to time perception that is not explained by ongoing behavior. PMID:24672473

  13. Reinforcement of Science Learning through Local Culture: A Delphi Study

    ERIC Educational Resources Information Center

    Nuangchalerm, Prasart

    2008-01-01

    This study aims to explore the ways to reinforce science learning through local culture by using Delphi technique. Twenty four participants in various fields of study were selected. The result of study provides a framework for reinforcement of science learning through local culture on the theme life and environment. (Contains 1 table.)

  14. Hippocampal Processing of Ambiguity Enhances Fear Memory

    PubMed Central

    Amadi, Ugwechi; Lim, Seh Hong; Liu, Elizabeth; Baratta, Michael V.; Goosens, Ki Ann

    2016-01-01

    Despite the ubiquitous use of Pavlovian fear conditioning as a model for fear learning, the highly predictable conditions used in the laboratory do not resemble real-world conditions, where dangerous situations can lead to unpleasant outcomes in unpredictable ways. Here we varied the timing of aversive events following predictive cues in rodents and discovered that temporal ambiguity of aversive events greatly enhances fear. During fear conditioning with unpredictably timed aversive events, pharmacological inactivation of the dorsal hippocampus or optogenetic silencing of CA1 cells during aversive negative prediction errors prevented this enhancement of fear without impacting fear learning for predictable events. Dorsal hippocampal inactivation also prevented ambiguity-related enhancement of fear during auditory fear conditioning under a partial reinforcement schedule. These results reveal that information about the timing and occurrence of aversive events is rapidly acquired and that unexpectedly timed or omitted aversive events generate hippocampal signals to enhance fear learning. PMID:28182526

  15. Hippocampal Processing of Ambiguity Enhances Fear Memory.

    PubMed

    Amadi, Ugwechi; Lim, Seh Hong; Liu, Elizabeth; Baratta, Michael V; Goosens, Ki A

    2017-02-01

    Despite the ubiquitous use of Pavlovian fear conditioning as a model for fear learning, the highly predictable conditions used in the laboratory do not resemble real-world conditions, in which dangerous situations can lead to unpleasant outcomes in unpredictable ways. In the current experiments, we varied the timing of aversive events after predictive cues in rodents and discovered that temporal ambiguity of aversive events greatly enhances fear. During fear conditioning with unpredictably timed aversive events, pharmacological inactivation of the dorsal hippocampus or optogenetic silencing of cornu ammonis 1 cells during aversive negative prediction errors prevented this enhancement of fear without affecting fear learning for predictable events. Dorsal hippocampal inactivation also prevented ambiguity-related enhancement of fear during auditory fear conditioning under a partial-reinforcement schedule. These results reveal that information about the timing and occurrence of aversive events is rapidly acquired and that unexpectedly timed or omitted aversive events generate hippocampal signals to enhance fear learning.

  16. Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions.

    PubMed

    Tamosiunaite, Minija; Asfour, Tamim; Wörgötter, Florentin

    2009-03-01

    Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult.

  17. Variable Behavior and Repeated Learning in Two Mouse Strains: Developmental and Genetic Contributions.

    PubMed

    Arnold, Megan A; Newland, M Christopher

    2018-06-16

    Behavioral inflexibility is often assessed using reversal learning tasks, which require a relatively low degree of response variability. No studies have assessed sensitivity to reinforcement contingencies that specifically select highly variable response patterns in mice, let alone in models of neurodevelopmental disorders involving limited response variation. Operant variability and incremental repeated acquisition (IRA) were used to assess unique aspects of behavioral variability of two mouse strains: BALB/c, a model of some deficits in ASD, and C57Bl/6. On the operant variability task, BALB/c mice responded more repetitively during adolescence than C57Bl/6 mice when reinforcement did not require variability but responded more variably when reinforcement required variability. During IRA testing in adulthood, both strains acquired an unchanging, performance sequence equally well. Strain differences emerged, however, after novel learning sequences began alternating with the performance sequence: BALB/c mice substantially outperformed C57Bl/6 mice. Using litter-mate controls, it was found that adolescent experience with variability did not affect either learning or performance on the IRA task in adulthood. These findings constrain the use of BALB/c mice as a model of ASD, but once again reveal this strain is highly sensitive to reinforcement contingencies and they are fast and robust learners. Copyright © 2018. Published by Elsevier B.V.

  18. Relapse processes after the extinction of instrumental learning: Renewal, resurgence, and reacquisition

    PubMed Central

    Bouton, Mark E.; Winterbauer, Neil E.; Todd, Travis P.

    2012-01-01

    It is widely recognized that extinction (the procedure in which a Pavlovian conditioned stimulus or an instrumental action is repeatedly presented without its reinforcer) weakens behavior without erasing the original learning. Most of the experiments that support this claim have focused on several “relapse” effects that occur after Pavlovian extinction, which collectively suggest that the original learning is saved through extinction. However, although such effects do occur after instrumental extinction, they have not been explored there in as much detail. This article reviews recent research in our laboratory that has investigated three relapse effects that occur after the extinction of instrumental (operant) learning. In renewal, responding returns after extinction when the behavior is tested in a different context; in resurgence, responding recovers when a second response that has been reinforced during extinction of the first is itself put on extinction; and in rapid reacquisition, extinguished responding returns rapidly when the response is reinforced again. The results provide new insights into extinction and relapse, and are consistent with principles that have been developed to explain extinction and relapse as they occur after Pavlovian conditioning. Extinction of instrumental learning, like Pavlovian learning, involves new learning that is relatively dependent on the context for expression. PMID:22450305

  19. Does Feedback-Related Brain Response during Reinforcement Learning Predict Socio-motivational (In-)dependence in Adolescence?

    PubMed Central

    Raufelder, Diana; Boehme, Rebecca; Romund, Lydia; Golde, Sabrina; Lorenz, Robert C.; Gleich, Tobias; Beck, Anne

    2016-01-01

    This multi-methodological study applied functional magnetic resonance imaging to investigate neural activation in a group of adolescent students (N = 88) during a probabilistic reinforcement learning task. We related patterns of emerging brain activity and individual learning rates to socio-motivational (in-)dependence manifested in four different motivation types (MTs): (1) peer-dependent MT, (2) teacher-dependent MT, (3) peer-and-teacher-dependent MT, (4) peer-and-teacher-independent MT. A multinomial regression analysis revealed that the individual learning rate predicts students’ membership to the independent MT, or the peer-and-teacher-dependent MT. Additionally, the striatum, a brain region associated with behavioral adaptation and flexibility, showed increased learning-related activation in students with motivational independence. Moreover, the prefrontal cortex, which is involved in behavioral control, was more active in students of the peer-and-teacher-dependent MT. Overall, this study offers new insights into the interplay of motivation and learning with (1) a focus on inter-individual differences in the role of peers and teachers as source of students’ individual motivation and (2) its potential neurobiological basis. PMID:27199873

  20. Advances in Temporal Analysis in Learning and Instruction

    ERIC Educational Resources Information Center

    Molenaar, Inge

    2014-01-01

    This paper focuses on a trend to analyse temporal characteristics of constructs important to learning and instruction. Different researchers have indicated that we should pay more attention to time in our research to enhance explanatory power and increase validity. Constructs formerly viewed as personal traits, such as self-regulated learning and…

  1. A parameter control method in reinforcement learning to rapidly follow unexpected environmental changes.

    PubMed

    Murakoshi, Kazushi; Mizuno, Junya

    2004-11-01

    In order to rapidly follow unexpected environmental changes, we propose a parameter control method in reinforcement learning that changes each of learning parameters in appropriate directions. We determine each appropriate direction on the basis of relationships between behaviors and neuromodulators by considering an emergency as a key word. Computer experiments show that the agents using our proposed method could rapidly respond to unexpected environmental changes, not depending on either two reinforcement learning algorithms (Q-learning and actor-critic (AC) architecture) or two learning problems (discontinuous and continuous state-action problems).

  2. Ducklings imprint on the relational concept of "same or different".

    PubMed

    Martinho, Antone; Kacelnik, Alex

    2016-07-15

    The ability to identify and retain logical relations between stimuli and apply them to novel stimuli is known as relational concept learning. This has been demonstrated in a few animal species after extensive reinforcement training, and it reveals the brain's ability to deal with abstract properties. Here we describe relational concept learning in newborn ducklings without reinforced training. Newly hatched domesticated mallards that were briefly exposed to a pair of objects that were either the same or different in shape or color later preferred to follow pairs of new objects exhibiting the imprinted relation. Thus, even in a seemingly rigid and very rapid form of learning such as filial imprinting, the brain operates with abstract conceptual reasoning, a faculty often assumed to be reserved to highly intelligent organisms. Copyright © 2016, American Association for the Advancement of Science.

  3. Rescaling of temporal expectations during extinction

    PubMed Central

    Drew, Michael R.; Walsh, Carolyn; Balsam, Peter D

    2016-01-01

    Previous research suggests that extinction learning is temporally specific. Changing the CS duration between training and extinction can facilitate the loss of the CR within the extinction session but impairs long-term retention of extinction. In two experiments using conditioned magazine approach with rats, we examined the relation between temporal specificity of extinction and CR timing. In Experiment 1 rats were trained on a 12-s, fixed CS-US interval and then extinguished with CS presentations that were 6, 12, or 24 s in duration. The design of Experiment 2 was the same except rats were trained using partial rather than continuous reinforcement. In both experiments, extending the CS duration in extinction facilitated the diminution of CRs during the extinction session, but shortening the CS duration failed to slow extinction. In addition, extending (but not shortening) the CS duration caused temporal rescaling of the CR, in that the peak CR rate migrated later into the trial over the course of extinction training. This migration partially accounted for the faster loss of the CR when the CS duration was extended. Results are incompatible with the hypothesis that extinction is driven by cumulative CS exposure and suggest that temporally extended nonreinforced CS exposure reduces conditioned responding via temporal displacement rather than through extinction per se. PMID:28045291

  4. Partial Planning Reinforcement Learning

    DTIC Science & Technology

    2012-08-31

    Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 15. SUBJECT TERMS Reinforcement Learning, Bayesian Optimization, Active ... Learning , Action Model Learning, Decision Theoretic Assistance Prasad Tadepalli, Alan Fern Oregon State University Office of Sponsored Programs Oregon State

  5. Dorsolateral prefrontal lesions do not impair tests of scene learning and decision-making that require frontal–temporal interaction

    PubMed Central

    Baxter, Mark G; Gaffan, David; Kyriazis, Diana A; Mitchell, Anna S

    2008-01-01

    Theories of dorsolateral prefrontal cortex (DLPFC) involvement in cognitive function variously emphasize its involvement in rule implementation, cognitive control, or working and/or spatial memory. These theories predict broad effects of DLPFC lesions on tests of visual learning and memory. We evaluated the effects of DLPFC lesions (including both banks of the principal sulcus) in rhesus monkeys on tests of scene learning and strategy implementation that are severely impaired following crossed unilateral lesions of frontal cortex and inferotemporal cortex. Dorsolateral lesions had no effect on learning of new scene problems postoperatively, or on the implementation of preoperatively acquired strategies. They were also without effect on the ability to adjust choice behaviour in response to a change in reinforcer value, a capacity that requires interaction between the amygdala and frontal lobe. These intact abilities following DLPFC damage support specialization of function within the prefrontal cortex, and suggest that many aspects of memory and strategic and goal-directed behaviour can survive ablation of this structure. PMID:18702721

  6. Homeostatic reinforcement learning for integrating reward collection and physiological stability

    PubMed Central

    Keramati, Mehdi; Gutkin, Boris

    2014-01-01

    Efficient regulation of internal homeostasis and defending it against perturbations requires adaptive behavioral strategies. However, the computational principles mediating the interaction between homeostatic and associative learning processes remain undefined. Here we use a definition of primary rewards, as outcomes fulfilling physiological needs, to build a normative theory showing how learning motivated behaviors may be modulated by internal states. Within this framework, we mathematically prove that seeking rewards is equivalent to the fundamental objective of physiological stability, defining the notion of physiological rationality of behavior. We further suggest a formal basis for temporal discounting of rewards by showing that discounting motivates animals to follow the shortest path in the space of physiological variables toward the desired setpoint. We also explain how animals learn to act predictively to preclude prospective homeostatic challenges, and several other behavioral patterns. Finally, we suggest a computational role for interaction between hypothalamus and the brain reward system. DOI: http://dx.doi.org/10.7554/eLife.04811.001 PMID:25457346

  7. Token Reinforcement: A Review and Analysis

    ERIC Educational Resources Information Center

    Hackenberg, Timothy D.

    2009-01-01

    Token reinforcement procedures and concepts are reviewed and discussed in relation to general principles of behavior. The paper is divided into four main parts. Part I reviews and discusses previous research on token systems in relation to common behavioral functions--reinforcement, temporal organization, antecedent stimulus functions, and…

  8. Reinforcement learning in supply chains.

    PubMed

    Valluri, Annapurna; North, Michael J; Macal, Charles M

    2009-10-01

    Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice.

  9. Reinforcement learning in scheduling

    NASA Technical Reports Server (NTRS)

    Dietterich, Tom G.; Ok, Dokyeong; Zhang, Wei; Tadepalli, Prasad

    1994-01-01

    The goal of this research is to apply reinforcement learning methods to real-world problems like scheduling. In this preliminary paper, we show that learning to solve scheduling problems such as the Space Shuttle Payload Processing and the Automatic Guided Vehicle (AGV) scheduling can be usefully studied in the reinforcement learning framework. We discuss some of the special challenges posed by the scheduling domain to these methods and propose some possible solutions we plan to implement.

  10. The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive.

    PubMed

    Otto, A Ross; Gershman, Samuel J; Markman, Arthur B; Daw, Nathaniel D

    2013-05-01

    A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls behavior-and under what circumstances-are still unclear. Following the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrated that having human decision makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement-learning strategy. Further, we showed that, across trials, people negotiate the trade-off between the two systems dynamically as a function of concurrent executive-function demands, and people's choice latencies reflect the computational expenses of the strategy they employ. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.

  11. The Curse of Planning: Dissecting multiple reinforcement learning systems by taxing the central executive

    PubMed Central

    Otto, A. Ross; Gershman, Samuel J.; Markman, Arthur B.; Daw, Nathaniel D.

    2013-01-01

    A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. Along these lines, a flexible but computationally expensive model-based reinforcement learning system has been contrasted with a less flexible but more efficient model-free reinforcement learning system. The factors governing which system controls behavior—and under what circumstances—are still unclear. Based on the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrate that having human decision-makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement learning strategy. Further, we show that across trials, people negotiate this tradeoff dynamically as a function of concurrent executive function demands and their choice latencies reflect the computational expenses of the strategy employed. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources. PMID:23558545

  12. Minimalist Social-Affective Value for Use in Joint Action: A Neural-Computational Hypothesis

    PubMed Central

    Lowe, Robert; Almér, Alexander; Lindblad, Gustaf; Gander, Pierre; Michael, John; Vesper, Cordula

    2016-01-01

    Joint Action is typically described as social interaction that requires coordination among two or more co-actors in order to achieve a common goal. In this article, we put forward a hypothesis for the existence of a neural-computational mechanism of affective valuation that may be critically exploited in Joint Action. Such a mechanism would serve to facilitate coordination between co-actors permitting a reduction of required information. Our hypothesized affective mechanism provides a value function based implementation of Associative Two-Process (ATP) theory that entails the classification of external stimuli according to outcome expectancies. This approach has been used to describe animal and human action that concerns differential outcome expectancies. Until now it has not been applied to social interaction. We describe our Affective ATP model as applied to social learning consistent with an “extended common currency” perspective in the social neuroscience literature. We contrast this to an alternative mechanism that provides an example implementation of the so-called social-specific value perspective. In brief, our Social-Affective ATP mechanism builds upon established formalisms for reinforcement learning (temporal difference learning models) nuanced to accommodate expectations (consistent with ATP theory) and extended to integrate non-social and social cues for use in Joint Action. PMID:27601989

  13. Frontostriatal anatomical connections predict age- and difficulty-related differences in reinforcement learning.

    PubMed

    van de Vijver, Irene; Ridderinkhof, K Richard; Harsay, Helga; Reneman, Liesbeth; Cavanagh, James F; Buitenweg, Jessika I V; Cohen, Michael X

    2016-10-01

    Reinforcement learning (RL) is supported by a network of striatal and frontal cortical structures that are connected through white-matter fiber bundles. With age, the integrity of these white-matter connections declines. The role of structural frontostriatal connectivity in individual and age-related differences in RL is unclear, although local white-matter density and diffusivity have been linked to individual differences in RL. Here we show that frontostriatal tract counts in young human adults (aged 18-28), as assessed noninvasively with diffusion-weighted magnetic resonance imaging and probabilistic tractography, positively predicted individual differences in RL when learning was difficult (70% valid feedback). In older adults (aged 63-87), in contrast, learning under both easy (90% valid feedback) and difficult conditions was predicted by tract counts in the same frontostriatal network. Furthermore, network-level analyses showed a double dissociation between the task-relevant networks in young and older adults, suggesting that older adults relied on different frontostriatal networks than young adults to obtain the same task performance. These results highlight the importance of successful information integration across striatal and frontal regions during RL, especially with variable outcomes. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. No impact of repeated extinction exposures on operant responding maintained by different reinforcer rates.

    PubMed

    Bai, John Y H; Podlesnik, Christopher A

    2017-05-01

    Greater rates of intermittent reinforcement in the presence of discriminative stimuli generally produce greater resistance to extinction, consistent with predictions of behavioral momentum theory. Other studies reveal more rapid extinction with higher rates of reinforcers - the partial reinforcement extinction effect. Further, repeated extinction often produces more rapid decreases in operant responding due to learning a discrimination between training and extinction contingencies. The present study examined extinction repeatedly with training with different rates of intermittent reinforcement in a multiple schedule. We assessed whether repeated extinction would reverse the pattern of greater resistance to extinction with greater reinforcer rates. Counter to this prediction, resistance to extinction was consistently greater across twelve assessments of training followed by six successive sessions of extinction. Moreover, patterns of responding during extinction resembled those observed during satiation tests, which should not alter discrimination processes with repeated testing. These findings join others suggesting operant responding in extinction can be durable across repeated tests. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Disrupted Reinforcement Learning and Maladaptive Behavior in Women with a History of Childhood Sexual Abuse: A High-Density Event-Related Potential Study

    PubMed Central

    Pechtel, Pia; Pizzagalli, Diego A.

    2013-01-01

    Context Childhood sexual abuse (CSA) has been associated with psychopathology, particularly major depressive disorder (MDD), and high-risk behaviors. Despite grave epidemiological data, the mechanisms underlying these maladaptive outcomes remain poorly understood. Objective We examined whether CSA history, particularly in conjunction with past MDD, is associated with behavioral and neural dysfunction in reinforcement learning, and whether such dysfunction is linked to maladaptive behavior. Design Participants completed a clinical evaluation and a probabilistic reinforcement task while 128-channel event-related potentials were recorded. Setting Academic setting; participants recruited from the community. Participants Fifteen remitted depressed females with CSA history (CSA+rMDD), 16 remitted depressed females without CSA history (rMDD), and 18 healthy females. Main Outcome Measures Participants’ preference for choosing the most rewarded stimulus and avoiding the most punished stimulus was evaluated. The feedback-related negativity (FRN) and error-related negativity (ERN)–hypothesized to reflect activation in the anterior cingulate cortex–were used as electrophysiological indices of reinforcement learning. Results No group differences emerged in the acquisition of reinforcement contingencies. In trials requiring to rely partially or exclusively on previously rewarded information, the CSA+rMDD group showed (1) lower accuracy (relative to both controls and rMDD), (2) blunted electrophysiological differentiation between correct and incorrect responses (relative to controls), and (3) increased activation in the subgenual anterior cingulate cortex (relative to rMDD). CSA history was not associated with impairments in avoiding the most punished stimulus. Self-harm and suicidal behaviors correlated with poorer performance of previously rewarded–but not previously punished–trials. Conclusions Irrespective of past MDD, women with CSA histories showed neural and behavioral deficits in utilizing previous reinforcement to optimize decision-making in the absence of feedback (blunted “Go learning”). While the current study provides initial evidence for reward-specific deficits associated with CSA, future research is warranted to determine if disrupted positive reinforcement learning predicts high-risk behavior following CSA. PMID:23487253

  16. B-tree search reinforcement learning for model based intelligent agent

    NASA Astrophysics Data System (ADS)

    Bhuvaneswari, S.; Vignashwaran, R.

    2013-03-01

    Agents trained by learning techniques provide a powerful approximation of active solutions for naive approaches. In this study using B - Trees implying reinforced learning the data search for information retrieval is moderated to achieve accuracy with minimum search time. The impact of variables and tactics applied in training are determined using reinforcement learning. Agents based on these techniques perform satisfactory baseline and act as finite agents based on the predetermined model against competitors from the course.

  17. Using Fuzzy Logic for Performance Evaluation in Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap S.

    1992-01-01

    Current reinforcement learning algorithms require long training periods which generally limit their applicability to small size problems. A new architecture is described which uses fuzzy rules to initialize its two neural networks: a neural network for performance evaluation and another for action selection. This architecture is applied to control of dynamic systems and it is demonstrated that it is possible to start with an approximate prior knowledge and learn to refine it through experiments using reinforcement learning.

  18. Learning to Obtain Reward, but Not Avoid Punishment, Is Affected by Presence of PTSD Symptoms in Male Veterans: Empirical Data and Computational Model

    PubMed Central

    Myers, Catherine E.; Moustafa, Ahmed A.; Sheynin, Jony; VanMeenen, Kirsten M.; Gilbertson, Mark W.; Orr, Scott P.; Beck, Kevin D.; Pang, Kevin C. H.; Servatius, Richard J.

    2013-01-01

    Post-traumatic stress disorder (PTSD) symptoms include behavioral avoidance which is acquired and tends to increase with time. This avoidance may represent a general learning bias; indeed, individuals with PTSD are often faster than controls on acquiring conditioned responses based on physiologically-aversive feedback. However, it is not clear whether this learning bias extends to cognitive feedback, or to learning from both reward and punishment. Here, male veterans with self-reported current, severe PTSD symptoms (PTSS group) or with few or no PTSD symptoms (control group) completed a probabilistic classification task that included both reward-based and punishment-based trials, where feedback could take the form of reward, punishment, or an ambiguous “no-feedback” outcome that could signal either successful avoidance of punishment or failure to obtain reward. The PTSS group outperformed the control group in total points obtained; the PTSS group specifically performed better than the control group on reward-based trials, with no difference on punishment-based trials. To better understand possible mechanisms underlying observed performance, we used a reinforcement learning model of the task, and applied maximum likelihood estimation techniques to derive estimated parameters describing individual participants’ behavior. Estimations of the reinforcement value of the no-feedback outcome were significantly greater in the control group than the PTSS group, suggesting that the control group was more likely to value this outcome as positively reinforcing (i.e., signaling successful avoidance of punishment). This is consistent with the control group’s generally poorer performance on reward trials, where reward feedback was to be obtained in preference to the no-feedback outcome. Differences in the interpretation of ambiguous feedback may contribute to the facilitated reinforcement learning often observed in PTSD patients, and may in turn provide new insight into how pathological behaviors are acquired and maintained in PTSD. PMID:24015254

  19. Reinforcement learning in multidimensional environments relies on attention mechanisms.

    PubMed

    Niv, Yael; Daniel, Reka; Geana, Andra; Gershman, Samuel J; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C

    2015-05-27

    In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this "representation learning" process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the "curse of dimensionality" in reinforcement learning. Copyright © 2015 the authors 0270-6474/15/358145-13$15.00/0.

  20. Changes in corticostriatal connectivity during reinforcement learning in humans.

    PubMed

    Horga, Guillermo; Maia, Tiago V; Marsh, Rachel; Hao, Xuejun; Xu, Dongrong; Duan, Yunsuo; Tau, Gregory Z; Graniello, Barbara; Wang, Zhishun; Kangarlu, Alayar; Martinez, Diana; Packard, Mark G; Peterson, Bradley S

    2015-02-01

    Many computational models assume that reinforcement learning relies on changes in synaptic efficacy between cortical regions representing stimuli and striatal regions involved in response selection, but this assumption has thus far lacked empirical support in humans. We recorded hemodynamic signals with fMRI while participants navigated a virtual maze to find hidden rewards. We fitted a reinforcement-learning algorithm to participants' choice behavior and evaluated the neural activity and the changes in functional connectivity related to trial-by-trial learning variables. Activity in the posterior putamen during choice periods increased progressively during learning. Furthermore, the functional connections between the sensorimotor cortex and the posterior putamen strengthened progressively as participants learned the task. These changes in corticostriatal connectivity differentiated participants who learned the task from those who did not. These findings provide a direct link between changes in corticostriatal connectivity and learning, thereby supporting a central assumption common to several computational models of reinforcement learning. © 2014 Wiley Periodicals, Inc.

  1. A common neural circuit mechanism for internally guided and externally reinforced forms of motor learning.

    PubMed

    Hisey, Erin; Kearney, Matthew Gene; Mooney, Richard

    2018-04-01

    The complex skills underlying verbal and musical expression can be learned without external punishment or reward, indicating their learning is internally guided. The neural mechanisms that mediate internally guided learning are poorly understood, but a circuit comprising dopamine-releasing neurons in the midbrain ventral tegmental area (VTA) and their targets in the basal ganglia are important to externally reinforced learning. Juvenile zebra finches copy a tutor song in a process that is internally guided and, in adulthood, can learn to modify the fundamental frequency (pitch) of a target syllable in response to external reinforcement with white noise. Here we combined intersectional genetic ablation of VTA neurons, reversible blockade of dopamine receptors in the basal ganglia, and singing-triggered optogenetic stimulation of VTA terminals to establish that a common VTA-basal ganglia circuit enables internally guided song copying and externally reinforced syllable pitch learning.

  2. Brain Mechanisms Underlying Individual Differences in Reaction to Stress: An Animal Model

    DTIC Science & Technology

    1988-10-29

    Schooler, et al., 1976; Gershon & Buchsbaum, 1977; Buchsbaum, et al., 1977), personality scales of extraversion- introversion (Haier, 1984) and sensation...exploratory and learned to bar press more quickly and efficiently. Reducers with a lower inhibitory threshold learned the differential reinforcement of

  3. Towards a genetics-based adaptive agent to support flight testing

    NASA Astrophysics Data System (ADS)

    Cribbs, Henry Brown, III

    Although the benefits of aircraft simulation have been known since the late 1960s, simulation almost always entails interaction with a human test pilot. This "pilot-in-the-loop" simulation process provides useful evaluative information to the aircraft designer and provides a training tool to the pilot. Emulation of a pilot during the early phases of the aircraft design process might provide designers a useful evaluative tool. Machine learning might emulate a pilot in a simulated aircraft/cockpit setting. Preliminary work in the application of machine learning techniques, such as reinforcement learning, to aircraft maneuvering have shown promise. These studies used simplified interfaces between machine learning agent and the aircraft simulation. The simulations employed low order equivalent system models. High-fidelity aircraft simulations exist, such as the simulations developed by NASA at its Dryden Flight Research Center. To expand the applicational domain of reinforcement learning to aircraft designs, this study presents a series of experiments that examine a reinforcement learning agent in the role of test pilot. The NASA X-31 and F-106 high-fidelity simulations provide realistic aircraft for the agent to maneuver. The approach of the study is to examine an agent possessing a genetic-based, artificial neural network to approximate long-term, expected cost (Bellman value) in a basic maneuvering task. The experiments evaluate different learning methods based on a common feedback function and an identical task. The learning methods evaluated are: Q-learning, Q(lambda)-learning, SARSA learning, and SARSA(lambda) learning. Experimental results indicate that, while prediction error remain quite high, similar, repeatable behaviors occur in both aircraft. Similar behavior exhibits portability of the agent between aircraft with different handling qualities (dynamics). Besides the adaptive behavior aspects of the study, the genetic algorithm used in the agent is shown to play an additive role in the shaping of the artificial neural network to the prediction task.

  4. Extinction of specific stimulus-outcome (S-O) associations in Pavlovian learning with an extended CS procedure.

    PubMed

    Delamater, Andrew R; Schneider, Kevin; Derman, Rifka C

    2017-07-01

    Three experiments with male and female rats were conducted to examine the effects of Pavlovian extinction training on Pavlovian-to-instrumental transfer (PIT) in a task in which the unconditioned stimulus (US) was presented at an early time point within an extended conditioned stimulus (CS). Two instrumental responses were trained with different reinforcing outcomes (R1-O1, R2-O2) and then, independently, 2 stimuli were trained with those outcomes (S1-O1, S2-O2). One group then underwent an extinction treatment (S1-, S2-) and a second was merely exposed to the experimental contexts without any stimulus events. Finally, the effects of the 2 stimuli on instrumental responding were assessed in PIT tests. Across experiments we varied the number of Pavlovian training trials prior to extinction (8, 16, or 64 trials) and the length of time following extinction prior to test (i.e., 1 or 21 days, in a test for spontaneous recovery). We observed that outcome-specific PIT was reduced by extinction in all of our training conditions and that this extinction effect was durable, surviving a 3-week spontaneous recovery interval even though conditioned magazine approach spontaneously recovered over this interval. Although extinction reduced the magnitude of PIT, the temporal expression of PIT was mostly unaffected. We found these effects in both male and female rats, though in 1 study females were extinction-resistant. These data suggest that under the conditions studied here Pavlovian extinction may permanently weaken the ability of Pavlovian cues to retrieve a representation of their associated outcomes without impacting the temporal organization of responding. This suggests that different features of learning may be differentially sensitive to extinction. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  5. Temporal discounting and heart rate reactivity to stress.

    PubMed

    Diller, James W; Patros, Connor H G; Prentice, Paula R

    2011-07-01

    Temporal discounting is the reduction of the value of a reinforcer as a function of increasing delay to its presentation. Impulsive individuals discount delayed consequences more rapidly than self-controlled individuals, and impulsivity has been related to substance abuse, gambling, and other problem behaviors. A growing body of literature has identified biological correlates of impulsivity, though little research to date has examined relations between delay discounting and markers of poor health (e.g., cardiovascular reactivity to stress). We evaluated the relation between one aspect of impulsivity, measured using a computerized temporal discounting task, and heart rate reactivity, measured as a change in heart rate from rest during a serial subtraction task. A linear regression showed that individuals who were more reactive to stress responded more impulsively (i.e., discounted delayed reinforcers more rapidly). When results were stratified by gender, the effect was observed for females, but not for males. This finding supports previous research on gender differences in cardiovascular reactivity and suggests that this type of reactivity may be an important correlate of impulsive behavior. Copyright © 2011 Elsevier B.V. All rights reserved.

  6. Properties of behavior under different random ratio and random interval schedules: A parametric study.

    PubMed

    Dembo, M; De Penfold, J B; Ruiz, R; Casalta, H

    1985-03-01

    Four pigeons were trained to peck a key under different values of a temporally defined independent variable (T) and different probabilities of reinforcement (p). Parameter T is a fixed repeating time cycle and p the probability of reinforcement for the first response of each cycle T. Two dependent variables were used: mean response rate and mean postreinforcement pause. For all values of p a critical value for the independent variable T was found (T=1 sec) in which marked changes took place in response rate and postreinforcement pauses. Behavior typical of random ratio schedules was obtained at T 1 sec and behavior typical of random interval schedules at T 1 sec. Copyright © 1985. Published by Elsevier B.V.

  7. Different Levels of Food Restriction Reveal Genotype-Specific Differences in Learning a Visual Discrimination Task

    PubMed Central

    Makowiecki, Kalina; Hammond, Geoff; Rodger, Jennifer

    2012-01-01

    In behavioural experiments, motivation to learn can be achieved using food rewards as positive reinforcement in food-restricted animals. Previous studies reduce animal weights to 80–90% of free-feeding body weight as the criterion for food restriction. However, effects of different degrees of food restriction on task performance have not been assessed. We compared learning task performance in mice food-restricted to 80 or 90% body weight (BW). We used adult wildtype (WT; C57Bl/6j) and knockout (ephrin-A2−/−) mice, previously shown to have a reverse learning deficit. Mice were trained in a two-choice visual discrimination task with food reward as positive reinforcement. When mice reached criterion for one visual stimulus (80% correct in three consecutive 10 trial sets) they began the reverse learning phase, where the rewarded stimulus was switched to the previously incorrect stimulus. For the initial learning and reverse phase of the task, mice at 90%BW took almost twice as many trials to reach criterion as mice at 80%BW. Furthermore, WT 80 and 90%BW groups significantly differed in percentage correct responses and learning strategy in the reverse learning phase, whereas no differences between weight restriction groups were observed in ephrin-A2−/− mice. Most importantly, genotype-specific differences in reverse learning strategy were only detected in the 80%BW groups. Our results indicate that increased food restriction not only results in better performance and a shorter training period, but may also be necessary for revealing behavioural differences between experimental groups. This has important ethical and animal welfare implications when deciding extent of diet restriction in behavioural studies. PMID:23144936

  8. Different levels of food restriction reveal genotype-specific differences in learning a visual discrimination task.

    PubMed

    Makowiecki, Kalina; Hammond, Geoff; Rodger, Jennifer

    2012-01-01

    In behavioural experiments, motivation to learn can be achieved using food rewards as positive reinforcement in food-restricted animals. Previous studies reduce animal weights to 80-90% of free-feeding body weight as the criterion for food restriction. However, effects of different degrees of food restriction on task performance have not been assessed. We compared learning task performance in mice food-restricted to 80 or 90% body weight (BW). We used adult wildtype (WT; C57Bl/6j) and knockout (ephrin-A2⁻/⁻) mice, previously shown to have a reverse learning deficit. Mice were trained in a two-choice visual discrimination task with food reward as positive reinforcement. When mice reached criterion for one visual stimulus (80% correct in three consecutive 10 trial sets) they began the reverse learning phase, where the rewarded stimulus was switched to the previously incorrect stimulus. For the initial learning and reverse phase of the task, mice at 90%BW took almost twice as many trials to reach criterion as mice at 80%BW. Furthermore, WT 80 and 90%BW groups significantly differed in percentage correct responses and learning strategy in the reverse learning phase, whereas no differences between weight restriction groups were observed in ephrin-A2⁻/⁻ mice. Most importantly, genotype-specific differences in reverse learning strategy were only detected in the 80%BW groups. Our results indicate that increased food restriction not only results in better performance and a shorter training period, but may also be necessary for revealing behavioural differences between experimental groups. This has important ethical and animal welfare implications when deciding extent of diet restriction in behavioural studies.

  9. SSCC TD: A Serial and Simultaneous Configural-Cue Compound Stimuli Representation for Temporal Difference Learning

    PubMed Central

    Mondragón, Esther; Gray, Jonathan; Alonso, Eduardo; Bonardi, Charlotte; Jennings, Dómhnall J.

    2014-01-01

    This paper presents a novel representational framework for the Temporal Difference (TD) model of learning, which allows the computation of configural stimuli – cumulative compounds of stimuli that generate perceptual emergents known as configural cues. This Simultaneous and Serial Configural-cue Compound Stimuli Temporal Difference model (SSCC TD) can model both simultaneous and serial stimulus compounds, as well as compounds including the experimental context. This modification significantly broadens the range of phenomena which the TD paradigm can explain, and allows it to predict phenomena which traditional TD solutions cannot, particularly effects that depend on compound stimuli functioning as a whole, such as pattern learning and serial structural discriminations, and context-related effects. PMID:25054799

  10. Use of Inverse Reinforcement Learning for Identity Prediction

    NASA Technical Reports Server (NTRS)

    Hayes, Roy; Bao, Jonathan; Beling, Peter; Horowitz, Barry

    2011-01-01

    We adopt Markov Decision Processes (MDP) to model sequential decision problems, which have the characteristic that the current decision made by a human decision maker has an uncertain impact on future opportunity. We hypothesize that the individuality of decision makers can be modeled as differences in the reward function under a common MDP model. A machine learning technique, Inverse Reinforcement Learning (IRL), was used to learn an individual's reward function based on limited observation of his or her decision choices. This work serves as an initial investigation for using IRL to analyze decision making, conducted through a human experiment in a cyber shopping environment. Specifically, the ability to determine the demographic identity of users is conducted through prediction analysis and supervised learning. The results show that IRL can be used to correctly identify participants, at a rate of 68% for gender and 66% for one of three college major categories.

  11. Deficient reinforcement learning in medial frontal cortex as a model of dopamine-related motivational deficits in ADHD.

    PubMed

    Silvetti, Massimo; Wiersema, Jan R; Sonuga-Barke, Edmund; Verguts, Tom

    2013-10-01

    Attention Deficit/Hyperactivity Disorder (ADHD) is a pathophysiologically complex and heterogeneous condition with both cognitive and motivational components. We propose a novel computational hypothesis of motivational deficits in ADHD, drawing together recent evidence on the role of anterior cingulate cortex (ACC) and associated mesolimbic dopamine circuits in both reinforcement learning and ADHD. Based on findings of dopamine dysregulation and ACC involvement in ADHD we simulated a lesion in a previously validated computational model of ACC (Reward Value and Prediction Model, RVPM). We explored the effects of the lesion on the processing of reinforcement signals. We tested specific behavioral predictions about the profile of reinforcement-related deficits in ADHD in three experimental contexts; probability tracking task, partial and continuous reward schedules, and immediate versus delayed rewards. In addition, predictions were made at the neurophysiological level. Behavioral and neurophysiological predictions from the RVPM-based lesion-model of motivational dysfunction in ADHD were confirmed by data from previously published studies. RVPM represents a promising model of ADHD reinforcement learning suggesting that ACC dysregulation might play a role in the pathogenesis of motivational deficits in ADHD. However, more behavioral and neurophysiological studies are required to test core predictions of the model. In addition, the interaction with different brain networks underpinning other aspects of ADHD neuropathology (i.e., executive function) needs to be better understood. Copyright © 2013 Elsevier Ltd. All rights reserved.

  12. Characteristics of implicit chaining in cotton-top tamarins (Saguinus oedipus).

    PubMed

    Locurto, Charles; Gagne, Matthew; Nutile, Lauren

    2010-07-01

    In human cognition there has been considerable interest in observing the conditions under which subjects learn material without explicit instructions to learn. In the present experiments, we adapted this issue to nonhumans by asking what subjects learn in the absence of explicit reinforcement for correct responses. Two experiments examined the acquisition of sequence information by cotton-top tamarins (Saguinus oedipus) when such learning was not demanded by the experimental contingencies. An implicit chaining procedure was used in which visual stimuli were presented serially on a touchscreen. Subjects were required to touch one stimulus to advance to the next stimulus. Stimulus presentations followed a pattern, but learning the pattern was not necessary for reinforcement. In Experiment 1 the chain consisted of five different visual stimuli that were presented in the same order on each trial. Each stimulus could occur at any one of six touchscreen positions. In Experiment 2 the same visual element was presented serially in the same five locations on each trial, thereby allowing a behavioral pattern to be correlated with the visual pattern. In this experiment two new tests, a Wild-Card test and a Running-Start test, were used to assess what was learned in this procedure. Results from both experiments indicated that tamarins acquired more information from an implicit chain than was required by the contingencies of reinforcement. These results contribute to the developing literature on nonhuman analogs of implicit learning.

  13. Developing PFC representations using reinforcement learning

    PubMed Central

    Reynolds, Jeremy R.; O'Reilly, Randall C.

    2009-01-01

    From both functional and biological considerations, it is widely believed that action production, planning, and goal-oriented behaviors supported by the frontal cortex are organized hierarchically (Fuster, 1990, Koechlin, Ody, & Kouneiher, 2003, & Miller, Galanter, & Pribram, 1960) However, the nature of the different levels of the hierarchy remains unclear, and little attention has been paid to the origins of such a hierarchy. We address these issues through biologically-inspired computational models that develop representations through reinforcement learning. We explore several different factors in these models that might plausibly give rise to a hierarchical organization of representations within the PFC, including an initial connectivity hierarchy within PFC, a hierarchical set of connections between PFC and subcortical structures controlling it, and differential synaptic plasticity schedules. Simulation results indicate that architectural constraints contribute to the segregation of different types of representations, and that this segregation facilitates learning. These findings are consistent with the idea that there is a functional hierarchy in PFC, as captured in our earlier computational models of PFC function and a growing body of empirical data. PMID:19591977

  14. Role of dopamine D2 receptors in human reinforcement learning.

    PubMed

    Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W

    2014-09-01

    Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well.

  15. Role of Dopamine D2 Receptors in Human Reinforcement Learning

    PubMed Central

    Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W

    2014-01-01

    Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well. PMID:24713613

  16. Reinforcement learning interfaces for biomedical database systems.

    PubMed

    Rudowsky, I; Kulyba, O; Kunin, M; Parsons, S; Raphan, T

    2006-01-01

    Studies of neural function that are carried out in different laboratories and that address different questions use a wide range of descriptors for data storage, depending on the laboratory and the individuals that input the data. A common approach to describe non-textual data that are referenced through a relational database is to use metadata descriptors. We have recently designed such a prototype system, but to maintain efficiency and a manageable metadata table, free formatted fields were designed as table entries. The database interface application utilizes an intelligent agent to improve integrity of operation. The purpose of this study was to investigate how reinforcement learning algorithms can assist the user in interacting with the database interface application that has been developed to improve the performance of the system.

  17. Reinforcement learning in computer vision

    NASA Astrophysics Data System (ADS)

    Bernstein, A. V.; Burnaev, E. V.

    2018-04-01

    Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computer vision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computer vision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computer vision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computer vision problems.

  18. Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms

    PubMed Central

    Daniel, Reka; Geana, Andra; Gershman, Samuel J.; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C.

    2015-01-01

    In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this “representation learning” process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the “curse of dimensionality” in reinforcement learning. PMID:26019331

  19. Effects of Intrinsic Motivation on Feedback Processing During Learning

    PubMed Central

    DePasque, Samantha; Tricomi, Elizabeth

    2015-01-01

    Learning commonly requires feedback about the consequences of one’s actions, which can drive learners to modify their behavior. Motivation may determine how sensitive an individual might be to such feedback, particularly in educational contexts where some students value academic achievement more than others. Thus, motivation for a task might influence the value placed on performance feedback and how effectively it is used to improve learning. To investigate the interplay between intrinsic motivation and feedback processing, we used functional magnetic resonance imaging (fMRI) during feedback-based learning before and after a novel manipulation based on motivational interviewing, a technique for enhancing treatment motivation in mental health settings. Because of its role in the reinforcement learning system, the striatum is situated to play a significant role in the modulation of learning based on motivation. Consistent with this idea, motivation levels during the task were associated with sensitivity to positive versus negative feedback in the striatum. Additionally, heightened motivation following a brief motivational interview was associated with increases in feedback sensitivity in the left medial temporal lobe. Our results suggest that motivation modulates neural responses to performance-related feedback, and furthermore that changes in motivation facilitates processing in areas that support learning and memory. PMID:26112370

  20. Network congestion control algorithm based on Actor-Critic reinforcement learning model

    NASA Astrophysics Data System (ADS)

    Xu, Tao; Gong, Lina; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen

    2018-04-01

    Aiming at the network congestion control problem, a congestion control algorithm based on Actor-Critic reinforcement learning model is designed. Through the genetic algorithm in the congestion control strategy, the network congestion problems can be better found and prevented. According to Actor-Critic reinforcement learning, the simulation experiment of network congestion control algorithm is designed. The simulation experiments verify that the AQM controller can predict the dynamic characteristics of the network system. Moreover, the learning strategy is adopted to optimize the network performance, and the dropping probability of packets is adaptively adjusted so as to improve the network performance and avoid congestion. Based on the above finding, it is concluded that the network congestion control algorithm based on Actor-Critic reinforcement learning model can effectively avoid the occurrence of TCP network congestion.

  1. From Recurrent Choice to Skill Learning: A Reinforcement-Learning Model

    ERIC Educational Resources Information Center

    Fu, Wai-Tat; Anderson, John R.

    2006-01-01

    The authors propose a reinforcement-learning mechanism as a model for recurrent choice and extend it to account for skill learning. The model was inspired by recent research in neurophysiological studies of the basal ganglia and provides an integrated explanation of recurrent choice behavior and skill learning. The behavior includes effects of…

  2. Effects of Dopamine Medication on Sequence Learning with Stochastic Feedback in Parkinson's Disease

    PubMed Central

    Seo, Moonsang; Beigi, Mazda; Jahanshahi, Marjan; Averbeck, Bruno B.

    2010-01-01

    A growing body of evidence suggests that the midbrain dopamine system plays a key role in reinforcement learning and disruption of the midbrain dopamine system in Parkinson's disease (PD) may lead to deficits on tasks that require learning from feedback. We examined how changes in dopamine levels (“ON” and “OFF” their dopamine medication) affect sequence learning from stochastic positive and negative feedback using Bayesian reinforcement learning models. We found deficits in sequence learning in patients with PD when they were “ON” and “OFF” medication relative to healthy controls, but smaller differences between patients “OFF” and “ON”. The deficits were mainly due to decreased learning from positive feedback, although across all participant groups learning was more strongly associated with positive than negative feedback in our task. The learning in our task is likely mediated by the relatively depleted dorsal striatum and not the relatively intact ventral striatum. Therefore, the changes we see in our task may be due to a strong loss of phasic dopamine signals in the dorsal striatum in PD. PMID:20740077

  3. Stochastic Reinforcement Benefits Skill Acquisition

    ERIC Educational Resources Information Center

    Dayan, Eran; Averbeck, Bruno B.; Richmond, Barry J.; Cohen, Leonardo G.

    2014-01-01

    Learning complex skills is driven by reinforcement, which facilitates both online within-session gains and retention of the acquired skills. Yet, in ecologically relevant situations, skills are often acquired when mapping between actions and rewarding outcomes is unknown to the learning agent, resulting in reinforcement schedules of a stochastic…

  4. Multi-agent Reinforcement Learning Model for Effective Action Selection

    NASA Astrophysics Data System (ADS)

    Youk, Sang Jo; Lee, Bong Keun

    Reinforcement learning is a sub area of machine learning concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward. In the case of multi-agent, especially, which state space and action space gets very enormous in compared to single agent, so it needs to take most effective measure available select the action strategy for effective reinforcement learning. This paper proposes a multi-agent reinforcement learning model based on fuzzy inference system in order to improve learning collect speed and select an effective action in multi-agent. This paper verifies an effective action select strategy through evaluation tests based on Robocop Keep away which is one of useful test-beds for multi-agent. Our proposed model can apply to evaluate efficiency of the various intelligent multi-agents and also can apply to strategy and tactics of robot soccer system.

  5. Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models

    PubMed Central

    Najnin, Shamima; Banerjee, Bonny

    2018-01-01

    Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The “novel words to novel objects” language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task. PMID:29441027

  6. Reinforcement learning improves behaviour from evaluative feedback

    NASA Astrophysics Data System (ADS)

    Littman, Michael L.

    2015-05-01

    Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.

  7. Reinforcement learning improves behaviour from evaluative feedback.

    PubMed

    Littman, Michael L

    2015-05-28

    Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.

  8. A Comparison of Emotional-Motivational (A-R-D Theory) Personality Characteristics in Learning Disabled, Normal Achieving, and High Achieving Children.

    ERIC Educational Resources Information Center

    Hufano, Linda D.

    The study examined emotional-motivational personality characteristics of 15 learning disabled, 15 normal achieving, and 15 high achieving students (grades 3-5). The study tested the hypothesis derived from the A-R-D (attitude-reinforcer-discriminative) theory of motivation that learning disabled (LD) children differ from normal and high achieving…

  9. Object-location training elicits an overlapping but temporally distinct transcriptional profile from contextual fear conditioning.

    PubMed

    Poplawski, Shane G; Schoch, Hannah; Wimmer, Mathieu; Hawk, Joshua D; Walsh, Jennifer L; Giese, Karl P; Abel, Ted

    2014-12-01

    Hippocampus-dependent learning is known to induce changes in gene expression, but information on gene expression differences between different learning paradigms that require the hippocampus is limited. The bulk of studies investigating RNA expression after learning use the contextual fear conditioning task, which couples a novel environment with a footshock. Although contextual fear conditioning has been useful in discovering gene targets, gene expression after spatial memory tasks has received less attention. In this study, we used the object-location memory task and studied gene expression at two time points after learning in a high-throughput manner using a microfluidic qPCR approach. We found that expression of the classic immediate-early genes changes after object-location training in a fashion similar to that observed after contextual fear conditioning. However, the temporal dynamics of gene expression are different between the two tasks, with object-location memory producing gene expression changes that last at least 2 hours. Our findings indicate that different training paradigms may give rise to distinct temporal dynamics of gene expression after learning. Copyright © 2014 Elsevier Inc. All rights reserved.

  10. Object-Location Training Elicits an Overlapping but Temporally Distinct Transcriptional Profile from Contextual Fear Conditioning

    PubMed Central

    Wimmer, Mathieu; Hawk, Joshua D.; Walsh, Jennifer L.; Giese, Karl P.; Abel, Ted

    2014-01-01

    Hippocampus-dependent learning is known to induce changes in gene expression, but information on gene expression differences between different learning paradigms that require the hippocampus is limited. The bulk of studies investigating RNA expression after learning use the contextual fear conditioning task, which couples a novel environment with a footshock. Although contextual fear conditioning has been useful in discovering gene targets, gene expression after spatial memory tasks has received less attention. In this study, we used the object-location memory task and studied gene expression at two time points after learning in a high-throughput manner using a microfluidic qPCR approach. We found that expression of the classic immediate-early genes changes after object-location training in a fashion similar to that observed after contextual fear conditioning. However, the temporal dynamics of gene expression are different between the two tasks, with object-location memory producing gene expression changes that last at least 2 hours. Our findings indicate that different training paradigms may give rise to distinct temporal dynamics of gene expression after learning. PMID:25242102

  11. Does Sensitivity to Magnitude Depend on the Temporal Distribution of Reinforcement?

    ERIC Educational Resources Information Center

    Grace, Randolph C.; Bragason, Orn

    2005-01-01

    Our research addressed the question of whether sensitivity to relative reinforcer magnitude in concurrent chains depends on the distribution of reinforcer delays when the terminal-link schedules are equal. In Experiment 1, 12 pigeons responded in a two-component procedure. In both components, the initial links were concurrent variable-interval 40…

  12. Cultural Complexity That Affects Young Children's Contemporary Growth, Change, and Learning.

    ERIC Educational Resources Information Center

    Hyun, Eunsook

    Based on the view that the group orientation to multicultural education reinforces group stereotyping and seldom allows acknowledgement of diverse children's unique capabilities and differences or helps children build self-identity while learning to appreciate others, this paper presents and discusses contemporary cultures of young children's…

  13. Quantum-Enhanced Machine Learning

    NASA Astrophysics Data System (ADS)

    Dunjko, Vedran; Taylor, Jacob M.; Briegel, Hans J.

    2016-09-01

    The emerging field of quantum machine learning has the potential to substantially aid in the problems and scope of artificial intelligence. This is only enhanced by recent successes in the field of classical machine learning. In this work we propose an approach for the systematic treatment of machine learning, from the perspective of quantum information. Our approach is general and covers all three main branches of machine learning: supervised, unsupervised, and reinforcement learning. While quantum improvements in supervised and unsupervised learning have been reported, reinforcement learning has received much less attention. Within our approach, we tackle the problem of quantum enhancements in reinforcement learning as well, and propose a systematic scheme for providing improvements. As an example, we show that quadratic improvements in learning efficiency, and exponential improvements in performance over limited time periods, can be obtained for a broad class of learning problems.

  14. Dissociable neural representations of reinforcement and belief prediction errors underlie strategic learning

    PubMed Central

    Zhu, Lusha; Mathewson, Kyle E.; Hsu, Ming

    2012-01-01

    Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents’ beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs. PMID:22307594

  15. Dissociable neural representations of reinforcement and belief prediction errors underlie strategic learning.

    PubMed

    Zhu, Lusha; Mathewson, Kyle E; Hsu, Ming

    2012-01-31

    Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents' beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs.

  16. Suppression of Striatal Prediction Errors by the Prefrontal Cortex in Placebo Hypoalgesia.

    PubMed

    Schenk, Lieven A; Sprenger, Christian; Onat, Selim; Colloca, Luana; Büchel, Christian

    2017-10-04

    Classical learning theories predict extinction after the discontinuation of reinforcement through prediction errors. However, placebo hypoalgesia, although mediated by associative learning, has been shown to be resistant to extinction. We tested the hypothesis that this is mediated by the suppression of prediction error processing through the prefrontal cortex (PFC). We compared pain modulation through treatment cues (placebo hypoalgesia, treatment context) with pain modulation through stimulus intensity cues (stimulus context) during functional magnetic resonance imaging in 48 male and female healthy volunteers. During acquisition, our data show that expectations are correctly learned and that this is associated with prediction error signals in the ventral striatum (VS) in both contexts. However, in the nonreinforced test phase, pain modulation and expectations of pain relief persisted to a larger degree in the treatment context, indicating that the expectations were not correctly updated in the treatment context. Consistently, we observed significantly stronger neural prediction error signals in the VS in the stimulus context compared with the treatment context. A connectivity analysis revealed negative coupling between the anterior PFC and the VS in the treatment context, suggesting that the PFC can suppress the expression of prediction errors in the VS. Consistent with this, a participant's conceptual views and beliefs about treatments influenced the pain modulation only in the treatment context. Our results indicate that in placebo hypoalgesia contextual treatment information engages prefrontal conceptual processes, which can suppress prediction error processing in the VS and lead to reduced updating of treatment expectancies, resulting in less extinction of placebo hypoalgesia. SIGNIFICANCE STATEMENT In aversive and appetitive reinforcement learning, learned effects show extinction when reinforcement is discontinued. This is thought to be mediated by prediction errors (i.e., the difference between expectations and outcome). Although reinforcement learning has been central in explaining placebo hypoalgesia, placebo hypoalgesic effects show little extinction and persist after the discontinuation of reinforcement. Our results support the idea that conceptual treatment beliefs bias the neural processing of expectations in a treatment context compared with a more stimulus-driven processing of expectations with stimulus intensity cues. We provide evidence that this is associated with the suppression of prediction error processing in the ventral striatum by the prefrontal cortex. This provides a neural basis for persisting effects in reinforcement learning and placebo hypoalgesia. Copyright © 2017 the authors 0270-6474/17/379715-09$15.00/0.

  17. The drift diffusion model as the choice rule in reinforcement learning.

    PubMed

    Pedersen, Mads Lund; Frank, Michael J; Biele, Guido

    2017-08-01

    Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.

  18. The drift diffusion model as the choice rule in reinforcement learning

    PubMed Central

    Frank, Michael J.

    2017-01-01

    Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyper-activity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups. PMID:27966103

  19. An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning

    PubMed Central

    Potjans, Wiebke; Diesmann, Markus; Morrison, Abigail

    2011-01-01

    An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards. PMID:21589888

  20. General functioning predicts reward and punishment learning in schizophrenia.

    PubMed

    Somlai, Zsuzsanna; Moustafa, Ahmed A; Kéri, Szabolcs; Myers, Catherine E; Gluck, Mark A

    2011-04-01

    Previous studies investigating feedback-driven reinforcement learning in patients with schizophrenia have provided mixed results. In this study, we explored the clinical predictors of reward and punishment learning using a probabilistic classification learning task. Patients with schizophrenia (n=40) performed similarly to healthy controls (n=30) on the classification learning task. However, more severe negative and general symptoms were associated with lower reward-learning performance, whereas poorer general psychosocial functioning was correlated with both lower reward- and punishment-learning performances. Multiple linear regression analyses indicated that general psychosocial functioning was the only significant predictor of reinforcement learning performance when education, antipsychotic dose, and positive, negative and general symptoms were included in the analysis. These results suggest a close relationship between reinforcement learning and general psychosocial functioning in schizophrenia. Published by Elsevier B.V.

  1. Agent-based traffic management and reinforcement learning in congested intersection network.

    DOT National Transportation Integrated Search

    2012-08-01

    This study evaluates the performance of traffic control systems based on reinforcement learning (RL), also called approximate dynamic programming (ADP). Two algorithms have been selected for testing: 1) Q-learning and 2) approximate dynamic programmi...

  2. Dissociable Learning Processes Underlie Human Pain Conditioning

    PubMed Central

    Zhang, Suyi; Mano, Hiroaki; Ganesh, Gowrishankar; Robbins, Trevor; Seymour, Ben

    2016-01-01

    Summary Pavlovian conditioning underlies many aspects of pain behavior, including fear and threat detection [1], escape and avoidance learning [2], and endogenous analgesia [3]. Although a central role for the amygdala is well established [4], both human and animal studies implicate other brain regions in learning, notably ventral striatum and cerebellum [5]. It remains unclear whether these regions make different contributions to a single aversive learning process or represent independent learning mechanisms that interact to generate the expression of pain-related behavior. We designed a human parallel aversive conditioning paradigm in which different Pavlovian visual cues probabilistically predicted thermal pain primarily to either the left or right arm and studied the acquisition of conditioned Pavlovian responses using combined physiological recordings and fMRI. Using computational modeling based on reinforcement learning theory, we found that conditioning involves two distinct types of learning process. First, a non-specific “preparatory” system learns aversive facial expressions and autonomic responses such as skin conductance. The associated learning signals—the learned associability and prediction error—were correlated with fMRI brain responses in amygdala-striatal regions, corresponding to the classic aversive (fear) learning circuit. Second, a specific lateralized system learns “consummatory” limb-withdrawal responses, detectable with electromyography of the arm to which pain is predicted. Its related learned associability was correlated with responses in ipsilateral cerebellar cortex, suggesting a novel computational role for the cerebellum in pain. In conclusion, our results show that the overall phenotype of conditioned pain behavior depends on two dissociable reinforcement learning circuits. PMID:26711494

  3. Exploring Temporal Sequences of Regulatory Phases and Associated Interactions in Low- and High-Challenge Collaborative Learning Sessions

    ERIC Educational Resources Information Center

    Sobocinski, Márta; Malmberg, Jonna; Järvelä, Sanna

    2017-01-01

    Investigating the temporal order of regulatory processes can explain in more detail the mechanisms behind success or lack of success during collaborative learning. The aim of this study is to explore the differences between high- and low-challenge collaborative learning sessions. This is achieved through examining how the three phases of…

  4. Spatio-temporal alignment of multiple sensors

    NASA Astrophysics Data System (ADS)

    Zhang, Tinghua; Ni, Guoqiang; Fan, Guihua; Sun, Huayan; Yang, Biao

    2018-01-01

    Aiming to achieve the spatio-temporal alignment of multi sensor on the same platform for space target observation, a joint spatio-temporal alignment method is proposed. To calibrate the parameters and measure the attitude of cameras, an astronomical calibration method is proposed based on star chart simulation and collinear invariant features of quadrilateral diagonal between the observed star chart. In order to satisfy a temporal correspondence and spatial alignment similarity simultaneously, the method based on the astronomical calibration and attitude measurement in this paper formulates the video alignment to fold the spatial and temporal alignment into a joint alignment framework. The advantage of this method is reinforced by exploiting the similarities and prior knowledge of velocity vector field between adjacent frames, which is calculated by the SIFT Flow algorithm. The proposed method provides the highest spatio-temporal alignment accuracy compared to the state-of-the-art methods on sequences recorded from multi sensor at different times.

  5. Statistical learning: a powerful mechanism that operates by mere exposure.

    PubMed

    Aslin, Richard N

    2017-01-01

    How do infants learn so rapidly and with little apparent effort? In 1996, Saffran, Aslin, and Newport reported that 8-month-old human infants could learn the underlying temporal structure of a stream of speech syllables after only 2 min of passive listening. This demonstration of what was called statistical learning, involving no instruction, reinforcement, or feedback, led to dozens of confirmations of this powerful mechanism of implicit learning in a variety of modalities, domains, and species. These findings reveal that infants are not nearly as dependent on explicit forms of instruction as we might have assumed from studies of learning in which children or adults are taught facts such as math or problem solving skills. Instead, at least in some domains, infants soak up the information around them by mere exposure. Learning and development in these domains thus appear to occur automatically and with little active involvement by an instructor (parent or teacher). The details of this statistical learning mechanism are discussed, including how exposure to specific types of information can, under some circumstances, generalize to never-before-observed information, thereby enabling transfer of learning. WIREs Cogn Sci 2017, 8:e1373. doi: 10.1002/wcs.1373 For further resources related to this article, please visit the WIREs website. © 2016 Wiley Periodicals, Inc.

  6. Neural Basis of Strategic Decision Making

    PubMed Central

    Lee, Daeyeol; Seo, Hyojung

    2015-01-01

    Human choice behaviors during social interactions often deviate from the predictions of game theory. This might arise partly from the limitations in cognitive abilities necessary for recursive reasoning about the behaviors of others. In addition, during iterative social interactions, choices might change dynamically, as knowledge about the intentions of others and estimates for choice outcomes are incrementally updated via reinforcement learning. Some of the brain circuits utilized during social decision making might be general-purpose and contribute to isomorphic individual and social decision making. By contrast, regions in the medial prefrontal cortex and temporal parietal junction might be recruited for cognitive processes unique to social decision making. PMID:26688301

  7. Posterior parietal cortex is critical for the encoding, consolidation, and retrieval of a memory that guides attention for learning.

    PubMed

    Schiffino, Felipe L; Zhou, Vivian; Holland, Peter C

    2014-02-01

    Within most contemporary learning theories, reinforcement prediction error, the difference between the obtained and expected reinforcer value, critically influences associative learning. In some theories, this prediction error determines the momentary effectiveness of the reinforcer itself, such that the same physical event produces more learning when its presentation is surprising than when it is expected. In other theories, prediction error enhances attention to potential cues for that reinforcer by adjusting cue-specific associability parameters, biasing the processing of those stimuli so that they more readily enter into new associations in the future. A unique feature of these latter theories is that such alterations in stimulus associability must be represented in memory in an enduring fashion. Indeed, considerable data indicate that altered associability may be expressed days after its induction. Previous research from our laboratory identified brain circuit elements critical to the enhancement of stimulus associability by the omission of an expected event, and to the subsequent expression of that altered associability in more rapid learning. Here, for the first time, we identified a brain region, the posterior parietal cortex, as a potential site for a memorial representation of altered stimulus associability. In three experiments using rats and a serial prediction task, we found that intact posterior parietal cortex function was essential during the encoding, consolidation, and retrieval of an associability memory enhanced by surprising omissions. We discuss these new results in the context of our previous findings and additional plausible frontoparietal and subcortical networks. © 2013 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.

  8. Neurocomputational mechanisms of prosocial learning and links to empathy.

    PubMed

    Lockwood, Patricia L; Apps, Matthew A J; Valton, Vincent; Viding, Essi; Roiser, Jonathan P

    2016-08-30

    Reinforcement learning theory powerfully characterizes how we learn to benefit ourselves. In this theory, prediction errors-the difference between a predicted and actual outcome of a choice-drive learning. However, we do not operate in a social vacuum. To behave prosocially we must learn the consequences of our actions for other people. Empathy, the ability to vicariously experience and understand the affect of others, is hypothesized to be a critical facilitator of prosocial behaviors, but the link between empathy and prosocial behavior is still unclear. During functional magnetic resonance imaging (fMRI) participants chose between different stimuli that were probabilistically associated with rewards for themselves (self), another person (prosocial), or no one (control). Using computational modeling, we show that people can learn to obtain rewards for others but do so more slowly than when learning to obtain rewards for themselves. fMRI revealed that activity in a posterior portion of the subgenual anterior cingulate cortex/basal forebrain (sgACC) drives learning only when we are acting in a prosocial context and signals a prosocial prediction error conforming to classical principles of reinforcement learning theory. However, there is also substantial variability in the neural and behavioral efficiency of prosocial learning, which is predicted by trait empathy. More empathic people learn more quickly when benefitting others, and their sgACC response is the most selective for prosocial learning. We thus reveal a computational mechanism driving prosocial learning in humans. This framework could provide insights into atypical prosocial behavior in those with disorders of social cognition.

  9. Punishment and psychopathy: a case-control functional MRI investigation of reinforcement learning in violent antisocial personality disordered men.

    PubMed

    Gregory, Sarah; Blair, R James; Ffytche, Dominic; Simmons, Andrew; Kumari, Veena; Hodgins, Sheilagh; Blackwood, Nigel

    2015-02-01

    Men with antisocial personality disorder show lifelong abnormalities in adaptive decision making guided by the weighing up of reward and punishment information. Among men with antisocial personality disorder, modification of the behaviour of those with additional diagnoses of psychopathy seems particularly resistant to punishment. We did a case-control functional MRI (fMRI) study in 50 men, of whom 12 were violent offenders with antisocial personality disorder and psychopathy, 20 were violent offenders with antisocial personality disorder but not psychopathy, and 18 were healthy non-offenders. We used fMRI to measure brain activation associated with the representation of punishment or reward information during an event-related probabilistic response-reversal task, assessed with standard general linear-model-based analysis. Offenders with antisocial personality disorder and psychopathy displayed discrete regions of increased activation in the posterior cingulate cortex and anterior insula in response to punished errors during the task reversal phase, and decreased activation to all correct rewarded responses in the superior temporal cortex. This finding was in contrast to results for offenders without psychopathy and healthy non-offenders. Punishment prediction error signalling in offenders with antisocial personality disorder and psychopathy was highly atypical. This finding challenges the widely held view that such men are simply characterised by diminished neural sensitivity to punishment. Instead, this finding indicates altered organisation of the information-processing system responsible for reinforcement learning and appropriate decision making. This difference between violent offenders with antisocial personality disorder with and without psychopathy has implications for the causes of these disorders and for treatment approaches. National Forensic Mental Health Research and Development Programme, UK Ministry of Justice, Psychiatry Research Trust, NIHR Biomedical Research Centre. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Place preference and vocal learning rely on distinct reinforcers in songbirds.

    PubMed

    Murdoch, Don; Chen, Ruidong; Goldberg, Jesse H

    2018-04-30

    In reinforcement learning (RL) agents are typically tasked with maximizing a single objective function such as reward. But it remains poorly understood how agents might pursue distinct objectives at once. In machines, multiobjective RL can be achieved by dividing a single agent into multiple sub-agents, each of which is shaped by agent-specific reinforcement, but it remains unknown if animals adopt this strategy. Here we use songbirds to test if navigation and singing, two behaviors with distinct objectives, can be differentially reinforced. We demonstrate that strobe flashes aversively condition place preference but not song syllables. Brief noise bursts aversively condition song syllables but positively reinforce place preference. Thus distinct behavior-generating systems, or agencies, within a single animal can be shaped by correspondingly distinct reinforcement signals. Our findings suggest that spatially segregated vocal circuits can solve a credit assignment problem associated with multiobjective learning.

  11. Observing Responses and Serial Stimuli: Searching for the Reinforcing Properties of the S-

    ERIC Educational Resources Information Center

    Escobar, Rogelio; Bruner, Carlos A.

    2009-01-01

    The control exerted by a stimulus associated with an extinction component (S-) on observing responses was determined as a function of its temporal relation with the onset of the reinforcement component (S+). Lever pressing by rats was reinforced on a mixed random-interval extinction schedule. Each press on a second lever produced stimuli…

  12. Dopamine-Dependent Reinforcement of Motor Skill Learning: Evidence from Gilles de la Tourette Syndrome

    ERIC Educational Resources Information Center

    Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias

    2011-01-01

    Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…

  13. Machine Learning Control For Highly Reconfigurable High-Order Systems

    DTIC Science & Technology

    2015-01-02

    develop and flight test a Reinforcement Learning based approach for autonomous tracking of ground targets using a fixed wing Unmanned...Reinforcement Learning - based algorithms are developed for learning agents’ time dependent dynamics while also learning to control them. Three algorithms...to a wide range of engineering- based problems . Implementation of these solutions, however, is often complicated by the hysteretic, non-linear,

  14. Multi-Objective Reinforcement Learning for Cognitive Radio-Based Satellite Communications

    NASA Technical Reports Server (NTRS)

    Ferreira, Paulo Victor R.; Paffenroth, Randy; Wyglinski, Alexander M.; Hackett, Timothy M.; Bilen, Sven G.; Reinhart, Richard C.; Mortensen, Dale J.

    2016-01-01

    Previous research on cognitive radios has addressed the performance of various machine-learning and optimization techniques for decision making of terrestrial link properties. In this paper, we present our recent investigations with respect to reinforcement learning that potentially can be employed by future cognitive radios installed onboard satellite communications systems specifically tasked with radio resource management. This work analyzes the performance of learning, reasoning, and decision making while considering multiple objectives for time-varying communications channels, as well as different cross-layer requirements. Based on the urgent demand for increased bandwidth, which is being addressed by the next generation of high-throughput satellites, the performance of cognitive radio is assessed considering links between a geostationary satellite and a fixed ground station operating at Ka-band (26 GHz). Simulation results show multiple objective performance improvements of more than 3.5 times for clear sky conditions and 6.8 times for rain conditions.

  15. Applying reinforcement learning techniques to detect hepatocellular carcinoma under limited screening capacity.

    PubMed

    Lee, Elliot; Lavieri, Mariel S; Volk, Michael L; Xu, Yongcai

    2015-09-01

    We investigate the problem faced by a healthcare system wishing to allocate its constrained screening resources across a population at risk for developing a disease. A patient's risk of developing the disease depends on his/her biomedical dynamics. However, knowledge of these dynamics must be learned by the system over time. Three classes of reinforcement learning policies are designed to address this problem of simultaneously gathering and utilizing information across multiple patients. We investigate a case study based upon the screening for Hepatocellular Carcinoma (HCC), and optimize each of the three classes of policies using the indifference zone method. A simulation is built to gauge the performance of these policies, and their performance is compared to current practice. We then demonstrate how the benefits of learning-based screening policies differ across various levels of resource scarcity and provide metrics of policy performance.

  16. Multi-Objective Reinforcement Learning for Cognitive Radio Based Satellite Communications

    NASA Technical Reports Server (NTRS)

    Ferreira, Paulo; Paffenroth, Randy; Wyglinski, Alexander; Hackett, Timothy; Bilen, Sven; Reinhart, Richard; Mortensen, Dale John

    2016-01-01

    Previous research on cognitive radios has addressed the performance of various machine learning and optimization techniques for decision making of terrestrial link properties. In this paper, we present our recent investigations with respect to reinforcement learning that potentially can be employed by future cognitive radios installed onboard satellite communications systems specifically tasked with radio resource management. This work analyzes the performance of learning, reasoning, and decision making while considering multiple objectives for time-varying communications channels, as well as different crosslayer requirements. Based on the urgent demand for increased bandwidth, which is being addressed by the next generation of high-throughput satellites, the performance of cognitive radio is assessed considering links between a geostationary satellite and a fixed ground station operating at Ka-band (26 GHz). Simulation results show multiple objective performance improvements of more than 3:5 times for clear sky conditions and 6:8 times for rain conditions.

  17. Reinforcement and inference in cross-situational word learning.

    PubMed

    Tilles, Paulo F C; Fontanari, José F

    2013-01-01

    Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation) inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only) reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.

  18. The limits and motivating potential of sensory stimuli as reinforcers for autistic children.

    PubMed

    Ferrari, M; Harris, S L

    1981-01-01

    This study investigated the reinforcing properties, limits, and motivating potentials of sensory stimuli with autistic children. In the first phase of the study, four intellectually retarded autistic children were exposed to three different types of sensory stimulation (vibration, music, and strobe light) as well as edible and social reinforcers for ten-second intervals contingent upon six simple bar pressing responses. In the second phase, the same events were used as reinforcers for correct responses in learning object labels. The results indicated that: (a) sensory stimuli can be used effectively as reinforcers to maintain high, durable rates of responding in a simple pressing task; (b) ranked preferences for sensory stimuli revealed a unique configuration of responding for each child; and (c) sensory stimuli have motivating potentials comparable to those of the traditional food and social reinforcers even when training receptive language tasks.

  19. Use of Frontal Lobe Hemodynamics as Reinforcement Signals to an Adaptive Controller

    PubMed Central

    DiStasio, Marcello M.; Francis, Joseph T.

    2013-01-01

    Decision-making ability in the frontal lobe (among other brain structures) relies on the assignment of value to states of the animal and its environment. Then higher valued states can be pursued and lower (or negative) valued states avoided. The same principle forms the basis for computational reinforcement learning controllers, which have been fruitfully applied both as models of value estimation in the brain, and as artificial controllers in their own right. This work shows how state desirability signals decoded from frontal lobe hemodynamics, as measured with near-infrared spectroscopy (NIRS), can be applied as reinforcers to an adaptable artificial learning agent in order to guide its acquisition of skills. A set of experiments carried out on an alert macaque demonstrate that both oxy- and deoxyhemoglobin concentrations in the frontal lobe show differences in response to both primarily and secondarily desirable (versus undesirable) stimuli. This difference allows a NIRS signal classifier to serve successfully as a reinforcer for an adaptive controller performing a virtual tool-retrieval task. The agent's adaptability allows its performance to exceed the limits of the NIRS classifier decoding accuracy. We also show that decoding state desirabilities is more accurate when using relative concentrations of both oxyhemoglobin and deoxyhemoglobin, rather than either species alone. PMID:23894500

  20. Explicit and implicit reinforcement learning across the psychosis spectrum.

    PubMed

    Barch, Deanna M; Carter, Cameron S; Gold, James M; Johnson, Sheri L; Kring, Ann M; MacDonald, Angus W; Pizzagalli, Diego A; Ragland, J Daniel; Silverstein, Steven M; Strauss, Milton E

    2017-07-01

    Motivational and hedonic impairments are core features of a variety of types of psychopathology. An important aspect of motivational function is reinforcement learning (RL), including implicit (i.e., outside of conscious awareness) and explicit (i.e., including explicit representations about potential reward associations) learning, as well as both positive reinforcement (learning about actions that lead to reward) and punishment (learning to avoid actions that lead to loss). Here we present data from paradigms designed to assess both positive and negative components of both implicit and explicit RL, examine performance on each of these tasks among individuals with schizophrenia, schizoaffective disorder, and bipolar disorder with psychosis, and examine their relative relationships to specific symptom domains transdiagnostically. None of the diagnostic groups differed significantly from controls on the implicit RL tasks in either bias toward a rewarded response or bias away from a punished response. However, on the explicit RL task, both the individuals with schizophrenia and schizoaffective disorder performed significantly worse than controls, but the individuals with bipolar did not. Worse performance on the explicit RL task, but not the implicit RL task, was related to worse motivation and pleasure symptoms across all diagnostic categories. Performance on explicit RL, but not implicit RL, was related to working memory, which accounted for some of the diagnostic group differences. However, working memory did not account for the relationship of explicit RL to motivation and pleasure symptoms. These findings suggest transdiagnostic relationships across the spectrum of psychotic disorders between motivation and pleasure impairments and explicit RL. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  1. Linking Individual Learning Styles to Approach-Avoidance Motivational Traits and Computational Aspects of Reinforcement Learning

    PubMed Central

    Carl Aberg, Kristoffer; Doell, Kimberly C.; Schwartz, Sophie

    2016-01-01

    Learning how to gain rewards (approach learning) and avoid punishments (avoidance learning) is fundamental for everyday life. While individual differences in approach and avoidance learning styles have been related to genetics and aging, the contribution of personality factors, such as traits, remains undetermined. Moreover, little is known about the computational mechanisms mediating differences in learning styles. Here, we used a probabilistic selection task with positive and negative feedbacks, in combination with computational modelling, to show that individuals displaying better approach (vs. avoidance) learning scored higher on measures of approach (vs. avoidance) trait motivation, but, paradoxically, also displayed reduced learning speed following positive (vs. negative) outcomes. These data suggest that learning different types of information depend on associated reward values and internal motivational drives, possibly determined by personality traits. PMID:27851807

  2. Impact of identifying factors which trigger bothersome tinnitus on the treatment outcome in tinnitus retraining therapy.

    PubMed

    Molini, Egisto; Faralli, Mario; Calzolaro, Lucia; Ricci, Giampietro

    2014-01-01

    The aim of this work was to ascertain any differences in the effectiveness of rehabilitation therapy in relation to the presence or absence of a known negative reinforcement responsible for the tinnitus-related pathology. Between 1 January 2001 and 31 December 2008, we recruited 294 subjects suffering from incapacitating tinnitus and/or hyperacusis. The patients underwent tinnitus retraining therapy (TRT) according to the methods described by Jastreboff and Hazell [Tinnitus Retraining Therapy: Implementing the Neurophysiological Model. Cambridge, Cambridge University Press, 2004, pp 121-133]. We clinically assessed the presence or absence of known phenomena of associative learning, regarding the presence of adverse events temporally correlated with tinnitus and the treatment outcome. The separate analysis of the 2 subgroups shows a statistically significant difference in the improvement rate between the group with a known triggering factor and the group without a triggering factor, with a preponderance of the former with a 91% improvement rate versus approximately 56% for the latter. In our study, the inability to identify factors triggering bothersome tinnitus negatively affected the treatment outcome in TRT. © 2014 S. Karger AG, Basel.

  3. Disrupted reinforcement learning and maladaptive behavior in women with a history of childhood sexual abuse: a high-density event-related potential study.

    PubMed

    Pechtel, Pia; Pizzagalli, Diego A

    2013-05-01

    Childhood sexual abuse (CSA) has been associated with psychopathology, particularly major depressive disorder (MDD), and high-risk behaviors. Despite the epidemiological data available, the mechanisms underlying these maladaptive outcomes remain poorly understood. We examined whether a history of CSA, particularly in conjunction with a past episode of MDD, is associated with behavioral and neural dysfunction in reinforcement learning, and whether such dysfunction is linked to maladaptive behavior. Participants completed a clinical evaluation and a probabilistic reinforcement task while 128-channel event-related potentials were recorded. Academic setting; participants recruited from the community. Fifteen women with a history of CSA and remitted MDD (CSA + rMDD), 16 women with remitted MDD with no history of CSA (rMDD), and 18 healthy women (controls). Three or more episodes of coerced sexual contact (mean [SD] duration, 3.00 [2.20] years) between the ages of 7 and 12 years by at least 1 male perpetrator. Participants' preference for choosing the most rewarded stimulus and avoiding the most punished stimulus was evaluated. The feedback-related negativity and error-related negativity-hypothesized to reflect activation in the anterior cingulate cortex-were used as electrophysiological indices of reinforcement learning. No group differences emerged in the acquisition of reinforcement contingencies. In trials requiring participants to rely partially or exclusively on previously rewarded information, the CSA + rMDD group showed (1) lower accuracy (relative to both controls and the rMDD group), (2) blunted electrophysiological differentiation between correct and incorrect responses (relative to controls), and (3) increased activation in the subgenual anterior cingulate cortex (relative to the rMDD group). A history of CSA was not associated with impairments in avoiding the most punished stimulus. Self-harm and suicidal behaviors correlated with poorer performance of previously rewarded, but not previously punished, trials. Irrespective of past MDD episodes, women with a history of CSA showed neural and behavioral deficits in utilizing previous reinforcement to optimize decision making in the absence of feedback (blunted "Go learning"). Although our study provides initial evidence for reward-specific deficits associated with CSA, future research is warranted to determine if disrupted positive reinforcement learning predicts high-risk behavior following CSA.

  4. Optimizing microstimulation using a reinforcement learning framework.

    PubMed

    Brockmeier, Austin J; Choi, John S; Distasio, Marcello M; Francis, Joseph T; Príncipe, José C

    2011-01-01

    The ability to provide sensory feedback is desired to enhance the functionality of neuroprosthetics. Somatosensory feedback provides closed-loop control to the motor system, which is lacking in feedforward neuroprosthetics. In the case of existing somatosensory function, a template of the natural response can be used as a template of desired response elicited by electrical microstimulation. In the case of no initial training data, microstimulation parameters that produce responses close to the template must be selected in an online manner. We propose using reinforcement learning as a framework to balance the exploration of the parameter space and the continued selection of promising parameters for further stimulation. This approach avoids an explicit model of the neural response from stimulation. We explore a preliminary architecture--treating the task as a k-armed bandit--using offline data recorded for natural touch and thalamic microstimulation, and we examine the methods efficiency in exploring the parameter space while concentrating on promising parameter forms. The best matching stimulation parameters, from k = 68 different forms, are selected by the reinforcement learning algorithm consistently after 334 realizations.

  5. Evolution with Reinforcement Learning in Negotiation

    PubMed Central

    Zou, Yi; Zhan, Wenjie; Shao, Yuan

    2014-01-01

    Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm. PMID:25048108

  6. Evolution with reinforcement learning in negotiation.

    PubMed

    Zou, Yi; Zhan, Wenjie; Shao, Yuan

    2014-01-01

    Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm.

  7. Developmental Changes in Learning: Computational Mechanisms and Social Influences

    PubMed Central

    Bolenz, Florian; Reiter, Andrea M. F.; Eppinger, Ben

    2017-01-01

    Our ability to learn from the outcomes of our actions and to adapt our decisions accordingly changes over the course of the human lifespan. In recent years, there has been an increasing interest in using computational models to understand developmental changes in learning and decision-making. Moreover, extensions of these models are currently applied to study socio-emotional influences on learning in different age groups, a topic that is of great relevance for applications in education and health psychology. In this article, we aim to provide an introduction to basic ideas underlying computational models of reinforcement learning and focus on parameters and model variants that might be of interest to developmental scientists. We then highlight recent attempts to use reinforcement learning models to study the influence of social information on learning across development. The aim of this review is to illustrate how computational models can be applied in developmental science, what they can add to our understanding of developmental mechanisms and how they can be used to bridge the gap between psychological and neurobiological theories of development. PMID:29250006

  8. Learning tactile skills through curious exploration

    PubMed Central

    Pape, Leo; Oddo, Calogero M.; Controzzi, Marco; Cipriani, Christian; Förster, Alexander; Carrozza, Maria C.; Schmidhuber, Jürgen

    2012-01-01

    We present curiosity-driven, autonomous acquisition of tactile exploratory skills on a biomimetic robot finger equipped with an array of microelectromechanical touch sensors. Instead of building tailored algorithms for solving a specific tactile task, we employ a more general curiosity-driven reinforcement learning approach that autonomously learns a set of motor skills in absence of an explicit teacher signal. In this approach, the acquisition of skills is driven by the information content of the sensory input signals relative to a learner that aims at representing sensory inputs using fewer and fewer computational resources. We show that, from initially random exploration of its environment, the robotic system autonomously develops a small set of basic motor skills that lead to different kinds of tactile input. Next, the system learns how to exploit the learned motor skills to solve supervised texture classification tasks. Our approach demonstrates the feasibility of autonomous acquisition of tactile skills on physical robotic platforms through curiosity-driven reinforcement learning, overcomes typical difficulties of engineered solutions for active tactile exploration and underactuated control, and provides a basis for studying developmental learning through intrinsic motivation in robots. PMID:22837748

  9. Distributed reinforcement learning for adaptive and robust network intrusion response

    NASA Astrophysics Data System (ADS)

    Malialis, Kleanthis; Devlin, Sam; Kudenko, Daniel

    2015-07-01

    Distributed denial of service (DDoS) attacks constitute a rapidly evolving threat in the current Internet. Multiagent Router Throttling is a novel approach to defend against DDoS attacks where multiple reinforcement learning agents are installed on a set of routers and learn to rate-limit or throttle traffic towards a victim server. The focus of this paper is on online learning and scalability. We propose an approach that incorporates task decomposition, team rewards and a form of reward shaping called difference rewards. One of the novel characteristics of the proposed system is that it provides a decentralised coordinated response to the DDoS problem, thus being resilient to DDoS attacks themselves. The proposed system learns remarkably fast, thus being suitable for online learning. Furthermore, its scalability is successfully demonstrated in experiments involving 1000 learning agents. We compare our approach against a baseline and a popular state-of-the-art throttling technique from the network security literature and show that the proposed approach is more effective, adaptive to sophisticated attack rate dynamics and robust to agent failures.

  10. Overcoming Learned Helplessness in Community College Students.

    ERIC Educational Resources Information Center

    Roueche, John E.; Mink, Oscar G.

    1982-01-01

    Reviews research on the effects of repeated experiences of helplessness and on locus of control. Identifies conditions necessary for overcoming learned helplessness; i.e., the potential for learning to occur; consistent reinforcement; relevant, valued reinforcers; and favorable psychological situation. Recommends eight ways for teachers to…

  11. Dopamine D2 Receptor Signaling in the Nucleus Accumbens Comprises a Metabolic-Cognitive Brain Interface Regulating Metabolic Components of Glucose Reinforcement.

    PubMed

    Michaelides, Michael; Miller, Michael L; DiNieri, Jennifer A; Gomez, Juan L; Schwartz, Elizabeth; Egervari, Gabor; Wang, Gene Jack; Mobbs, Charles V; Volkow, Nora D; Hurd, Yasmin L

    2017-11-01

    Appetitive drive is influenced by coordinated interactions between brain circuits that regulate reinforcement and homeostatic signals that control metabolism. Glucose modulates striatal dopamine (DA) and regulates appetitive drive and reinforcement learning. Striatal DA D2 receptors (D2Rs) also regulate reinforcement learning and are implicated in glucose-related metabolic disorders. Nevertheless, interactions between striatal D2R and peripheral glucose have not been previously described. Here we show that manipulations involving striatal D2R signaling coincide with perseverative and impulsive-like responding for sucrose, a disaccharide consisting of fructose and glucose. Fructose conveys orosensory (ie, taste) reinforcement but does not convey metabolic (ie, nutrient-derived) reinforcement. Glucose however conveys orosensory reinforcement but unlike fructose, it is a major metabolic energy source, underlies sustained reinforcement, and activates striatal circuitry. We found that mice with deletion of dopamine- and cAMP-regulated neuronal phosphoprotein (DARPP-32) exclusively in D2R-expressing cells exhibited preferential D2R changes in the nucleus accumbens (NAc), a striatal region that critically regulates sucrose reinforcement. These changes coincided with perseverative and impulsive-like responding for sucrose pellets and sustained reinforcement learning of glucose-paired flavors. These mice were also characterized by significant glucose intolerance (ie, impaired glucose utilization). Systemic glucose administration significantly attenuated sucrose operant responding and D2R activation or blockade in the NAc bidirectionally modulated blood glucose levels and glucose tolerance. Collectively, these results implicate NAc D2R in regulating both peripheral glucose levels and glucose-dependent reinforcement learning behaviors and highlight the notion that glucose metabolic impairments arising from disrupted NAc D2R signaling are involved in compulsive and perseverative feeding behaviors.

  12. A junction-tree based learning algorithm to optimize network wide traffic control: A coordinated multi-agent framework

    DOE PAGES

    Zhu, Feng; Aziz, H. M. Abdul; Qian, Xinwu; ...

    2015-01-31

    Our study develops a novel reinforcement learning algorithm for the challenging coordinated signal control problem. Traffic signals are modeled as intelligent agents interacting with the stochastic traffic environment. The model is built on the framework of coordinated reinforcement learning. The Junction Tree Algorithm (JTA) based reinforcement learning is proposed to obtain an exact inference of the best joint actions for all the coordinated intersections. Moreover, the algorithm is implemented and tested with a network containing 18 signalized intersections in VISSIM. Finally, our results show that the JTA based algorithm outperforms independent learning (Q-learning), real-time adaptive learning, and fixed timing plansmore » in terms of average delay, number of stops, and vehicular emissions at the network level.« less

  13. Design issues for a reinforcement-based self-learning fuzzy controller

    NASA Technical Reports Server (NTRS)

    Yen, John; Wang, Haojin; Dauherity, Walter

    1993-01-01

    Fuzzy logic controllers have some often cited advantages over conventional techniques such as PID control: easy implementation, its accommodation to natural language, the ability to cover wider range of operating conditions and others. One major obstacle that hinders its broader application is the lack of a systematic way to develop and modify its rules and as result the creation and modification of fuzzy rules often depends on try-error or pure experimentation. One of the proposed approaches to address this issue is self-learning fuzzy logic controllers (SFLC) that use reinforcement learning techniques to learn the desirability of states and to adjust the consequent part of fuzzy control rules accordingly. Due to the different dynamics of the controlled processes, the performance of self-learning fuzzy controller is highly contingent on the design. The design issue has not received sufficient attention. The issues related to the design of a SFLC for the application to chemical process are discussed and its performance is compared with that of PID and self-tuning fuzzy logic controller.

  14. Applications of Deep Learning and Reinforcement Learning to Biological Data.

    PubMed

    Mahmud, Mufti; Kaiser, Mohammed Shamim; Hussain, Amir; Vassanelli, Stefano

    2018-06-01

    Rapid advances in hardware-based technologies during the past decades have opened up new possibilities for life scientists to gather multimodal data in various application domains, such as omics, bioimaging, medical imaging, and (brain/body)-machine interfaces. These have generated novel opportunities for development of dedicated data-intensive machine learning techniques. In particular, recent research in deep learning (DL), reinforcement learning (RL), and their combination (deep RL) promise to revolutionize the future of artificial intelligence. The growth in computational power accompanied by faster and increased data storage, and declining computing costs have already allowed scientists in various fields to apply these techniques on data sets that were previously intractable owing to their size and complexity. This paper provides a comprehensive survey on the application of DL, RL, and deep RL techniques in mining biological data. In addition, we compare the performances of DL techniques when applied to different data sets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.

  15. The Effects of Partial Reinforcement in the Acquisition and Extinction of Recurrent Serial Patterns.

    ERIC Educational Resources Information Center

    Dockstader, Steven L.

    The purpose of these 2 experiments was to determine whether sequential response pattern behavior is affected by partial reinforcement in the same way as other behavior systems. The first experiment investigated the partial reinforcement extinction effects (PREE) in a sequential concept learning task where subjects were required to learn a…

  16. Enhancing second-order conditioning with lesions of the basolateral amygdala.

    PubMed

    Holland, Peter C

    2016-04-01

    Because the occurrence of primary reinforcers in natural environments is relatively rare, conditioned reinforcement plays an important role in many accounts of behavior, including pathological behaviors such as the abuse of alcohol or drugs. As a result of pairing with natural or drug reinforcers, initially neutral cues acquire the ability to serve as reinforcers for subsequent learning. Accepting a major role for conditioned reinforcement in everyday learning is complicated by the often-evanescent nature of this phenomenon in the laboratory, especially when primary reinforcers are entirely absent from the test situation. Here, I found that under certain conditions, the impact of conditioned reinforcement could be extended by lesions of the basolateral amygdala (BLA). Rats received first-order Pavlovian conditioning pairings of 1 visual conditioned stimulus (CS) with food prior to receiving excitotoxic or sham lesions of the BLA, and first-order pairings of another visual CS with food after that surgery. Finally, each rat received second-order pairings of a different auditory cue with each visual first-order CS. As in prior studies, relative to sham-lesioned control rats, lesioned rats were impaired in their acquisition of second-order conditioning to the auditory cue paired with the first-order CS that was trained after surgery. However, lesioned rats showed enhanced and prolonged second-order conditioning to the auditory cue paired with the first-order CS that was trained before amygdala damage was made. Implications for an enhanced role for conditioned reinforcement by drug-related cues after drug-induced alterations in neural plasticity are discussed. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  17. Microstimulation of the Human Substantia Nigra Alters Reinforcement Learning

    PubMed Central

    Ramayya, Ashwin G.; Misra, Amrit

    2014-01-01

    Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action–reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action–reward associations rather than stimulus–reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action–reward associations during reinforcement learning. PMID:24828643

  18. An Investigation of Ways to Reduce the Failure Rate of Student Pilots during Flying Training in the Royal Australian Air Force.

    DTIC Science & Technology

    1987-09-01

    Luthans (28) expanded the concept of learning as follows: 1. Learning involves a change, though not necessarily an improvement, in behaviour. Learning...that results in an unpleasant outcome is not likely to be repeated (36:244). Luthans and Kreitner (27) described the various forms of reinforcement as...four 33 alternatives (defined previously on page 24 and taken from Luthans ) of positive reinforcement, negative reinforcement, extinction and punishment

  19. An Assessment of Fixed Interval Timing in Free-Flying Honey Bees (Apis mellifera ligustica): An Analysis of Individual Performance

    PubMed Central

    Craig, David Philip Arthur; Varnon, Christopher A.; Sokolowski, Michel B. C.; Wells, Harrington; Abramson, Charles I.

    2014-01-01

    Interval timing is a key element of foraging theory, models of predator avoidance, and competitive interactions. Although interval timing is well documented in vertebrate species, it is virtually unstudied in invertebrates. In the present experiment, we used free-flying honey bees (Apis mellifera ligustica) as a model for timing behaviors. Subjects were trained to enter a hole in an automated artificial flower to receive a nectar reinforcer (i.e. reward). Responses were continuously reinforced prior to exposure to either a fixed interval (FI) 15-sec, FI 30-sec, FI 60-sec, or FI 120-sec reinforcement schedule. We measured response rate and post-reinforcement pause within each fixed interval trial between reinforcers. Honey bees responded at higher frequencies earlier in the fixed interval suggesting subject responding did not come under traditional forms of temporal control. Response rates were lower during FI conditions compared to performance on continuous reinforcement schedules, and responding was more resistant to extinction when previously reinforced on FI schedules. However, no “scalloped” or “break-and-run” patterns of group or individual responses reinforced on FI schedules were observed; no traditional evidence of temporal control was found. Finally, longer FI schedules eventually caused all subjects to cease returning to the operant chamber indicating subjects did not tolerate the longer FI schedules. PMID:24983960

  20. Genetic reinforcement learning through symbiotic evolution for fuzzy controller design.

    PubMed

    Juang, C F; Lin, J Y; Lin, C T

    2000-01-01

    An efficient genetic reinforcement learning algorithm for designing fuzzy controllers is proposed in this paper. The genetic algorithm (GA) adopted in this paper is based upon symbiotic evolution which, when applied to fuzzy controller design, complements the local mapping property of a fuzzy rule. Using this Symbiotic-Evolution-based Fuzzy Controller (SEFC) design method, the number of control trials, as well as consumed CPU time, are considerably reduced when compared to traditional GA-based fuzzy controller design methods and other types of genetic reinforcement learning schemes. Moreover, unlike traditional fuzzy controllers, which partition the input space into a grid, SEFC partitions the input space in a flexible way, thus creating fewer fuzzy rules. In SEFC, different types of fuzzy rules whose consequent parts are singletons, fuzzy sets, or linear equations (TSK-type fuzzy rules) are allowed. Further, the free parameters (e.g., centers and widths of membership functions) and fuzzy rules are all tuned automatically. For the TSK-type fuzzy rule especially, which put the proposed learning algorithm in use, only the significant input variables are selected to participate in the consequent of a rule. The proposed SEFC design method has been applied to different simulated control problems, including the cart-pole balancing system, a magnetic levitation system, and a water bath temperature control system. The proposed SEFC has been verified to be efficient and superior from these control problems, and from comparisons with some traditional GA-based fuzzy systems.

  1. Frontal Theta Links Prediction Errors to Behavioral Adaptation in Reinforcement Learning

    PubMed Central

    Cavanagh, James F.; Frank, Michael J.; Klein, Theresa J.; Allen, John J.B.

    2009-01-01

    Investigations into action monitoring have consistently detailed a fronto-central voltage deflection in the Event-Related Potential (ERP) following the presentation of negatively valenced feedback, sometimes termed the Feedback Related Negativity (FRN). The FRN has been proposed to reflect a neural response to prediction errors during reinforcement learning, yet the single trial relationship between neural activity and the quanta of expectation violation remains untested. Although ERP methods are not well suited to single trial analyses, the FRN has been associated with theta band oscillatory perturbations in the medial prefrontal cortex. Medio-frontal theta oscillations have been previously associated with expectation violation and behavioral adaptation and are well suited to single trial analysis. Here, we recorded EEG activity during a probabilistic reinforcement learning task and fit the performance data to an abstract computational model (Q-learning) for calculation of single-trial reward prediction errors. Single-trial theta oscillatory activities following feedback were investigated within the context of expectation (prediction error) and adaptation (subsequent reaction time change). Results indicate that interactive medial and lateral frontal theta activities reflect the degree of negative and positive reward prediction error in the service of behavioral adaptation. These different brain areas use prediction error calculations for different behavioral adaptations: with medial frontal theta reflecting the utilization of prediction errors for reaction time slowing (specifically following errors), but lateral frontal theta reflecting prediction errors leading to working memory-related reaction time speeding for the correct choice. PMID:19969093

  2. Mastery Learning through Individualized Instruction: A Reinforcement Strategy

    ERIC Educational Resources Information Center

    Sagy, John; Ravi, R.; Ananthasayanam, R.

    2009-01-01

    The present study attempts to gauge the effect of individualized instructional methods as a reinforcement strategy for mastery learning. Among various individualized instructional methods, the study focuses on PIM (Programmed Instructional Method) and CAIM (Computer Assisted Instruction Method). Mastery learning is a process where students achieve…

  3. Working Memory and Reinforcement Schedule Jointly Determine Reinforcement Learning in Children: Potential Implications for Behavioral Parent Training

    PubMed Central

    Segers, Elien; Beckers, Tom; Geurts, Hilde; Claes, Laurence; Danckaerts, Marina; van der Oord, Saskia

    2018-01-01

    Introduction: Behavioral Parent Training (BPT) is often provided for childhood psychiatric disorders. These disorders have been shown to be associated with working memory impairments. BPT is based on operant learning principles, yet how operant principles shape behavior (through the partial reinforcement (PRF) extinction effect, i.e., greater resistance to extinction that is created when behavior is reinforced partially rather than continuously) and the potential role of working memory therein is scarcely studied in children. This study explored the PRF extinction effect and the role of working memory therein using experimental tasks in typically developing children. Methods: Ninety-seven children (age 6–10) completed a working memory task and an operant learning task, in which children acquired a response-sequence rule under either continuous or PRF (120 trials), followed by an extinction phase (80 trials). Data of 88 children were used for analysis. Results: The PRF extinction effect was confirmed: We observed slower acquisition and extinction in the PRF condition as compared to the continuous reinforcement (CRF) condition. Working memory was negatively related to acquisition but not extinction performance. Conclusion: Both reinforcement contingencies and working memory relate to acquisition performance. Potential implications for BPT are that decreasing working memory load may enhance the chance of optimally learning through reinforcement. PMID:29643822

  4. Gaze data reveal distinct choice processes underlying model-based and model-free reinforcement learning

    PubMed Central

    Konovalov, Arkady; Krajbich, Ian

    2016-01-01

    Organisms appear to learn and make decisions using different strategies known as model-free and model-based learning; the former is mere reinforcement of previously rewarded actions and the latter is a forward-looking strategy that involves evaluation of action-state transition probabilities. Prior work has used neural data to argue that both model-based and model-free learners implement a value comparison process at trial onset, but model-based learners assign more weight to forward-looking computations. Here using eye-tracking, we report evidence for a different interpretation of prior results: model-based subjects make their choices prior to trial onset. In contrast, model-free subjects tend to ignore model-based aspects of the task and instead seem to treat the decision problem as a simple comparison process between two differentially valued items, consistent with previous work on sequential-sampling models of decision making. These findings illustrate a problem with assuming that experimental subjects make their decisions at the same prescribed time. PMID:27511383

  5. The left hemisphere learns what is right: Hemispatial reward learning depends on reinforcement learning processes in the contralateral hemisphere.

    PubMed

    Aberg, Kristoffer Carl; Doell, Kimberly Crystal; Schwartz, Sophie

    2016-08-01

    Orienting biases refer to consistent, trait-like direction of attention or locomotion toward one side of space. Recent studies suggest that such hemispatial biases may determine how well people memorize information presented in the left or right hemifield. Moreover, lesion studies indicate that learning rewarded stimuli in one hemispace depends on the integrity of the contralateral striatum. However, the exact neural and computational mechanisms underlying the influence of individual orienting biases on reward learning remain unclear. Because reward-based behavioural adaptation depends on the dopaminergic system and prediction error (PE) encoding in the ventral striatum, we hypothesized that hemispheric asymmetries in dopamine (DA) function may determine individual spatial biases in reward learning. To test this prediction, we acquired fMRI in 33 healthy human participants while they performed a lateralized reward task. Learning differences between hemispaces were assessed by presenting stimuli, assigned to different reward probabilities, to the left or right of central fixation, i.e. presented in the left or right visual hemifield. Hemispheric differences in DA function were estimated through differential fMRI responses to positive vs. negative feedback in the left vs. right ventral striatum, and a computational approach was used to identify the neural correlates of PEs. Our results show that spatial biases favoring reward learning in the right (vs. left) hemifield were associated with increased reward responses in the left hemisphere and relatively better neural encoding of PEs for stimuli presented in the right (vs. left) hemifield. These findings demonstrate that trait-like spatial biases implicate hemisphere-specific learning mechanisms, with individual differences between hemispheres contributing to reinforcing spatial biases. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. Deletion of the δ opioid receptor gene impairs place conditioning but preserves morphine reinforcement.

    PubMed

    Le Merrer, Julie; Plaza-Zabala, Ainhoa; Del Boca, Carolina; Matifas, Audrey; Maldonado, Rafael; Kieffer, Brigitte L

    2011-04-01

    Converging experimental data indicate that δ opioid receptors contribute to mediate drug reinforcement processes. Whether their contribution reflects a role in the modulation of drug reward or an implication in conditioned learning, however, has not been explored. In the present study, we investigated the impact of δ receptor gene knockout on reinforced conditioned learning under several experimental paradigms. We assessed the ability of δ receptor knockout mice to form drug-context associations with either morphine (appetitive)- or lithium (aversive)-induced Pavlovian place conditioning. We also examined the efficiency of morphine to serve as a positive reinforcer in these mice and their motivation to gain drug injections, with operant intravenous self-administration under fixed and progressive ratio schedules and at two different doses. Mutant mice showed impaired place conditioning in both appetitive and aversive conditions, indicating disrupted context-drug association. In contrast, mutant animals displayed intact acquisition of morphine self-administration and reached breaking-points comparable to control subjects. Thus, reinforcing effects of morphine and motivation to obtain the drug were maintained. Collectively, the data suggest that δ receptor activity is not involved in morphine reinforcement but facilitates place conditioning. This study reveals a novel aspect of δ opioid receptor function in addiction-related behaviors. Copyright © 2011 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  7. Dissociable Learning Processes Underlie Human Pain Conditioning.

    PubMed

    Zhang, Suyi; Mano, Hiroaki; Ganesh, Gowrishankar; Robbins, Trevor; Seymour, Ben

    2016-01-11

    Pavlovian conditioning underlies many aspects of pain behavior, including fear and threat detection [1], escape and avoidance learning [2], and endogenous analgesia [3]. Although a central role for the amygdala is well established [4], both human and animal studies implicate other brain regions in learning, notably ventral striatum and cerebellum [5]. It remains unclear whether these regions make different contributions to a single aversive learning process or represent independent learning mechanisms that interact to generate the expression of pain-related behavior. We designed a human parallel aversive conditioning paradigm in which different Pavlovian visual cues probabilistically predicted thermal pain primarily to either the left or right arm and studied the acquisition of conditioned Pavlovian responses using combined physiological recordings and fMRI. Using computational modeling based on reinforcement learning theory, we found that conditioning involves two distinct types of learning process. First, a non-specific "preparatory" system learns aversive facial expressions and autonomic responses such as skin conductance. The associated learning signals-the learned associability and prediction error-were correlated with fMRI brain responses in amygdala-striatal regions, corresponding to the classic aversive (fear) learning circuit. Second, a specific lateralized system learns "consummatory" limb-withdrawal responses, detectable with electromyography of the arm to which pain is predicted. Its related learned associability was correlated with responses in ipsilateral cerebellar cortex, suggesting a novel computational role for the cerebellum in pain. In conclusion, our results show that the overall phenotype of conditioned pain behavior depends on two dissociable reinforcement learning circuits. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  8. Geochemical studies of granitic rocks of Kallur area, Manvi Taluk, Raichur district, Karnataka (India).

    PubMed

    Raghavendra, N R; Reddy, R Purushottam; Nijagunappa, R

    2011-01-01

    The geochemical data is much widely used in establishing the overall chemical relation existing between the different rock types with their parentage. A major impetus for this shift comes not only from the need to understand and quantify better the spatial and temporal evolution, with emphasis on the younger greenstone belts (Kallur copper formations), but also from the recognition that such knowledge could form the basis for the sustainable development of our natural resources. In addition, the recurrence of natural hazards has reinforced the need to learn more about the mechanics and to develop predictive modeling with advanced technical tools. This paper is emphasizing on Granodiorites of Kallur area of Manvi Taluk, Raichur District to substantiate the classical approaches of exploration and data gathering through quantitative methods of data processing and interpretation. The trilinear diagram indicates that the granites are rich in Potash and Soda. This clearly indicates that Granites are fairly rich in K2O than Na2O.

  9. Regulating recognition decisions through incremental reinforcement learning.

    PubMed

    Han, Sanghoon; Dobbins, Ian G

    2009-06-01

    Does incremental reinforcement learning influence recognition memory judgments? We examined this question by subtly altering the relative validity or availability of feedback in order to differentially reinforce old or new recognition judgments. Experiment 1 probabilistically and incorrectly indicated that either misses or false alarms were correct in the context of feedback that was otherwise accurate. Experiment 2 selectively withheld feedback for either misses or false alarms in the context of feedback that was otherwise present. Both manipulations caused prominent shifts of recognition memory decision criteria that remained for considerable periods even after feedback had been altogether removed. Overall, these data demonstrate that incremental reinforcement-learning mechanisms influence the degree of caution subjects exercise when evaluating explicit memories.

  10. Adaptive Educational Software by Applying Reinforcement Learning

    ERIC Educational Resources Information Center

    Bennane, Abdellah

    2013-01-01

    The introduction of the intelligence in teaching software is the object of this paper. In software elaboration process, one uses some learning techniques in order to adapt the teaching software to characteristics of student. Generally, one uses the artificial intelligence techniques like reinforcement learning, Bayesian network in order to adapt…

  11. Neural Correlates of Temporal Credit Assignment in the Parietal Lobe

    PubMed Central

    Eisenberg, Ian; Gottlieb, Jacqueline

    2014-01-01

    Empirical studies of decision making have typically assumed that value learning is governed by time, such that a reward prediction error arising at a specific time triggers temporally-discounted learning for all preceding actions. However, in natural behavior, goals must be acquired through multiple actions, and each action can have different significance for the final outcome. As is recognized in computational research, carrying out multi-step actions requires the use of credit assignment mechanisms that focus learning on specific steps, but little is known about the neural correlates of these mechanisms. To investigate this question we recorded neurons in the monkey lateral intraparietal area (LIP) during a serial decision task where two consecutive eye movement decisions led to a final reward. The underlying decision trees were structured such that the two decisions had different relationships with the final reward, and the optimal strategy was to learn based on the final reward at one of the steps (the “F” step) but ignore changes in this reward at the remaining step (the “I” step). In two distinct contexts, the F step was either the first or the second in the sequence, controlling for effects of temporal discounting. We show that LIP neurons had the strongest value learning and strongest post-decision responses during the transition after the F step regardless of the serial position of this step. Thus, the neurons encode correlates of temporal credit assignment mechanisms that allocate learning to specific steps independently of temporal discounting. PMID:24523935

  12. Gaining Insight by Transforming between Temporal Representations of Human Interaction

    ERIC Educational Resources Information Center

    Lund, Kristine; Quignard, Matthieu; Shaffer, David Williamson

    2017-01-01

    Recordings of human interaction data can be organized into temporal representations with different affordances. We use audio data of a learning-related discussion analyzed for its low-level emotional indicators and divided into four phases, each characterized by an overarching emotion. After arguing for the relevance of emotion to learning, we…

  13. Working Memory Load Strengthens Reward Prediction Errors.

    PubMed

    Collins, Anne G E; Ciullo, Brittany; Frank, Michael J; Badre, David

    2017-04-19

    Reinforcement learning (RL) in simple instrumental tasks is usually modeled as a monolithic process in which reward prediction errors (RPEs) are used to update expected values of choice options. This modeling ignores the different contributions of different memory and decision-making systems thought to contribute even to simple learning. In an fMRI experiment, we investigated how working memory (WM) and incremental RL processes interact to guide human learning. WM load was manipulated by varying the number of stimuli to be learned across blocks. Behavioral results and computational modeling confirmed that learning was best explained as a mixture of two mechanisms: a fast, capacity-limited, and delay-sensitive WM process together with slower RL. Model-based analysis of fMRI data showed that striatum and lateral prefrontal cortex were sensitive to RPE, as shown previously, but, critically, these signals were reduced when the learning problem was within capacity of WM. The degree of this neural interaction related to individual differences in the use of WM to guide behavioral learning. These results indicate that the two systems do not process information independently, but rather interact during learning. SIGNIFICANCE STATEMENT Reinforcement learning (RL) theory has been remarkably productive at improving our understanding of instrumental learning as well as dopaminergic and striatal network function across many mammalian species. However, this neural network is only one contributor to human learning and other mechanisms such as prefrontal cortex working memory also play a key role. Our results also show that these other players interact with the dopaminergic RL system, interfering with its key computation of reward prediction errors. Copyright © 2017 the authors 0270-6474/17/374332-11$15.00/0.

  14. A Robust Cooperated Control Method with Reinforcement Learning and Adaptive H∞ Control

    NASA Astrophysics Data System (ADS)

    Obayashi, Masanao; Uchiyama, Shogo; Kuremoto, Takashi; Kobayashi, Kunikazu

    This study proposes a robust cooperated control method combining reinforcement learning with robust control to control the system. A remarkable characteristic of the reinforcement learning is that it doesn't require model formula, however, it doesn't guarantee the stability of the system. On the other hand, robust control system guarantees stability and robustness, however, it requires model formula. We employ both the actor-critic method which is a kind of reinforcement learning with minimal amount of computation to control continuous valued actions and the traditional robust control, that is, H∞ control. The proposed system was compared method with the conventional control method, that is, the actor-critic only used, through the computer simulation of controlling the angle and the position of a crane system, and the simulation result showed the effectiveness of the proposed method.

  15. Associative symmetry in a spatial sample-response paradigm

    PubMed Central

    Vasconcelos, Marco; Urcuioli, Peter J.

    2011-01-01

    Symmetry has been difficult to observe in nonhumans mainly because they seem to perceive stimuli as a conjunction of visual, spatial, and temporal characteristics. When such characteristics are controlled, symmetry does emerge in nonhumans (cf. Frank and Wasserman 2005; Urcuioli 2008). Recently, however, Garcia and Benjumea (2006) reported symmetry in pigeons without controlling for temporal order. The present experiments explored their paradigm and the ingredients for their success. Experiments 1 and 2 sought to replicate their findings and to examine different symmetry measures. We found evidence for symmetry using non-reinforced choice probe tests, a latency-based test, and a reinforced consistent versus inconsistent manipulation. Experiment 3 adapted their procedure to successive matching to evaluate their contention that a choice between at least two comparisons is necessary for symmetry to emerge. Contrary to their prediction, symmetry was observed following go/no-go training. Our results confirm Garcia and Benjumea’s findings, extend them to other test and training procedures, and once again demonstrate symmetry in the absence of language. PMID:21238554

  16. Effects of intrinsic motivation on feedback processing during learning.

    PubMed

    DePasque, Samantha; Tricomi, Elizabeth

    2015-10-01

    Learning commonly requires feedback about the consequences of one's actions, which can drive learners to modify their behavior. Motivation may determine how sensitive an individual might be to such feedback, particularly in educational contexts where some students value academic achievement more than others. Thus, motivation for a task might influence the value placed on performance feedback and how effectively it is used to improve learning. To investigate the interplay between intrinsic motivation and feedback processing, we used functional magnetic resonance imaging (fMRI) during feedback-based learning before and after a novel manipulation based on motivational interviewing, a technique for enhancing treatment motivation in mental health settings. Because of its role in the reinforcement learning system, the striatum is situated to play a significant role in the modulation of learning based on motivation. Consistent with this idea, motivation levels during the task were associated with sensitivity to positive versus negative feedback in the striatum. Additionally, heightened motivation following a brief motivational interview was associated with increases in feedback sensitivity in the left medial temporal lobe. Our results suggest that motivation modulates neural responses to performance-related feedback, and furthermore that changes in motivation facilitate processing in areas that support learning and memory. Copyright © 2015. Published by Elsevier Inc.

  17. Learning alternative movement coordination patterns using reinforcement feedback.

    PubMed

    Lin, Tzu-Hsiang; Denomme, Amber; Ranganathan, Rajiv

    2018-05-01

    One of the characteristic features of the human motor system is redundancy-i.e., the ability to achieve a given task outcome using multiple coordination patterns. However, once participants settle on using a specific coordination pattern, the process of learning to use a new alternative coordination pattern to perform the same task is still poorly understood. Here, using two experiments, we examined this process of how participants shift from one coordination pattern to another using different reinforcement schedules. Participants performed a virtual reaching task, where they moved a cursor to different targets positioned on the screen. Our goal was to make participants use a coordination pattern with greater trunk motion, and to this end, we provided reinforcement by making the cursor disappear if the trunk motion during the reach did not cross a specified threshold value. In Experiment 1, we compared two reinforcement schedules in two groups of participants-an abrupt group, where the threshold was introduced immediately at the beginning of practice; and a gradual group, where the threshold was introduced gradually with practice. Results showed that both abrupt and gradual groups were effective in shifting their coordination patterns to involve greater trunk motion, but the abrupt group showed greater retention when the reinforcement was removed. In Experiment 2, we examined the basis of this advantage in the abrupt group using two additional control groups. Results showed that the advantage of the abrupt group was because of a greater number of practice trials with the desired coordination pattern. Overall, these results show that reinforcement can be successfully used to shift coordination patterns, which has potential in the rehabilitation of movement disorders.

  18. Neurocomputational mechanisms of prosocial learning and links to empathy

    PubMed Central

    Apps, Matthew A. J.; Valton, Vincent; Viding, Essi; Roiser, Jonathan P.

    2016-01-01

    Reinforcement learning theory powerfully characterizes how we learn to benefit ourselves. In this theory, prediction errors—the difference between a predicted and actual outcome of a choice—drive learning. However, we do not operate in a social vacuum. To behave prosocially we must learn the consequences of our actions for other people. Empathy, the ability to vicariously experience and understand the affect of others, is hypothesized to be a critical facilitator of prosocial behaviors, but the link between empathy and prosocial behavior is still unclear. During functional magnetic resonance imaging (fMRI) participants chose between different stimuli that were probabilistically associated with rewards for themselves (self), another person (prosocial), or no one (control). Using computational modeling, we show that people can learn to obtain rewards for others but do so more slowly than when learning to obtain rewards for themselves. fMRI revealed that activity in a posterior portion of the subgenual anterior cingulate cortex/basal forebrain (sgACC) drives learning only when we are acting in a prosocial context and signals a prosocial prediction error conforming to classical principles of reinforcement learning theory. However, there is also substantial variability in the neural and behavioral efficiency of prosocial learning, which is predicted by trait empathy. More empathic people learn more quickly when benefitting others, and their sgACC response is the most selective for prosocial learning. We thus reveal a computational mechanism driving prosocial learning in humans. This framework could provide insights into atypical prosocial behavior in those with disorders of social cognition. PMID:27528669

  19. A spiking neural network model of model-free reinforcement learning with high-dimensional sensory input and perceptual ambiguity.

    PubMed

    Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji

    2015-01-01

    A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach.

  20. Punishment insensitivity and impaired reinforcement learning in preschoolers.

    PubMed

    Briggs-Gowan, Margaret J; Nichols, Sara R; Voss, Joel; Zobel, Elvira; Carter, Alice S; McCarthy, Kimberly J; Pine, Daniel S; Blair, James; Wakschlag, Lauren S

    2014-01-01

    Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a developmental vulnerability to psychopathic traits. One hundred and fifty-seven preschoolers (mean age 4.7 ± 0.8 years) participated in a substudy that was embedded within a larger project. Children completed the 'Stars-in-Jars' task, which involved learning to select rewarded jars and avoid punished jars. Maternal report of responsiveness to socialization was assessed with the Punishment Insensitivity and Low Concern for Others scales of the Multidimensional Assessment of Preschool Disruptive Behavior (MAP-DB). Punishment Insensitivity, but not Low Concern for Others, was significantly associated with reinforcement learning in multivariate models that accounted for age and sex. Specifically, higher Punishment Insensitivity was associated with significantly lower overall performance and more errors on punished trials ('passive avoidance'). Impairments in reinforcement learning manifest in preschoolers who are high in maternal ratings of Punishment Insensitivity. If replicated, these findings may help to pinpoint the neurodevelopmental antecedents of psychopathic tendencies and suggest novel intervention targets beginning in early childhood. © 2013 The Authors. Journal of Child Psychology and Psychiatry © 2013 Association for Child and Adolescent Mental Health.

  1. A Spiking Neural Network Model of Model-Free Reinforcement Learning with High-Dimensional Sensory Input and Perceptual Ambiguity

    PubMed Central

    Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji

    2015-01-01

    A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach. PMID:25734662

  2. Rats bred for helplessness exhibit positive reinforcement learning deficits which are not alleviated by an antidepressant dose of the MAO-B inhibitor deprenyl.

    PubMed

    Schulz, Daniela; Henn, Fritz A; Petri, David; Huston, Joseph P

    2016-08-04

    Principles of negative reinforcement learning may play a critical role in the etiology and treatment of depression. We examined the integrity of positive reinforcement learning in congenitally helpless (cH) rats, an animal model of depression, using a random ratio schedule and a devaluation-extinction procedure. Furthermore, we tested whether an antidepressant dose of the monoamine oxidase (MAO)-B inhibitor deprenyl would reverse any deficits in positive reinforcement learning. We found that cH rats (n=9) were impaired in the acquisition of even simple operant contingencies, such as a fixed interval (FI) 20 schedule. cH rats exhibited no apparent deficits in appetite or reward sensitivity. They reacted to the devaluation of food in a manner consistent with a dose-response relationship. Reinforcer motivation as assessed by lever pressing across sessions with progressively decreasing reward probabilities was highest in congenitally non-helpless (cNH, n=10) rats as long as the reward probabilities remained relatively high. cNH compared to wild-type (n=10) rats were also more resistant to extinction across sessions. Compared to saline (n=5), deprenyl (n=5) reduced the duration of immobility of cH rats in the forced swimming test, indicative of antidepressant effects, but did not restore any deficits in the acquisition of a FI 20 schedule. We conclude that positive reinforcement learning was impaired in rats bred for helplessness, possibly due to motivational impairments but not deficits in reward sensitivity, and that deprenyl exerted antidepressant effects but did not reverse the deficits in positive reinforcement learning. Copyright © 2016 IBRO. Published by Elsevier Ltd. All rights reserved.

  3. Active Learning in Engineering Education: a (re)introduction

    NASA Astrophysics Data System (ADS)

    Lima, Rui M.; Andersson, Pernille Hammar; Saalman, Elisabeth

    2017-01-01

    The informal network 'Active Learning in Engineering Education' (ALE) has been promoting Active Learning since 2001. ALE creates opportunity for practitioners and researchers of engineering education to collaboratively learn how to foster learning of engineering students. The activities in ALE are centred on the vision that learners construct their knowledge based on meaningful activities and knowledge. In 2014, the steering committee of the ALE network reinforced the need to discuss the meaning of Active Learning and that was the base for this proposal for a special issue. More than 40 submissions were reviewed by the European Journal of Engineering Education community and this theme issue ended up with eight contributions, which are different both in their research and Active Learning approaches. These different Active Learning approaches are aligned with the different approaches that can be increasingly found in indexed journals.

  4. How actions shape perception: learning action-outcome relations and predicting sensory outcomes promote audio-visual temporal binding

    PubMed Central

    Desantis, Andrea; Haggard, Patrick

    2016-01-01

    To maintain a temporally-unified representation of audio and visual features of objects in our environment, the brain recalibrates audio-visual simultaneity. This process allows adjustment for both differences in time of transmission and time for processing of audio and visual signals. In four experiments, we show that the cognitive processes for controlling instrumental actions also have strong influence on audio-visual recalibration. Participants learned that right and left hand button-presses each produced a specific audio-visual stimulus. Following one action the audio preceded the visual stimulus, while for the other action audio lagged vision. In a subsequent test phase, left and right button-press generated either the same audio-visual stimulus as learned initially, or the pair associated with the other action. We observed recalibration of simultaneity only for previously-learned audio-visual outcomes. Thus, learning an action-outcome relation promotes temporal grouping of the audio and visual events within the outcome pair, contributing to the creation of a temporally unified multisensory object. This suggests that learning action-outcome relations and the prediction of perceptual outcomes can provide an integrative temporal structure for our experiences of external events. PMID:27982063

  5. How actions shape perception: learning action-outcome relations and predicting sensory outcomes promote audio-visual temporal binding.

    PubMed

    Desantis, Andrea; Haggard, Patrick

    2016-12-16

    To maintain a temporally-unified representation of audio and visual features of objects in our environment, the brain recalibrates audio-visual simultaneity. This process allows adjustment for both differences in time of transmission and time for processing of audio and visual signals. In four experiments, we show that the cognitive processes for controlling instrumental actions also have strong influence on audio-visual recalibration. Participants learned that right and left hand button-presses each produced a specific audio-visual stimulus. Following one action the audio preceded the visual stimulus, while for the other action audio lagged vision. In a subsequent test phase, left and right button-press generated either the same audio-visual stimulus as learned initially, or the pair associated with the other action. We observed recalibration of simultaneity only for previously-learned audio-visual outcomes. Thus, learning an action-outcome relation promotes temporal grouping of the audio and visual events within the outcome pair, contributing to the creation of a temporally unified multisensory object. This suggests that learning action-outcome relations and the prediction of perceptual outcomes can provide an integrative temporal structure for our experiences of external events.

  6. An applied test of the social learning theory of deviance to college alcohol use.

    PubMed

    DeMartino, Cynthia H; Rice, Ronald E; Saltz, Robert

    2015-04-01

    Several hypotheses about influences on college drinking derived from the social learning theory of deviance were tested and confirmed. The effect of ethnicity on alcohol use was completely mediated by differential association and differential reinforcement, whereas the effect of biological sex on alcohol use was partially mediated. Higher net positive reinforcements to costs for alcohol use predicted increased general use, more underage use, and more frequent binge drinking. Two unexpected finding were the negative relationship between negative expectations and negative experiences, and the substantive difference between nondrinkers and general drinkers compared with illegal or binge drinkers. The discussion considers implications for future campaigns based on Akers's deterrence theory.

  7. How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

    PubMed

    Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C

    2014-03-01

    Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.

  8. Pedunculopontine tegmental nucleus lesions impair stimulus--reward learning in autoshaping and conditioned reinforcement paradigms.

    PubMed

    Inglis, W L; Olmstead, M C; Robbins, T W

    2000-04-01

    The role of the pedunculopontine tegmental nucleus (PPTg) in stimulus-reward learning was assessed by testing the effects of PPTg lesions on performance in visual autoshaping and conditioned reinforcement (CRf) paradigms. Rats with PPTg lesions were unable to learn an association between a conditioned stimulus (CS) and a primary reward in either paradigm. In the autoshaping experiment, PPTg-lesioned rats approached the CS+ and CS- with equal frequency, and the latencies to respond to the two stimuli did not differ. PPTg lesions also disrupted discriminated approaches to an appetitive CS in the CRf paradigm and completely abolished the acquisition of responding with CRf. These data are discussed in the context of a possible cognitive function of the PPTg, particularly in terms of lesion-induced disruptions of attentional processes that are mediated by the thalamus.

  9. ARI Basic Research Program FY 1999-2000

    DTIC Science & Technology

    1999-06-01

    visual cues, reinforcement, and instruction concerning abstract , general rules. In our future research, we plan to examine the learning of novel...Watch, • Graduate student apprenticeship program - Consortium Research Fellows Program- with the Consortium of Metropolitan Washington Universities...do learn complex rules involving different levels of abstraction when given sufficient specific examples but that they also benefit from explicit

  10. Comparing the Achievement Goal Orientation of Mathematics Learners with and without Attention-Deficit Hyperactivity Disorder

    ERIC Educational Resources Information Center

    Spangenberg, Erica Dorethea

    2017-01-01

    Many learners with different learning challenges are accommodated in the same classroom in South Africa, which could result in poor performance in mathematics. By reinforcing or disregarding certain goals, a teacher can influence the way in which learners learn mathematics. This study compared the achievement goal orientation of Grade Nine…

  11. Microstimulation of the human substantia nigra alters reinforcement learning.

    PubMed

    Ramayya, Ashwin G; Misra, Amrit; Baltuch, Gordon H; Kahana, Michael J

    2014-05-14

    Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action-reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action-reward associations rather than stimulus-reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action-reward associations during reinforcement learning. Copyright © 2014 the authors 0270-6474/14/346887-09$15.00/0.

  12. Batch Mode Reinforcement Learning based on the Synthesis of Artificial Trajectories

    PubMed Central

    Fonteneau, Raphael; Murphy, Susan A.; Wehenkel, Louis; Ernst, Damien

    2013-01-01

    In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value function. As an alternative to the use of function approximators, we rely on the synthesis of “artificial trajectories” from the given sample of trajectories, and show that this idea opens new avenues for designing and analyzing algorithms for batch mode reinforcement learning. PMID:24049244

  13. Neural mechanisms of reinforcement learning in unmedicated patients with major depressive disorder.

    PubMed

    Rothkirch, Marcus; Tonn, Jonas; Köhler, Stephan; Sterzer, Philipp

    2017-04-01

    According to current concepts, major depressive disorder is strongly related to dysfunctional neural processing of motivational information, entailing impairments in reinforcement learning. While computational modelling can reveal the precise nature of neural learning signals, it has not been used to study learning-related neural dysfunctions in unmedicated patients with major depressive disorder so far. We thus aimed at comparing the neural coding of reward and punishment prediction errors, representing indicators of neural learning-related processes, between unmedicated patients with major depressive disorder and healthy participants. To this end, a group of unmedicated patients with major depressive disorder (n = 28) and a group of age- and sex-matched healthy control participants (n = 30) completed an instrumental learning task involving monetary gains and losses during functional magnetic resonance imaging. The two groups did not differ in their learning performance. Patients and control participants showed the same level of prediction error-related activity in the ventral striatum and the anterior insula. In contrast, neural coding of reward prediction errors in the medial orbitofrontal cortex was reduced in patients. Moreover, neural reward prediction error signals in the medial orbitofrontal cortex and ventral striatum showed negative correlations with anhedonia severity. Using a standard instrumental learning paradigm we found no evidence for an overall impairment of reinforcement learning in medication-free patients with major depressive disorder. Importantly, however, the attenuated neural coding of reward in the medial orbitofrontal cortex and the relation between anhedonia and reduced reward prediction error-signalling in the medial orbitofrontal cortex and ventral striatum likely reflect an impairment in experiencing pleasure from rewarding events as a key mechanism of anhedonia in major depressive disorder. © The Author (2017). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  14. Human reinforcement learning subdivides structured action spaces by learning effector-specific values

    PubMed Central

    Gershman, Samuel J.; Pesaran, Bijan; Daw, Nathaniel D.

    2009-01-01

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable, due to the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning – such as prediction error signals for action valuation associated with dopamine and the striatum – can cope with this “curse of dimensionality.” We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and BOLD activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to “divide and conquer” reinforcement learning over high-dimensional action spaces. PMID:19864565

  15. Human reinforcement learning subdivides structured action spaces by learning effector-specific values.

    PubMed

    Gershman, Samuel J; Pesaran, Bijan; Daw, Nathaniel D

    2009-10-28

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.

  16. Avoidance-based human Pavlovian-to-instrumental transfer

    PubMed Central

    Lewis, Andrea H.; Niznikiewicz, Michael A.; Delamater, Andrew R.; Delgado, Mauricio R.

    2013-01-01

    The Pavlovian-to-instrumental transfer (PIT) paradigm probes the influence of Pavlovian cues over instrumentally learned behavior. The paradigm has been used extensively to probe basic cognitive and motivational processes in studies of animal learning but, more recently, PIT and its underlying neural basis have been extended to investigations in humans. These initial neuroimaging studies of PIT have focused on the influence of appetitively conditioned stimuli on instrumental responses maintained by positive reinforcement, and highlight the involvement of the striatum. In the current study, we sought to understand the neural correlates of PIT in an aversive Pavlovian learning situation when instrumental responding was maintained through negative reinforcement. Participants exhibited specific PIT, wherein selective increases in instrumental responding to conditioned stimuli occurred when the stimulus signaled a specific aversive outcome whose omission negatively reinforced the instrumental response. Additionally, a general PIT effect was observed such that when a stimulus was associated with a different aversive outcome than was used to negatively reinforce instrumental behavior, the presence of that stimulus caused a non-selective increase in overall instrumental responding. Both specific and general PIT behavioral effects correlated with increased activation in corticostriatal circuitry, particularly in the striatum, a region involved in cognitive and motivational processes. These results suggest that avoidance-based PIT utilizes a similar neural mechanism to that seen with PIT in an appetitive context, which has implications for understanding mechanisms of drug-seeking behavior during addiction and relapse. PMID:24118624

  17. Disrupted reinforcement signaling in the orbitofrontal cortex and caudate in youths with conduct disorder or oppositional defiant disorder and a high level of psychopathic traits.

    PubMed

    Finger, Elizabeth C; Marsh, Abigail A; Blair, Karina S; Reid, Marguerite E; Sims, Courtney; Ng, Pamela; Pine, Daniel S; Blair, R James R

    2011-02-01

    Dysfunction in the amygdala and orbitofrontal cortex has been reported in youths and adults with psychopathic traits. The specific nature of the functional irregularities within these structures remains poorly understood. The authors used a passive avoidance task to examine the responsiveness of these systems to early stimulus-reinforcement exposure, when prediction errors are greatest and learning maximized, and to reward in youths with psychopathic traits and comparison youths. While performing the passive avoidance learning task, 15 youths with conduct disorder or oppositional defiant disorder plus a high level of psychopathic traits and 15 healthy subjects completed a 3.0-T fMRI scan. Relative to the comparison youths, the youths with a disruptive behavior disorder plus psychopathic traits showed less orbitofrontal responsiveness both to early stimulus-reinforcement exposure and to rewards, as well as less caudate response to early stimulus-reinforcement exposure. There were no group differences in amygdala responsiveness to these two task measures, but amygdala responsiveness throughout the task was lower in the youths with psychopathic traits. Compromised sensitivity to early reinforcement information in the orbitofrontal cortex and caudate and to reward outcome information in the orbitofrontal cortex of youths with conduct disorder or oppositional defiant disorder plus psychopathic traits suggests that the integrated functioning of the amygdala, caudate, and orbitofrontal cortex may be disrupted. This provides a functional neural basis for why such youths are more likely to repeat disadvantageous decisions. New treatment possibilities are raised, as pharmacologic modulations of serotonin and dopamine can affect this form of learning.

  18. Recording single neurons' action potentials from freely moving pigeons across three stages of learning.

    PubMed

    Starosta, Sarah; Stüttgen, Maik C; Güntürkün, Onur

    2014-06-02

    While the subject of learning has attracted immense interest from both behavioral and neural scientists, only relatively few investigators have observed single-neuron activity while animals are acquiring an operantly conditioned response, or when that response is extinguished. But even in these cases, observation periods usually encompass only a single stage of learning, i.e. acquisition or extinction, but not both (exceptions include protocols employing reversal learning; see Bingman et al.(1) for an example). However, acquisition and extinction entail different learning mechanisms and are therefore expected to be accompanied by different types and/or loci of neural plasticity. Accordingly, we developed a behavioral paradigm which institutes three stages of learning in a single behavioral session and which is well suited for the simultaneous recording of single neurons' action potentials. Animals are trained on a single-interval forced choice task which requires mapping each of two possible choice responses to the presentation of different novel visual stimuli (acquisition). After having reached a predefined performance criterion, one of the two choice responses is no longer reinforced (extinction). Following a certain decrement in performance level, correct responses are reinforced again (reacquisition). By using a new set of stimuli in every session, animals can undergo the acquisition-extinction-reacquisition process repeatedly. Because all three stages of learning occur in a single behavioral session, the paradigm is ideal for the simultaneous observation of the spiking output of multiple single neurons. We use pigeons as model systems, but the task can easily be adapted to any other species capable of conditioned discrimination learning.

  19. Autonomous reinforcement learning with experience replay.

    PubMed

    Wawrzyński, Paweł; Tanwani, Ajay Kumar

    2013-05-01

    This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. Copyright © 2012 Elsevier Ltd. All rights reserved.

  20. Electrophysiological correlates of reinforcement learning in young people with Tourette syndrome with and without co-occurring ADHD symptoms.

    PubMed

    Shephard, Elizabeth; Jackson, Georgina M; Groom, Madeleine J

    2016-06-01

    Altered reinforcement learning is implicated in the causes of Tourette syndrome (TS) and attention-deficit/hyperactivity disorder (ADHD). TS and ADHD frequently co-occur but how this affects reinforcement learning has not been investigated. We examined the ability of young people with TS (n=18), TS+ADHD (N=17), ADHD (n=13) and typically developing controls (n=20) to learn and reverse stimulus-response (S-R) associations based on positive and negative reinforcement feedback. We used a 2 (TS-yes, TS-no)×2 (ADHD-yes, ADHD-no) factorial design to assess the effects of TS, ADHD, and their interaction on behavioural (accuracy, RT) and event-related potential (stimulus-locked P3, feedback-locked P2, feedback-related negativity, FRN) indices of learning and reversing the S-R associations. TS was associated with intact learning and reversal performance and largely typical ERP amplitudes. ADHD was associated with lower accuracy during S-R learning and impaired reversal learning (significantly reduced accuracy and a trend for smaller P3 amplitude). The results indicate that co-occurring ADHD symptoms impair reversal learning in TS+ADHD. The implications of these findings for behavioural tic therapies are discussed. Copyright © 2016 ISDN. Published by Elsevier Ltd. All rights reserved.

  1. Utilising reinforcement learning to develop strategies for driving auditory neural implants.

    PubMed

    Lee, Geoffrey W; Zambetta, Fabio; Li, Xiaodong; Paolini, Antonio G

    2016-08-01

    In this paper we propose a novel application of reinforcement learning to the area of auditory neural stimulation. We aim to develop a simulation environment which is based off real neurological responses to auditory and electrical stimulation in the cochlear nucleus (CN) and inferior colliculus (IC) of an animal model. Using this simulator we implement closed loop reinforcement learning algorithms to determine which methods are most effective at learning effective acoustic neural stimulation strategies. By recording a comprehensive set of acoustic frequency presentations and neural responses from a set of animals we created a large database of neural responses to acoustic stimulation. Extensive electrical stimulation in the CN and the recording of neural responses in the IC provides a mapping of how the auditory system responds to electrical stimuli. The combined dataset is used as the foundation for the simulator, which is used to implement and test learning algorithms. Reinforcement learning, utilising a modified n-Armed Bandit solution, is implemented to demonstrate the model's function. We show the ability to effectively learn stimulation patterns which mimic the cochlea's ability to covert acoustic frequencies to neural activity. Time taken to learn effective replication using neural stimulation takes less than 20 min under continuous testing. These results show the utility of reinforcement learning in the field of neural stimulation. These results can be coupled with existing sound processing technologies to develop new auditory prosthetics that are adaptable to the recipients current auditory pathway. The same process can theoretically be abstracted to other sensory and motor systems to develop similar electrical replication of neural signals.

  2. Embedded Incremental Feature Selection for Reinforcement Learning

    DTIC Science & Technology

    2012-05-01

    Prior to this work, feature selection for reinforce- ment learning has focused on linear value function ap- proximation ( Kolter and Ng, 2009; Parr et al...InProceed- ings of the the 23rd International Conference on Ma- chine Learning, pages 449–456. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature

  3. Social Learning, Reinforcement and Crime: Evidence from Three European Cities

    ERIC Educational Resources Information Center

    Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina

    2012-01-01

    This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…

  4. Learning with incomplete information and the mathematical structure behind it.

    PubMed

    Kühn, Reimer; Stamatescu, Ion-Olimpiu

    2007-07-01

    We investigate the problem of learning with incomplete information as exemplified by learning with delayed reinforcement. We study a two phase learning scenario in which a phase of Hebbian associative learning based on momentary internal representations is supplemented by an 'unlearning' phase depending on a graded reinforcement signal. The reinforcement signal quantifies the success-rate globally for a number of learning steps in phase one, and 'unlearning' is indiscriminate with respect to associations learnt in that phase. Learning according to this model is studied via simulations and analytically within a student-teacher scenario for both single layer networks and, for a committee machine. Success and speed of learning depend on the ratio lambda of the learning rates used for the associative Hebbian learning phase and for the unlearning-correction in response to the reinforcement signal, respectively. Asymptotically perfect generalization is possible only, if this ratio exceeds a critical value lambda( c ), in which case the generalization error exhibits a power law decay with the number of examples seen by the student, with an exponent that depends in a non-universal manner on the parameter lambda. We find these features to be robust against a wide spectrum of modifications of microscopic modelling details. Two illustrative applications-one of a robot learning to navigate a field containing obstacles, and the problem of identifying a specific component in a collection of stimuli-are also provided.

  5. Neural Basis of Strategic Decision Making.

    PubMed

    Lee, Daeyeol; Seo, Hyojung

    2016-01-01

    Human choice behaviors during social interactions often deviate from the predictions of game theory. This might arise partly from the limitations in the cognitive abilities necessary for recursive reasoning about the behaviors of others. In addition, during iterative social interactions, choices might change dynamically as knowledge about the intentions of others and estimates for choice outcomes are incrementally updated via reinforcement learning. Some of the brain circuits utilized during social decision making might be general-purpose and contribute to isomorphic individual and social decision making. By contrast, regions in the medial prefrontal cortex (mPFC) and temporal parietal junction (TPJ) might be recruited for cognitive processes unique to social decision making. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. Isolating the incentive salience of reward-associated stimuli: value, choice, and persistence

    PubMed Central

    Chow, Jonathan J.

    2015-01-01

    Sign- and goal-tracking are differentially associated with drug abuse-related behavior. Recently, it has been hypothesized that sign- and goal-tracking behavior are mediated by different neurobehavioral valuation systems, including differential incentive salience attribution. Herein, we used different conditioned stimuli to preferentially elicit different response types to study the different incentive valuation characteristics of stimuli associated with sign- and goal-tracking within individuals. The results demonstrate that all stimuli used were equally effective conditioned stimuli; however, only a lever stimulus associated with sign-tracking behavior served as a robust conditioned reinforcer and was preferred over a tone associated with goal-tracking. Moreover, the incentive value attributed to the lever stimulus was capable of promoting suboptimal choice, leading to a significant reduction in reinforcers (food) earned. Furthermore, sign-tracking to a lever was more persistent than goal-tracking to a tone under omission and extinction contingencies. Finally, a conditional discrimination procedure demonstrated that sign-tracking to a lever and goal-tracking to a tone were dependent on learned stimulus–reinforcer relations. Collectively, these results suggest that the different neurobehavioral valuation processes proposed to govern sign- and goal-tracking behavior are independent but parallel processes within individuals. Examining these systems within individuals will provide a better understanding of how one system comes to dominate stimulus–reward learning, thus leading to the differential role these systems play in abuse-related behavior. PMID:25593298

  7. Models, Entropy and Information of Temporal Social Networks

    NASA Astrophysics Data System (ADS)

    Zhao, Kun; Karsai, Márton; Bianconi, Ginestra

    Temporal social networks are characterized by heterogeneous duration of contacts, which can either follow a power-law distribution, such as in face-to-face interactions, or a Weibull distribution, such as in mobile-phone communication. Here we model the dynamics of face-to-face interaction and mobile phone communication by a reinforcement dynamics, which explains the data observed in these different types of social interactions. We quantify the information encoded in the dynamics of these networks by the entropy of temporal networks. Finally, we show evidence that human dynamics is able to modulate the information present in social network dynamics when it follows circadian rhythms and when it is interfacing with a new technology such as the mobile-phone communication technology.

  8. Spatiotemporal information during unsupervised learning enhances viewpoint invariant object recognition

    PubMed Central

    Tian, Moqian; Grill-Spector, Kalanit

    2015-01-01

    Recognizing objects is difficult because it requires both linking views of an object that can be different and distinguishing objects with similar appearance. Interestingly, people can learn to recognize objects across views in an unsupervised way, without feedback, just from the natural viewing statistics. However, there is intense debate regarding what information during unsupervised learning is used to link among object views. Specifically, researchers argue whether temporal proximity, motion, or spatiotemporal continuity among object views during unsupervised learning is beneficial. Here, we untangled the role of each of these factors in unsupervised learning of novel three-dimensional (3-D) objects. We found that after unsupervised training with 24 object views spanning a 180° view space, participants showed significant improvement in their ability to recognize 3-D objects across rotation. Surprisingly, there was no advantage to unsupervised learning with spatiotemporal continuity or motion information than training with temporal proximity. However, we discovered that when participants were trained with just a third of the views spanning the same view space, unsupervised learning via spatiotemporal continuity yielded significantly better recognition performance on novel views than learning via temporal proximity. These results suggest that while it is possible to obtain view-invariant recognition just from observing many views of an object presented in temporal proximity, spatiotemporal information enhances performance by producing representations with broader view tuning than learning via temporal association. Our findings have important implications for theories of object recognition and for the development of computational algorithms that learn from examples. PMID:26024454

  9. Value of Conditioned Reinforcers as a Function of Temporal Context

    ERIC Educational Resources Information Center

    O'Daly, Matthew; Meyer, Steven; Fantino, Edmund

    2005-01-01

    In two experiments, pigeons were trained on a multiple-chain schedule, in which the initial link for one chain was a variable-interval (VI) 100s schedule and for the other chain a VI 10s schedule. The terminal links were both fixed-time 30s schedules signaled by differently colored stimuli. Following training, the pigeons had their preference for…

  10. Reinforcement learning and decision making in monkeys during a competitive game.

    PubMed

    Lee, Daeyeol; Conroy, Michelle L; McGreevy, Benjamin P; Barraclough, Dominic J

    2004-12-01

    Animals living in a dynamic environment must adjust their decision-making strategies through experience. To gain insights into the neural basis of such adaptive decision-making processes, we trained monkeys to play a competitive game against a computer in an oculomotor free-choice task. The animal selected one of two visual targets in each trial and was rewarded only when it selected the same target as the computer opponent. To determine how the animal's decision-making strategy can be affected by the opponent's strategy, the computer opponent was programmed with three different algorithms that exploited different aspects of the animal's choice and reward history. When the computer selected its targets randomly with equal probabilities, animals selected one of the targets more often, violating the prediction of probability matching, and their choices were systematically influenced by the choice history of the two players. When the computer exploited only the animal's choice history but not its reward history, animal's choice became more independent of its own choice history but was still related to the choice history of the opponent. This bias was substantially reduced, but not completely eliminated, when the computer used the choice history of both players in making its predictions. These biases were consistent with the predictions of reinforcement learning, suggesting that the animals sought optimal decision-making strategies using reinforcement learning algorithms.

  11. What you learn is more than what you see: what can sequencing effects tell us about inductive category learning?

    PubMed Central

    Carvalho, Paulo F.; Goldstone, Robert L.

    2015-01-01

    Inductive category learning takes place across time. As such, it is not surprising that the sequence in which information is studied has an impact in what is learned and how efficient learning is. In this paper we review research on different learning sequences and how this impacts learning. We analyze different aspects of interleaved (frequent alternation between categories during study) and blocked study (infrequent alternation between categories during study) that might explain how and when one sequence of study results in improved learning. While these different sequences of study differ in the amount of temporal spacing and temporal juxtaposition between items of different categories, these aspects do not seem to account for the majority of the results available in the literature. However, differences in the type of category being studied and the duration of the retention interval between study and test may play an important role. We conclude that there is no single aspect that is able to account for all the evidence available. Understanding learning as a process of sequential comparisons in time and how different sequences fundamentally alter the statistics of this experience offers a promising framework for understanding sequencing effects in category learning. We use this framework to present novel predictions and hypotheses for future research on sequencing effects in inductive category learning. PMID:25983699

  12. Effectiveness of an educational video as an instrument to refresh and reinforce the learning of a nursing technique: a randomized controlled trial.

    PubMed

    Salina, Loris; Ruffinengo, Carlo; Garrino, Lorenza; Massariello, Patrizia; Charrier, Lorena; Martin, Barbara; Favale, Maria Santina; Dimonte, Valerio

    2012-05-01

    The Undergraduate Nursing Course has been using videos for the past year or so. Videos are used for many different purposes such as during lessons, nurse refresher courses, reinforcement, and sharing and comparison of knowledge with the professional and scientific community. The purpose of this study was to estimate the efficacy of the video (moving an uncooperative patient from the supine to the lateral position) as an instrument to refresh and reinforce nursing techniques. A two-arm randomized controlled trial (RCT) design was chosen: both groups attended lessons in the classroom as well as in the laboratory; a month later while one group received written information as a refresher, the other group watched the video. Both groups were evaluated in a blinded fashion. A total of 223 students agreed to take part in the study. The difference observed between those who had seen the video and those who had read up on the technique turned out to be an average of 6.19 points in favour of the first (P < 0.05). The results of the RCT demonstrated that students who had seen the video were better able to apply the technique, resulting in a better performance. The video, therefore, represents an important tool to refresh and reinforce previous learning.

  13. Beyond Stimulus Cues and Reinforcement Signals: A New Approach to Animal Metacognition

    PubMed Central

    Couchman, Justin J.; Coutinho, Mariana V. C.; Beran, Michael J.; Smith, J. David

    2010-01-01

    Some metacognition paradigms for nonhuman animals encourage the alternative explanation that animals avoid difficult trials based only on reinforcement history and stimulus aversion. To explore this possibility, we placed humans and monkeys in successive uncertainty-monitoring tasks that were qualitatively different, eliminating many associative cues that might support transfer across tasks. In addition, task transfer occurred under conditions of deferred and rearranged feedback—both species completed blocks of trials followed by summary feedback. This ensured that animals received no trial-by-trial reinforcement. Despite distancing performance from associative cues, humans and monkeys still made adaptive uncertainty responses by declining the most difficult trials. These findings suggest that monkeys’ uncertainty responses could represent a higher-level, decisional process of cognitive monitoring, though that process need not involve full self-awareness or consciousness. The dissociation of performance from reinforcement has theoretical implications concerning the status of reinforcement as the critical binding force in animal learning. PMID:20836592

  14. "The stone which the builders rejected...": Delay of reinforcement and response rate on fixed-interval and related schedules.

    PubMed

    Wearden, J H; Lejeune, Helga

    2006-02-28

    The article deals with response rates (mainly running and peak or terminal rates) on simple and on some mixed-FI schedules and explores the idea that these rates are determined by the average delay of reinforcement for responses occurring during the response periods that the schedules generate. The effects of reinforcement delay are assumed to be mediated by a hyperbolic delay of reinforcement gradient. The account predicts that (a) running rates on simple FI schedules should increase with increasing rate of reinforcement, in a manner close to that required by Herrnstein's equation, (b) improving temporal control during acquisition should be associated with increasing running rates, (c) two-valued mixed-FI schedules with equiprobable components should produce complex results, with peak rates sometimes being higher on the longer component schedule, and (d) that effects of reinforcement probability on mixed-FI should affect the response rate at the time of the shorter component only. All these predictions were confirmed by data, although effects in some experiments remain outside the scope of the model. In general, delay of reinforcement as a determinant of response rate on FI and related schedules (rather than temporal control on such schedules) seems a useful starting point for a more thorough analysis of some neglected questions about performance on FI and related schedules.

  15. Fuzzy Q-Learning for Generalization of Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1996-01-01

    Fuzzy Q-Learning, introduced earlier by the author, is an extension of Q-Learning into fuzzy environments. GARIC is a methodology for fuzzy reinforcement learning. In this paper, we introduce GARIC-Q, a new method for doing incremental Dynamic Programming using a society of intelligent agents which are controlled at the top level by Fuzzy Q-Learning and at the local level, each agent learns and operates based on GARIC. GARIC-Q improves the speed and applicability of Fuzzy Q-Learning through generalization of input space by using fuzzy rules and bridges the gap between Q-Learning and rule based intelligent systems.

  16. Intertrial-interval effects on sensitivity (A') and response bias (B") in a temporal discrimination by rats.

    PubMed Central

    Raslear, T G; Shurtleff, D; Simmons, L

    1992-01-01

    Killeen and Fetterman's (1988) behavioral theory of animal timing predicts that decreases in the rate of reinforcement should produce decreases in the sensitivity (A') of temporal discriminations and a decrease in miss and correct rejection rates (decrease in bias toward "long" responses). Eight rats were trained on a 10- versus 0.1-s temporal discrimination with an intertrial interval of 5 s and were subsequently tested on probe days on the same discrimination with intertrial intervals of 1, 2.5, 5, 10, or 20 s. The rate of reinforcement declined for all animals as intertrial interval increased. Although sensitivity (A') decreased with increasing intertrial interval, all rats showed an increase in bias to make long responses. PMID:1447544

  17. Basal ganglia and Dopamine Contributions to Probabilistic Category Learning

    PubMed Central

    Shohamy, D.; Myers, C.E.; Kalanithi, J.; Gluck, M.A.

    2009-01-01

    Studies of the medial temporal lobe and basal ganglia memory systems have recently been extended towards understanding the neural systems contributing to category learning. The basal ganglia, in particular, have been linked to probabilistic category learning in humans. A separate parallel literature in systems neuroscience has emerged, indicating a role for the basal ganglia and related dopamine inputs in reward prediction and feedback processing. Here, we review behavioral, neuropsychological, functional neuroimaging, and computational studies of basal ganglia and dopamine contributions to learning in humans. Collectively, these studies implicate the basal ganglia in incremental, feedback-based learning that involves integrating information across multiple experiences. The medial temporal lobes, by contrast, contribute to rapid encoding of relations between stimuli and support flexible generalization of learning to novel contexts and stimuli. By breaking down our understanding of the cognitive and neural mechanisms contributing to different aspects of learning, recent studies are providing insight into how, and when, these different processes support learning, how they may interact with each other, and the consequence of different forms of learning for the representation of knowledge. PMID:18061261

  18. Proactivity and Reinforcement: The Contingency of Social Behavior

    ERIC Educational Resources Information Center

    Williams, J. Sherwood; And Others

    1976-01-01

    This paper analyzes development of group structure in terms of the stimulus-sampling perspective. Learning is the continual sampling of possibilities, with those reinforced possibilities increasing in probability of occurance. This contingency learning approach is tested experimentally. (NG)

  19. Scheduled power tracking control of the wind-storage hybrid system based on the reinforcement learning theory

    NASA Astrophysics Data System (ADS)

    Li, Ze

    2017-09-01

    In allusion to the intermittency and uncertainty of the wind electricity, energy storage and wind generator are combined into a hybrid system to improve the controllability of the output power. A scheduled power tracking control method is proposed based on the reinforcement learning theory and Q-learning algorithm. In this method, the state space of the environment is formed with two key factors, i.e. the state of charge of the energy storage and the difference value between the actual wind power and scheduled power, the feasible action is the output power of the energy storage, and the corresponding immediate rewarding function is designed to reflect the rationality of the control action. By interacting with the environment and learning from the immediate reward, the optimal control strategy is gradually formed. After that, it could be applied to the scheduled power tracking control of the hybrid system. Finally, the rationality and validity of the method are verified through simulation examples.

  20. A Service Learning Project on Aluminum Recycling--Developing Soft Skills in a Material and Energy Balances Course

    ERIC Educational Resources Information Center

    West, Christy Wheeler

    2017-01-01

    This paper describes a project carried out in a sophomore chemical engineering course, in which students studied the energetic differences between refining and recycling aluminum. They worked in teams to prepare a presentation about the importance of aluminum recycling to a lay audience. The project reinforced classroom learning and provided an…

  1. Interactions of numerical and temporal stimulus characteristics on the control of response location by brief flashes of light.

    PubMed

    Fetterman, J Gregor; Killeen, P Richard

    2011-09-01

    Pigeons pecked on three keys, responses to one of which could be reinforced after 3 flashes of the houselight, to a second key after 6, and to a third key after 12. The flashes were arranged according to variable-interval schedules. Response allocation among the keys was a function of the number of flashes. When flashes were omitted, transitions occurred very late. Increasing flash duration produced a leftward shift in the transitions along a number axis. Increasing reinforcement probability produced a leftward shift, and decreasing reinforcement probability produced a rightward shift. Intermixing different flash rates within sessions separated allocations: Faster flash rates shifted the functions sooner in real time, but later in terms of flash count, and conversely for slower flash rates. A model of control by fading memories of number and time was proposed.

  2. Sensitivity to value-driven attention is predicted by how we learn from value.

    PubMed

    Jahfari, Sara; Theeuwes, Jan

    2017-04-01

    Reward learning is known to influence the automatic capture of attention. This study examined how the rate of learning, after high- or low-value reward outcomes, can influence future transfers into value-driven attentional capture. Participants performed an instrumental learning task that was directly followed by an attentional capture task. A hierarchical Bayesian reinforcement model was used to infer individual differences in learning from high or low reward. Results showed a strong relationship between high-reward learning rates (or the weight that is put on learning after a high reward) and the magnitude of attentional capture with high-reward colors. Individual differences in learning from high or low rewards were further related to performance differences when high- or low-value distractors were present. These findings provide novel insight into the development of value-driven attentional capture by showing how information updating after desired or undesired outcomes can influence future deployments of automatic attention.

  3. Automated Inattention and Fatigue Detection System in Distance Education for Elementary School Students

    ERIC Educational Resources Information Center

    Hwang, Kuo-An; Yang, Chia-Hao

    2009-01-01

    Most courses based on distance learning focus on the cognitive domain of learning. Because students are sometimes inattentive or tired, they may neglect the attention goal of learning. This study proposes an auto-detection and reinforcement mechanism for the distance-education system based on the reinforcement teaching strategy. If a student is…

  4. When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition

    ERIC Educational Resources Information Center

    Janssen, Christian P.; Gray, Wayne D.

    2012-01-01

    Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other…

  5. Altered Risk-Based Decision Making following Adolescent Alcohol Use Results from an Imbalance in Reinforcement Learning in Rats

    PubMed Central

    Hart, Andrew S.; Collins, Anne L.; Bernstein, Ilene L.; Phillips, Paul E. M.

    2012-01-01

    Alcohol use during adolescence has profound and enduring consequences on decision-making under risk. However, the fundamental psychological processes underlying these changes are unknown. Here, we show that alcohol use produces over-fast learning for better-than-expected, but not worse-than-expected, outcomes without altering subjective reward valuation. We constructed a simple reinforcement learning model to simulate altered decision making using behavioral parameters extracted from rats with a history of adolescent alcohol use. Remarkably, the learning imbalance alone was sufficient to simulate the divergence in choice behavior observed between these groups of animals. These findings identify a selective alteration in reinforcement learning following adolescent alcohol use that can account for a robust change in risk-based decision making persisting into later life. PMID:22615989

  6. An analysis of intergroup rivalry using Ising model and reinforcement learning

    NASA Astrophysics Data System (ADS)

    Zhao, Feng-Fei; Qin, Zheng; Shao, Zhuo

    2014-01-01

    Modeling of intergroup rivalry can help us better understand economic competitions, political elections and other similar activities. The result of intergroup rivalry depends on the co-evolution of individual behavior within one group and the impact from the rival group. In this paper, we model the rivalry behavior using Ising model. Different from other simulation studies using Ising model, the evolution rules of each individual in our model are not static, but have the ability to learn from historical experience using reinforcement learning technique, which makes the simulation more close to real human behavior. We studied the phase transition in intergroup rivalry and focused on the impact of the degree of social freedom, the personality of group members and the social experience of individuals. The results of computer simulation show that a society with a low degree of social freedom and highly educated, experienced individuals is more likely to be one-sided in intergroup rivalry.

  7. Play along: effects of music and social interaction on word learning.

    PubMed

    Verga, Laura; Bigand, Emmanuel; Kotz, Sonja A

    2015-01-01

    Learning new words is an increasingly common necessity in everyday life. External factors, among which music and social interaction are particularly debated, are claimed to facilitate this task. Due to their influence on the learner's temporal behavior, these stimuli are able to drive the learner's attention to the correct referent of new words at the correct point in time. However, do music and social interaction impact learning behavior in the same way? The current study aims to answer this question. Native German speakers (N = 80) were requested to learn new words (pseudo-words) during a contextual learning game. This learning task was performed alone with a computer or with a partner, with or without music. Results showed that music and social interaction had a different impact on the learner's behavior: Participants tended to temporally coordinate their behavior more with a partner than with music, and in both cases more than with a computer. However, when both music and social interaction were present, this temporal coordination was hindered. These results suggest that while music and social interaction do influence participants' learning behavior, they have a different impact. Moreover, impaired behavior when both music and a partner are present suggests that different mechanisms are employed to coordinate with the two types of stimuli. Whether one or the other approach is more efficient for word learning, however, is a question still requiring further investigation, as no differences were observed between conditions in a retrieval phase, which took place immediately after the learning session. This study contributes to the literature on word learning in adults by investigating two possible facilitating factors, and has important implications for situations such as music therapy, in which music and social interaction are present at the same time.

  8. Play along: effects of music and social interaction on word learning

    PubMed Central

    Verga, Laura; Bigand, Emmanuel; Kotz, Sonja A.

    2015-01-01

    Learning new words is an increasingly common necessity in everyday life. External factors, among which music and social interaction are particularly debated, are claimed to facilitate this task. Due to their influence on the learner’s temporal behavior, these stimuli are able to drive the learner’s attention to the correct referent of new words at the correct point in time. However, do music and social interaction impact learning behavior in the same way? The current study aims to answer this question. Native German speakers (N = 80) were requested to learn new words (pseudo-words) during a contextual learning game. This learning task was performed alone with a computer or with a partner, with or without music. Results showed that music and social interaction had a different impact on the learner’s behavior: Participants tended to temporally coordinate their behavior more with a partner than with music, and in both cases more than with a computer. However, when both music and social interaction were present, this temporal coordination was hindered. These results suggest that while music and social interaction do influence participants’ learning behavior, they have a different impact. Moreover, impaired behavior when both music and a partner are present suggests that different mechanisms are employed to coordinate with the two types of stimuli. Whether one or the other approach is more efficient for word learning, however, is a question still requiring further investigation, as no differences were observed between conditions in a retrieval phase, which took place immediately after the learning session. This study contributes to the literature on word learning in adults by investigating two possible facilitating factors, and has important implications for situations such as music therapy, in which music and social interaction are present at the same time. PMID:26388818

  9. Goal-directed EEG activity evoked by discriminative stimuli in reinforcement learning.

    PubMed

    Luque, David; Morís, Joaquín; Rushby, Jacqueline A; Le Pelley, Mike E

    2015-02-01

    In reinforcement learning (RL), discriminative stimuli (S) allow agents to anticipate the value of a future outcome, and the response that will produce that outcome. We examined this processing by recording EEG locked to S during RL. Incentive value of outcomes and predictive value of S were manipulated, allowing us to discriminate between outcome-related and response-related activity. S predicting the correct response differed from nonpredictive S in the P2. S paired with high-value outcomes differed from those paired with low-value outcomes in a frontocentral positivity and in the P3b. A slow negativity then distinguished between predictive and nonpredictive S. These results suggest that, first, attention prioritizes detection of informative S. Activation of mental representations of these informative S then retrieves representations of outcomes, which in turn retrieve representations of responses that previously produced those outcomes. © 2014 Society for Psychophysiological Research.

  10. Subjective and Real Time: Coding Under Different Drug States

    PubMed Central

    Sanchez-Castillo, Hugo; Taylor, Kathleen M.; Ward, Ryan D.; Paz-Trejo, Diana B.; Arroyo-Araujo, Maria; Castillo, Oscar Galicia; Balsam, Peter D.

    2016-01-01

    Organisms are constantly extracting information from the temporal structure of the environment, which allows them to select appropriate actions and predict impending changes. Several lines of research have suggested that interval timing is modulated by the dopaminergic system. It has been proposed that higher levels of dopamine cause an internal clock to speed up, whereas less dopamine causes a deceleration of the clock. In most experiments the subjects are first trained to perform a timing task while drug free. Consequently, most of what is known about the influence of dopaminergic modulation of timing is on well-established timing performance. In the current study the impact of altered DA on the acquisition of temporal control was the focal question. Thirty male Sprague-Dawley rats were distributed randomly into three different groups (haloperidol, d-amphetamine or vehicle). Each animal received an injection 15 min prior to the start of every session from the beginning of interval training. The subjects were trained in a Fixed Interval (FI) 16s schedule followed by training on a peak procedure in which 64s non-reinforced peak trials were intermixed with FI trials. In a final test session all subjects were given vehicle injections and 10 consecutive non-reinforced peak trials to see if training under drug conditions altered the encoding of time. The current study suggests that administration of drugs that modulate dopamine do not alter the encoding temporal durations but do acutely affect the initiation of responding. PMID:27087743

  11. Sweet Taste and Nutrient Value Subdivide Rewarding Dopaminergic Neurons in Drosophila

    PubMed Central

    Huetteroth, Wolf; Perisse, Emmanuel; Lin, Suewei; Klappenbach, Martín; Burke, Christopher; Waddell, Scott

    2015-01-01

    Summary Dopaminergic neurons provide reward learning signals in mammals and insects [1–4]. Recent work in Drosophila has demonstrated that water-reinforcing dopaminergic neurons are different to those for nutritious sugars [5]. Here, we tested whether the sweet taste and nutrient properties of sugar reinforcement further subdivide the fly reward system. We found that dopaminergic neurons expressing the OAMB octopamine receptor [6] specifically convey the short-term reinforcing effects of sweet taste [4]. These dopaminergic neurons project to the β′2 and γ4 regions of the mushroom body lobes. In contrast, nutrient-dependent long-term memory requires different dopaminergic neurons that project to the γ5b regions, and it can be artificially reinforced by those projecting to the β lobe and adjacent α1 region. Surprisingly, whereas artificial implantation and expression of short-term memory occur in satiated flies, formation and expression of artificial long-term memory require flies to be hungry. These studies suggest that short-term and long-term sugar memories have different physiological constraints. They also demonstrate further functional heterogeneity within the rewarding dopaminergic neuron population. PMID:25728694

  12. On the differentiation of N2 components in an appetitive choice task: evidence for the revised Reinforcement Sensitivity Theory.

    PubMed

    Leue, Anja; Chavanon, Mira-Lynn; Wacker, Jan; Stemmler, Gerhard

    2009-11-01

    Task- and personality-related modulations of the N2 were probed within the framework of the revised Reinforcement Sensitivity Theory (RST). Using an appetitive choice task, we investigated 58 students with extreme scores on the behavioral inhibition system and behavioral approach system (BIS/BAS) scales. The baseline-to-peak N2 amplitude was sensitive to the strength of decision conflict and demonstrated RST-related personality differences. In addition to the baseline N2 amplitude, temporal PCA results suggested two N2 components accounting for a laterality effect and capturing different N2 patterns for BIS/BAS groups with increasing conflict level. Evidence for RST-related personality differences was obtained for baseline-to-peak N2 and tPCA components in the present task. The results support the RST prediction that BAS sensitivity modulates conflict processing and confirm the cognitive-motivational conflict concept of RST.

  13. A Discussion of Possibility of Reinforcement Learning Using Event-Related Potential in BCI

    NASA Astrophysics Data System (ADS)

    Yamagishi, Yuya; Tsubone, Tadashi; Wada, Yasuhiro

    Recently, Brain computer interface (BCI) which is a direct connecting pathway an external device such as a computer or a robot and a human brain have gotten a lot of attention. Since BCI can control the machines as robots by using the brain activity without using the voluntary muscle, the BCI may become a useful communication tool for handicapped persons, for instance, amyotrophic lateral sclerosis patients. However, in order to realize the BCI system which can perform precise tasks on various environments, it is necessary to design the control rules to adapt to the dynamic environments. Reinforcement learning is one approach of the design of the control rule. If this reinforcement leaning can be performed by the brain activity, it leads to the attainment of BCI that has general versatility. In this research, we paid attention to P300 of event-related potential as an alternative signal of the reward of reinforcement learning. We discriminated between the success and the failure trials from P300 of the EEG of the single trial by using the proposed discrimination algorithm based on Support vector machine. The possibility of reinforcement learning was examined from the viewpoint of the number of discriminated trials. It was shown that there was a possibility to be able to learn in most subjects.

  14. Comparative learning theory and its application in the training of horses.

    PubMed

    Cooper, J J

    1998-11-01

    Training can best be explained as a process that occurs through stimulus-response-reinforcement chains, whereby animals are conditioned to associate cues in their environment, with specific behavioural responses and their rewarding consequences. Research into learning in horses has concentrated on their powers of discrimination and on primary positive reinforcement schedules, where the correct response is paired with a desirable consequence such as food. In contrast, a number of other learning processes that are used in training have been widely studied in other species, but have received little scientific investigation in the horse. These include: negative reinforcement, where performance of the correct response is followed by removal of, or decrease in, intensity of a unpleasant stimulus; punishment, where an incorrect response is paired with an undesirable consequence, but without consistent prior warning; secondary conditioning, where a natural primary reinforcer such as food is closely associated with an arbitrary secondary reinforcer such as vocal praise; and variable or partial conditioning, where once the correct response has been learnt, reinforcement is presented according to an intermittent schedule to increase resistance to extinction outside of training.

  15. Design issues of a reinforcement-based self-learning fuzzy controller for petrochemical process control

    NASA Technical Reports Server (NTRS)

    Yen, John; Wang, Haojin; Daugherity, Walter C.

    1992-01-01

    Fuzzy logic controllers have some often-cited advantages over conventional techniques such as PID control, including easier implementation, accommodation to natural language, and the ability to cover a wider range of operating conditions. One major obstacle that hinders the broader application of fuzzy logic controllers is the lack of a systematic way to develop and modify their rules; as a result the creation and modification of fuzzy rules often depends on trial and error or pure experimentation. One of the proposed approaches to address this issue is a self-learning fuzzy logic controller (SFLC) that uses reinforcement learning techniques to learn the desirability of states and to adjust the consequent part of its fuzzy control rules accordingly. Due to the different dynamics of the controlled processes, the performance of a self-learning fuzzy controller is highly contingent on its design. The design issue has not received sufficient attention. The issues related to the design of a SFLC for application to a petrochemical process are discussed, and its performance is compared with that of a PID and a self-tuning fuzzy logic controller.

  16. Acquisition of choice in concurrent chains: Assessing the cumulative decision model.

    PubMed

    Grace, Randolph C

    2016-05-01

    Concurrent chains is widely used to study pigeons' choice between terminal links that can vary in delay, magnitude, or probability of reinforcement. We review research on the acquisition of choice in this procedure. Acquisition has been studied with a variety of research designs, and some studies have incorporated no-food trials to allow for timing and choice to be observed concurrently. Results show that: Choice can be acquired rapidly within sessions when terminal links change unpredictably; under steady-state conditions, acquisition depends on both initial- and terminal-link schedules; and initial-link responding is mediated by learning about the terminal-link stimulus-reinforcer relations. The cumulative decision model (CDM) proposed by Christensen and Grace (2010) and Grace and McLean (2006, 2015) provides a good description of within-session acquisition, and correctly predicts the effects of initial and terminal-link schedules in steady-state designs (Grace, 2002a). Questions for future research include how abrupt shifts in preference within individual sessions and temporal control of terminal-link responding can be modeled. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. Optimisation of cognitive performance in rodent operant (touchscreen) testing: Evaluation and effects of reinforcer strength.

    PubMed

    Phillips, Benjamin U; Heath, Christopher J; Ossowska, Zofia; Bussey, Timothy J; Saksida, Lisa M

    2017-09-01

    Operant testing is a widely used and highly effective method of studying cognition in rodents. Performance on such tasks is sensitive to reinforcer strength. It is therefore advantageous to select effective reinforcers to minimize training times and maximize experimental throughput. To quantitatively investigate the control of behavior by different reinforcers, performance of mice was tested with either strawberry milkshake or a known powerful reinforcer, super saccharin (1.5% or 2% (w/v) saccharin/1.5% (w/v) glucose/water mixture). Mice were tested on fixed (FR)- and progressive-ratio (PR) schedules in the touchscreen-operant testing system. Under an FR schedule, both the rate of responding and number of trials completed were higher in animals responding for strawberry milkshake versus super saccharin. Under a PR schedule, mice were willing to emit similar numbers of responses for strawberry milkshake and super saccharin; however, analysis of the rate of responding revealed a significantly higher rate of responding by animals reinforced with milkshake versus super saccharin. To determine the impact of reinforcer strength on cognitive performance, strawberry milkshake and super saccharin-reinforced animals were compared on a touchscreen visual discrimination task. Animals reinforced by strawberry milkshake were significantly faster to acquire the discrimination than animals reinforced by super saccharin. Taken together, these results suggest that strawberry milkshake is superior to super saccharin for operant behavioral testing and further confirms that the application of response rate analysis to multiple ratio tasks is a highly sensitive method for the detection of behavioral differences relevant to learning and motivation.

  18. Mesolimbic confidence signals guide perceptual learning in the absence of external feedback

    PubMed Central

    Guggenmos, Matthias; Wilbertz, Gregor; Hebart, Martin N; Sterzer, Philipp

    2016-01-01

    It is well established that learning can occur without external feedback, yet normative reinforcement learning theories have difficulties explaining such instances of learning. Here, we propose that human observers are capable of generating their own feedback signals by monitoring internal decision variables. We investigated this hypothesis in a visual perceptual learning task using fMRI and confidence reports as a measure for this monitoring process. Employing a novel computational model in which learning is guided by confidence-based reinforcement signals, we found that mesolimbic brain areas encoded both anticipation and prediction error of confidence—in remarkable similarity to previous findings for external reward-based feedback. We demonstrate that the model accounts for choice and confidence reports and show that the mesolimbic confidence prediction error modulation derived through the model predicts individual learning success. These results provide a mechanistic neurobiological explanation for learning without external feedback by augmenting reinforcement models with confidence-based feedback. DOI: http://dx.doi.org/10.7554/eLife.13388.001 PMID:27021283

  19. A Flexible Mechanism of Rule Selection Enables Rapid Feature-Based Reinforcement Learning

    PubMed Central

    Balcarras, Matthew; Womelsdorf, Thilo

    2016-01-01

    Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or color) and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or color. Two-thirds of subjects (n = 22/32) exhibited behavior that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behavior of other subjects (n = 10/32) was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioral rules by leveraging simple model-free reinforcement learning and context-specific selections to drive responses. PMID:27064794

  20. Developing PFC representations using reinforcement learning.

    PubMed

    Reynolds, Jeremy R; O'Reilly, Randall C

    2009-12-01

    From both functional and biological considerations, it is widely believed that action production, planning, and goal-oriented behaviors supported by the frontal cortex are organized hierarchically [Fuster (1991); Koechlin, E., Ody, C., & Kouneiher, F. (2003). Neuroscience: The architecture of cognitive control in the human prefrontal cortex. Science, 424, 1181-1184; Miller, G. A., Galanter, E., & Pribram, K. H. (1960). Plans and the structure of behavior. New York: Holt]. However, the nature of the different levels of the hierarchy remains unclear, and little attention has been paid to the origins of such a hierarchy. We address these issues through biologically-inspired computational models that develop representations through reinforcement learning. We explore several different factors in these models that might plausibly give rise to a hierarchical organization of representations within the PFC, including an initial connectivity hierarchy within PFC, a hierarchical set of connections between PFC and subcortical structures controlling it, and differential synaptic plasticity schedules. Simulation results indicate that architectural constraints contribute to the segregation of different types of representations, and that this segregation facilitates learning. These findings are consistent with the idea that there is a functional hierarchy in PFC, as captured in our earlier computational models of PFC function and a growing body of empirical data.

  1. Processing speed enhances model-based over model-free reinforcement learning in the presence of high working memory functioning

    PubMed Central

    Schad, Daniel J.; Jünger, Elisabeth; Sebold, Miriam; Garbusow, Maria; Bernhardt, Nadine; Javadi, Amir-Homayoun; Zimmermann, Ulrich S.; Smolka, Michael N.; Heinz, Andreas; Rapp, Michael A.; Huys, Quentin J. M.

    2014-01-01

    Theories of decision-making and its neural substrates have long assumed the existence of two distinct and competing valuation systems, variously described as goal-directed vs. habitual, or, more recently and based on statistical arguments, as model-free vs. model-based reinforcement-learning. Though both have been shown to control choices, the cognitive abilities associated with these systems are under ongoing investigation. Here we examine the link to cognitive abilities, and find that individual differences in processing speed covary with a shift from model-free to model-based choice control in the presence of above-average working memory function. This suggests shared cognitive and neural processes; provides a bridge between literatures on intelligence and valuation; and may guide the development of process models of different valuation components. Furthermore, it provides a rationale for individual differences in the tendency to deploy valuation systems, which may be important for understanding the manifold neuropsychiatric diseases associated with malfunctions of valuation. PMID:25566131

  2. A Neural Correlate of Predicted and Actual Reward-Value Information in Monkey Pedunculopontine Tegmental and Dorsal Raphe Nucleus during Saccade Tasks

    PubMed Central

    Okada, Ken-ichi; Nakamura, Kae; Kobayashi, Yasushi

    2011-01-01

    Dopamine, acetylcholine, and serotonin, the main modulators of the central nervous system, have been proposed to play important roles in the execution of movement, control of several forms of attentional behavior, and reinforcement learning. While the response pattern of midbrain dopaminergic neurons and its specific role in reinforcement learning have been revealed, the role of the other neuromodulators remains rather elusive. Here, we review our recent studies using extracellular recording from neurons in the pedunculopontine tegmental nucleus, where many cholinergic neurons exist, and the dorsal raphe nucleus, where many serotonergic neurons exist, while monkeys performed eye movement tasks to obtain different reward values. The firing patterns of these neurons are often tonic throughout the task period, while dopaminergic neurons exhibited a phasic activity pattern to the task event. The different modulation patterns, together with the activity of dopaminergic neurons, reveal dynamic information processing between these different neuromodulator systems. PMID:22013541

  3. Within- and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory.

    PubMed

    Collins, Anne G E; Frank, Michael J

    2018-03-06

    Learning from rewards and punishments is essential to survival and facilitates flexible human behavior. It is widely appreciated that multiple cognitive and reinforcement learning systems contribute to decision-making, but the nature of their interactions is elusive. Here, we leverage methods for extracting trial-by-trial indices of reinforcement learning (RL) and working memory (WM) in human electro-encephalography to reveal single-trial computations beyond that afforded by behavior alone. Neural dynamics confirmed that increases in neural expectation were predictive of reduced neural surprise in the following feedback period, supporting central tenets of RL models. Within- and cross-trial dynamics revealed a cooperative interplay between systems for learning, in which WM contributes expectations to guide RL, despite competition between systems during choice. Together, these results provide a deeper understanding of how multiple neural systems interact for learning and decision-making and facilitate analysis of their disruption in clinical populations.

  4. Learning and tuning fuzzy logic controllers through reinforcements.

    PubMed

    Berenji, H R; Khedkar, P

    1992-01-01

    A method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system is presented. It is shown that: the generalized approximate-reasoning-based intelligent control (GARIC) architecture learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward network, which can then adaptively improve performance by using gradient descent methods. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.

  5. Impairments in action-outcome learning in schizophrenia.

    PubMed

    Morris, Richard W; Cyrzon, Chad; Green, Melissa J; Le Pelley, Mike E; Balleine, Bernard W

    2018-03-03

    Learning the causal relation between actions and their outcomes (AO learning) is critical for goal-directed behavior when actions are guided by desire for the outcome. This can be contrasted with habits that are acquired by reinforcement and primed by prevailing stimuli, in which causal learning plays no part. Recently, we demonstrated that goal-directed actions are impaired in schizophrenia; however, whether this deficit exists alongside impairments in habit or reinforcement learning is unknown. The present study distinguished deficits in causal learning from reinforcement learning in schizophrenia. We tested people with schizophrenia (SZ, n = 25) and healthy adults (HA, n = 25) in a vending machine task. Participants learned two action-outcome contingencies (e.g., push left to get a chocolate M&M, push right to get a cracker), and they also learned one contingency was degraded by delivery of noncontingent outcomes (e.g., free M&Ms), as well as changes in value by outcome devaluation. Both groups learned the best action to obtain rewards; however, SZ did not distinguish the more causal action when one AO contingency was degraded. Moreover, action selection in SZ was insensitive to changes in outcome value unless feedback was provided, and this was related to the deficit in AO learning. The failure to encode the causal relation between action and outcome in schizophrenia occurred without any apparent deficit in reinforcement learning. This implies that poor goal-directed behavior in schizophrenia cannot be explained by a more primary deficit in reward learning such as insensitivity to reward value or reward prediction errors.

  6. The Effects of a Token Reinforcement System on the Reading and Arithmetic Skills Learnings of Migrant Primary School Pupils.

    ERIC Educational Resources Information Center

    Heitzman, Andrew J.

    The New York State Center for Migrant Studies conducted this 1968 study which investigated effects of token reinforcers on reading and arithmetic skills learnings of migrant primary school students during a 6-week summer school session. Students (Negro and Caucasian) received plastic tokens to reward skills learning responses. Tokens were traded…

  7. The Effects of Observation of Learn Units during Reinforcement and Correction Conditions on the Rate of Learning Math Algorithms by Fifth Grade Students

    ERIC Educational Resources Information Center

    Neu, Jessica Adele

    2013-01-01

    I conducted two studies on the comparative effects of the observation of learn units during (a) reinforcement or (b) correction conditions on the acquisition of math objectives. The dependent variables were the within-session cumulative numbers of correct responses emitted during observational sessions. The independent variables were the…

  8. An Evaluation of Pedagogical Tutorial Tactics for a Natural Language Tutoring System: A Reinforcement Learning Approach

    ERIC Educational Resources Information Center

    Chi, Min; VanLehn, Kurt; Litman, Diane; Jordan, Pamela

    2011-01-01

    Pedagogical strategies are policies for a tutor to decide the next action when there are multiple actions available. When the content is controlled to be the same across experimental conditions, there has been little evidence that tutorial decisions have an impact on students' learning. In this paper, we applied Reinforcement Learning (RL) to…

  9. The Identification and Establishment of Reinforcement for Collaboration in Elementary Students

    ERIC Educational Resources Information Center

    Darcy, Laura

    2017-01-01

    In Experiment 1, I conducted a functional analysis of student rate of learning with and without a peer-yoked contingency for 12 students in Kindergarten through 2nd grade in order to determine if they had conditioned reinforcement for collaboration. Using an ABAB reversal design, I compared rate of learning as measured by learn units to criterion…

  10. Improving the Science Excursion: An Educational Technologist's View

    ERIC Educational Resources Information Center

    Balson, M.

    1973-01-01

    Analyzes the nature of the learning process and attempts to show how the three components of a reinforcement contingency, the stimulus, the response and the reinforcement can be utilized to increase the efficiency of a typical science learning experience, the excursion. (JR)

  11. Cross-modal transfer of the conditioned eyeblink response during interstimulus interval discrimination training in young rats

    PubMed Central

    Brown, Kevin L.; Stanton, Mark E.

    2008-01-01

    Eyeblink classical conditioning (EBC) was observed across a broad developmental period with tasks utilizing two interstimulus intervals (ISIs). In ISI discrimination, two distinct conditioned stimuli (CSs; light and tone) are reinforced with a periocular shock unconditioned stimulus (US) at two different CS-US intervals. Temporal uncertainty is identical in design with the exception that the same CS is presented at both intervals. Developmental changes in conditioning have been reported in each task beyond ages when single-ISI learning is well developed. The present study sought to replicate and extend these previous findings by testing each task at four separate ages. Consistent with previous findings, younger rats (postnatal day – PD - 23 and 30) trained in ISI discrimination showed evidence of enhanced cross-modal influence of the short CS-US pairing upon long CS conditioning relative to older subjects. ISI discrimination training at PD43-47 yielded outcomes similar to those in adults (PD65-71). Cross-modal transfer effects in this task therefore appear to diminish between PD30 and PD43-47. Comparisons of ISI discrimination with temporal uncertainty indicated that cross-modal transfer in ISI discrimination at the youngest ages did not represent complete generalization across CSs. ISI discrimination undergoes a more protracted developmental emergence than single-cue EBC and may be a more sensitive indicator of developmental disorders involving cerebellar dysfunction. PMID:18726989

  12. Rapid Acquisition of Bias in Signal Detection: Dynamics of Effective Reinforcement Allocation

    ERIC Educational Resources Information Center

    Hutsell, Blake; Jacobs, Eric A.

    2012-01-01

    We investigated changes in bias (preference for one response alternative) in signal detection when relative reinforcer frequency for correct responses varied across sessions. In Experiment 1, 4 rats responded in a two-stimulus, two-response identification procedure employing temporal stimuli (short vs. long houselight presentations). Relative…

  13. Gaze-contingent reinforcement learning reveals incentive value of social signals in young children and adults

    PubMed Central

    Smith, Tim J.; Senju, Atsushi

    2017-01-01

    While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue–reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue–reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. PMID:28250186

  14. Gaze-contingent reinforcement learning reveals incentive value of social signals in young children and adults.

    PubMed

    Vernetti, Angélina; Smith, Tim J; Senju, Atsushi

    2017-03-15

    While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. © 2017 The Authors.

  15. Reinforcement Learning with Orthonormal Basis Adaptation Based on Activity-Oriented Index Allocation

    NASA Astrophysics Data System (ADS)

    Satoh, Hideki

    An orthonormal basis adaptation method for function approximation was developed and applied to reinforcement learning with multi-dimensional continuous state space. First, a basis used for linear function approximation of a control function is set to an orthonormal basis. Next, basis elements with small activities are replaced with other candidate elements as learning progresses. As this replacement is repeated, the number of basis elements with large activities increases. Example chaos control problems for multiple logistic maps were solved, demonstrating that the method for adapting an orthonormal basis can modify a basis while holding the orthonormality in accordance with changes in the environment to improve the performance of reinforcement learning and to eliminate the adverse effects of redundant noisy states.

  16. Learning to Predict Consequences as a Method of Knowledge Transfer in Reinforcement Learning.

    PubMed

    Chalmers, Eric; Contreras, Edgar Bermudez; Robertson, Brandon; Luczak, Artur; Gruber, Aaron

    2017-04-17

    The reinforcement learning (RL) paradigm allows agents to solve tasks through trial-and-error learning. To be capable of efficient, long-term learning, RL agents should be able to apply knowledge gained in the past to new tasks they may encounter in the future. The ability to predict actions' consequences may facilitate such knowledge transfer. We consider here domains where an RL agent has access to two kinds of information: agent-centric information with constant semantics across tasks, and environment-centric information, which is necessary to solve the task, but with semantics that differ between tasks. For example, in robot navigation, environment-centric information may include the robot's geographic location, while agent-centric information may include sensor readings of various nearby obstacles. We propose that these situations provide an opportunity for a very natural style of knowledge transfer, in which the agent learns to predict actions' environmental consequences using agent-centric information. These predictions contain important information about the affordances and dangers present in a novel environment, and can effectively transfer knowledge from agent-centric to environment-centric learning systems. Using several example problems including spatial navigation and network routing, we show that our knowledge transfer approach can allow faster and lower cost learning than existing alternatives.

  17. Learning to use working memory: a reinforcement learning gating model of rule acquisition in rats

    PubMed Central

    Lloyd, Kevin; Becker, Nadine; Jones, Matthew W.; Bogacz, Rafal

    2012-01-01

    Learning to form appropriate, task-relevant working memory representations is a complex process central to cognition. Gating models frame working memory as a collection of past observations and use reinforcement learning (RL) to solve the problem of when to update these observations. Investigation of how gating models relate to brain and behavior remains, however, at an early stage. The current study sought to explore the ability of simple RL gating models to replicate rule learning behavior in rats. Rats were trained in a maze-based spatial learning task that required animals to make trial-by-trial choices contingent upon their previous experience. Using an abstract version of this task, we tested the ability of two gating algorithms, one based on the Actor-Critic and the other on the State-Action-Reward-State-Action (SARSA) algorithm, to generate behavior consistent with the rats'. Both models produced rule-acquisition behavior consistent with the experimental data, though only the SARSA gating model mirrored faster learning following rule reversal. We also found that both gating models learned multiple strategies in solving the initial task, a property which highlights the multi-agent nature of such models and which is of importance in considering the neural basis of individual differences in behavior. PMID:23115551

  18. Online Bahavior Aquisition of an Agent based on Coaching as Learning Assistance

    NASA Astrophysics Data System (ADS)

    Hirokawa, Masakazu; Suzuki, Kenji

    This paper describes a novel methodology, namely ``Coaching'', which allows humans to give a subjective evaluation to an agent in an iterative manner. This is an interactive learning method to improve the reinforcement learning by modifying a reward function dynamically according to given evaluations by a trainer and the learning situation of the agent. We demonstrate that the agent can learn different reward functions by given instructions such as ``good or bad'' by human's observation, and can also obtain a set of behavior based on the learnt reward functions through several experiments.

  19. Feeding behavior of Aplysia: a model system for comparing cellular mechanisms of classical and operant conditioning.

    PubMed

    Baxter, Douglas A; Byrne, John H

    2006-01-01

    Feeding behavior of Aplysia provides an excellent model system for analyzing and comparing mechanisms underlying appetitive classical conditioning and reward operant conditioning. Behavioral protocols have been developed for both forms of associative learning, both of which increase the occurrence of biting following training. Because the neural circuitry that mediates the behavior is well characterized and amenable to detailed cellular analyses, substantial progress has been made toward a comparative analysis of the cellular mechanisms underlying these two forms of associative learning. Both forms of associative learning use the same reinforcement pathway (the esophageal nerve, En) and the same reinforcement transmitter (dopamine, DA). In addition, at least one cellular locus of plasticity (cell B51) is modified by both forms of associative learning. However, the two forms of associative learning have opposite effects on B51. Classical conditioning decreases the excitability of B51, whereas operant conditioning increases the excitability of B51. Thus, the approach of using two forms of associative learning to modify a single behavior, which is mediated by an analytically tractable neural circuit, is revealing similarities and differences in the mechanisms that underlie classical and operant conditioning.

  20. Flow Navigation by Smart Microswimmers via Reinforcement Learning

    NASA Astrophysics Data System (ADS)

    Colabrese, Simona; Biferale, Luca; Celani, Antonio; Gustavsson, Kristian

    2017-11-01

    We have numerically modeled active particles which are able to acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. We show that those swimmers can learn effective strategies just by experience, using a reinforcement learning algorithm. As an example, we focus on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, exploiting the underlying flow whenever possible. The reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This work paves the way towards the engineering of smart microswimmers that solve difficult navigation problems. ERC AdG NewTURB 339032.

  1. Reinforcement Learning Multi-Agent Modeling of Decision-Making Agents for the Study of Transboundary Surface Water Conflicts with Application to the Syr Darya River Basin

    NASA Astrophysics Data System (ADS)

    Riegels, N.; Siegfried, T.; Pereira Cardenal, S. J.; Jensen, R. A.; Bauer-Gottwein, P.

    2008-12-01

    In most economics--driven approaches to optimizing water use at the river basin scale, the system is modelled deterministically with the goal of maximizing overall benefits. However, actual operation and allocation decisions must be made under hydrologic and economic uncertainty. In addition, river basins often cross political boundaries, and different states may not be motivated to cooperate so as to maximize basin- scale benefits. Even within states, competing agents such as irrigation districts, municipal water agencies, and large industrial users may not have incentives to cooperate to realize efficiency gains identified in basin- level studies. More traditional simulation--optimization approaches assume pre-commitment by individual agents and stakeholders and unconditional compliance on each side. While this can help determine attainable gains and tradeoffs from efficient management, such hardwired policies do not account for dynamic feedback between agents themselves or between agents and their environments (e.g. due to climate change etc.). In reality however, we are dealing with an out-of-equilibrium multi-agent system, where there is neither global knowledge nor global control, but rather continuous strategic interaction between decision making agents. Based on the theory of stochastic games, we present a computational framework that allows for studying the dynamic feedback between decision--making agents themselves and an inherently uncertain environment in a spatially and temporally distributed manner. Agents with decision-making control over water allocation such as countries, irrigation districts, and municipalities are represented by reinforcement learning agents and coupled to a detailed hydrologic--economic model. This approach emphasizes learning by agents from their continuous interaction with other agents and the environment. It provides a convenient framework for the solution of the problem of dynamic decision-making in a mixed cooperative / non-cooperative environment with which different institutional setups and incentive systems can be studied so to identify reasonable ways to reach desirable, Pareto--optimal allocation outcomes. Preliminary results from an application to the Syr Darya river basin in Central Asia will be presented and discussed. The Syr Darya River is a classic example of a transboundary river basin in which basin-wide efficiency gains identified in optimization studies have not been sufficient to induce cooperative management of the river by the riparian states.

  2. Unique characteristics of motor adaptation during walking in young children.

    PubMed

    Musselman, Kristin E; Patrick, Susan K; Vasudevan, Erin V L; Bastian, Amy J; Yang, Jaynie F

    2011-05-01

    Children show precocious ability in the learning of languages; is this the case with motor learning? We used split-belt walking to probe motor adaptation (a form of motor learning) in children. Data from 27 children (ages 8-36 mo) were compared with those from 10 adults. Children walked with the treadmill belts at the same speed (tied belt), followed by walking with the belts moving at different speeds (split belt) for 8-10 min, followed again by tied-belt walking (postsplit). Initial asymmetries in temporal coordination (i.e., double support time) induced by split-belt walking were slowly reduced, with most children showing an aftereffect (i.e., asymmetry in the opposite direction to the initial) in the early postsplit period, indicative of learning. In contrast, asymmetries in spatial coordination (i.e., center of oscillation) persisted during split-belt walking and no aftereffect was seen. Step length, a measure of both spatial and temporal coordination, showed intermediate effects. The time course of learning in double support and step length was slower in children than in adults. Moreover, there was a significant negative correlation between the size of the initial asymmetry during early split-belt walking (called error) and the aftereffect for step length. Hence, children may have more difficulty learning when the errors are large. The findings further suggest that the mechanisms controlling temporal and spatial adaptation are different and mature at different times.

  3. Can Service Learning Reinforce Social and Cultural Bias? Exploring a Popular Model of Family Involvement for Early Childhood Teacher Candidates

    ERIC Educational Resources Information Center

    Dunn-Kenney, Maylan

    2010-01-01

    Service learning is often used in teacher education as a way to challenge social bias and provide teacher candidates with skills needed to work in partnership with diverse families. Although some literature suggests that service learning could reinforce cultural bias, there is little documentation. In a study of 21 early childhood teacher…

  4. Deep Gate Recurrent Neural Network

    DTIC Science & Technology

    2016-11-22

    Schmidhuber. A system for robotic heart surgery that learns to tie knots using recurrent neural networks. In IEEE International Conference on...tasks, such as Machine Translation (Bahdanau et al. (2015)) or Robot Reinforcement Learning (Bakker (2001)). The main idea behind these networks is to...and J. Peters. Reinforcement learning in robotics : A survey. The International Journal of Robotics Research, 32:1238–1274, 2013. ISSN 0278-3649. doi

  5. Sex differences in verbal and nonverbal learning before and after temporal lobe epilepsy surgery.

    PubMed

    Berger, Justus; Oltmanns, Frank; Holtkamp, Martin; Bengner, Thomas

    2017-01-01

    Women outperform men in a host of episodic memory tasks, yet the neuroanatomical basis for this effect is unclear. It has been suggested that the anterior temporal lobe might be especially relevant for sex differences in memory. In the current study, we investigated whether temporal lobe epilepsy (TLE) has an influence on sex effects in learning and memory and whether women and men with TLE differ in their risk for memory deficits after epilepsy surgery. 177 patients (53 women and 41 men with left TLE, 42 women and 41 men with right TLE) were neuropsychologically tested before and one year after temporal lobe resection. We found that women with TLE had better verbal, but not figural, memory than men with TLE. The female advantage in verbal memory was not affected by temporal lobe resection. The same pattern of results was found in a more homogeneous subsample of 84 patients with only hippocampal sclerosis who were seizure-free after surgery. Our findings challenge the concept that the anterior temporal lobe plays a central role in the verbal memory advantage for women. Copyright © 2016 Elsevier Inc. All rights reserved.

  6. The critical dimensions of the response-reinforcer contingency.

    PubMed

    Williams, B A.

    2001-05-03

    Two major dimensions of any contingency of reinforcement are the temporal relation between a response and its reinforcer, and the relative frequency of the reinforcer given the response versus when the response has not occurred. Previous data demonstrate that time, per se, is not sufficient to explain the effects of delay-of-reinforcement procedures; needed in addition is some account of the events occurring in the delay interval. Moreover, the effects of the same absolute time values vary greatly across situations, such that any notion of a standard delay-of-reinforcement gradient is simplistic. The effects of reinforcers occurring in the absence of a response depend critically upon the stimulus conditions paired with those reinforcers, in much the same manner as has been shown with Pavlovian contingency effects. However, it is unclear whether the underlying basis of such effects is response competition or changes in the calculus of causation.

  7. Conceptualizing withdrawal-induced escalation of alcohol self-administration as a learned, plasticity-dependent process

    PubMed Central

    Walker, Brendan M.

    2013-01-01

    This article represents one of five contributions focusing on the topic “Plasticity and neuroadaptive responses within the extended amygdala in response to chronic or excessive alcohol exposure” that were developed by awardees participating in the Young Investigator Award Symposium at the “Alcoholism and Stress: A Framework for Future Treatment Strategies” conference in Volterra, Italy on May 3–6, 2011 that was organized/chaired by Drs. Antonio Noronha and Fulton Crews and sponsored by the National Institute on Alcohol Abuse and Alcoholism. This review discusses the dependence-induced neuroadaptations in affective systems that provide a basis for negative reinforcement learning and presents evidence demonstrating that escalated alcohol consumption during withdrawal is a learned, plasticity-dependent process. The review concludes by identifying changes within extended amygdala dynorphin/kappa-opioid receptor systems that could serve as the foundation for the occurrence of negative reinforcement processes. While some evidence contained herein may be specific to alcohol dependence-related learning and plasticity, much of the information will be of relevance to any addictive disorder involving negative reinforcement mechanisms. Collectively, the information presented within this review provides a framework to assess the negative reinforcing effects of alcohol in a manner that distinguishes neuroadaptations produced by chronic alcohol exposure from the actual plasticity that is associated with negative reinforcement learning in dependent organisms. PMID:22459874

  8. Reinforcement Learning Strategies for Clinical Trials in Non-small Cell Lung Cancer

    PubMed Central

    Zhao, Yufan; Zeng, Donglin; Socinski, Mark A.; Kosorok, Michael R.

    2010-01-01

    Summary Typical regimens for advanced metastatic stage IIIB/IV non-small cell lung cancer (NSCLC) consist of multiple lines of treatment. We present an adaptive reinforcement learning approach to discover optimal individualized treatment regimens from a specially designed clinical trial (a “clinical reinforcement trial”) of an experimental treatment for patients with advanced NSCLC who have not been treated previously with systemic therapy. In addition to the complexity of the problem of selecting optimal compounds for first and second-line treatments based on prognostic factors, another primary goal is to determine the optimal time to initiate second-line therapy, either immediately or delayed after induction therapy, yielding the longest overall survival time. A reinforcement learning method called Q-learning is utilized which involves learning an optimal regimen from patient data generated from the clinical reinforcement trial. Approximating the Q-function with time-indexed parameters can be achieved by using a modification of support vector regression which can utilize censored data. Within this framework, a simulation study shows that the procedure can extract optimal regimens for two lines of treatment directly from clinical data without prior knowledge of the treatment effect mechanism. In addition, we demonstrate that the design reliably selects the best initial time for second-line therapy while taking into account the heterogeneity of NSCLC across patients. PMID:21385164

  9. Human instrumental performance in ratio and interval contingencies: A challenge for associative theory.

    PubMed

    Pérez, Omar D; Aitken, Michael R F; Zhukovsky, Peter; Soto, Fabián A; Urcelay, Gonzalo P; Dickinson, Anthony

    2016-12-15

    Associative learning theories regard the probability of reinforcement as the critical factor determining responding. However, the role of this factor in instrumental conditioning is not completely clear. In fact, free-operant experiments show that participants respond at a higher rate on variable ratio than on variable interval schedules even though the reinforcement probability is matched between the schedules. This difference has been attributed to the differential reinforcement of long inter-response times (IRTs) by interval schedules, which acts to slow responding. In the present study, we used a novel experimental design to investigate human responding under random ratio (RR) and regulated probability interval (RPI) schedules, a type of interval schedule that sets a reinforcement probability independently of the IRT duration. Participants responded on each type of schedule before a final choice test in which they distributed responding between two schedules similar to those experienced during training. Although response rates did not differ during training, the participants responded at a lower rate on the RPI schedule than on the matched RR schedule during the choice test. This preference cannot be attributed to a higher probability of reinforcement for long IRTs and questions the idea that similar associative processes underlie classical and instrumental conditioning.

  10. Sleep to the beat: A nap favours consolidation of timing.

    PubMed

    Verweij, Ilse M; Onuki, Yoshiyuki; Van Someren, Eus J W; Van der Werf, Ysbrand D

    2016-06-01

    Growing evidence suggests that sleep is important for procedural learning, but few studies have investigated the effect of sleep on the temporal aspects of motor skill learning. We assessed the effect of a 90-min day-time nap on learning a motor timing task, using 2 adaptations of a serial interception sequence learning (SISL) task. Forty-two right-handed participants performed the task before and after a 90-min period of sleep or wake. Electroencephalography (EEG) was recorded throughout. The motor task consisted of a sequential spatial pattern and was performed according to 2 different timing conditions, that is, either following a sequential or a random temporal pattern. The increase in accuracy was compared between groups using a mixed linear regression model. Within the sleep group, performance improvement was modeled based on sleep characteristics, including spindle- and slow-wave density. The sleep group, but not the wake group, showed improvement in the random temporal, but especially and significantly more strongly in the sequential temporal condition. None of the sleep characteristics predicted improvement on either general of the timing conditions. In conclusion, a daytime nap improves performance on a timing task. We show that performance on the task with a sequential timing sequence benefits more from sleep than motor timing. More important, the temporal sequence did not benefit initial learning, because differences arose only after an offline period and specifically when this period contained sleep. Sleep appears to aid in the extraction of regularities for optimal subsequent performance. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  11. Precise Synaptic Efficacy Alignment Suggests Potentiation Dominated Learning.

    PubMed

    Hartmann, Christoph; Miner, Daniel C; Triesch, Jochen

    2015-01-01

    Recent evidence suggests that parallel synapses from the same axonal branch onto the same dendritic branch have almost identical strength. It has been proposed that this alignment is only possible through learning rules that integrate activity over long time spans. However, learning mechanisms such as spike-timing-dependent plasticity (STDP) are commonly assumed to be temporally local. Here, we propose that the combination of temporally local STDP and a multiplicative synaptic normalization mechanism is sufficient to explain the alignment of parallel synapses. To address this issue, we introduce three increasingly complex models: First, we model the idealized interaction of STDP and synaptic normalization in a single neuron as a simple stochastic process and derive analytically that the alignment effect can be described by a so-called Kesten process. From this we can derive that synaptic efficacy alignment requires potentiation-dominated learning regimes. We verify these conditions in a single-neuron model with independent spiking activities but more realistic synapses. As expected, we only observe synaptic efficacy alignment for long-term potentiation-biased STDP. Finally, we explore how well the findings transfer to recurrent neural networks where the learning mechanisms interact with the correlated activity of the network. We find that due to the self-reinforcing correlations in recurrent circuits under STDP, alignment occurs for both long-term potentiation- and depression-biased STDP, because the learning will be potentiation dominated in both cases due to the potentiating events induced by correlated activity. This is in line with recent results demonstrating a dominance of potentiation over depression during waking and normalization during sleep. This leads us to predict that individual spine pairs will be more similar after sleep compared to after sleep deprivation. In conclusion, we show that synaptic normalization in conjunction with coordinated potentiation--in this case, from STDP in the presence of correlated pre- and post-synaptic activity--naturally leads to an alignment of parallel synapses.

  12. Monitoring Makes a Difference: Quality and Temporal Variation in Teacher Education Students' Collaborative Learning

    ERIC Educational Resources Information Center

    Näykki, Piia; Järvenoja, Hanna; Järvelä, Sanna; Kirschner, Paul

    2017-01-01

    The aim of this process-oriented video-observation study is to explore how groups that perform differently differ in terms of the number, quality, and temporal variation of their content-level (knowledge co-construction) and meta-level (monitoring) activities. Five groups of teacher education students (n = 22) were observed throughout a 3-month…

  13. The strength of aversive and appetitive associations and maladaptive behaviors.

    PubMed

    Itzhak, Yossef; Perez-Lanza, Daniel; Liddie, Shervin

    2014-08-01

    Certain maladaptive behaviors are thought to be acquired through classical Pavlovian conditioning. Exaggerated fear response, which can develop through Pavlovian conditioning, is associated with acquired anxiety disorders such as post-traumatic stress disorders (PTSDs). Inflated reward-seeking behavior, which develops through Pavlovian conditioning, underlies some types of addictive behavior (e.g., addiction to drugs, food, and gambling). These maladaptive behaviors are dependent on associative learning and the development of long-term memory (LTM). In animal models, an aversive reinforcer (fear conditioning) encodes an aversive contextual and cued LTM. On the other hand, an appetitive reinforcer results in conditioned place preference (CPP) that encodes an appetitive contextual LTM. The literature on weak and strong associative learning pertaining to the development of aversive and appetitive LTM is relatively scarce; thus, this review is particularly focused on the strength of associative learning. The strength of associative learning is dependent on the valence of the reinforcer and the salience of the conditioned stimulus that ultimately sways the strength of the memory trace. Our studies suggest that labile (weak) aversive and appetitive LTM may share similar signaling pathways, whereas stable (strong) aversive and appetitive LTM is mediated through different pathways. In addition, we provide some evidence suggesting that extinction of aversive fear memory and appetitive drug memory is likely to be mediated through different signaling molecules. We put forward the importance of studies aimed to investigate the molecular mechanisms underlying the development of weak and strong memories (aversive and appetitive), which would ultimately help in the development of targeted pharmacotherapies for the management of maladaptive behaviors that arise from classical Pavlovian conditioning. © 2014 International Union of Biochemistry and Molecular Biology.

  14. Optimizing occupational exposure measurement strategies when estimating the log-scale arithmetic mean value--an example from the reinforced plastics industry.

    PubMed

    Lampa, Erik G; Nilsson, Leif; Liljelind, Ingrid E; Bergdahl, Ingvar A

    2006-06-01

    When assessing occupational exposures, repeated measurements are in most cases required. Repeated measurements are more resource intensive than a single measurement, so careful planning of the measurement strategy is necessary to assure that resources are spent wisely. The optimal strategy depends on the objectives of the measurements. Here, two different models of random effects analysis of variance (ANOVA) are proposed for the optimization of measurement strategies by the minimization of the variance of the estimated log-transformed arithmetic mean value of a worker group, i.e. the strategies are optimized for precise estimation of that value. The first model is a one-way random effects ANOVA model. For that model it is shown that the best precision in the estimated mean value is always obtained by including as many workers as possible in the sample while restricting the number of replicates to two or at most three regardless of the size of the variance components. The second model introduces the 'shared temporal variation' which accounts for those random temporal fluctuations of the exposure that the workers have in common. It is shown for that model that the optimal sample allocation depends on the relative sizes of the between-worker component and the shared temporal component, so that if the between-worker component is larger than the shared temporal component more workers should be included in the sample and vice versa. The results are illustrated graphically with an example from the reinforced plastics industry. If there exists a shared temporal variation at a workplace, that variability needs to be accounted for in the sampling design and the more complex model is recommended.

  15. Neural Modularity Helps Organisms Evolve to Learn New Skills without Forgetting Old Skills

    PubMed Central

    Ellefsen, Kai Olav; Mouret, Jean-Baptiste; Clune, Jeff

    2015-01-01

    A long-standing goal in artificial intelligence is creating agents that can learn a variety of different skills for different problems. In the artificial intelligence subfield of neural networks, a barrier to that goal is that when agents learn a new skill they typically do so by losing previously acquired skills, a problem called catastrophic forgetting. That occurs because, to learn the new task, neural learning algorithms change connections that encode previously acquired skills. How networks are organized critically affects their learning dynamics. In this paper, we test whether catastrophic forgetting can be reduced by evolving modular neural networks. Modularity intuitively should reduce learning interference between tasks by separating functionality into physically distinct modules in which learning can be selectively turned on or off. Modularity can further improve learning by having a reinforcement learning module separate from sensory processing modules, allowing learning to happen only in response to a positive or negative reward. In this paper, learning takes place via neuromodulation, which allows agents to selectively change the rate of learning for each neural connection based on environmental stimuli (e.g. to alter learning in specific locations based on the task at hand). To produce modularity, we evolve neural networks with a cost for neural connections. We show that this connection cost technique causes modularity, confirming a previous result, and that such sparsely connected, modular networks have higher overall performance because they learn new skills faster while retaining old skills more and because they have a separate reinforcement learning module. Our results suggest (1) that encouraging modularity in neural networks may help us overcome the long-standing barrier of networks that cannot learn new skills without forgetting old ones, and (2) that one benefit of the modularity ubiquitous in the brains of natural animals might be to alleviate the problem of catastrophic forgetting. PMID:25837826

  16. Neural modularity helps organisms evolve to learn new skills without forgetting old skills.

    PubMed

    Ellefsen, Kai Olav; Mouret, Jean-Baptiste; Clune, Jeff

    2015-04-01

    A long-standing goal in artificial intelligence is creating agents that can learn a variety of different skills for different problems. In the artificial intelligence subfield of neural networks, a barrier to that goal is that when agents learn a new skill they typically do so by losing previously acquired skills, a problem called catastrophic forgetting. That occurs because, to learn the new task, neural learning algorithms change connections that encode previously acquired skills. How networks are organized critically affects their learning dynamics. In this paper, we test whether catastrophic forgetting can be reduced by evolving modular neural networks. Modularity intuitively should reduce learning interference between tasks by separating functionality into physically distinct modules in which learning can be selectively turned on or off. Modularity can further improve learning by having a reinforcement learning module separate from sensory processing modules, allowing learning to happen only in response to a positive or negative reward. In this paper, learning takes place via neuromodulation, which allows agents to selectively change the rate of learning for each neural connection based on environmental stimuli (e.g. to alter learning in specific locations based on the task at hand). To produce modularity, we evolve neural networks with a cost for neural connections. We show that this connection cost technique causes modularity, confirming a previous result, and that such sparsely connected, modular networks have higher overall performance because they learn new skills faster while retaining old skills more and because they have a separate reinforcement learning module. Our results suggest (1) that encouraging modularity in neural networks may help us overcome the long-standing barrier of networks that cannot learn new skills without forgetting old ones, and (2) that one benefit of the modularity ubiquitous in the brains of natural animals might be to alleviate the problem of catastrophic forgetting.

  17. Depression, Activity, and Evaluation of Reinforcement

    ERIC Educational Resources Information Center

    Hammen, Constance L.; Glass, David R., Jr.

    1975-01-01

    This research attempted to find the causal relation between mood and level of reinforcement. An effort was made to learn what mood change might occur if depressed subjects increased their levels of participation in reinforcing activities. (Author/RK)

  18. What Can Reinforcement Learning Teach Us About Non-Equilibrium Quantum Dynamics

    NASA Astrophysics Data System (ADS)

    Bukov, Marin; Day, Alexandre; Sels, Dries; Weinberg, Phillip; Polkovnikov, Anatoli; Mehta, Pankaj

    Equilibrium thermodynamics and statistical physics are the building blocks of modern science and technology. Yet, our understanding of thermodynamic processes away from equilibrium is largely missing. In this talk, I will reveal the potential of what artificial intelligence can teach us about the complex behaviour of non-equilibrium systems. Specifically, I will discuss the problem of finding optimal drive protocols to prepare a desired target state in quantum mechanical systems by applying ideas from Reinforcement Learning [one can think of Reinforcement Learning as the study of how an agent (e.g. a robot) can learn and perfect a given policy through interactions with an environment.]. The driving protocols learnt by our agent suggest that the non-equilibrium world features possibilities easily defying intuition based on equilibrium physics.

  19. The Chronotron: A Neuron That Learns to Fire Temporally Precise Spike Patterns

    PubMed Central

    Florian, Răzvan V.

    2012-01-01

    In many cases, neurons process information carried by the precise timings of spikes. Here we show how neurons can learn to generate specific temporally precise output spikes in response to input patterns of spikes having precise timings, thus processing and memorizing information that is entirely temporally coded, both as input and as output. We introduce two new supervised learning rules for spiking neurons with temporal coding of information (chronotrons), one that provides high memory capacity (E-learning), and one that has a higher biological plausibility (I-learning). With I-learning, the neuron learns to fire the target spike trains through synaptic changes that are proportional to the synaptic currents at the timings of real and target output spikes. We study these learning rules in computer simulations where we train integrate-and-fire neurons. Both learning rules allow neurons to fire at the desired timings, with sub-millisecond precision. We show how chronotrons can learn to classify their inputs, by firing identical, temporally precise spike trains for different inputs belonging to the same class. When the input is noisy, the classification also leads to noise reduction. We compute lower bounds for the memory capacity of chronotrons and explore the influence of various parameters on chronotrons' performance. The chronotrons can model neurons that encode information in the time of the first spike relative to the onset of salient stimuli or neurons in oscillatory networks that encode information in the phases of spikes relative to the background oscillation. Our results show that firing one spike per cycle optimizes memory capacity in neurons encoding information in the phase of firing relative to a background rhythm. PMID:22879876

  20. Kinesthetic Reinforcement-Is It a Boon to Learning?

    ERIC Educational Resources Information Center

    Bohrer, Roxilu K.

    1970-01-01

    Language instruction, particularly in the elementary school, should be reinforced through the use of visual aids and through associated physical activity. Kinesthetic experiences provide an opportunity to make use of non-verbal cues to meaning, enliven classroom activities, and maximize learning for pupils. The author discusses the educational…

  1. Reinforcing Basic Skills Through Social Studies. Grades 4-7.

    ERIC Educational Resources Information Center

    Lewis, Teresa Marie

    Arranged into seven parts, this document provides a variety of games and activities, bulletin board ideas, overhead transparencies, student handouts, and learning station ideas to help reinforce basic social studies skills in the intermediate grades. In part 1, students learn about timelines, first constructing their own life timeline, then a…

  2. Effects of Reinforcement on Peer Imitation in a Small Group Play Context

    ERIC Educational Resources Information Center

    Barton, Erin E.; Ledford, Jennifer R.

    2018-01-01

    Children with disabilities often have deficits in imitation skills, particularly in imitating peers. Imitation is considered a behavioral cusp--which, once learned, allows a child to access additional and previously unavailable learning opportunities. In the current study, researchers examined the efficacy of contingent reinforcement delivered…

  3. Neurofeedback in Learning Disabled Children: Visual versus Auditory Reinforcement.

    PubMed

    Fernández, Thalía; Bosch-Bayard, Jorge; Harmony, Thalía; Caballero, María I; Díaz-Comas, Lourdes; Galán, Lídice; Ricardo-Garcell, Josefina; Aubert, Eduardo; Otero-Ojeda, Gloria

    2016-03-01

    Children with learning disabilities (LD) frequently have an EEG characterized by an excess of theta and a deficit of alpha activities. NFB using an auditory stimulus as reinforcer has proven to be a useful tool to treat LD children by positively reinforcing decreases of the theta/alpha ratio. The aim of the present study was to optimize the NFB procedure by comparing the efficacy of visual (with eyes open) versus auditory (with eyes closed) reinforcers. Twenty LD children with an abnormally high theta/alpha ratio were randomly assigned to the Auditory or the Visual group, where a 500 Hz tone or a visual stimulus (a white square), respectively, was used as a positive reinforcer when the value of the theta/alpha ratio was reduced. Both groups had signs consistent with EEG maturation, but only the Auditory Group showed behavioral/cognitive improvements. In conclusion, the auditory reinforcer was more efficacious in reducing the theta/alpha ratio, and it improved the cognitive abilities more than the visual reinforcer.

  4. Investigating the procedural variables that determine whether rats will display negative anticipatory contrast or positive induction.

    PubMed

    Weatherly, Jeffrey N; Nurnberger, Jeri T; Hanson, Brent C

    2005-08-31

    Previous studies have demonstrated that consumption of a low-valued food substance may decrease if access to a high-valued substance will soon be available (negative anticipatory contrast). Research has also demonstrated that responding for a low-valued reinforcer may increase if responding for a high-valued reinforcer will soon be possible (positive induction). The present experiment employed rats to respond in a procedure similar to that typically used to produce negative anticipatory contrast. The goal was to determine what factors contribute to when a contrast or an induction effect will occur. Based on previous research, the influence of auditory cues, temporal delays, food deprivation, and location of substance delivery were investigated. Auditory cues and temporal delays did little to influence whether subjects increased or decreased their consumption of 1% sucrose when access to 32% sucrose was upcoming. The appearance of contrast or induction was related to level of deprivation, with deprivation promoting induction. Which effect occurred also depended on whether subjects consumed the two substances from one spout in one location (induction) or from two different spouts in two different locations (contrast). The present results help identify the procedural link(s) between these two effects. They also provide insight to why positive induction may occur (i.e., higher-order place conditioning).

  5. The Effect of Sample Duration and Cue on a Double Temporal Discrimination

    ERIC Educational Resources Information Center

    Oliveira, Luis; Machado, Armando

    2008-01-01

    To test the assumptions of two models of timing, Scalar Expectancy Theory (SET) and Learning to Time (LeT), nine pigeons were exposed to two temporal discriminations, each signaled by a different cue. On half of the trials, pigeons learned to choose a red key after a 1.5-s horizontal bar and a green key after a 6-s horizontal bar; on the other…

  6. Establishment and Maintenance of Socially Learned Conditioned Reinforcement in Young Children: Elimination of the Role of Adults and View of Peers' Faces

    ERIC Educational Resources Information Center

    Zrinzo, Michelle; Greer, R. Douglas

    2013-01-01

    Prior research has demonstrated the establishment of reinforcers for learning and maintenance with young children as a function of social learning where a peer and an adult experimenter were present. The presence of an adult experimenter was eliminated in the present study to test if the effect produced in the prior studies would occur with only…

  7. Multisensory perceptual learning is dependent upon task difficulty.

    PubMed

    De Niear, Matthew A; Koo, Bonhwang; Wallace, Mark T

    2016-11-01

    There has been a growing interest in developing behavioral tasks to enhance temporal acuity as recent findings have demonstrated changes in temporal processing in a number of clinical conditions. Prior research has demonstrated that perceptual training can enhance temporal acuity both within and across different sensory modalities. Although certain forms of unisensory perceptual learning have been shown to be dependent upon task difficulty, this relationship has not been explored for multisensory learning. The present study sought to determine the effects of task difficulty on multisensory perceptual learning. Prior to and following a single training session, participants completed a simultaneity judgment (SJ) task, which required them to judge whether a visual stimulus (flash) and auditory stimulus (beep) presented in synchrony or at various stimulus onset asynchronies (SOAs) occurred synchronously or asynchronously. During the training session, participants completed the same SJ task but received feedback regarding the accuracy of their responses. Participants were randomly assigned to one of three levels of difficulty during training: easy, moderate, and hard, which were distinguished based on the SOAs used during training. We report that only the most difficult (i.e., hard) training protocol enhanced temporal acuity. We conclude that perceptual training protocols for enhancing multisensory temporal acuity may be optimized by employing audiovisual stimuli for which it is difficult to discriminate temporal synchrony from asynchrony.

  8. Structure identification in fuzzy inference using reinforcement learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap

    1993-01-01

    In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.

  9. Spared internal but impaired external reward prediction error signals in major depressive disorder during reinforcement learning.

    PubMed

    Bakic, Jasmina; Pourtois, Gilles; Jepma, Marieke; Duprat, Romain; De Raedt, Rudi; Baeken, Chris

    2017-01-01

    Major depressive disorder (MDD) creates debilitating effects on a wide range of cognitive functions, including reinforcement learning (RL). In this study, we sought to assess whether reward processing as such, or alternatively the complex interplay between motivation and reward might potentially account for the abnormal reward-based learning in MDD. A total of 35 treatment resistant MDD patients and 44 age matched healthy controls (HCs) performed a standard probabilistic learning task. RL was titrated using behavioral, computational modeling and event-related brain potentials (ERPs) data. MDD patients showed comparable learning rate compared to HCs. However, they showed decreased lose-shift responses as well as blunted subjective evaluations of the reinforcers used during the task, relative to HCs. Moreover, MDD patients showed normal internal (at the level of error-related negativity, ERN) but abnormal external (at the level of feedback-related negativity, FRN) reward prediction error (RPE) signals during RL, selectively when additional efforts had to be made to establish learning. Collectively, these results lend support to the assumption that MDD does not impair reward processing per se during RL. Instead, it seems to alter the processing of the emotional value of (external) reinforcers during RL, when additional intrinsic motivational processes have to be engaged. © 2016 Wiley Periodicals, Inc.

  10. Augmenting Learning in an Out-of-School Context: The Cognitive and Affective Impact of Two Cryogenics-Based Enrichment Programmes on Upper Primary Students

    ERIC Educational Resources Information Center

    Caleon, Imelda S.; Subramaniam, R.

    2007-01-01

    Concepts learned in the classroom were reinforced and augmented by presenting them in a different context using cryogenics-based enrichment programmes (CBEPs) held in an out-of-school setting. The effectiveness of two CBEPs, which involve the use of liquid nitrogen and liquid oxygen, was explored. Using a sample of 265 upper primary students, it…

  11. Relationship between Reinforcement and Eye Movements during Ocular Motor Training with Learning Disabled Children.

    ERIC Educational Resources Information Center

    Punnett, Audrey F.; Steinhauer, Gene D.

    1984-01-01

    Four reading disabled children were given eight sessions of ocular motor training with reinforcement and eight sessions without reinforcement. Two reading disabled control Ss were treated similarly but received no ocular motor training. Results demonstrated that reinforcement can improve ocular motor skills, which in turn elevates reading…

  12. The evolution of continuous learning of the structure of the environment

    PubMed Central

    Kolodny, Oren; Edelman, Shimon; Lotem, Arnon

    2014-01-01

    Continuous, ‘always on’, learning of structure from a stream of data is studied mainly in the fields of machine learning or language acquisition, but its evolutionary roots may go back to the first organisms that were internally motivated to learn and represent their environment. Here, we study under what conditions such continuous learning (CL) may be more adaptive than simple reinforcement learning and examine how it could have evolved from the same basic associative elements. We use agent-based computer simulations to compare three learning strategies: simple reinforcement learning; reinforcement learning with chaining (RL-chain) and CL that applies the same associative mechanisms used by the other strategies, but also seeks statistical regularities in the relations among all items in the environment, regardless of the initial association with food. We show that a sufficiently structured environment favours the evolution of both RL-chain and CL and that CL outperforms the other strategies when food is relatively rare and the time for learning is limited. This advantage of internally motivated CL stems from its ability to capture statistical patterns in the environment even before they are associated with food, at which point they immediately become useful for planning. PMID:24402920

  13. The partial-reinforcement extinction effect and the contingent-sampling hypothesis.

    PubMed

    Hochman, Guy; Erev, Ido

    2013-12-01

    The partial-reinforcement extinction effect (PREE) implies that learning under partial reinforcements is more robust than learning under full reinforcements. While the advantages of partial reinforcements have been well-documented in laboratory studies, field research has failed to support this prediction. In the present study, we aimed to clarify this pattern. Experiment 1 showed that partial reinforcements increase the tendency to select the promoted option during extinction; however, this effect is much smaller than the negative effect of partial reinforcements on the tendency to select the promoted option during the training phase. Experiment 2 demonstrated that the overall effect of partial reinforcements varies inversely with the attractiveness of the alternative to the promoted behavior: The overall effect is negative when the alternative is relatively attractive, and positive when the alternative is relatively unattractive. These results can be captured with a contingent-sampling model assuming that people select options that provided the best payoff in similar past experiences. The best fit was obtained under the assumption that similarity is defined by the sequence of the last four outcomes.

  14. Complementary roles for amygdala and periaqueductal gray in temporal-difference fear learning.

    PubMed

    Cole, Sindy; McNally, Gavan P

    2009-01-01

    Pavlovian fear conditioning is not a unitary process. At the neurobiological level multiple brain regions and neurotransmitters contribute to fear learning. At the behavioral level many variables contribute to fear learning including the physical salience of the events being learned about, the direction and magnitude of predictive error, and the rate at which these are learned about. These experiments used a serial compound conditioning design to determine the roles of basolateral amygdala (BLA) NMDA receptors and ventrolateral midbrain periaqueductal gray (vlPAG) mu-opioid receptors (MOR) in predictive fear learning. Rats received a three-stage design, which arranged for both positive and negative prediction errors producing bidirectional changes in fear learning within the same subjects during the test stage. Intra-BLA infusion of the NR2B receptor antagonist Ifenprodil prevented all learning. In contrast, intra-vlPAG infusion of the MOR antagonist CTAP enhanced learning in response to positive predictive error but impaired learning in response to negative predictive error--a pattern similar to Hebbian learning and an indication that fear learning had been divorced from predictive error. These findings identify complementary but dissociable roles for amygdala NMDA receptors and vlPAG MOR in temporal-difference predictive fear learning.

  15. Tiger salamanders' (Ambystoma tigrinum) response learning and usage of visual cues.

    PubMed

    Kundey, Shannon M A; Millar, Roberto; McPherson, Justin; Gonzalez, Maya; Fitz, Aleyna; Allen, Chadbourne

    2016-05-01

    We explored tiger salamanders' (Ambystoma tigrinum) learning to execute a response within a maze as proximal visual cue conditions varied. In Experiment 1, salamanders learned to turn consistently in a T-maze for reinforcement before the maze was rotated. All learned the initial task and executed the trained turn during test, suggesting that they learned to demonstrate the reinforced response during training and continued to perform it during test. In a second experiment utilizing a similar procedure, two visual cues were placed consistently at the maze junction. Salamanders were reinforced for turning towards one cue. Cue placement was reversed during test. All learned the initial task, but executed the trained turn rather than turning towards the visual cue during test, evidencing response learning. In Experiment 3, we investigated whether a compound visual cue could control salamanders' behaviour when it was the only cue predictive of reinforcement in a cross-maze by varying start position and cue placement. All learned to turn in the direction indicated by the compound visual cue, indicating that visual cues can come to control their behaviour. Following training, testing revealed that salamanders attended to stimuli foreground over background features. Overall, these results suggest that salamanders learn to execute responses over learning to use visual cues but can use visual cues if required. Our success with this paradigm offers the potential in future studies to explore salamanders' cognition further, as well as to shed light on how features of the tiger salamanders' life history (e.g. hibernation and metamorphosis) impact cognition.

  16. Temporal-difference prediction errors and Pavlovian fear conditioning: role of NMDA and opioid receptors.

    PubMed

    Cole, Sindy; McNally, Gavan P

    2007-10-01

    Three experiments studied temporal-difference (TD) prediction errors during Pavlovian fear conditioning. In Stage I, rats received conditioned stimulus A (CSA) paired with shock. In Stage II, they received pairings of CSA and CSB with shock that blocked learning to CSB. In Stage III, a serial overlapping compound, CSB --> CSA, was followed by shock. The change in intratrial durations supported fear learning to CSB but reduced fear of CSA, revealing the operation of TD prediction errors. N-methyl- D-aspartate (NMDA) receptor antagonism prior to Stage III prevented learning, whereas opioid receptor antagonism selectively affected predictive learning. These findings support a role for TD prediction errors in fear conditioning. They suggest that NMDA receptors contribute to fear learning by acting on the product of predictive error, whereas opioid receptors contribute to predictive error. (PsycINFO Database Record (c) 2007 APA, all rights reserved).

  17. Corticostriatal circuit mechanisms of value-based action selection: Implementation of reinforcement learning algorithms and beyond.

    PubMed

    Morita, Kenji; Jitsev, Jenia; Morrison, Abigail

    2016-09-15

    Value-based action selection has been suggested to be realized in the corticostriatal local circuits through competition among neural populations. In this article, we review theoretical and experimental studies that have constructed and verified this notion, and provide new perspectives on how the local-circuit selection mechanisms implement reinforcement learning (RL) algorithms and computations beyond them. The striatal neurons are mostly inhibitory, and lateral inhibition among them has been classically proposed to realize "Winner-Take-All (WTA)" selection of the maximum-valued action (i.e., 'max' operation). Although this view has been challenged by the revealed weakness, sparseness, and asymmetry of lateral inhibition, which suggest more complex dynamics, WTA-like competition could still occur on short time scales. Unlike the striatal circuit, the cortical circuit contains recurrent excitation, which may enable retention or temporal integration of information and probabilistic "soft-max" selection. The striatal "max" circuit and the cortical "soft-max" circuit might co-implement an RL algorithm called Q-learning; the cortical circuit might also similarly serve for other algorithms such as SARSA. In these implementations, the cortical circuit presumably sustains activity representing the executed action, which negatively impacts dopamine neurons so that they can calculate reward-prediction-error. Regarding the suggested more complex dynamics of striatal, as well as cortical, circuits on long time scales, which could be viewed as a sequence of short WTA fragments, computational roles remain open: such a sequence might represent (1) sequential state-action-state transitions, constituting replay or simulation of the internal model, (2) a single state/action by the whole trajectory, or (3) probabilistic sampling of state/action. Copyright © 2016. Published by Elsevier B.V.

  18. Behavioral research in pigeons with ARENA: An automated remote environmental navigation apparatus

    PubMed Central

    Leising, Kenneth J.; Garlick, Dennis; Parenteau, Michael; Blaisdell, Aaron P.

    2009-01-01

    Three experiments established the effectiveness of an Automated Remote Environmental Navigation Apparatus (ARENA) developed in our lab to study behavioral processes in pigeons. The technology utilizes one or more wireless modules, each capable of presenting colored lights as visual stimuli to signal reward and of detecting subject peck responses. In Experiment 1, subjects were instrumentally shaped to peck at a single ARENA module following an unsuccessful autoshaping procedure. In Experiment 2, pigeons were trained with a simultaneous discrimination procedure during which two modules were illuminated different colors; pecks to one color (S+) were reinforced while pecks to the other color (S−) were not. Pigeons learned to preferentially peck the module displaying the S+. In Experiment 3, two modules were lit the same color concurrently from a set of six colors in a conditional discrimination task. For three of the colors pecks to the module in one location (e.g., upper quadrant) were reinforced while for the remaining colors pecks at the other module (e.g., lower quadrant) were reinforced. After learning this discrimination, the color-reinforced location assignments were reversed. Pigeons successfully acquired the reversal. ARENA is an automated system for open-field studies and a more ecologically valid alternative to the touchscreen. PMID:19429204

  19. Learning about Chemiosmosis and ATP Synthesis with Animations Outside of the Classroom †

    PubMed Central

    Goff, Eric E.; Reindl, Katie M.; Johnson, Christina; McClean, Phillip; Offerdahl, Erika G.; Schroeder, Noah L.; White, Alan R.

    2017-01-01

    Many undergraduate biology courses have begun to implement instructional strategies aimed at increasing student interaction with course material outside of the classroom. Two examples of such practices are introducing students to concepts as preparation prior to instruction, and as conceptual reinforcement after the instructional period. Using a three-group design, we investigate the impact of an animation developed as part of the Virtual Cell Animation Collection on the topic of concentration gradients and their role in the actions of ATP synthase as a means of pre-class preparation or post-class reinforcement compared with a no-intervention control group. Results from seven sections of introductory biology (n = 732) randomized to treatments over two semesters show that students who viewed animation as preparation (d = 0.44, p < 0.001) or as reinforcement (d = 0.53, p < 0.001) both outperformed students in the control group on a follow-up assessment. Direct comparison of the preparation and reinforcement treatments shows no significant difference in student outcomes between the two treatment groups (p = 0.87). Results suggest that while student interaction with animations on the topic of concentration gradients outside of the classroom may lead to greater learning outcomes than the control group, in the traditional lecture-based course the timing of such interactions may not be as important. PMID:28512512

  20. Intelligent multiagent coordination based on reinforcement hierarchical neuro-fuzzy models.

    PubMed

    Mendoza, Leonardo Forero; Vellasco, Marley; Figueiredo, Karla

    2014-12-01

    This paper presents the research and development of two hybrid neuro-fuzzy models for the hierarchical coordination of multiple intelligent agents. The main objective of the models is to have multiple agents interact intelligently with each other in complex systems. We developed two new models of coordination for intelligent multiagent systems, which integrates the Reinforcement Learning Hierarchical Neuro-Fuzzy model with two proposed coordination mechanisms: the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with a market-driven coordination mechanism (MA-RL-HNFP-MD) and the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with graph coordination (MA-RL-HNFP-CG). In order to evaluate the proposed models and verify the contribution of the proposed coordination mechanisms, two multiagent benchmark applications were developed: the pursuit game and the robot soccer simulation. The results obtained demonstrated that the proposed coordination mechanisms greatly improve the performance of the multiagent system when compared with other strategies.

Top