computational reinforcement learning: Topics by Science.gov

Sample records for computational reinforcement learning

Reinforcement learning in computer vision

NASA Astrophysics Data System (ADS)

Bernstein, A. V.; Burnaev, E. V.

2018-04-01

Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computer vision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computer vision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computer vision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computer vision problems.
Enhanced Experience Replay for Deep Reinforcement Learning

DTIC Science & Technology

2015-11-01

ARL-TR-7538 ● NOV 2015 US Army Research Laboratory Enhanced Experience Replay for Deep Reinforcement Learning by David Doria...Experience Replay for Deep Reinforcement Learning by David Doria, Bryan Dawson, and Manuel Vindiola Computational and Information Sciences Directorate...
Hierarchically organized behavior and its neural foundations: A reinforcement-learning perspective

PubMed Central

Botvinick, Matthew M.; Niv, Yael; Barto, Andrew C.

2009-01-01

Research on human and animal behavior has long emphasized its hierarchical structure — the divisibility of ongoing behavior into discrete tasks, which are comprised of subtask sequences, which in turn are built of simple actions. The hierarchical structure of behavior has also been of enduring interest within neuroscience, where it has been widely considered to reflect prefrontal cortical functions. In this paper, we reexamine behavioral hierarchy and its neural substrates from the point of view of recent developments in computational reinforcement learning. Specifically, we consider a set of approaches known collectively as hierarchical reinforcement learning, which extend the reinforcement learning paradigm by allowing the learning agent to aggregate actions into reusable subroutines or skills. A close look at the components of hierarchical reinforcement learning suggests how they might map onto neural structures, in particular regions within the dorsolateral and orbital prefrontal cortex. It also suggests specific ways in which hierarchical reinforcement learning might provide a complement to existing psychological models of hierarchically structured behavior. A particularly important question that hierarchical reinforcement learning brings to the fore is that of how learning identifies new action routines that are likely to provide useful building blocks in solving a wide range of future problems. Here and at many other points, hierarchical reinforcement learning offers an appealing framework for investigating the computational and neural underpinnings of hierarchically structured behavior. PMID:18926527
Neural Basis of Reinforcement Learning and Decision Making

PubMed Central

Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan

2012-01-01

Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain. PMID:22462543
The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive.

PubMed

Otto, A Ross; Gershman, Samuel J; Markman, Arthur B; Daw, Nathaniel D

2013-05-01

A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls behavior-and under what circumstances-are still unclear. Following the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrated that having human decision makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement-learning strategy. Further, we showed that, across trials, people negotiate the trade-off between the two systems dynamically as a function of concurrent executive-function demands, and people's choice latencies reflect the computational expenses of the strategy they employ. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.
The Curse of Planning: Dissecting multiple reinforcement learning systems by taxing the central executive

PubMed Central

Otto, A. Ross; Gershman, Samuel J.; Markman, Arthur B.; Daw, Nathaniel D.

2013-01-01

A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. Along these lines, a flexible but computationally expensive model-based reinforcement learning system has been contrasted with a less flexible but more efficient model-free reinforcement learning system. The factors governing which system controls behavior—and under what circumstances—are still unclear. Based on the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrate that having human decision-makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement learning strategy. Further, we show that across trials, people negotiate this tradeoff dynamically as a function of concurrent executive function demands and their choice latencies reflect the computational expenses of the strategy employed. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources. PMID:23558545
Changes in corticostriatal connectivity during reinforcement learning in humans.

PubMed

Horga, Guillermo; Maia, Tiago V; Marsh, Rachel; Hao, Xuejun; Xu, Dongrong; Duan, Yunsuo; Tau, Gregory Z; Graniello, Barbara; Wang, Zhishun; Kangarlu, Alayar; Martinez, Diana; Packard, Mark G; Peterson, Bradley S

2015-02-01

Many computational models assume that reinforcement learning relies on changes in synaptic efficacy between cortical regions representing stimuli and striatal regions involved in response selection, but this assumption has thus far lacked empirical support in humans. We recorded hemodynamic signals with fMRI while participants navigated a virtual maze to find hidden rewards. We fitted a reinforcement-learning algorithm to participants' choice behavior and evaluated the neural activity and the changes in functional connectivity related to trial-by-trial learning variables. Activity in the posterior putamen during choice periods increased progressively during learning. Furthermore, the functional connections between the sensorimotor cortex and the posterior putamen strengthened progressively as participants learned the task. These changes in corticostriatal connectivity differentiated participants who learned the task from those who did not. These findings provide a direct link between changes in corticostriatal connectivity and learning, thereby supporting a central assumption common to several computational models of reinforcement learning. © 2014 Wiley Periodicals, Inc.
The Computational Development of Reinforcement Learning during Adolescence

PubMed Central

Palminteri, Stefano; Coricelli, Giorgio; Blakemore, Sarah-Jayne

2016-01-01

Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents’ behaviour was better explained by a basic reinforcement learning algorithm, adults’ behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence. PMID:27322574
The cerebellum: a neural system for the study of reinforcement learning.

PubMed

Swain, Rodney A; Kerr, Abigail L; Thompson, Richard F

2011-01-01

In its strictest application, the term "reinforcement learning" refers to a computational approach to learning in which an agent (often a machine) interacts with a mutable environment to maximize reward through trial and error. The approach borrows essentials from several fields, most notably Computer Science, Behavioral Neuroscience, and Psychology. At the most basic level, a neural system capable of mediating reinforcement learning must be able to acquire sensory information about the external environment and internal milieu (either directly or through connectivities with other brain regions), must be able to select a behavior to be executed, and must be capable of providing evaluative feedback about the success of that behavior. Given that Psychology informs us that reinforcers, both positive and negative, are stimuli or consequences that increase the probability that the immediately antecedent behavior will be repeated and that reinforcer strength or viability is modulated by the organism's past experience with the reinforcer, its affect, and even the state of its muscles (e.g., eyes open or closed); it is the case that any neural system that supports reinforcement learning must also be sensitive to these same considerations. Once learning is established, such a neural system must finally be able to maintain continued response expression and prevent response drift. In this report, we examine both historical and recent evidence that the cerebellum satisfies all of these requirements. While we report evidence from a variety of learning paradigms, the majority of our discussion will focus on classical conditioning of the rabbit eye blink response as an ideal model system for the study of reinforcement and reinforcement learning.
Computer Assisted Language Learning. Routledge Studies in Computer Assisted Language Learning

ERIC Educational Resources Information Center

Pennington, Martha

2011-01-01

Computer-assisted language learning (CALL) is an approach to language teaching and learning in which computer technology is used as an aid to the presentation, reinforcement and assessment of material to be learned, usually including a substantial interactive element. This books provides an up-to date and comprehensive overview of…
How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

PubMed

Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C

2014-03-01

Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.
A Robust Cooperated Control Method with Reinforcement Learning and Adaptive H∞ Control

NASA Astrophysics Data System (ADS)

Obayashi, Masanao; Uchiyama, Shogo; Kuremoto, Takashi; Kobayashi, Kunikazu

This study proposes a robust cooperated control method combining reinforcement learning with robust control to control the system. A remarkable characteristic of the reinforcement learning is that it doesn't require model formula, however, it doesn't guarantee the stability of the system. On the other hand, robust control system guarantees stability and robustness, however, it requires model formula. We employ both the actor-critic method which is a kind of reinforcement learning with minimal amount of computation to control continuous valued actions and the traditional robust control, that is, H∞ control. The proposed system was compared method with the conventional control method, that is, the actor-critic only used, through the computer simulation of controlling the angle and the position of a crane system, and the simulation result showed the effectiveness of the proposed method.
Racial bias shapes social reinforcement learning.

PubMed

Lindström, Björn; Selbing, Ida; Molapour, Tanaz; Olsson, Andreas

2014-03-01

Both emotional facial expressions and markers of racial-group belonging are ubiquitous signals in social interaction, but little is known about how these signals together affect future behavior through learning. To address this issue, we investigated how emotional (threatening or friendly) in-group and out-group faces reinforced behavior in a reinforcement-learning task. We asked whether reinforcement learning would be modulated by intergroup attitudes (i.e., racial bias). The results showed that individual differences in racial bias critically modulated reinforcement learning. As predicted, racial bias was associated with more efficiently learned avoidance of threatening out-group individuals. We used computational modeling analysis to quantitatively delimit the underlying processes affected by social reinforcement. These analyses showed that racial bias modulates the rate at which exposure to threatening out-group individuals is transformed into future avoidance behavior. In concert, these results shed new light on the learning processes underlying social interaction with racial-in-group and out-group individuals.
A parameter control method in reinforcement learning to rapidly follow unexpected environmental changes.

PubMed

Murakoshi, Kazushi; Mizuno, Junya

2004-11-01

In order to rapidly follow unexpected environmental changes, we propose a parameter control method in reinforcement learning that changes each of learning parameters in appropriate directions. We determine each appropriate direction on the basis of relationships between behaviors and neuromodulators by considering an emergency as a key word. Computer experiments show that the agents using our proposed method could rapidly respond to unexpected environmental changes, not depending on either two reinforcement learning algorithms (Q-learning and actor-critic (AC) architecture) or two learning problems (discontinuous and continuous state-action problems).
Mastery Learning through Individualized Instruction: A Reinforcement Strategy

ERIC Educational Resources Information Center

Sagy, John; Ravi, R.; Ananthasayanam, R.

2009-01-01

The present study attempts to gauge the effect of individualized instructional methods as a reinforcement strategy for mastery learning. Among various individualized instructional methods, the study focuses on PIM (Programmed Instructional Method) and CAIM (Computer Assisted Instruction Method). Mastery learning is a process where students achieve…
Reinforcement Learning and Dopamine in Schizophrenia: Dimensions of Symptoms or Specific Features of a Disease Group?

PubMed Central

Deserno, Lorenz; Boehme, Rebecca; Heinz, Andreas; Schlagenhauf, Florian

2013-01-01

Abnormalities in reinforcement learning are a key finding in schizophrenia and have been proposed to be linked to elevated levels of dopamine neurotransmission. Behavioral deficits in reinforcement learning and their neural correlates may contribute to the formation of clinical characteristics of schizophrenia. The ability to form predictions about future outcomes is fundamental for environmental interactions and depends on neuronal teaching signals, like reward prediction errors. While aberrant prediction errors, that encode non-salient events as surprising, have been proposed to contribute to the formation of positive symptoms, a failure to build neural representations of decision values may result in negative symptoms. Here, we review behavioral and neuroimaging research in schizophrenia and focus on studies that implemented reinforcement learning models. In addition, we discuss studies that combined reinforcement learning with measures of dopamine. Thereby, we suggest how reinforcement learning abnormalities in schizophrenia may contribute to the formation of psychotic symptoms and may interact with cognitive deficits. These ideas point toward an interplay of more rigid versus flexible control over reinforcement learning. Pronounced deficits in the flexible or model-based domain may allow for a detailed characterization of well-established cognitive deficits in schizophrenia patients based on computational models of learning. Finally, we propose a framework based on the potentially crucial contribution of dopamine to dysfunctional reinforcement learning on the level of neural networks. Future research may strongly benefit from computational modeling but also requires further methodological improvement for clinical group studies. These research tools may help to improve our understanding of disease-specific mechanisms and may help to identify clinically relevant subgroups of the heterogeneous entity schizophrenia. PMID:24391603
Probabilistic Reinforcement Learning in Adults with Autism Spectrum Disorders

PubMed Central

Solomon, Marjorie; Smith, Anne C.; Frank, Michael J.; Ly, Stanford; Carter, Cameron S.

2017-01-01

Background Autism spectrum disorders (ASDs) can be conceptualized as disorders of learning, however there have been few experimental studies taking this perspective. Methods We examined the probabilistic reinforcement learning performance of 28 adults with ASDs and 30 typically developing adults on a task requiring learning relationships between three stimulus pairs consisting of Japanese characters with feedback that was valid with different probabilities (80%, 70%, and 60%). Both univariate and Bayesian state–space data analytic methods were employed. Hypotheses were based on the extant literature as well as on neurobiological and computational models of reinforcement learning. Results Both groups learned the task after training. However, there were group differences in early learning in the first task block where individuals with ASDs acquired the most frequently accurately reinforced stimulus pair (80%) comparably to typically developing individuals; exhibited poorer acquisition of the less frequently reinforced 70% pair as assessed by state–space learning curves; and outperformed typically developing individuals on the near chance (60%) pair. Individuals with ASDs also demonstrated deficits in using positive feedback to exploit rewarded choices. Conclusions Results support the contention that individuals with ASDs are slower learners. Based on neurobiology and on the results of computational modeling, one interpretation of this pattern of findings is that impairments are related to deficits in flexible updating of reinforcement history as mediated by the orbito-frontal cortex, with spared functioning of the basal ganglia. This hypothesis about the pathophysiology of learning in ASDs can be tested using functional magnetic resonance imaging. PMID:21425243
Computer-Assisted Language Learning: Diversity in Research and Practice

ERIC Educational Resources Information Center

Stockwell, Glenn, Ed.

2012-01-01

Computer-assisted language learning (CALL) is an approach to teaching and learning languages that uses computers and other technologies to present, reinforce, and assess material to be learned, or to create environments where teachers and learners can interact with one another and the outside world. This book provides a much-needed overview of the…
A computational psychiatry approach identifies how alpha-2A noradrenergic agonist Guanfacine affects feature-based reinforcement learning in the macaque

PubMed Central

Hassani, S. A.; Oemisch, M.; Balcarras, M.; Westendorff, S.; Ardid, S.; van der Meer, M. A.; Tiesinga, P.; Womelsdorf, T.

2017-01-01

Noradrenaline is believed to support cognitive flexibility through the alpha 2A noradrenergic receptor (a2A-NAR) acting in prefrontal cortex. Enhanced flexibility has been inferred from improved working memory with the a2A-NA agonist Guanfacine. But it has been unclear whether Guanfacine improves specific attention and learning mechanisms beyond working memory, and whether the drug effects can be formalized computationally to allow single subject predictions. We tested and confirmed these suggestions in a case study with a healthy nonhuman primate performing a feature-based reversal learning task evaluating performance using Bayesian and Reinforcement learning models. In an initial dose-testing phase we found a Guanfacine dose that increased performance accuracy, decreased distractibility and improved learning. In a second experimental phase using only that dose we examined the faster feature-based reversal learning with Guanfacine with single-subject computational modeling. Parameter estimation suggested that improved learning is not accounted for by varying a single reinforcement learning mechanism, but by changing the set of parameter values to higher learning rates and stronger suppression of non-chosen over chosen feature information. These findings provide an important starting point for developing nonhuman primate models to discern the synaptic mechanisms of attention and learning functions within the context of a computational neuropsychiatry framework. PMID:28091572
Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models

PubMed Central

Najnin, Shamima; Banerjee, Bonny

2018-01-01

Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The “novel words to novel objects” language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task. PMID:29441027

Generalization of value in reinforcement learning by humans.

PubMed

Wimmer, G Elliott; Daw, Nathaniel D; Shohamy, Daphna

2012-04-01

Research in decision-making has focused on the role of dopamine and its striatal targets in guiding choices via learned stimulus-reward or stimulus-response associations, behavior that is well described by reinforcement learning theories. However, basic reinforcement learning is relatively limited in scope and does not explain how learning about stimulus regularities or relations may guide decision-making. A candidate mechanism for this type of learning comes from the domain of memory, which has highlighted a role for the hippocampus in learning of stimulus-stimulus relations, typically dissociated from the role of the striatum in stimulus-response learning. Here, we used functional magnetic resonance imaging and computational model-based analyses to examine the joint contributions of these mechanisms to reinforcement learning. Humans performed a reinforcement learning task with added relational structure, modeled after tasks used to isolate hippocampal contributions to memory. On each trial participants chose one of four options, but the reward probabilities for pairs of options were correlated across trials. This (uninstructed) relationship between pairs of options potentially enabled an observer to learn about option values based on experience with the other options and to generalize across them. We observed blood oxygen level-dependent (BOLD) activity related to learning in the striatum and also in the hippocampus. By comparing a basic reinforcement learning model to one augmented to allow feedback to generalize between correlated options, we tested whether choice behavior and BOLD activity were influenced by the opportunity to generalize across correlated options. Although such generalization goes beyond standard computational accounts of reinforcement learning and striatal BOLD, both choices and striatal BOLD activity were better explained by the augmented model. Consistent with the hypothesized role for the hippocampus in this generalization, functional connectivity between the ventral striatum and hippocampus was modulated, across participants, by the ability of the augmented model to capture participants' choice. Our results thus point toward an interactive model in which striatal reinforcement learning systems may employ relational representations typically associated with the hippocampus. © 2012 The Authors. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.
Effect of reinforcement learning on coordination of multiangent systems

NASA Astrophysics Data System (ADS)

Bukkapatnam, Satish T. S.; Gao, Greg

2000-12-01

For effective coordination of distributed environments involving multiagent systems, learning ability of each agent in the environment plays a crucial role. In this paper, we develop a simple group learning method based on reinforcement, and study its effect on coordination through application to a supply chain procurement scenario involving a computer manufacturer. Here, all parties are represented by self-interested, autonomous agents, each capable of performing specific simple tasks. They negotiate with each other to perform complex tasks and thus coordinate supply chain procurement. Reinforcement learning is intended to enable each agent to reach a best negotiable price within a shortest possible time. Our simulations of the application scenario under different learning strategies reveals the positive effects of reinforcement learning on an agent's as well as the system's performance.
A Discussion of Possibility of Reinforcement Learning Using Event-Related Potential in BCI

NASA Astrophysics Data System (ADS)

Yamagishi, Yuya; Tsubone, Tadashi; Wada, Yasuhiro

Recently, Brain computer interface (BCI) which is a direct connecting pathway an external device such as a computer or a robot and a human brain have gotten a lot of attention. Since BCI can control the machines as robots by using the brain activity without using the voluntary muscle, the BCI may become a useful communication tool for handicapped persons, for instance, amyotrophic lateral sclerosis patients. However, in order to realize the BCI system which can perform precise tasks on various environments, it is necessary to design the control rules to adapt to the dynamic environments. Reinforcement learning is one approach of the design of the control rule. If this reinforcement leaning can be performed by the brain activity, it leads to the attainment of BCI that has general versatility. In this research, we paid attention to P300 of event-related potential as an alternative signal of the reward of reinforcement learning. We discriminated between the success and the failure trials from P300 of the EEG of the single trial by using the proposed discrimination algorithm based on Support vector machine. The possibility of reinforcement learning was examined from the viewpoint of the number of discriminated trials. It was shown that there was a possibility to be able to learn in most subjects.
Reinforcement learning in multidimensional environments relies on attention mechanisms.

PubMed

Niv, Yael; Daniel, Reka; Geana, Andra; Gershman, Samuel J; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C

2015-05-27

In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this "representation learning" process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the "curse of dimensionality" in reinforcement learning. Copyright © 2015 the authors 0270-6474/15/358145-13$15.00/0.
The "proactive" model of learning: Integrative framework for model-free and model-based reinforcement learning utilizing the associative learning-based proactive brain concept.

PubMed

Zsuga, Judit; Biro, Klara; Papp, Csaba; Tajti, Gabor; Gesztelyi, Rudolf

2016-02-01

Reinforcement learning (RL) is a powerful concept underlying forms of associative learning governed by the use of a scalar reward signal, with learning taking place if expectations are violated. RL may be assessed using model-based and model-free approaches. Model-based reinforcement learning involves the amygdala, the hippocampus, and the orbitofrontal cortex (OFC). The model-free system involves the pedunculopontine-tegmental nucleus (PPTgN), the ventral tegmental area (VTA) and the ventral striatum (VS). Based on the functional connectivity of VS, model-free and model based RL systems center on the VS that by integrating model-free signals (received as reward prediction error) and model-based reward related input computes value. Using the concept of reinforcement learning agent we propose that the VS serves as the value function component of the RL agent. Regarding the model utilized for model-based computations we turned to the proactive brain concept, which offers an ubiquitous function for the default network based on its great functional overlap with contextual associative areas. Hence, by means of the default network the brain continuously organizes its environment into context frames enabling the formulation of analogy-based association that are turned into predictions of what to expect. The OFC integrates reward-related information into context frames upon computing reward expectation by compiling stimulus-reward and context-reward information offered by the amygdala and hippocampus, respectively. Furthermore we suggest that the integration of model-based expectations regarding reward into the value signal is further supported by the efferent of the OFC that reach structures canonical for model-free learning (e.g., the PPTgN, VTA, and VS). (c) 2016 APA, all rights reserved).
Developmental Changes in Learning: Computational Mechanisms and Social Influences

PubMed Central

Bolenz, Florian; Reiter, Andrea M. F.; Eppinger, Ben

2017-01-01

Our ability to learn from the outcomes of our actions and to adapt our decisions accordingly changes over the course of the human lifespan. In recent years, there has been an increasing interest in using computational models to understand developmental changes in learning and decision-making. Moreover, extensions of these models are currently applied to study socio-emotional influences on learning in different age groups, a topic that is of great relevance for applications in education and health psychology. In this article, we aim to provide an introduction to basic ideas underlying computational models of reinforcement learning and focus on parameters and model variants that might be of interest to developmental scientists. We then highlight recent attempts to use reinforcement learning models to study the influence of social information on learning across development. The aim of this review is to illustrate how computational models can be applied in developmental science, what they can add to our understanding of developmental mechanisms and how they can be used to bridge the gap between psychological and neurobiological theories of development. PMID:29250006
Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms

PubMed Central

Daniel, Reka; Geana, Andra; Gershman, Samuel J.; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C.

2015-01-01

In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this “representation learning” process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the “curse of dimensionality” in reinforcement learning. PMID:26019331
Predicting psychosis across diagnostic boundaries: Behavioral and computational modeling evidence for impaired reinforcement learning in schizophrenia and bipolar disorder with a history of psychosis.

PubMed

Strauss, Gregory P; Thaler, Nicholas S; Matveeva, Tatyana M; Vogel, Sally J; Sutton, Griffin P; Lee, Bern G; Allen, Daniel N

2015-08-01

There is increasing evidence that schizophrenia (SZ) and bipolar disorder (BD) share a number of cognitive, neurobiological, and genetic markers. Shared features may be most prevalent among SZ and BD with a history of psychosis. This study extended this literature by examining reinforcement learning (RL) performance in individuals with SZ (n = 29), BD with a history of psychosis (BD+; n = 24), BD without a history of psychosis (BD-; n = 23), and healthy controls (HC; n = 24). RL was assessed through a probabilistic stimulus selection task with acquisition and test phases. Computational modeling evaluated competing accounts of the data. Each participant's trial-by-trial decision-making behavior was fit to 3 computational models of RL: (a) a standard actor-critic model simulating pure basal ganglia-dependent learning, (b) a pure Q-learning model simulating action selection as a function of learned expected reward value, and (c) a hybrid model where an actor-critic is "augmented" by a Q-learning component, meant to capture the top-down influence of orbitofrontal cortex value representations on the striatum. The SZ group demonstrated greater reinforcement learning impairments at acquisition and test phases than the BD+, BD-, and HC groups. The BD+ and BD- groups displayed comparable performance at acquisition and test phases. Collapsing across diagnostic categories, greater severity of current psychosis was associated with poorer acquisition of the most rewarding stimuli as well as poor go/no-go learning at test. Model fits revealed that reinforcement learning in SZ was best characterized by a pure actor-critic model where learning is driven by prediction error signaling alone. In contrast, BD-, BD+, and HC were best fit by a hybrid model where prediction errors are influenced by top-down expected value representations that guide decision making. These findings suggest that abnormalities in the reward system are more prominent in SZ than BD; however, current psychotic symptoms may be associated with reinforcement learning deficits regardless of a Diagnostic and Statistical Manual of Mental Disorders (5th Edition; American Psychiatric Association, 2013) diagnosis. (c) 2015 APA, all rights reserved).
Reinforcement learning in depression: A review of computational research.

PubMed

Chen, Chong; Takahashi, Taiki; Nakagawa, Shin; Inoue, Takeshi; Kusumi, Ichiro

2015-08-01

Despite being considered primarily a mood disorder, major depressive disorder (MDD) is characterized by cognitive and decision making deficits. Recent research has employed computational models of reinforcement learning (RL) to address these deficits. The computational approach has the advantage in making explicit predictions about learning and behavior, specifying the process parameters of RL, differentiating between model-free and model-based RL, and the computational model-based functional magnetic resonance imaging and electroencephalography. With these merits there has been an emerging field of computational psychiatry and here we review specific studies that focused on MDD. Considerable evidence suggests that MDD is associated with impaired brain signals of reward prediction error and expected value ('wanting'), decreased reward sensitivity ('liking') and/or learning (be it model-free or model-based), etc., although the causality remains unclear. These parameters may serve as valuable intermediate phenotypes of MDD, linking general clinical symptoms to underlying molecular dysfunctions. We believe future computational research at clinical, systems, and cellular/molecular/genetic levels will propel us toward a better understanding of the disease. Copyright © 2015 Elsevier Ltd. All rights reserved.
Fast and Epsilon-Optimal Discretized Pursuit Learning Automata.

PubMed

Zhang, JunQi; Wang, Cheng; Zhou, MengChu

2015-10-01

Learning automata (LA) are powerful tools for reinforcement learning. A discretized pursuit LA is the most popular one among them. During an iteration its operation consists of three basic phases: 1) selecting the next action; 2) finding the optimal estimated action; and 3) updating the state probability. However, when the number of actions is large, the learning becomes extremely slow because there are too many updates to be made at each iteration. The increased updates are mostly from phases 1 and 3. A new fast discretized pursuit LA with assured ε -optimality is proposed to perform both phases 1 and 3 with the computational complexity independent of the number of actions. Apart from its low computational complexity, it achieves faster convergence speed than the classical one when operating in stationary environments. This paper can promote the applications of LA toward the large-scale-action oriented area that requires efficient reinforcement learning tools with assured ε -optimality, fast convergence speed, and low computational complexity for each iteration.
Reinforcement learning and decision making in monkeys during a competitive game.

PubMed

Lee, Daeyeol; Conroy, Michelle L; McGreevy, Benjamin P; Barraclough, Dominic J

2004-12-01

Animals living in a dynamic environment must adjust their decision-making strategies through experience. To gain insights into the neural basis of such adaptive decision-making processes, we trained monkeys to play a competitive game against a computer in an oculomotor free-choice task. The animal selected one of two visual targets in each trial and was rewarded only when it selected the same target as the computer opponent. To determine how the animal's decision-making strategy can be affected by the opponent's strategy, the computer opponent was programmed with three different algorithms that exploited different aspects of the animal's choice and reward history. When the computer selected its targets randomly with equal probabilities, animals selected one of the targets more often, violating the prediction of probability matching, and their choices were systematically influenced by the choice history of the two players. When the computer exploited only the animal's choice history but not its reward history, animal's choice became more independent of its own choice history but was still related to the choice history of the opponent. This bias was substantially reduced, but not completely eliminated, when the computer used the choice history of both players in making its predictions. These biases were consistent with the predictions of reinforcement learning, suggesting that the animals sought optimal decision-making strategies using reinforcement learning algorithms.
Affect and the computer game player: the effect of gender, personality, and game reinforcement structure on affective responses to computer game-play.

PubMed

Chumbley, Justin; Griffiths, Mark

2006-06-01

Previous research on computer games has tended to concentrate on their more negative effects (e.g., addiction, increased aggression). This study departs from the traditional clinical and social learning explanations for these behavioral phenomena and examines the effect of personality, in-game reinforcement characteristics, gender, and skill on the emotional state of the game-player. Results demonstrated that in-game reinforcement characteristics and skill significantly effect a number of affective measures (most notably excitement and frustration). The implications of the impact of game-play on affect are discussed with reference to the concepts of "addiction" and "aggression."
The drift diffusion model as the choice rule in reinforcement learning.

PubMed

Pedersen, Mads Lund; Frank, Michael J; Biele, Guido

2017-08-01

Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.
The drift diffusion model as the choice rule in reinforcement learning

PubMed Central

Frank, Michael J.

2017-01-01

Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyper-activity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups. PMID:27966103
Reinforcement learning with Marr.

PubMed

Niv, Yael; Langdon, Angela

2016-10-01

To many, the poster child for David Marr's famous three levels of scientific inquiry is reinforcement learning-a computational theory of reward optimization, which readily prescribes algorithmic solutions that evidence striking resemblance to signals found in the brain, suggesting a straightforward neural implementation. Here we review questions that remain open at each level of analysis, concluding that the path forward to their resolution calls for inspiration across levels, rather than a focus on mutual constraints.
Within- and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory.

PubMed

Collins, Anne G E; Frank, Michael J

2018-03-06

Learning from rewards and punishments is essential to survival and facilitates flexible human behavior. It is widely appreciated that multiple cognitive and reinforcement learning systems contribute to decision-making, but the nature of their interactions is elusive. Here, we leverage methods for extracting trial-by-trial indices of reinforcement learning (RL) and working memory (WM) in human electro-encephalography to reveal single-trial computations beyond that afforded by behavior alone. Neural dynamics confirmed that increases in neural expectation were predictive of reduced neural surprise in the following feedback period, supporting central tenets of RL models. Within- and cross-trial dynamics revealed a cooperative interplay between systems for learning, in which WM contributes expectations to guide RL, despite competition between systems during choice. Together, these results provide a deeper understanding of how multiple neural systems interact for learning and decision-making and facilitate analysis of their disruption in clinical populations.
Learning and tuning fuzzy logic controllers through reinforcements.

PubMed

Berenji, H R; Khedkar, P

1992-01-01

A method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system is presented. It is shown that: the generalized approximate-reasoning-based intelligent control (GARIC) architecture learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward network, which can then adaptively improve performance by using gradient descent methods. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.
Goal-Directed and Habit-Like Modulations of Stimulus Processing during Reinforcement Learning.

PubMed

Luque, David; Beesley, Tom; Morris, Richard W; Jack, Bradley N; Griffiths, Oren; Whitford, Thomas J; Le Pelley, Mike E

2017-03-15

Recent research has shown that perceptual processing of stimuli previously associated with high-value rewards is automatically prioritized even when rewards are no longer available. It has been hypothesized that such reward-related modulation of stimulus salience is conceptually similar to an "attentional habit." Recording event-related potentials in humans during a reinforcement learning task, we show strong evidence in favor of this hypothesis. Resistance to outcome devaluation (the defining feature of a habit) was shown by the stimulus-locked P1 component, reflecting activity in the extrastriate visual cortex. Analysis at longer latencies revealed a positive component (corresponding to the P3b, from 550-700 ms) sensitive to outcome devaluation. Therefore, distinct spatiotemporal patterns of brain activity were observed corresponding to habitual and goal-directed processes. These results demonstrate that reinforcement learning engages both attentional habits and goal-directed processes in parallel. Consequences for brain and computational models of reinforcement learning are discussed. SIGNIFICANCE STATEMENT The human attentional network adapts to detect stimuli that predict important rewards. A recent hypothesis suggests that the visual cortex automatically prioritizes reward-related stimuli, driven by cached representations of reward value; that is, stimulus-response habits. Alternatively, the neural system may track the current value of the predicted outcome. Our results demonstrate for the first time that visual cortex activity is increased for reward-related stimuli even when the rewarding event is temporarily devalued. In contrast, longer-latency brain activity was specifically sensitive to transient changes in reward value. Therefore, we show that both habit-like attention and goal-directed processes occur in the same learning episode at different latencies. This result has important consequences for computational models of reinforcement learning. Copyright © 2017 the authors 0270-6474/17/373009-09$15.00/0.
Working Memory Contributions to Reinforcement Learning Impairments in Schizophrenia

PubMed Central

Brown, Jaime K.; Gold, James M.; Waltz, James A.; Frank, Michael J.

2014-01-01

Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine-dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia. PMID:25297101
Social stress reactivity alters reward and punishment learning

PubMed Central

Frank, Michael J.; Allen, John J. B.

2011-01-01

To examine how stress affects cognitive functioning, individual differences in trait vulnerability (punishment sensitivity) and state reactivity (negative affect) to social evaluative threat were examined during concurrent reinforcement learning. Lower trait-level punishment sensitivity predicted better reward learning and poorer punishment learning; the opposite pattern was found in more punishment sensitive individuals. Increasing state-level negative affect was directly related to punishment learning accuracy in highly punishment sensitive individuals, but these measures were inversely related in less sensitive individuals. Combined electrophysiological measurement, performance accuracy and computational estimations of learning parameters suggest that trait and state vulnerability to stress alter cortico-striatal functioning during reinforcement learning, possibly mediated via medio-frontal cortical systems. PMID:20453038

Social stress reactivity alters reward and punishment learning.

PubMed

Cavanagh, James F; Frank, Michael J; Allen, John J B

2011-06-01

To examine how stress affects cognitive functioning, individual differences in trait vulnerability (punishment sensitivity) and state reactivity (negative affect) to social evaluative threat were examined during concurrent reinforcement learning. Lower trait-level punishment sensitivity predicted better reward learning and poorer punishment learning; the opposite pattern was found in more punishment sensitive individuals. Increasing state-level negative affect was directly related to punishment learning accuracy in highly punishment sensitive individuals, but these measures were inversely related in less sensitive individuals. Combined electrophysiological measurement, performance accuracy and computational estimations of learning parameters suggest that trait and state vulnerability to stress alter cortico-striatal functioning during reinforcement learning, possibly mediated via medio-frontal cortical systems.
Mesolimbic confidence signals guide perceptual learning in the absence of external feedback

PubMed Central

Guggenmos, Matthias; Wilbertz, Gregor; Hebart, Martin N; Sterzer, Philipp

2016-01-01

It is well established that learning can occur without external feedback, yet normative reinforcement learning theories have difficulties explaining such instances of learning. Here, we propose that human observers are capable of generating their own feedback signals by monitoring internal decision variables. We investigated this hypothesis in a visual perceptual learning task using fMRI and confidence reports as a measure for this monitoring process. Employing a novel computational model in which learning is guided by confidence-based reinforcement signals, we found that mesolimbic brain areas encoded both anticipation and prediction error of confidence—in remarkable similarity to previous findings for external reward-based feedback. We demonstrate that the model accounts for choice and confidence reports and show that the mesolimbic confidence prediction error modulation derived through the model predicts individual learning success. These results provide a mechanistic neurobiological explanation for learning without external feedback by augmenting reinforcement models with confidence-based feedback. DOI: http://dx.doi.org/10.7554/eLife.13388.001 PMID:27021283
Application of fuzzy logic-neural network based reinforcement learning to proximity and docking operations

NASA Technical Reports Server (NTRS)

Jani, Yashvant

1992-01-01

As part of the Research Institute for Computing and Information Systems (RICIS) activity, the reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Max satellite simulation. This activity is carried out in the software technology laboratory utilizing the Orbital Operations Simulator (OOS). This interim report provides the status of the project and outlines the future plans.
Quantum reinforcement learning.

PubMed

Dong, Daoyi; Chen, Chunlin; Li, Hanxiong; Tarn, Tzyh-Jong

2008-10-01

The key approaches for machine learning, particularly learning in unknown probabilistic environments, are new representations and computation mechanisms. In this paper, a novel quantum reinforcement learning (QRL) method is proposed by combining quantum theory and reinforcement learning (RL). Inspired by the state superposition principle and quantum parallelism, a framework of a value-updating algorithm is introduced. The state (action) in traditional RL is identified as the eigen state (eigen action) in QRL. The state (action) set can be represented with a quantum superposition state, and the eigen state (eigen action) can be obtained by randomly observing the simulated quantum state according to the collapse postulate of quantum measurement. The probability of the eigen action is determined by the probability amplitude, which is updated in parallel according to rewards. Some related characteristics of QRL such as convergence, optimality, and balancing between exploration and exploitation are also analyzed, which shows that this approach makes a good tradeoff between exploration and exploitation using the probability amplitude and can speedup learning through the quantum parallelism. To evaluate the performance and practicability of QRL, several simulated experiments are given, and the results demonstrate the effectiveness and superiority of the QRL algorithm for some complex problems. This paper is also an effective exploration on the application of quantum computation to artificial intelligence.
A spiking neural network model of model-free reinforcement learning with high-dimensional sensory input and perceptual ambiguity.

PubMed

Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji

2015-01-01

A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach.
A Spiking Neural Network Model of Model-Free Reinforcement Learning with High-Dimensional Sensory Input and Perceptual Ambiguity

PubMed Central

Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji

2015-01-01

A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach. PMID:25734662
A neural model of hierarchical reinforcement learning.

PubMed

Rasmussen, Daniel; Voelker, Aaron; Eliasmith, Chris

2017-01-01

We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain's general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model's behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions.
Deficient reinforcement learning in medial frontal cortex as a model of dopamine-related motivational deficits in ADHD.

PubMed

Silvetti, Massimo; Wiersema, Jan R; Sonuga-Barke, Edmund; Verguts, Tom

2013-10-01

Attention Deficit/Hyperactivity Disorder (ADHD) is a pathophysiologically complex and heterogeneous condition with both cognitive and motivational components. We propose a novel computational hypothesis of motivational deficits in ADHD, drawing together recent evidence on the role of anterior cingulate cortex (ACC) and associated mesolimbic dopamine circuits in both reinforcement learning and ADHD. Based on findings of dopamine dysregulation and ACC involvement in ADHD we simulated a lesion in a previously validated computational model of ACC (Reward Value and Prediction Model, RVPM). We explored the effects of the lesion on the processing of reinforcement signals. We tested specific behavioral predictions about the profile of reinforcement-related deficits in ADHD in three experimental contexts; probability tracking task, partial and continuous reward schedules, and immediate versus delayed rewards. In addition, predictions were made at the neurophysiological level. Behavioral and neurophysiological predictions from the RVPM-based lesion-model of motivational dysfunction in ADHD were confirmed by data from previously published studies. RVPM represents a promising model of ADHD reinforcement learning suggesting that ACC dysregulation might play a role in the pathogenesis of motivational deficits in ADHD. However, more behavioral and neurophysiological studies are required to test core predictions of the model. In addition, the interaction with different brain networks underpinning other aspects of ADHD neuropathology (i.e., executive function) needs to be better understood. Copyright © 2013 Elsevier Ltd. All rights reserved.
Working memory contributions to reinforcement learning impairments in schizophrenia.

PubMed

Collins, Anne G E; Brown, Jaime K; Gold, James M; Waltz, James A; Frank, Michael J

2014-10-08

Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine-dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia. Copyright © 2014 the authors 0270-6474/14/3413747-10$15.00/0.
Hybrid computing using a neural network with dynamic external memory.

PubMed

Graves, Alex; Wayne, Greg; Reynolds, Malcolm; Harley, Tim; Danihelka, Ivo; Grabska-Barwińska, Agnieszka; Colmenarejo, Sergio Gómez; Grefenstette, Edward; Ramalho, Tiago; Agapiou, John; Badia, Adrià Puigdomènech; Hermann, Karl Moritz; Zwols, Yori; Ostrovski, Georg; Cain, Adam; King, Helen; Summerfield, Christopher; Blunsom, Phil; Kavukcuoglu, Koray; Hassabis, Demis

2016-10-27

Artificial neural networks are remarkably adept at sensory processing, sequence learning and reinforcement learning, but are limited in their ability to represent variables and data structures and to store data over long timescales, owing to the lack of an external memory. Here we introduce a machine learning model called a differentiable neural computer (DNC), which consists of a neural network that can read from and write to an external memory matrix, analogous to the random-access memory in a conventional computer. Like a conventional computer, it can use its memory to represent and manipulate complex data structures, but, like a neural network, it can learn to do so from data. When trained with supervised learning, we demonstrate that a DNC can successfully answer synthetic questions designed to emulate reasoning and inference problems in natural language. We show that it can learn tasks such as finding the shortest path between specified points and inferring the missing links in randomly generated graphs, and then generalize these tasks to specific graphs such as transport networks and family trees. When trained with reinforcement learning, a DNC can complete a moving blocks puzzle in which changing goals are specified by sequences of symbols. Taken together, our results demonstrate that DNCs have the capacity to solve complex, structured tasks that are inaccessible to neural networks without external read-write memory.
Instructional control of reinforcement learning: A behavioral and neurocomputational investigation

PubMed Central

Doll, Bradley B.; Jacobs, W. Jake; Sanfey, Alan G.; Frank, Michael J.

2011-01-01

Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is “overridden” at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract “Q-learning” and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a “confirmation bias” in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes. PMID:19595993
Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework.

PubMed

Gershman, Samuel J; Daw, Nathaniel D

2017-01-03

We review the psychology and neuroscience of reinforcement learning (RL), which has experienced significant progress in the past two decades, enabled by the comprehensive experimental study of simple learning and decision-making tasks. However, one challenge in the study of RL is computational: The simplicity of these tasks ignores important aspects of reinforcement learning in the real world: (a) State spaces are high-dimensional, continuous, and partially observable; this implies that (b) data are relatively sparse and, indeed, precisely the same situation may never be encountered twice; furthermore, (c) rewards depend on the long-term consequences of actions in ways that violate the classical assumptions that make RL tractable. A seemingly distinct challenge is that, cognitively, theories of RL have largely involved procedural and semantic memory, the way in which knowledge about action values or world models extracted gradually from many experiences can drive choice. This focus on semantic memory leaves out many aspects of memory, such as episodic memory, related to the traces of individual events. We suggest that these two challenges are related. The computational challenge can be dealt with, in part, by endowing RL systems with episodic memory, allowing them to (a) efficiently approximate value functions over complex state spaces, (b) learn with very little data, and (c) bridge long-term dependencies between actions and rewards. We review the computational theory underlying this proposal and the empirical evidence to support it. Our proposal suggests that the ubiquitous and diverse roles of memory in RL may function as part of an integrated learning system.
Learning and tuning fuzzy logic controllers through reinforcements

NASA Technical Reports Server (NTRS)

Berenji, Hamid R.; Khedkar, Pratap

1992-01-01

A new method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system is presented. In particular, our Generalized Approximate Reasoning-based Intelligent Control (GARIC) architecture: (1) learns and tunes a fuzzy logic controller even when only weak reinforcements, such as a binary failure signal, is available; (2) introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; (3) introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and (4) learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward network, which can then adaptively improve performance by using gradient descent methods. We extend the AHC algorithm of Barto, Sutton, and Anderson to include the prior control knowledge of human operators. The GARIC architecture is applied to a cart-pole balancing system and has demonstrated significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.
Reinforcement learning for resource allocation in LEO satellite networks.

PubMed

Usaha, Wipawee; Barria, Javier A

2007-06-01

In this paper, we develop and assess online decision-making algorithms for call admission and routing for low Earth orbit (LEO) satellite networks. It has been shown in a recent paper that, in a LEO satellite system, a semi-Markov decision process formulation of the call admission and routing problem can achieve better performance in terms of an average revenue function than existing routing methods. However, the conventional dynamic programming (DP) numerical solution becomes prohibited as the problem size increases. In this paper, two solution methods based on reinforcement learning (RL) are proposed in order to circumvent the computational burden of DP. The first method is based on an actor-critic method with temporal-difference (TD) learning. The second method is based on a critic-only method, called optimistic TD learning. The algorithms enhance performance in terms of requirements in storage, computational complexity and computational time, and in terms of an overall long-term average revenue function that penalizes blocked calls. Numerical studies are carried out, and the results obtained show that the RL framework can achieve up to 56% higher average revenue over existing routing methods used in LEO satellite networks with reasonable storage and computational requirements.
Applications of Deep Learning and Reinforcement Learning to Biological Data.

PubMed

Mahmud, Mufti; Kaiser, Mohammed Shamim; Hussain, Amir; Vassanelli, Stefano

2018-06-01

Rapid advances in hardware-based technologies during the past decades have opened up new possibilities for life scientists to gather multimodal data in various application domains, such as omics, bioimaging, medical imaging, and (brain/body)-machine interfaces. These have generated novel opportunities for development of dedicated data-intensive machine learning techniques. In particular, recent research in deep learning (DL), reinforcement learning (RL), and their combination (deep RL) promise to revolutionize the future of artificial intelligence. The growth in computational power accompanied by faster and increased data storage, and declining computing costs have already allowed scientists in various fields to apply these techniques on data sets that were previously intractable owing to their size and complexity. This paper provides a comprehensive survey on the application of DL, RL, and deep RL techniques in mining biological data. In addition, we compare the performances of DL techniques when applied to different data sets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.
Kernel Temporal Differences for Neural Decoding

PubMed Central

Bae, Jihye; Sanchez Giraldo, Luis G.; Pohlmeyer, Eric A.; Francis, Joseph T.; Sanchez, Justin C.; Príncipe, José C.

2015-01-01

We study the feasibility and capability of the kernel temporal difference (KTD)(λ) algorithm for neural decoding. KTD(λ) is an online, kernel-based learning algorithm, which has been introduced to estimate value functions in reinforcement learning. This algorithm combines kernel-based representations with the temporal difference approach to learning. One of our key observations is that by using strictly positive definite kernels, algorithm's convergence can be guaranteed for policy evaluation. The algorithm's nonlinear functional approximation capabilities are shown in both simulations of policy evaluation and neural decoding problems (policy improvement). KTD can handle high-dimensional neural states containing spatial-temporal information at a reasonable computational complexity allowing real-time applications. When the algorithm seeks a proper mapping between a monkey's neural states and desired positions of a computer cursor or a robot arm, in both open-loop and closed-loop experiments, it can effectively learn the neural state to action mapping. Finally, a visualization of the coadaptation process between the decoder and the subject shows the algorithm's capabilities in reinforcement learning brain machine interfaces. PMID:25866504
Dissociable neural representations of reinforcement and belief prediction errors underlie strategic learning

PubMed Central

Zhu, Lusha; Mathewson, Kyle E.; Hsu, Ming

2012-01-01

Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents’ beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs. PMID:22307594
Dissociable neural representations of reinforcement and belief prediction errors underlie strategic learning.

PubMed

Zhu, Lusha; Mathewson, Kyle E; Hsu, Ming

2012-01-31

Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents' beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs.
Chronic Exposure to Methamphetamine Disrupts Reinforcement-Based Decision Making in Rats.

PubMed

Groman, Stephanie M; Rich, Katherine M; Smith, Nathaniel J; Lee, Daeyeol; Taylor, Jane R

2018-03-01

The persistent use of psychostimulant drugs, despite the detrimental outcomes associated with continued drug use, may be because of disruptions in reinforcement-learning processes that enable behavior to remain flexible and goal directed in dynamic environments. To identify the reinforcement-learning processes that are affected by chronic exposure to the psychostimulant methamphetamine (MA), the current study sought to use computational and biochemical analyses to characterize decision-making processes, assessed by probabilistic reversal learning, in rats before and after they were exposed to an escalating dose regimen of MA (or saline control). The ability of rats to use flexible and adaptive decision-making strategies following changes in stimulus-reward contingencies was significantly impaired following exposure to MA. Computational analyses of parameters that track choice and outcome behavior indicated that exposure to MA significantly impaired the ability of rats to use negative outcomes effectively. These MA-induced changes in decision making were similar to those observed in rats following administration of a dopamine D2/3 receptor antagonist. These data use computational models to provide insight into drug-induced maladaptive decision making that may ultimately identify novel targets for the treatment of psychostimulant addiction. We suggest that the disruption in utilization of negative outcomes to adaptively guide dynamic decision making is a new behavioral mechanism by which MA rigidly biases choice behavior.
A cholinergic feedback circuit to regulate striatal population uncertainty and optimize reinforcement learning.

PubMed

Franklin, Nicholas T; Frank, Michael J

2015-12-25

Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments.

Neural computations underlying inverse reinforcement learning in the human brain

PubMed Central

Pauli, Wolfgang M; Bossaerts, Peter; O'Doherty, John

2017-01-01

In inverse reinforcement learning an observer infers the reward distribution available for actions in the environment solely through observing the actions implemented by another agent. To address whether this computational process is implemented in the human brain, participants underwent fMRI while learning about slot machines yielding hidden preferred and non-preferred food outcomes with varying probabilities, through observing the repeated slot choices of agents with similar and dissimilar food preferences. Using formal model comparison, we found that participants implemented inverse RL as opposed to a simple imitation strategy, in which the actions of the other agent are copied instead of inferring the underlying reward structure of the decision problem. Our computational fMRI analysis revealed that anterior dorsomedial prefrontal cortex encoded inferences about action-values within the value space of the agent as opposed to that of the observer, demonstrating that inverse RL is an abstract cognitive process divorceable from the values and concerns of the observer him/herself. PMID:29083301
A Neurocomputational Model of Dopamine and Prefrontal-Striatal Interactions during Multicue Category Learning by Parkinson Patients

ERIC Educational Resources Information Center

Moustafa, Ahmed A.; Gluck, Mark A.

2011-01-01

Most existing models of dopamine and learning in Parkinson disease (PD) focus on simulating the role of basal ganglia dopamine in reinforcement learning. Much data argue, however, for a critical role for prefrontal cortex (PFC) dopamine in stimulus selection in attentional learning. Here, we present a new computational model that simulates…
Computer-Assisted Instruction: One Aid for Teachers of Reading.

ERIC Educational Resources Information Center

Rauch, Margaret; Samojeden, Elizabeth

Computer assisted instruction (CAI), an instructional system with direct interaction between the student and the computer, can be a valuable aid for presenting new concepts, for reinforcing of selective skills, and for individualizing instruction. The advantages CAI provides include self-paced learning, more efficient allocation of classroom time,…
Identifying Cognitive Remediation Change Through Computational Modelling—Effects on Reinforcement Learning in Schizophrenia

PubMed Central

Cella, Matteo; Bishara, Anthony J.; Medin, Evelina; Swan, Sarah; Reeder, Clare; Wykes, Til

2014-01-01

Objective: Converging research suggests that individuals with schizophrenia show a marked impairment in reinforcement learning, particularly in tasks requiring flexibility and adaptation. The problem has been associated with dopamine reward systems. This study explores, for the first time, the characteristics of this impairment and how it is affected by a behavioral intervention—cognitive remediation. Method: Using computational modelling, 3 reinforcement learning parameters based on the Wisconsin Card Sorting Test (WCST) trial-by-trial performance were estimated: R (reward sensitivity), P (punishment sensitivity), and D (choice consistency). In Study 1 the parameters were compared between a group of individuals with schizophrenia (n = 100) and a healthy control group (n = 50). In Study 2 the effect of cognitive remediation therapy (CRT) on these parameters was assessed in 2 groups of individuals with schizophrenia, one receiving CRT (n = 37) and the other receiving treatment as usual (TAU, n = 34). Results: In Study 1 individuals with schizophrenia showed impairment in the R and P parameters compared with healthy controls. Study 2 demonstrated that sensitivity to negative feedback (P) and reward (R) improved in the CRT group after therapy compared with the TAU group. R and P parameter change correlated with WCST outputs. Improvements in R and P after CRT were associated with working memory gains and reduction of negative symptoms, respectively. Conclusion: Schizophrenia reinforcement learning difficulties negatively influence performance in shift learning tasks. CRT can improve sensitivity to reward and punishment. Identifying parameters that show change may be useful in experimental medicine studies to identify cognitive domains susceptible to improvement. PMID:24214932
A neural model of hierarchical reinforcement learning

PubMed Central

Rasmussen, Daniel; Eliasmith, Chris

2017-01-01

We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain’s general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model’s behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions. PMID:28683111
Reinforcement Learning Trees

PubMed Central

Zhu, Ruoqing; Zeng, Donglin; Kosorok, Michael R.

2015-01-01

In this paper, we introduce a new type of tree-based method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001) under high-dimensional settings. The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach enables linear combination cuts at little extra computational cost. Second, we propose a variable muting procedure that progressively eliminates noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards terminal nodes, where the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method under basic assumptions and discuss rationale in general settings. PMID:26903687
Modeling the behavioral substrates of associate learning and memory - Adaptive neural models

NASA Technical Reports Server (NTRS)

Lee, Chuen-Chien

1991-01-01

Three adaptive single-neuron models based on neural analogies of behavior modification episodes are proposed, which attempt to bridge the gap between psychology and neurophysiology. The proposed models capture the predictive nature of Pavlovian conditioning, which is essential to the theory of adaptive/learning systems. The models learn to anticipate the occurrence of a conditioned response before the presence of a reinforcing stimulus when training is complete. Furthermore, each model can find the most nonredundant and earliest predictor of reinforcement. The behavior of the models accounts for several aspects of basic animal learning phenomena in Pavlovian conditioning beyond previous related models. Computer simulations show how well the models fit empirical data from various animal learning paradigms.
Impact of Computer Animations in Cognitive Learning: Differentiation

ERIC Educational Resources Information Center

Altiparmak, Kemal

2014-01-01

In mathematic courses, construction of some concepts by the students in a meaningful way may be complicated. In such circumstances, to embody the concepts application of the required technologies may reinforce learning process. Onset of learning process over daily life events of the student's environment may lure their attention and may…
Altered neural encoding of prediction errors in assault-related posttraumatic stress disorder.

PubMed

Ross, Marisa C; Lenow, Jennifer K; Kilts, Clinton D; Cisler, Josh M

2018-05-12

Posttraumatic stress disorder (PTSD) is widely associated with deficits in extinguishing learned fear responses, which relies on mechanisms of reinforcement learning (e.g., updating expectations based on prediction errors). However, the degree to which PTSD is associated with impairments in general reinforcement learning (i.e., outside of the context of fear stimuli) remains poorly understood. Here, we investigate brain and behavioral differences in general reinforcement learning between adult women with and without a current diagnosis of PTSD. 29 adult females (15 PTSD with exposure to assaultive violence, 14 controls) underwent a neutral reinforcement-learning task (i.e., two arm bandit task) during fMRI. We modeled participant behavior using different adaptations of the Rescorla-Wagner (RW) model and used Independent Component Analysis to identify timecourses for large-scale a priori brain networks. We found that an anticorrelated and risk sensitive RW model best fit participant behavior, with no differences in computational parameters between groups. Women in the PTSD group demonstrated significantly less neural encoding of prediction errors in both a ventral striatum/mPFC and anterior insula network compared to healthy controls. Weakened encoding of prediction errors in the ventral striatum/mPFC and anterior insula during a general reinforcement learning task, outside of the context of fear stimuli, suggests the possibility of a broader conceptualization of learning differences in PTSD than currently proposed in current neurocircuitry models of PTSD. Copyright © 2018 Elsevier Ltd. All rights reserved.
Bio-robots automatic navigation with graded electric reward stimulation based on Reinforcement Learning.

PubMed

Zhang, Chen; Sun, Chao; Gao, Liqiang; Zheng, Nenggan; Chen, Weidong; Zheng, Xiaoxiang

2013-01-01

Bio-robots based on brain computer interface (BCI) suffer from the lack of considering the characteristic of the animals in navigation. This paper proposed a new method for bio-robots' automatic navigation combining the reward generating algorithm base on Reinforcement Learning (RL) with the learning intelligence of animals together. Given the graded electrical reward, the animal e.g. the rat, intends to seek the maximum reward while exploring an unknown environment. Since the rat has excellent spatial recognition, the rat-robot and the RL algorithm can convergent to an optimal route by co-learning. This work has significant inspiration for the practical development of bio-robots' navigation with hybrid intelligence.
Seizure Control in a Computational Model Using a Reinforcement Learning Stimulation Paradigm.

PubMed

Nagaraj, Vivek; Lamperski, Andrew; Netoff, Theoden I

2017-11-01

Neuromodulation technologies such as vagus nerve stimulation and deep brain stimulation, have shown some efficacy in controlling seizures in medically intractable patients. However, inherent patient-to-patient variability of seizure disorders leads to a wide range of therapeutic efficacy. A patient specific approach to determining stimulation parameters may lead to increased therapeutic efficacy while minimizing stimulation energy and side effects. This paper presents a reinforcement learning algorithm that optimizes stimulation frequency for controlling seizures with minimum stimulation energy. We apply our method to a computational model called the epileptor. The epileptor model simulates inter-ictal and ictal local field potential data. In order to apply reinforcement learning to the Epileptor, we introduce a specialized reward function and state-space discretization. With the reward function and discretization fixed, we test the effectiveness of the temporal difference reinforcement learning algorithm (TD(0)). For periodic pulsatile stimulation, we derive a relation that describes, for any stimulation frequency, the minimal pulse amplitude required to suppress seizures. The TD(0) algorithm is able to identify parameters that control seizures quickly. Additionally, our results show that the TD(0) algorithm refines the stimulation frequency to minimize stimulation energy thereby converging to optimal parameters reliably. An advantage of the TD(0) algorithm is that it is adaptive so that the parameters necessary to control the seizures can change over time. We show that the algorithm can converge on the optimal solution in simulation with slow and fast inter-seizure intervals.
The evolution of continuous learning of the structure of the environment

PubMed Central

Kolodny, Oren; Edelman, Shimon; Lotem, Arnon

2014-01-01

Continuous, ‘always on’, learning of structure from a stream of data is studied mainly in the fields of machine learning or language acquisition, but its evolutionary roots may go back to the first organisms that were internally motivated to learn and represent their environment. Here, we study under what conditions such continuous learning (CL) may be more adaptive than simple reinforcement learning and examine how it could have evolved from the same basic associative elements. We use agent-based computer simulations to compare three learning strategies: simple reinforcement learning; reinforcement learning with chaining (RL-chain) and CL that applies the same associative mechanisms used by the other strategies, but also seeks statistical regularities in the relations among all items in the environment, regardless of the initial association with food. We show that a sufficiently structured environment favours the evolution of both RL-chain and CL and that CL outperforms the other strategies when food is relatively rare and the time for learning is limited. This advantage of internally motivated CL stems from its ability to capture statistical patterns in the environment even before they are associated with food, at which point they immediately become useful for planning. PMID:24402920
Predictive representations can link model-based reinforcement learning to model-free mechanisms.

PubMed

Russek, Evan M; Momennejad, Ida; Botvinick, Matthew M; Gershman, Samuel J; Daw, Nathaniel D

2017-09-01

Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation.
Predictive representations can link model-based reinforcement learning to model-free mechanisms

PubMed Central

Botvinick, Matthew M.

2017-01-01

Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation. PMID:28945743
FROM REINFORCEMENT LEARNING MODELS OF THE BASAL GANGLIA TO THE PATHOPHYSIOLOGY OF PSYCHIATRIC AND NEUROLOGICAL DISORDERS

PubMed Central

Maia, Tiago V.; Frank, Michael J.

2013-01-01

Over the last decade and a half, reinforcement learning models have fostered an increasingly sophisticated understanding of the functions of dopamine and cortico-basal ganglia-thalamo-cortical (CBGTC) circuits. More recently, these models, and the insights that they afford, have started to be used to understand key aspects of several psychiatric and neurological disorders that involve disturbances of the dopaminergic system and CBGTC circuits. We review this approach and its existing and potential applications to Parkinson’s disease, Tourette’s syndrome, attention-deficit/hyperactivity disorder, addiction, schizophrenia, and preclinical animal models used to screen novel antipsychotic drugs. The approach’s proven explanatory and predictive power bodes well for the continued growth of computational psychiatry and computational neurology. PMID:21270784
Asynchronous Gossip for Averaging and Spectral Ranking

NASA Astrophysics Data System (ADS)

Borkar, Vivek S.; Makhijani, Rahul; Sundaresan, Rajesh

2014-08-01

We consider two variants of the classical gossip algorithm. The first variant is a version of asynchronous stochastic approximation. We highlight a fundamental difficulty associated with the classical asynchronous gossip scheme, viz., that it may not converge to a desired average, and suggest an alternative scheme based on reinforcement learning that has guaranteed convergence to the desired average. We then discuss a potential application to a wireless network setting with simultaneous link activation constraints. The second variant is a gossip algorithm for distributed computation of the Perron-Frobenius eigenvector of a nonnegative matrix. While the first variant draws upon a reinforcement learning algorithm for an average cost controlled Markov decision problem, the second variant draws upon a reinforcement learning algorithm for risk-sensitive control. We then discuss potential applications of the second variant to ranking schemes, reputation networks, and principal component analysis.
Linking Individual Learning Styles to Approach-Avoidance Motivational Traits and Computational Aspects of Reinforcement Learning

PubMed Central

Carl Aberg, Kristoffer; Doell, Kimberly C.; Schwartz, Sophie

2016-01-01

Learning how to gain rewards (approach learning) and avoid punishments (avoidance learning) is fundamental for everyday life. While individual differences in approach and avoidance learning styles have been related to genetics and aging, the contribution of personality factors, such as traits, remains undetermined. Moreover, little is known about the computational mechanisms mediating differences in learning styles. Here, we used a probabilistic selection task with positive and negative feedbacks, in combination with computational modelling, to show that individuals displaying better approach (vs. avoidance) learning scored higher on measures of approach (vs. avoidance) trait motivation, but, paradoxically, also displayed reduced learning speed following positive (vs. negative) outcomes. These data suggest that learning different types of information depend on associated reward values and internal motivational drives, possibly determined by personality traits. PMID:27851807
A System for Generating Instructional Computer Graphics.

ERIC Educational Resources Information Center

Nygard, Kendall E.; Ranganathan, Babusankar

1983-01-01

Description of the Tektronix-Based Interactive Graphics System for Instruction (TIGSI), which was developed for generating graphics displays in computer-assisted instruction materials, discusses several applications (e.g., reinforcing learning of concepts, principles, rules, and problem-solving techniques) and presents advantages of the TIGSI…
Computational Properties of the Hippocampus Increase the Efficiency of Goal-Directed Foraging through Hierarchical Reinforcement Learning

PubMed Central

Chalmers, Eric; Luczak, Artur; Gruber, Aaron J.

2016-01-01

The mammalian brain is thought to use a version of Model-based Reinforcement Learning (MBRL) to guide “goal-directed” behavior, wherein animals consider goals and make plans to acquire desired outcomes. However, conventional MBRL algorithms do not fully explain animals' ability to rapidly adapt to environmental changes, or learn multiple complex tasks. They also require extensive computation, suggesting that goal-directed behavior is cognitively expensive. We propose here that key features of processing in the hippocampus support a flexible MBRL mechanism for spatial navigation that is computationally efficient and can adapt quickly to change. We investigate this idea by implementing a computational MBRL framework that incorporates features inspired by computational properties of the hippocampus: a hierarchical representation of space, “forward sweeps” through future spatial trajectories, and context-driven remapping of place cells. We find that a hierarchical abstraction of space greatly reduces the computational load (mental effort) required for adaptation to changing environmental conditions, and allows efficient scaling to large problems. It also allows abstract knowledge gained at high levels to guide adaptation to new obstacles. Moreover, a context-driven remapping mechanism allows learning and memory of multiple tasks. Simulating dorsal or ventral hippocampal lesions in our computational framework qualitatively reproduces behavioral deficits observed in rodents with analogous lesions. The framework may thus embody key features of how the brain organizes model-based RL to efficiently solve navigation and other difficult tasks. PMID:28018203
Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: a simulated robotic study.

PubMed

Mirolli, Marco; Santucci, Vieri G; Baldassarre, Gianluca

2013-03-01

An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions. Copyright © 2013 Elsevier Ltd. All rights reserved.

Spared internal but impaired external reward prediction error signals in major depressive disorder during reinforcement learning.

PubMed

Bakic, Jasmina; Pourtois, Gilles; Jepma, Marieke; Duprat, Romain; De Raedt, Rudi; Baeken, Chris

2017-01-01

Major depressive disorder (MDD) creates debilitating effects on a wide range of cognitive functions, including reinforcement learning (RL). In this study, we sought to assess whether reward processing as such, or alternatively the complex interplay between motivation and reward might potentially account for the abnormal reward-based learning in MDD. A total of 35 treatment resistant MDD patients and 44 age matched healthy controls (HCs) performed a standard probabilistic learning task. RL was titrated using behavioral, computational modeling and event-related brain potentials (ERPs) data. MDD patients showed comparable learning rate compared to HCs. However, they showed decreased lose-shift responses as well as blunted subjective evaluations of the reinforcers used during the task, relative to HCs. Moreover, MDD patients showed normal internal (at the level of error-related negativity, ERN) but abnormal external (at the level of feedback-related negativity, FRN) reward prediction error (RPE) signals during RL, selectively when additional efforts had to be made to establish learning. Collectively, these results lend support to the assumption that MDD does not impair reward processing per se during RL. Instead, it seems to alter the processing of the emotional value of (external) reinforcers during RL, when additional intrinsic motivational processes have to be engaged. © 2016 Wiley Periodicals, Inc.
Learning and tuning fuzzy logic controllers through reinforcements

NASA Technical Reports Server (NTRS)

Berenji, Hamid R.; Khedkar, Pratap

1992-01-01

This paper presents a new method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system. In particular, our generalized approximate reasoning-based intelligent control (GARIC) architecture (1) learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; (2) introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; (3) introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and (4) learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward neural network, which can then adaptively improve performance by using gradient descent methods. We extend the AHC algorithm of Barto et al. (1983) to include the prior control knowledge of human operators. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.
Interactions Among Working Memory, Reinforcement Learning, and Effort in Value-Based Choice: A New Paradigm and Selective Deficits in Schizophrenia.

PubMed

Collins, Anne G E; Albrecht, Matthew A; Waltz, James A; Gold, James M; Frank, Michael J

2017-09-15

When studying learning, researchers directly observe only the participants' choices, which are often assumed to arise from a unitary learning process. However, a number of separable systems, such as working memory (WM) and reinforcement learning (RL), contribute simultaneously to human learning. Identifying each system's contributions is essential for mapping the neural substrates contributing in parallel to behavior; computational modeling can help to design tasks that allow such a separable identification of processes and infer their contributions in individuals. We present a new experimental protocol that separately identifies the contributions of RL and WM to learning, is sensitive to parametric variations in both, and allows us to investigate whether the processes interact. In experiments 1 and 2, we tested this protocol with healthy young adults (n = 29 and n = 52, respectively). In experiment 3, we used it to investigate learning deficits in medicated individuals with schizophrenia (n = 49 patients, n = 32 control subjects). Experiments 1 and 2 established WM and RL contributions to learning, as evidenced by parametric modulations of choice by load and delay and reward history, respectively. They also showed interactions between WM and RL, where RL was enhanced under high WM load. Moreover, we observed a cost of mental effort when controlling for reinforcement history: participants preferred stimuli they encountered under low WM load. Experiment 3 revealed selective deficits in WM contributions and preserved RL value learning in individuals with schizophrenia compared with control subjects. Computational approaches allow us to disentangle contributions of multiple systems to learning and, consequently, to further our understanding of psychiatric diseases. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis

PubMed Central

Collins, Anne G. E.; Frank, Michael J.

2012-01-01

Instrumental learning involves corticostriatal circuitry and the dopaminergic system. This system is typically modeled in the reinforcement learning (RL) framework by incrementally accumulating reward values of states and actions. However, human learning also implicates prefrontal cortical mechanisms involved in higher level cognitive functions. The interaction of these systems remains poorly understood, and models of human behavior often ignore working memory (WM) and therefore incorrectly assign behavioral variance to the RL system. Here we designed a task that highlights the profound entanglement of these two processes, even in simple learning problems. By systematically varying the size of the learning problem and delay between stimulus repetitions, we separately extracted WM-specific effects of load and delay on learning. We propose a new computational model that accounts for the dynamic integration of RL and WM processes observed in subjects' behavior. Incorporating capacity-limited WM into the model allowed us to capture behavioral variance that could not be captured in a pure RL framework even if we (implausibly) allowed separate RL systems for each set size. The WM component also allowed for a more reasonable estimation of a single RL process. Finally, we report effects of two genetic polymorphisms having relative specificity for prefrontal and basal ganglia functions. Whereas the COMT gene coding for catechol-O-methyl transferase selectively influenced model estimates of WM capacity, the GPR6 gene coding for G-protein-coupled receptor 6 influenced the RL learning rate. Thus, this study allowed us to specify distinct influences of the high-level and low-level cognitive functions on instrumental learning, beyond the possibilities offered by simple RL models. PMID:22487033
Reinforcement learning and episodic memory in humans and animals: an integrative framework

PubMed Central

Gershman, Samuel J.; Daw, Nathaniel D.

2018-01-01

We review the psychology and neuroscience of reinforcement learning (RL), which has witnessed significant progress in the last two decades, enabled by the comprehensive experimental study of simple learning and decision-making tasks. However, the simplicity of these tasks misses important aspects of reinforcement learning in the real world: (i) State spaces are high-dimensional, continuous, and partially observable; this implies that (ii) data are relatively sparse: indeed precisely the same situation may never be encountered twice; and also that (iii) rewards depend on long-term consequences of actions in ways that violate the classical assumptions that make RL tractable. A seemingly distinct challenge is that, cognitively, these theories have largely connected with procedural and semantic memory: how knowledge about action values or world models extracted gradually from many experiences can drive choice. This misses many aspects of memory related to traces of individual events, such as episodic memory. We suggest that these two gaps are related. In particular, the computational challenges can be dealt with, in part, by endowing RL systems with episodic memory, allowing them to (i) efficiently approximate value functions over complex state spaces, (ii) learn with very little data, and (iii) bridge long-term dependencies between actions and rewards. We review the computational theory underlying this proposal and the empirical evidence to support it. Our proposal suggests that the ubiquitous and diverse roles of memory in RL may function as part of an integrated learning system. PMID:27618944
Negative symptoms in schizophrenia result from a failure to represent the expected value of rewards: Behavioral and computational modeling evidence

PubMed Central

Gold, James M.; Waltz, James A.; Matveeva, Tatyana M.; Kasanova, Zuzana; Strauss, Gregory P.; Herbener, Ellen S.; Collins, Anne G.E.; Frank, Michael J.

2015-01-01

Context Negative symptoms are a core feature of schizophrenia, but their pathophysiology remains unclear. Objective Negative symptoms are defined by the absence of normal function. However, there must be a productive mechanism that leads to this absence. Here, we test a reinforcement learning account suggesting that negative symptoms result from a failure to represent the expected value of rewards coupled with preserved loss avoidance learning. Design Subjects performed a probabilistic reinforcement learning paradigm involving stimulus pairs in which choices resulted in either reward or avoidance of loss. Following training, subjects indicated their valuation of the stimuli in a transfer task. Computational modeling was used to distinguish between alternative accounts of the data. Setting A tertiary care research outpatient clinic. Patients A total of 47 clinically stable patients with a diagnosis of schizophrenia or schizoaffective disorder and 28 healthy volunteers participated. Patients were divided into high and low negative symptom groups. Main Outcome measures 1) The number of choices leading to reward or loss avoidance and 2) performance in the transfer phase. Quantitative fits from three different models were examined. Results High negative symptom patients demonstrated impaired learning from rewards but intact loss avoidance learning, and failed to distinguish rewarding stimuli from loss-avoiding stimuli in the transfer phase. Model fits revealed that high negative symptom patients were better characterized by an “actor-critic” model, learning stimulus-response associations, whereas controls and low negative symptom patients incorporated expected value of their actions (“Q-learning”) into the selection process. Conclusions Negative symptoms are associated with a specific reinforcement learning abnormality: High negative symptoms patients do not represent the expected value of rewards when making decisions but learn to avoid punishments through the use of prediction errors. This computational framework offers the potential to understand negative symptoms at a mechanistic level. PMID:22310503
Teachers' Support in Using Computers for Developing Students' Listening and Speaking Skills in Pre-Sessional English Courses

ERIC Educational Resources Information Center

Zou, Bin

2013-01-01

Many computer-assisted language learning (CALL) studies have found that teacher direction can help learners develop language skills at their own pace on computers. However, many teachers still do not know how to provide support for students to use computers to reinforce the development of their language skills. Hence, more examples of CALL…
Cerebellar and prefrontal cortex contributions to adaptation, strategies, and reinforcement learning.

PubMed

Taylor, Jordan A; Ivry, Richard B

2014-01-01

Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. © 2014 Elsevier B.V. All rights reserved.
Cerebellar and Prefrontal Cortex Contributions to Adaptation, Strategies, and Reinforcement Learning

PubMed Central

Taylor, Jordan A.; Ivry, Richard B.

2014-01-01

Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. PMID:24916295
Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model

PubMed Central

Jensen, Greg; Muñoz, Fabian; Alkan, Yelda; Ferrera, Vincent P.; Terrace, Herbert S.

2015-01-01

Transitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort, inspired by cognitive processes, which performs transitive inference at low computational cost. This is accomplished by (1) representing stimulus positions along a unit span using beta distributions, (2) treating positive and negative feedback asymmetrically, and (3) updating the position of every stimulus during every trial, whether that stimulus was visible or not. Performance was compared for rhesus macaques, humans, and the betasort algorithm, as well as Q-learning, an established reward-prediction error (RPE) model. Of these, only Q-learning failed to respond above chance during critical test trials. Betasort’s success (when compared to RPE models) and its computational efficiency (when compared to full Markov decision process implementations) suggests that the study of reinforcement learning in organisms will be best served by a feature-driven approach to comparing formal models. PMID:26407227
Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model.

PubMed

Jensen, Greg; Muñoz, Fabian; Alkan, Yelda; Ferrera, Vincent P; Terrace, Herbert S

2015-01-01

Transitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort, inspired by cognitive processes, which performs transitive inference at low computational cost. This is accomplished by (1) representing stimulus positions along a unit span using beta distributions, (2) treating positive and negative feedback asymmetrically, and (3) updating the position of every stimulus during every trial, whether that stimulus was visible or not. Performance was compared for rhesus macaques, humans, and the betasort algorithm, as well as Q-learning, an established reward-prediction error (RPE) model. Of these, only Q-learning failed to respond above chance during critical test trials. Betasort's success (when compared to RPE models) and its computational efficiency (when compared to full Markov decision process implementations) suggests that the study of reinforcement learning in organisms will be best served by a feature-driven approach to comparing formal models.
Identifying cognitive remediation change through computational modelling--effects on reinforcement learning in schizophrenia.

PubMed

Cella, Matteo; Bishara, Anthony J; Medin, Evelina; Swan, Sarah; Reeder, Clare; Wykes, Til

2014-11-01

Converging research suggests that individuals with schizophrenia show a marked impairment in reinforcement learning, particularly in tasks requiring flexibility and adaptation. The problem has been associated with dopamine reward systems. This study explores, for the first time, the characteristics of this impairment and how it is affected by a behavioral intervention-cognitive remediation. Using computational modelling, 3 reinforcement learning parameters based on the Wisconsin Card Sorting Test (WCST) trial-by-trial performance were estimated: R (reward sensitivity), P (punishment sensitivity), and D (choice consistency). In Study 1 the parameters were compared between a group of individuals with schizophrenia (n = 100) and a healthy control group (n = 50). In Study 2 the effect of cognitive remediation therapy (CRT) on these parameters was assessed in 2 groups of individuals with schizophrenia, one receiving CRT (n = 37) and the other receiving treatment as usual (TAU, n = 34). In Study 1 individuals with schizophrenia showed impairment in the R and P parameters compared with healthy controls. Study 2 demonstrated that sensitivity to negative feedback (P) and reward (R) improved in the CRT group after therapy compared with the TAU group. R and P parameter change correlated with WCST outputs. Improvements in R and P after CRT were associated with working memory gains and reduction of negative symptoms, respectively. Schizophrenia reinforcement learning difficulties negatively influence performance in shift learning tasks. CRT can improve sensitivity to reward and punishment. Identifying parameters that show change may be useful in experimental medicine studies to identify cognitive domains susceptible to improvement. © The Author 2013. Published by Oxford University Press on behalf of the Maryland Psychiatric Research Center. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Hierarchical extreme learning machine based reinforcement learning for goal localization

NASA Astrophysics Data System (ADS)

AlDahoul, Nouar; Zaw Htike, Zaw; Akmeliawati, Rini

2017-03-01

The objective of goal localization is to find the location of goals in noisy environments. Simple actions are performed to move the agent towards the goal. The goal detector should be capable of minimizing the error between the predicted locations and the true ones. Few regions need to be processed by the agent to reduce the computational effort and increase the speed of convergence. In this paper, reinforcement learning (RL) method was utilized to find optimal series of actions to localize the goal region. The visual data, a set of images, is high dimensional unstructured data and needs to be represented efficiently to get a robust detector. Different deep Reinforcement models have already been used to localize a goal but most of them take long time to learn the model. This long learning time results from the weights fine tuning stage that is applied iteratively to find an accurate model. Hierarchical Extreme Learning Machine (H-ELM) was used as a fast deep model that doesn’t fine tune the weights. In other words, hidden weights are generated randomly and output weights are calculated analytically. H-ELM algorithm was used in this work to find good features for effective representation. This paper proposes a combination of Hierarchical Extreme learning machine and Reinforcement learning to find an optimal policy directly from visual input. This combination outperforms other methods in terms of accuracy and learning speed. The simulations and results were analysed by using MATLAB.
A cholinergic feedback circuit to regulate striatal population uncertainty and optimize reinforcement learning

PubMed Central

Franklin, Nicholas T; Frank, Michael J

2015-01-01

Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments. DOI: http://dx.doi.org/10.7554/eLife.12029.001 PMID:26705698
Distributed Economic Dispatch in Microgrids Based on Cooperative Reinforcement Learning.

PubMed

Liu, Weirong; Zhuang, Peng; Liang, Hao; Peng, Jun; Huang, Zhiwu; Weirong Liu; Peng Zhuang; Hao Liang; Jun Peng; Zhiwu Huang; Liu, Weirong; Liang, Hao; Peng, Jun; Zhuang, Peng; Huang, Zhiwu

2018-06-01

Microgrids incorporated with distributed generation (DG) units and energy storage (ES) devices are expected to play more and more important roles in the future power systems. Yet, achieving efficient distributed economic dispatch in microgrids is a challenging issue due to the randomness and nonlinear characteristics of DG units and loads. This paper proposes a cooperative reinforcement learning algorithm for distributed economic dispatch in microgrids. Utilizing the learning algorithm can avoid the difficulty of stochastic modeling and high computational complexity. In the cooperative reinforcement learning algorithm, the function approximation is leveraged to deal with the large and continuous state spaces. And a diffusion strategy is incorporated to coordinate the actions of DG units and ES devices. Based on the proposed algorithm, each node in microgrids only needs to communicate with its local neighbors, without relying on any centralized controllers. Algorithm convergence is analyzed, and simulations based on real-world meteorological and load data are conducted to validate the performance of the proposed algorithm.
The Novelty Exploration Bonus and Its Attentional Modulation

ERIC Educational Resources Information Center

Krebs, Ruth M.; Schott, Bjorn H.; Schutze, Hartmut; Duzel, Emrah

2009-01-01

We hypothesized that novel stimuli represent salient learning signals that can motivate "exploration" in search for potential rewards. In computational theories of reinforcement learning, this is referred to as the novelty "exploration bonus" for rewards. If true, stimulus novelty should enhance the reward anticipation signals in brain areas that…
Cocaine addiction as a homeostatic reinforcement learning disorder.

PubMed

Keramati, Mehdi; Durand, Audrey; Girardeau, Paul; Gutkin, Boris; Ahmed, Serge H

2017-03-01

Drug addiction implicates both reward learning and homeostatic regulation mechanisms of the brain. This has stimulated 2 partially successful theoretical perspectives on addiction. Many important aspects of addiction, however, remain to be explained within a single, unified framework that integrates the 2 mechanisms. Building upon a recently developed homeostatic reinforcement learning theory, the authors focus on a key transition stage of addiction that is well modeled in animals, escalation of drug use, and propose a computational theory of cocaine addiction where cocaine reinforces behavior due to its rapid homeostatic corrective effect, whereas its chronic use induces slow and long-lasting changes in homeostatic setpoint. Simulations show that our new theory accounts for key behavioral and neurobiological features of addiction, most notably, escalation of cocaine use, drug-primed craving and relapse, individual differences underlying dose-response curves, and dopamine D2-receptor downregulation in addicts. The theory also generates unique predictions about cocaine self-administration behavior in rats that are confirmed by new experimental results. Viewing addiction as a homeostatic reinforcement learning disorder coherently explains many behavioral and neurobiological aspects of the transition to cocaine addiction, and suggests a new perspective toward understanding addiction. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons

PubMed Central

Frémaux, Nicolas; Sprekeler, Henning; Gerstner, Wulfram

2013-01-01

Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity. PMID:23592970
Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

PubMed

Frémaux, Nicolas; Sprekeler, Henning; Gerstner, Wulfram

2013-04-01

Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.
The effects of aging on the interaction between reinforcement learning and attention.

PubMed

Radulescu, Angela; Daniel, Reka; Niv, Yael

2016-11-01

Reinforcement learning (RL) in complex environments relies on selective attention to uncover those aspects of the environment that are most predictive of reward. Whereas previous work has focused on age-related changes in RL, it is not known whether older adults learn differently from younger adults when selective attention is required. In 2 experiments, we examined how aging affects the interaction between RL and selective attention. Younger and older adults performed a learning task in which only 1 stimulus dimension was relevant to predicting reward, and within it, 1 "target" feature was the most rewarding. Participants had to discover this target feature through trial and error. In Experiment 1, stimuli varied on 1 or 3 dimensions and participants received hints that revealed the target feature, the relevant dimension, or gave no information. Group-related differences in accuracy and RTs differed systematically as a function of the number of dimensions and the type of hint available. In Experiment 2 we used trial-by-trial computational modeling of the learning process to test for age-related differences in learning strategies. Behavior of both young and older adults was explained well by a reinforcement-learning model that uses selective attention to constrain learning. However, the model suggested that older adults restricted their learning to fewer features, employing more focused attention than younger adults. Furthermore, this difference in strategy predicted age-related deficits in accuracy. We discuss these results suggesting that a narrower filter of attention may reflect an adaptation to the reduced capabilities of the reinforcement learning system. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

More Than the Sum of Its Parts: A Role for the Hippocampus in Configural Reinforcement Learning.

PubMed

Duncan, Katherine; Doll, Bradley B; Daw, Nathaniel D; Shohamy, Daphna

2018-05-02

People often perceive configurations rather than the elements they comprise, a bias that may emerge because configurations often predict outcomes. But how does the brain learn to associate configurations with outcomes and how does this learning differ from learning about individual elements? We combined behavior, reinforcement learning models, and functional imaging to understand how people learn to associate configurations of cues with outcomes. We found that configural learning depended on the relative predictive strength of elements versus configurations and was related to both the strength of BOLD activity and patterns of BOLD activity in the hippocampus. Configural learning was further related to functional connectivity between the hippocampus and nucleus accumbens. Moreover, configural learning was associated with flexible knowledge about associations and differential eye movements during choice. Together, this suggests that configural learning is associated with a distinct computational, cognitive, and neural profile that is well suited to support flexible and adaptive behavior. Copyright © 2018 Elsevier Inc. All rights reserved.
Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer.

PubMed

Hu, Yujing; Gao, Yang; An, Bo

2015-07-01

An important approach in multiagent reinforcement learning (MARL) is equilibrium-based MARL, which adopts equilibrium solution concepts in game theory and requires agents to play equilibrium strategies at each state. However, most existing equilibrium-based MARL algorithms cannot scale due to a large number of computationally expensive equilibrium computations (e.g., computing Nash equilibria is PPAD-hard) during learning. For the first time, this paper finds that during the learning process of equilibrium-based MARL, the one-shot games corresponding to each state's successive visits often have the same or similar equilibria (for some states more than 90% of games corresponding to successive visits have similar equilibria). Inspired by this observation, this paper proposes to use equilibrium transfer to accelerate equilibrium-based MARL. The key idea of equilibrium transfer is to reuse previously computed equilibria when each agent has a small incentive to deviate. By introducing transfer loss and transfer condition, a novel framework called equilibrium transfer-based MARL is proposed. We prove that although equilibrium transfer brings transfer loss, equilibrium-based MARL algorithms can still converge to an equilibrium policy under certain assumptions. Experimental results in widely used benchmarks (e.g., grid world game, soccer game, and wall game) show that the proposed framework: 1) not only significantly accelerates equilibrium-based MARL (up to 96.7% reduction in learning time), but also achieves higher average rewards than algorithms without equilibrium transfer and 2) scales significantly better than algorithms without equilibrium transfer when the state/action space grows and the number of agents increases.
Deep Direct Reinforcement Learning for Financial Signal Representation and Trading.

PubMed

Deng, Yue; Bao, Feng; Kong, Youyong; Ren, Zhiquan; Dai, Qionghai

2017-03-01

Can we train the computer to beat experienced traders for financial assert trading? In this paper, we try to address this challenge by introducing a recurrent deep neural network (NN) for real-time financial signal representation and trading. Our model is inspired by two biological-related learning concepts of deep learning (DL) and reinforcement learning (RL). In the framework, the DL part automatically senses the dynamic market condition for informative feature learning. Then, the RL module interacts with deep representations and makes trading decisions to accumulate the ultimate rewards in an unknown environment. The learning system is implemented in a complex NN that exhibits both the deep and recurrent structures. Hence, we propose a task-aware backpropagation through time method to cope with the gradient vanishing issue in deep training. The robustness of the neural system is verified on both the stock and the commodity future markets under broad testing conditions.
The Effect of Subliminal HELP Presentations on Learning a Text Editor.

ERIC Educational Resources Information Center

Wallace, F. Layne; And Others

1991-01-01

Discussion of subliminal stimuli focuses on a study of undergraduates that was conducted to determine the feasibility of presenting subliminal information in a passive manner to reinforce the learning process involved in computer-based text editing. The Texas Instruments microcomputers used in the study are described, and further research is…
Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.

PubMed

Schönberg, Tom; Daw, Nathaniel D; Joel, Daphna; O'Doherty, John P

2007-11-21

The computational framework of reinforcement learning has been used to forward our understanding of the neural mechanisms underlying reward learning and decision-making behavior. It is known that humans vary widely in their performance in decision-making tasks. Here, we used a simple four-armed bandit task in which subjects are almost evenly split into two groups on the basis of their performance: those who do learn to favor choice of the optimal action and those who do not. Using models of reinforcement learning we sought to determine the neural basis of these intrinsic differences in performance by scanning both groups with functional magnetic resonance imaging. We scanned 29 subjects while they performed the reward-based decision-making task. Our results suggest that these two groups differ markedly in the degree to which reinforcement learning signals in the striatum are engaged during task performance. While the learners showed robust prediction error signals in both the ventral and dorsal striatum during learning, the nonlearner group showed a marked absence of such signals. Moreover, the magnitude of prediction error signals in a region of dorsal striatum correlated significantly with a measure of behavioral performance across all subjects. These findings support a crucial role of prediction error signals, likely originating from dopaminergic midbrain neurons, in enabling learning of action selection preferences on the basis of obtained rewards. Thus, spontaneously observed individual differences in decision making performance demonstrate the suggested dependence of this type of learning on the functional integrity of the dopaminergic striatal system in humans.
Oculomotor learning revisited: a model of reinforcement learning in the basal ganglia incorporating an efference copy of motor actions

PubMed Central

Fee, Michale S.

2012-01-01

In its simplest formulation, reinforcement learning is based on the idea that if an action taken in a particular context is followed by a favorable outcome, then, in the same context, the tendency to produce that action should be strengthened, or reinforced. While reinforcement learning forms the basis of many current theories of basal ganglia (BG) function, these models do not incorporate distinct computational roles for signals that convey context, and those that convey what action an animal takes. Recent experiments in the songbird suggest that vocal-related BG circuitry receives two functionally distinct excitatory inputs. One input is from a cortical region that carries context information about the current “time” in the motor sequence. The other is an efference copy of motor commands from a separate cortical brain region that generates vocal variability during learning. Based on these findings, I propose here a general model of vertebrate BG function that combines context information with a distinct motor efference copy signal. The signals are integrated by a learning rule in which efference copy inputs gate the potentiation of context inputs (but not efference copy inputs) onto medium spiny neurons in response to a rewarded action. The hypothesis is described in terms of a circuit that implements the learning of visually guided saccades. The model makes testable predictions about the anatomical and functional properties of hypothesized context and efference copy inputs to the striatum from both thalamic and cortical sources. PMID:22754501
Oculomotor learning revisited: a model of reinforcement learning in the basal ganglia incorporating an efference copy of motor actions.

PubMed

Fee, Michale S

2012-01-01

In its simplest formulation, reinforcement learning is based on the idea that if an action taken in a particular context is followed by a favorable outcome, then, in the same context, the tendency to produce that action should be strengthened, or reinforced. While reinforcement learning forms the basis of many current theories of basal ganglia (BG) function, these models do not incorporate distinct computational roles for signals that convey context, and those that convey what action an animal takes. Recent experiments in the songbird suggest that vocal-related BG circuitry receives two functionally distinct excitatory inputs. One input is from a cortical region that carries context information about the current "time" in the motor sequence. The other is an efference copy of motor commands from a separate cortical brain region that generates vocal variability during learning. Based on these findings, I propose here a general model of vertebrate BG function that combines context information with a distinct motor efference copy signal. The signals are integrated by a learning rule in which efference copy inputs gate the potentiation of context inputs (but not efference copy inputs) onto medium spiny neurons in response to a rewarded action. The hypothesis is described in terms of a circuit that implements the learning of visually guided saccades. The model makes testable predictions about the anatomical and functional properties of hypothesized context and efference copy inputs to the striatum from both thalamic and cortical sources.
Reinforced Adversarial Neural Computer for de Novo Molecular Design.

PubMed

Putin, Evgeny; Asadulaev, Arip; Ivanenkov, Yan; Aladinskiy, Vladimir; Sanchez-Lengeling, Benjamin; Aspuru-Guzik, Alán; Zhavoronkov, Alex

2018-06-12

In silico modeling is a crucial milestone in modern drug design and development. Although computer-aided approaches in this field are well-studied, the application of deep learning methods in this research area is at the beginning. In this work, we present an original deep neural network (DNN) architecture named RANC (Reinforced Adversarial Neural Computer) for the de novo design of novel small-molecule organic structures based on the generative adversarial network (GAN) paradigm and reinforcement learning (RL). As a generator RANC uses a differentiable neural computer (DNC), a category of neural networks, with increased generation capabilities due to the addition of an explicit memory bank, which can mitigate common problems found in adversarial settings. The comparative results have shown that RANC trained on the SMILES string representation of the molecules outperforms its first DNN-based counterpart ORGANIC by several metrics relevant to drug discovery: the number of unique structures, passing medicinal chemistry filters (MCFs), Muegge criteria, and high QED scores. RANC is able to generate structures that match the distributions of the key chemical features/descriptors (e.g., MW, logP, TPSA) and lengths of the SMILES strings in the training data set. Therefore, RANC can be reasonably regarded as a promising starting point to develop novel molecules with activity against different biological targets or pathways. In addition, this approach allows scientists to save time and covers a broad chemical space populated with novel and diverse compounds.
Developing PFC representations using reinforcement learning

PubMed Central

Reynolds, Jeremy R.; O'Reilly, Randall C.

2009-01-01

From both functional and biological considerations, it is widely believed that action production, planning, and goal-oriented behaviors supported by the frontal cortex are organized hierarchically (Fuster, 1990, Koechlin, Ody, & Kouneiher, 2003, & Miller, Galanter, & Pribram, 1960) However, the nature of the different levels of the hierarchy remains unclear, and little attention has been paid to the origins of such a hierarchy. We address these issues through biologically-inspired computational models that develop representations through reinforcement learning. We explore several different factors in these models that might plausibly give rise to a hierarchical organization of representations within the PFC, including an initial connectivity hierarchy within PFC, a hierarchical set of connections between PFC and subcortical structures controlling it, and differential synaptic plasticity schedules. Simulation results indicate that architectural constraints contribute to the segregation of different types of representations, and that this segregation facilitates learning. These findings are consistent with the idea that there is a functional hierarchy in PFC, as captured in our earlier computational models of PFC function and a growing body of empirical data. PMID:19591977
Switching Reinforcement Learning for Continuous Action Space

NASA Astrophysics Data System (ADS)

Nagayoshi, Masato; Murao, Hajime; Tamaki, Hisashi

Reinforcement Learning (RL) attracts much attention as a technique of realizing computational intelligence such as adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. This difficulty includes a problem of designing a suitable action space of an agent, i.e., satisfying two requirements in trade-off: (i) to keep the characteristics (or structure) of an original search space as much as possible in order to seek strategies that lie close to the optimal, and (ii) to reduce the search space as much as possible in order to expedite the learning process. In order to design a suitable action space adaptively, we propose switching RL model to mimic a process of an infant's motor development in which gross motor skills develop before fine motor skills. Then, a method for switching controllers is constructed by introducing and referring to the “entropy”. Further, through computational experiments by using robot navigation problems with one and two-dimensional continuous action space, the validity of the proposed method has been confirmed.
Model-free and model-based reward prediction errors in EEG.

PubMed

Sambrook, Thomas D; Hardwick, Ben; Wills, Andy J; Goslin, Jeremy

2018-05-24

Learning theorists posit two reinforcement learning systems: model-free and model-based. Model-based learning incorporates knowledge about structure and contingencies in the world to assign candidate actions with an expected value. Model-free learning is ignorant of the world's structure; instead, actions hold a value based on prior reinforcement, with this value updated by expectancy violation in the form of a reward prediction error. Because they use such different learning mechanisms, it has been previously assumed that model-based and model-free learning are computationally dissociated in the brain. However, recent fMRI evidence suggests that the brain may compute reward prediction errors to both model-free and model-based estimates of value, signalling the possibility that these systems interact. Because of its poor temporal resolution, fMRI risks confounding reward prediction errors with other feedback-related neural activity. In the present study, EEG was used to show the presence of both model-based and model-free reward prediction errors and their place in a temporal sequence of events including state prediction errors and action value updates. This demonstration of model-based prediction errors questions a long-held assumption that model-free and model-based learning are dissociated in the brain. Copyright © 2018 Elsevier Inc. All rights reserved.
Reinforced dynamics for enhanced sampling in large atomic and molecular systems

NASA Astrophysics Data System (ADS)

Zhang, Linfeng; Wang, Han; E, Weinan

2018-03-01

A new approach for efficiently exploring the configuration space and computing the free energy of large atomic and molecular systems is proposed, motivated by an analogy with reinforcement learning. There are two major components in this new approach. Like metadynamics, it allows for an efficient exploration of the configuration space by adding an adaptively computed biasing potential to the original dynamics. Like deep reinforcement learning, this biasing potential is trained on the fly using deep neural networks, with data collected judiciously from the exploration and an uncertainty indicator from the neural network model playing the role of the reward function. Parameterization using neural networks makes it feasible to handle cases with a large set of collective variables. This has the potential advantage that selecting precisely the right set of collective variables has now become less critical for capturing the structural transformations of the system. The method is illustrated by studying the full-atom explicit solvent models of alanine dipeptide and tripeptide, as well as the system of a polyalanine-10 molecule with 20 collective variables.
Dissociable Learning Processes Underlie Human Pain Conditioning

PubMed Central

Zhang, Suyi; Mano, Hiroaki; Ganesh, Gowrishankar; Robbins, Trevor; Seymour, Ben

2016-01-01

Summary Pavlovian conditioning underlies many aspects of pain behavior, including fear and threat detection [1], escape and avoidance learning [2], and endogenous analgesia [3]. Although a central role for the amygdala is well established [4], both human and animal studies implicate other brain regions in learning, notably ventral striatum and cerebellum [5]. It remains unclear whether these regions make different contributions to a single aversive learning process or represent independent learning mechanisms that interact to generate the expression of pain-related behavior. We designed a human parallel aversive conditioning paradigm in which different Pavlovian visual cues probabilistically predicted thermal pain primarily to either the left or right arm and studied the acquisition of conditioned Pavlovian responses using combined physiological recordings and fMRI. Using computational modeling based on reinforcement learning theory, we found that conditioning involves two distinct types of learning process. First, a non-specific “preparatory” system learns aversive facial expressions and autonomic responses such as skin conductance. The associated learning signals—the learned associability and prediction error—were correlated with fMRI brain responses in amygdala-striatal regions, corresponding to the classic aversive (fear) learning circuit. Second, a specific lateralized system learns “consummatory” limb-withdrawal responses, detectable with electromyography of the arm to which pain is predicted. Its related learned associability was correlated with responses in ipsilateral cerebellar cortex, suggesting a novel computational role for the cerebellum in pain. In conclusion, our results show that the overall phenotype of conditioned pain behavior depends on two dissociable reinforcement learning circuits. PMID:26711494
The Relationship between Software Design and Children's Engagement

ERIC Educational Resources Information Center

Buckleitner, Warren

2006-01-01

This study was an attempt to measure the effects of praise and reinforcement on children in a computer learning setting. A sorting game was designed to simulate 2 interaction styles. One style, called high computer control, provided frequent praise and coaching. The other, called high child control, had narration and praise toggled off. A…
Intelligent Control of a Sensor-Actuator System via Kernelized Least-Squares Policy Iteration

PubMed Central

Liu, Bo; Chen, Sanfeng; Li, Shuai; Liang, Yongsheng

2012-01-01

In this paper a new framework, called Compressive Kernelized Reinforcement Learning (CKRL), for computing near-optimal policies in sequential decision making with uncertainty is proposed via incorporating the non-adaptive data-independent Random Projections and nonparametric Kernelized Least-squares Policy Iteration (KLSPI). Random Projections are a fast, non-adaptive dimensionality reduction framework in which high-dimensionality data is projected onto a random lower-dimension subspace via spherically random rotation and coordination sampling. KLSPI introduce kernel trick into the LSPI framework for Reinforcement Learning, often achieving faster convergence and providing automatic feature selection via various kernel sparsification approaches. In this approach, policies are computed in a low-dimensional subspace generated by projecting the high-dimensional features onto a set of random basis. We first show how Random Projections constitute an efficient sparsification technique and how our method often converges faster than regular LSPI, while at lower computational costs. Theoretical foundation underlying this approach is a fast approximation of Singular Value Decomposition (SVD). Finally, simulation results are exhibited on benchmark MDP domains, which confirm gains both in computation time and in performance in large feature spaces. PMID:22736969
Neurocomputational mechanisms of prosocial learning and links to empathy.

PubMed

Lockwood, Patricia L; Apps, Matthew A J; Valton, Vincent; Viding, Essi; Roiser, Jonathan P

2016-08-30

Reinforcement learning theory powerfully characterizes how we learn to benefit ourselves. In this theory, prediction errors-the difference between a predicted and actual outcome of a choice-drive learning. However, we do not operate in a social vacuum. To behave prosocially we must learn the consequences of our actions for other people. Empathy, the ability to vicariously experience and understand the affect of others, is hypothesized to be a critical facilitator of prosocial behaviors, but the link between empathy and prosocial behavior is still unclear. During functional magnetic resonance imaging (fMRI) participants chose between different stimuli that were probabilistically associated with rewards for themselves (self), another person (prosocial), or no one (control). Using computational modeling, we show that people can learn to obtain rewards for others but do so more slowly than when learning to obtain rewards for themselves. fMRI revealed that activity in a posterior portion of the subgenual anterior cingulate cortex/basal forebrain (sgACC) drives learning only when we are acting in a prosocial context and signals a prosocial prediction error conforming to classical principles of reinforcement learning theory. However, there is also substantial variability in the neural and behavioral efficiency of prosocial learning, which is predicted by trait empathy. More empathic people learn more quickly when benefitting others, and their sgACC response is the most selective for prosocial learning. We thus reveal a computational mechanism driving prosocial learning in humans. This framework could provide insights into atypical prosocial behavior in those with disorders of social cognition.
Social learning through prediction error in the brain

NASA Astrophysics Data System (ADS)

Joiner, Jessica; Piva, Matthew; Turrin, Courtney; Chang, Steve W. C.

2017-06-01

Learning about the world is critical to survival and success. In social animals, learning about others is a necessary component of navigating the social world, ultimately contributing to increasing evolutionary fitness. How humans and nonhuman animals represent the internal states and experiences of others has long been a subject of intense interest in the developmental psychology tradition, and, more recently, in studies of learning and decision making involving self and other. In this review, we explore how psychology conceptualizes the process of representing others, and how neuroscience has uncovered correlates of reinforcement learning signals to explore the neural mechanisms underlying social learning from the perspective of representing reward-related information about self and other. In particular, we discuss self-referenced and other-referenced types of reward prediction errors across multiple brain structures that effectively allow reinforcement learning algorithms to mediate social learning. Prediction-based computational principles in the brain may be strikingly conserved between self-referenced and other-referenced information.
Model-based predictions for dopamine.

PubMed

Langdon, Angela J; Sharpe, Melissa J; Schoenbaum, Geoffrey; Niv, Yael

2018-04-01

Phasic dopamine responses are thought to encode a prediction-error signal consistent with model-free reinforcement learning theories. However, a number of recent findings highlight the influence of model-based computations on dopamine responses, and suggest that dopamine prediction errors reflect more dimensions of an expected outcome than scalar reward value. Here, we review a selection of these recent results and discuss the implications and complications of model-based predictions for computational theories of dopamine and learning. Copyright © 2017. Published by Elsevier Ltd.
Reinforcement Learning with Autonomous Small Unmanned Aerial Vehicles in Cluttered Environments

NASA Technical Reports Server (NTRS)

Tran, Loc; Cross, Charles; Montague, Gilbert; Motter, Mark; Neilan, James; Qualls, Garry; Rothhaar, Paul; Trujillo, Anna; Allen, B. Danette

2015-01-01

We present ongoing work in the Autonomy Incubator at NASA Langley Research Center (LaRC) exploring the efficacy of a data set aggregation approach to reinforcement learning for small unmanned aerial vehicle (sUAV) flight in dense and cluttered environments with reactive obstacle avoidance. The goal is to learn an autonomous flight model using training experiences from a human piloting a sUAV around static obstacles. The training approach uses video data from a forward-facing camera that records the human pilot's flight. Various computer vision based features are extracted from the video relating to edge and gradient information. The recorded human-controlled inputs are used to train an autonomous control model that correlates the extracted feature vector to a yaw command. As part of the reinforcement learning approach, the autonomous control model is iteratively updated with feedback from a human agent who corrects undesired model output. This data driven approach to autonomous obstacle avoidance is explored for simulated forest environments furthering autonomous flight under the tree canopy research. This enables flight in previously inaccessible environments which are of interest to NASA researchers in Earth and Atmospheric sciences.
Computational Psychiatry and the Challenge of Schizophrenia.

PubMed

Krystal, John H; Murray, John D; Chekroud, Adam M; Corlett, Philip R; Yang, Genevieve; Wang, Xiao-Jing; Anticevic, Alan

2017-05-01

Schizophrenia research is plagued by enormous challenges in integrating and analyzing large datasets and difficulties developing formal theories related to the etiology, pathophysiology, and treatment of this disorder. Computational psychiatry provides a path to enhance analyses of these large and complex datasets and to promote the development and refinement of formal models for features of this disorder. This presentation introduces the reader to the notion of computational psychiatry and describes discovery-oriented and theory-driven applications to schizophrenia involving machine learning, reinforcement learning theory, and biophysically-informed neural circuit models. Published by Oxford University Press on behalf of the Maryland Psychiatric Research Center 2017.

Contextual modulation of value signals in reward and punishment learning

PubMed Central

Palminteri, Stefano; Khamassi, Mehdi; Joffily, Mateus; Coricelli, Giorgio

2015-01-01

Compared with reward seeking, punishment avoidance learning is less clearly understood at both the computational and neurobiological levels. Here we demonstrate, using computational modelling and fMRI in humans, that learning option values in a relative—context-dependent—scale offers a simple computational solution for avoidance learning. The context (or state) value sets the reference point to which an outcome should be compared before updating the option value. Consequently, in contexts with an overall negative expected value, successful punishment avoidance acquires a positive value, thus reinforcing the response. As revealed by post-learning assessment of options values, contextual influences are enhanced when subjects are informed about the result of the forgone alternative (counterfactual information). This is mirrored at the neural level by a shift in negative outcome encoding from the anterior insula to the ventral striatum, suggesting that value contextualization also limits the need to mobilize an opponent punishment learning system. PMID:26302782
Contextual modulation of value signals in reward and punishment learning.

PubMed

Palminteri, Stefano; Khamassi, Mehdi; Joffily, Mateus; Coricelli, Giorgio

2015-08-25

Compared with reward seeking, punishment avoidance learning is less clearly understood at both the computational and neurobiological levels. Here we demonstrate, using computational modelling and fMRI in humans, that learning option values in a relative--context-dependent--scale offers a simple computational solution for avoidance learning. The context (or state) value sets the reference point to which an outcome should be compared before updating the option value. Consequently, in contexts with an overall negative expected value, successful punishment avoidance acquires a positive value, thus reinforcing the response. As revealed by post-learning assessment of options values, contextual influences are enhanced when subjects are informed about the result of the forgone alternative (counterfactual information). This is mirrored at the neural level by a shift in negative outcome encoding from the anterior insula to the ventral striatum, suggesting that value contextualization also limits the need to mobilize an opponent punishment learning system.
Associability-modulated loss learning is increased in posttraumatic stress disorder

PubMed Central

Brown, Vanessa M; Zhu, Lusha; Wang, John M; Frueh, B Christopher

2018-01-01

Disproportionate reactions to unexpected stimuli in the environment are a cardinal symptom of posttraumatic stress disorder (PTSD). Here, we test whether these heightened responses are associated with disruptions in distinct components of reinforcement learning. Specifically, using functional neuroimaging, a loss-learning task, and a computational model-based approach, we assessed the mechanistic hypothesis that overreactions to stimuli in PTSD arise from anomalous gating of attention during learning (i.e., associability). Behavioral choices of combat-deployed veterans with and without PTSD were fit to a reinforcement learning model, generating trial-by-trial prediction errors (signaling unexpected outcomes) and associability values (signaling attention allocation to the unexpected outcomes). Neural substrates of associability value and behavioral parameter estimates of associability updating, but not prediction error, increased with PTSD during loss learning. Moreover, the interaction of PTSD severity with neural markers of associability value predicted behavioral choices. These results indicate that increased attention-based learning may underlie aspects of PTSD and suggest potential neuromechanistic treatment targets. PMID:29313489
Deficits in Positive Reinforcement Learning and Uncertainty-Driven Exploration are Associated with Distinct Aspects of Negative Symptoms in Schizophrenia

PubMed Central

Strauss, Gregory P.; Frank, Michael J.; Waltz, James A.; Kasanova, Zuzana; Herbener, Ellen S.; Gold, James M.

2011-01-01

Background Negative symptoms are core features of schizophrenia; however, the cognitive and neural basis for individual negative symptom domains remains unclear. Converging evidence suggests a role for striatal and prefrontal dopamine in reward learning and the exploration of actions that might produce outcomes that are better than the status quo. The current study examines whether deficits in reinforcement learning and uncertainty-driven exploration predict specific negative symptoms domains. Methods We administered a temporal decision making task, which required trial-by-trial adjustment of reaction time (RT) to maximize reward receipt, to 51 patients with schizophrenia and 39 age-matched healthy controls. Task conditions were designed such that expected value (probability * magnitude) increased (IEV), decreased (DEV), or remained constant (CEV) with increasing response times. Computational analyses were applied to estimate the degree to which trial-by-trial responses are influenced by reinforcement history. Results Individuals with schizophrenia showed impaired Go learning, but intact NoGo learning relative to controls. These effects were pronounced as a function of global measures of negative symptom. Uncertainty-based exploration was substantially reduced in individuals with schizophrenia, and selectively correlated with clinical ratings of anhedonia. Conclusions Schizophrenia patients, particularly those with high negative symptoms, failed to speed RT's to increase positive outcomes and showed reduced tendency to explore when alternative actions could lead to better outcomes than the status quo. Results are interpreted in the context of current computational, genetic, and pharmacological data supporting the roles of striatal and prefrontal dopamine in these processes. PMID:21168124
Cost-Benefit Arbitration Between Multiple Reinforcement-Learning Systems.

PubMed

Kool, Wouter; Gershman, Samuel J; Cushman, Fiery A

2017-09-01

Human behavior is sometimes determined by habit and other times by goal-directed planning. Modern reinforcement-learning theories formalize this distinction as a competition between a computationally cheap but inaccurate model-free system that gives rise to habits and a computationally expensive but accurate model-based system that implements planning. It is unclear, however, how people choose to allocate control between these systems. Here, we propose that arbitration occurs by comparing each system's task-specific costs and benefits. To investigate this proposal, we conducted two experiments showing that people increase model-based control when it achieves greater accuracy than model-free control, and especially when the rewards of accurate performance are amplified. In contrast, they are insensitive to reward amplification when model-based and model-free control yield equivalent accuracy. This suggests that humans adaptively balance habitual and planned action through on-line cost-benefit analysis.
Somatosensory Contribution to the Initial Stages of Human Motor Learning

PubMed Central

Bernardi, Nicolò F.; Darainy, Mohammad

2015-01-01

The early stages of motor skill acquisition are often marked by uncertainty about the sensory and motor goals of the task, as is the case in learning to speak or learning the feel of a good tennis serve. Here we present an experimental model of this early learning process, in which targets are acquired by exploration and reinforcement rather than sensory error. We use this model to investigate the relative contribution of motor and sensory factors to human motor learning. Participants make active reaching movements or matched passive movements to an unseen target using a robot arm. We find that learning through passive movements paired with reinforcement is comparable with learning associated with active movement, both in terms of magnitude and durability, with improvements due to training still observable at a 1 week retest. Motor learning is also accompanied by changes in somatosensory perceptual acuity. No stable changes in motor performance are observed for participants that train, actively or passively, in the absence of reinforcement, or for participants who are given explicit information about target position in the absence of somatosensory experience. These findings indicate that the somatosensory system dominates learning in the early stages of motor skill acquisition. SIGNIFICANCE STATEMENT The research focuses on the initial stages of human motor learning, introducing a new experimental model that closely approximates the key features of motor learning outside of the laboratory. The finding indicates that it is the somatosensory system rather than the motor system that dominates learning in the early stages of motor skill acquisition. This is important given that most of our computational models of motor learning are based on the idea that learning is motoric in origin. This is also a valuable finding for rehabilitation of patients with limited mobility as it shows that reinforcement in conjunction with passive movement results in benefits to motor learning that are as great as those observed for active movement training. PMID:26490869
The involvement of model-based but not model-free learning signals during observational reward learning in the absence of choice.

PubMed

Dunne, Simon; D'Souza, Arun; O'Doherty, John P

2016-06-01

A major open question is whether computational strategies thought to be used during experiential learning, specifically model-based and model-free reinforcement learning, also support observational learning. Furthermore, the question of how observational learning occurs when observers must learn about the value of options from observing outcomes in the absence of choice has not been addressed. In the present study we used a multi-armed bandit task that encouraged human participants to employ both experiential and observational learning while they underwent functional magnetic resonance imaging (fMRI). We found evidence for the presence of model-based learning signals during both observational and experiential learning in the intraparietal sulcus. However, unlike during experiential learning, model-free learning signals in the ventral striatum were not detectable during this form of observational learning. These results provide insight into the flexibility of the model-based learning system, implicating this system in learning during observation as well as from direct experience, and further suggest that the model-free reinforcement learning system may be less flexible with regard to its involvement in observational learning. Copyright © 2016 the American Physiological Society.
Dissociable Learning Processes Underlie Human Pain Conditioning.

PubMed

Zhang, Suyi; Mano, Hiroaki; Ganesh, Gowrishankar; Robbins, Trevor; Seymour, Ben

2016-01-11

Pavlovian conditioning underlies many aspects of pain behavior, including fear and threat detection [1], escape and avoidance learning [2], and endogenous analgesia [3]. Although a central role for the amygdala is well established [4], both human and animal studies implicate other brain regions in learning, notably ventral striatum and cerebellum [5]. It remains unclear whether these regions make different contributions to a single aversive learning process or represent independent learning mechanisms that interact to generate the expression of pain-related behavior. We designed a human parallel aversive conditioning paradigm in which different Pavlovian visual cues probabilistically predicted thermal pain primarily to either the left or right arm and studied the acquisition of conditioned Pavlovian responses using combined physiological recordings and fMRI. Using computational modeling based on reinforcement learning theory, we found that conditioning involves two distinct types of learning process. First, a non-specific "preparatory" system learns aversive facial expressions and autonomic responses such as skin conductance. The associated learning signals-the learned associability and prediction error-were correlated with fMRI brain responses in amygdala-striatal regions, corresponding to the classic aversive (fear) learning circuit. Second, a specific lateralized system learns "consummatory" limb-withdrawal responses, detectable with electromyography of the arm to which pain is predicted. Its related learned associability was correlated with responses in ipsilateral cerebellar cortex, suggesting a novel computational role for the cerebellum in pain. In conclusion, our results show that the overall phenotype of conditioned pain behavior depends on two dissociable reinforcement learning circuits. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
[The Meaning of "Understanding the Brain": Peeking into the Brain of a Computational Neuroscientist].

PubMed

Tanaka, Hirokazu

2016-11-01

What does "understanding the brain" mean? Here, I review how computational neuroscience, a theoretical approach to the brain, can aid our understanding of the brain. First, I illustrate the study of reinforcement learning and dopamine neurons and argue its success in the light of Marr's three levels of computation. Second, I discuss how Marr's program has led to a computational understanding of the brain, and present computational models of the motor cortex and of a spiking neural network as illustrative examples.
Extinction from a Rationalist Perspective

PubMed Central

Gallistel, C. R.

2012-01-01

The merging of the computational theory of mind and evolutionary thinking leads to a kind of rationalism, in which enduring truths about the world have become implicit in the computations that enable the brain to cope with the experienced world. The dead reckoning computation, for example, is implemented within the brains of animals as one of the mechanisms that enables them to learn where they are (Gallistel, 1990, 1995). It integrates a velocity signal with respect to a time signal. Thus, the manner in which position and velocity relate to one another in the world is reflected in the manner in which signals representing those variables are processed in the brain. I use principles of information theory and Bayesian inference to derive from other simple principles explanations for: 1) the failure of partial reinforcement to increase reinforcements to acquisition; 2) the partial reinforcement extinction effect; 3) spontaneous recovery; 4) renewal; 5) reinstatement; 6) resurgence (aka facilitated reacquisition). Like the principle underlying dead-reckoning, these principles are grounded in analytic considerations. They are the kind of enduring truths about the world that are likely to have shaped the brain's computations. PMID:22391153
Effective reinforcement learning following cerebellar damage requires a balance between exploration and motor noise.

PubMed

Therrien, Amanda S; Wolpert, Daniel M; Bastian, Amy J

2016-01-01

Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. © The Author (2015). Published by Oxford University Press on behalf of the Guarantors of Brain.
Effective reinforcement learning following cerebellar damage requires a balance between exploration and motor noise

PubMed Central

Therrien, Amanda S.; Wolpert, Daniel M.

2016-01-01

Abstract See Miall and Galea (doi: 10.1093/awv343 ) for a scientific commentary on this article. Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. PMID:26626368
Apprenticeship Learning: Learning to Schedule from Human Experts

DTIC Science & Technology

2016-06-09

approaches to learning such models are based on Markov models, such as reinforcement learning or inverse reinforcement learning (Busoniu, Babuska, and De...via inverse reinforcement learning. In ICML. Barto, A. G., and Mahadevan, S. 2003. Recent advances in hierarchical reinforcement learning. Discrete...of tasks with temporal constraints. In Proc. AAAI, 2110–2116. Odom, P., and Natarajan, S. 2015. Active advice seeking for inverse reinforcement
An effect of serotonergic stimulation on learning rates for rewards apparent after long intertrial intervals.

PubMed

Iigaya, Kiyohito; Fonseca, Madalena S; Murakami, Masayoshi; Mainen, Zachary F; Dayan, Peter

2018-06-26

Serotonin has widespread, but computationally obscure, modulatory effects on learning and cognition. Here, we studied the impact of optogenetic stimulation of dorsal raphe serotonin neurons in mice performing a non-stationary, reward-driven decision-making task. Animals showed two distinct choice strategies. Choices after short inter-trial-intervals (ITIs) depended only on the last trial outcome and followed a win-stay-lose-switch pattern. In contrast, choices after long ITIs reflected outcome history over multiple trials, as described by reinforcement learning models. We found that optogenetic stimulation during a trial significantly boosted the rate of learning that occurred due to the outcome of that trial, but these effects were only exhibited on choices after long ITIs. This suggests that serotonin neurons modulate reinforcement learning rates, and that this influence is masked by alternate, unaffected, decision mechanisms. These results provide insight into the role of serotonin in treating psychiatric disorders, particularly its modulation of neural plasticity and learning.
Neurocomputational mechanisms of prosocial learning and links to empathy

PubMed Central

Apps, Matthew A. J.; Valton, Vincent; Viding, Essi; Roiser, Jonathan P.

2016-01-01

Reinforcement learning theory powerfully characterizes how we learn to benefit ourselves. In this theory, prediction errors—the difference between a predicted and actual outcome of a choice—drive learning. However, we do not operate in a social vacuum. To behave prosocially we must learn the consequences of our actions for other people. Empathy, the ability to vicariously experience and understand the affect of others, is hypothesized to be a critical facilitator of prosocial behaviors, but the link between empathy and prosocial behavior is still unclear. During functional magnetic resonance imaging (fMRI) participants chose between different stimuli that were probabilistically associated with rewards for themselves (self), another person (prosocial), or no one (control). Using computational modeling, we show that people can learn to obtain rewards for others but do so more slowly than when learning to obtain rewards for themselves. fMRI revealed that activity in a posterior portion of the subgenual anterior cingulate cortex/basal forebrain (sgACC) drives learning only when we are acting in a prosocial context and signals a prosocial prediction error conforming to classical principles of reinforcement learning theory. However, there is also substantial variability in the neural and behavioral efficiency of prosocial learning, which is predicted by trait empathy. More empathic people learn more quickly when benefitting others, and their sgACC response is the most selective for prosocial learning. We thus reveal a computational mechanism driving prosocial learning in humans. This framework could provide insights into atypical prosocial behavior in those with disorders of social cognition. PMID:27528669
Quantum machine learning: a classical perspective

NASA Astrophysics Data System (ADS)

Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Rocchetto, Andrea; Severini, Simone; Wossnig, Leonard

2018-01-01

Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed.
Quantum machine learning: a classical perspective

PubMed Central

Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Severini, Simone; Wossnig, Leonard

2018-01-01

Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed. PMID:29434508
Quantum machine learning: a classical perspective.

PubMed

Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Rocchetto, Andrea; Severini, Simone; Wossnig, Leonard

2018-01-01

Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed.
A simple computational algorithm of model-based choice preference.

PubMed

Toyama, Asako; Katahira, Kentaro; Ohira, Hideki

2017-08-01

A broadly used computational framework posits that two learning systems operate in parallel during the learning of choice preferences-namely, the model-free and model-based reinforcement-learning systems. In this study, we examined another possibility, through which model-free learning is the basic system and model-based information is its modulator. Accordingly, we proposed several modified versions of a temporal-difference learning model to explain the choice-learning process. Using the two-stage decision task developed by Daw, Gershman, Seymour, Dayan, and Dolan (2011), we compared their original computational model, which assumes a parallel learning process, and our proposed models, which assume a sequential learning process. Choice data from 23 participants showed a better fit with the proposed models. More specifically, the proposed eligibility adjustment model, which assumes that the environmental model can weight the degree of the eligibility trace, can explain choices better under both model-free and model-based controls and has a simpler computational algorithm than the original model. In addition, the forgetting learning model and its variation, which assume changes in the values of unchosen actions, substantially improved the fits to the data. Overall, we show that a hybrid computational model best fits the data. The parameters used in this model succeed in capturing individual tendencies with respect to both model use in learning and exploration behavior. This computational model provides novel insights into learning with interacting model-free and model-based components.
Modelling ADHD: A review of ADHD theories through their predictions for computational models of decision-making and reinforcement learning.

PubMed

Ziegler, Sigurd; Pedersen, Mads L; Mowinckel, Athanasia M; Biele, Guido

2016-12-01

Attention deficit hyperactivity disorder (ADHD) is characterized by altered decision-making (DM) and reinforcement learning (RL), for which competing theories propose alternative explanations. Computational modelling contributes to understanding DM and RL by integrating behavioural and neurobiological findings, and could elucidate pathogenic mechanisms behind ADHD. This review of neurobiological theories of ADHD describes predictions for the effect of ADHD on DM and RL as described by the drift-diffusion model of DM (DDM) and a basic RL model. Empirical studies employing these models are also reviewed. While theories often agree on how ADHD should be reflected in model parameters, each theory implies a unique combination of predictions. Empirical studies agree with the theories' assumptions of a lowered DDM drift rate in ADHD, while findings are less conclusive for boundary separation. The few studies employing RL models support a lower choice sensitivity in ADHD, but not an altered learning rate. The discussion outlines research areas for further theoretical refinement in the ADHD field. Copyright Â© 2016 Elsevier Ltd. All rights reserved.

Modeling Humans as Reinforcement Learners: How to Predict Human Behavior in Multi-Stage Games

NASA Technical Reports Server (NTRS)

Lee, Ritchie; Wolpert, David H.; Backhaus, Scott; Bent, Russell; Bono, James; Tracey, Brendan

2011-01-01

This paper introduces a novel framework for modeling interacting humans in a multi-stage game environment by combining concepts from game theory and reinforcement learning. The proposed model has the following desirable characteristics: (1) Bounded rational players, (2) strategic (i.e., players account for one anothers reward functions), and (3) is computationally feasible even on moderately large real-world systems. To do this we extend level-K reasoning to policy space to, for the first time, be able to handle multiple time steps. This allows us to decompose the problem into a series of smaller ones where we can apply standard reinforcement learning algorithms. We investigate these ideas in a cyber-battle scenario over a smart power grid and discuss the relationship between the behavior predicted by our model and what one might expect of real human defenders and attackers.
Cognitive control predicts use of model-based reinforcement learning.

PubMed

Otto, A Ross; Skatova, Anya; Madlon-Kay, Seth; Daw, Nathaniel D

2015-02-01

Accounts of decision-making and its neural substrates have long posited the operation of separate, competing valuation systems in the control of choice behavior. Recent theoretical and experimental work suggest that this classic distinction between behaviorally and neurally dissociable systems for habitual and goal-directed (or more generally, automatic and controlled) choice may arise from two computational strategies for reinforcement learning (RL), called model-free and model-based RL, but the cognitive or computational processes by which one system may dominate over the other in the control of behavior is a matter of ongoing investigation. To elucidate this question, we leverage the theoretical framework of cognitive control, demonstrating that individual differences in utilization of goal-related contextual information--in the service of overcoming habitual, stimulus-driven responses--in established cognitive control paradigms predict model-based behavior in a separate, sequential choice task. The behavioral correspondence between cognitive control and model-based RL compellingly suggests that a common set of processes may underpin the two behaviors. In particular, computational mechanisms originally proposed to underlie controlled behavior may be applicable to understanding the interactions between model-based and model-free choice behavior.
Model-based hierarchical reinforcement learning and human action control

PubMed Central

Botvinick, Matthew; Weinstein, Ari

2014-01-01

Recent work has reawakened interest in goal-directed or ‘model-based’ choice, where decisions are based on prospective evaluation of potential action outcomes. Concurrently, there has been growing attention to the role of hierarchy in decision-making and action control. We focus here on the intersection between these two areas of interest, considering the topic of hierarchical model-based control. To characterize this form of action control, we draw on the computational framework of hierarchical reinforcement learning, using this to interpret recent empirical findings. The resulting picture reveals how hierarchical model-based mechanisms might play a special and pivotal role in human decision-making, dramatically extending the scope and complexity of human behaviour. PMID:25267822
Reinforcement learning signals in the anterior cingulate cortex code for others' false beliefs.

PubMed

Apps, M A J; Green, R; Ramnani, N

2013-01-01

The ability to recognise that another's belief is false is a hallmark of our capacity to understand others' mental states. It has been suggested that the computational and neural mechanisms that underpin learning about others' mental states may be similar to those that underpin first-person Reinforcement Learning (RL). In RL, unexpected decision-making outcomes constitute prediction errors (PE), which are coded for by neurons in the Anterior Cingulate Cortex (ACC). Does the ACC signal the PEs (false beliefs) of others about the outcomes of their decisions? We scanned subjects using fMRI while they monitored a third-person's decisions and similar responses made by a computer. The outcomes of the trials were manipulated, such that the actual outcome was unexpectedly different from the predicted outcome on 1/3 of trials. We examined activity time-locked to privileged information which indicated the actual outcomes only to subjects. Activity in the gyral ACC was found when the outcomes of the third-person's decisions were unexpectedly positive. Activity in the sulcal ACC was found when the third-person's or computer's outcomes were unexpectedly positive. We suggest that a property of the ACC is that it codes PEs, with a portion of the gyral ACC specialised for processing the PEs of others. Copyright © 2012 Elsevier Inc. All rights reserved.
Learning tactile skills through curious exploration

PubMed Central

Pape, Leo; Oddo, Calogero M.; Controzzi, Marco; Cipriani, Christian; Förster, Alexander; Carrozza, Maria C.; Schmidhuber, Jürgen

2012-01-01

We present curiosity-driven, autonomous acquisition of tactile exploratory skills on a biomimetic robot finger equipped with an array of microelectromechanical touch sensors. Instead of building tailored algorithms for solving a specific tactile task, we employ a more general curiosity-driven reinforcement learning approach that autonomously learns a set of motor skills in absence of an explicit teacher signal. In this approach, the acquisition of skills is driven by the information content of the sensory input signals relative to a learner that aims at representing sensory inputs using fewer and fewer computational resources. We show that, from initially random exploration of its environment, the robotic system autonomously develops a small set of basic motor skills that lead to different kinds of tactile input. Next, the system learns how to exploit the learned motor skills to solve supervised texture classification tasks. Our approach demonstrates the feasibility of autonomous acquisition of tactile skills on physical robotic platforms through curiosity-driven reinforcement learning, overcomes typical difficulties of engineered solutions for active tactile exploration and underactuated control, and provides a basis for studying developmental learning through intrinsic motivation in robots. PMID:22837748
Believer-Skeptic Meets Actor-Critic: Rethinking the Role of Basal Ganglia Pathways during Decision-Making and Reinforcement Learning

PubMed Central

Dunovan, Kyle; Verstynen, Timothy

2016-01-01

The flexibility of behavioral control is a testament to the brain's capacity for dynamically resolving uncertainty during goal-directed actions. This ability to select actions and learn from immediate feedback is driven by the dynamics of basal ganglia (BG) pathways. A growing body of empirical evidence conflicts with the traditional view that these pathways act as independent levers for facilitating (i.e., direct pathway) or suppressing (i.e., indirect pathway) motor output, suggesting instead that they engage in a dynamic competition during action decisions that computationally captures action uncertainty. Here we discuss the utility of encoding action uncertainty as a dynamic competition between opposing control pathways and provide evidence that this simple mechanism may have powerful implications for bridging neurocomputational theories of decision making and reinforcement learning. PMID:27047328
Reinforcement learning for a biped robot based on a CPG-actor-critic method.

PubMed

Nakamura, Yutaka; Mori, Takeshi; Sato, Masa-aki; Ishii, Shin

2007-08-01

Animals' rhythmic movements, such as locomotion, are considered to be controlled by neural circuits called central pattern generators (CPGs), which generate oscillatory signals. Motivated by this biological mechanism, studies have been conducted on the rhythmic movements controlled by CPG. As an autonomous learning framework for a CPG controller, we propose in this article a reinforcement learning method we call the "CPG-actor-critic" method. This method introduces a new architecture to the actor, and its training is roughly based on a stochastic policy gradient algorithm presented recently. We apply this method to an automatic acquisition problem of control for a biped robot. Computer simulations show that training of the CPG can be successfully performed by our method, thus allowing the biped robot to not only walk stably but also adapt to environmental changes.
Believer-Skeptic Meets Actor-Critic: Rethinking the Role of Basal Ganglia Pathways during Decision-Making and Reinforcement Learning.

PubMed

Dunovan, Kyle; Verstynen, Timothy

2016-01-01

The flexibility of behavioral control is a testament to the brain's capacity for dynamically resolving uncertainty during goal-directed actions. This ability to select actions and learn from immediate feedback is driven by the dynamics of basal ganglia (BG) pathways. A growing body of empirical evidence conflicts with the traditional view that these pathways act as independent levers for facilitating (i.e., direct pathway) or suppressing (i.e., indirect pathway) motor output, suggesting instead that they engage in a dynamic competition during action decisions that computationally captures action uncertainty. Here we discuss the utility of encoding action uncertainty as a dynamic competition between opposing control pathways and provide evidence that this simple mechanism may have powerful implications for bridging neurocomputational theories of decision making and reinforcement learning.
The neuroscience of learning: beyond the Hebbian synapse.

PubMed

Gallistel, C R; Matzel, Louis D

2013-01-01

From the traditional perspective of associative learning theory, the hypothesis linking modifications of synaptic transmission to learning and memory is plausible. It is less so from an information-processing perspective, in which learning is mediated by computations that make implicit commitments to physical and mathematical principles governing the domains where domain-specific cognitive mechanisms operate. We compare the properties of associative learning and memory to the properties of long-term potentiation, concluding that the properties of the latter do not explain the fundamental properties of the former. We briefly review the neuroscience of reinforcement learning, emphasizing the representational implications of the neuroscientific findings. We then review more extensively findings that confirm the existence of complex computations in three information-processing domains: probabilistic inference, the representation of uncertainty, and the representation of space. We argue for a change in the conceptual framework within which neuroscientists approach the study of learning mechanisms in the brain.
Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing

PubMed Central

Lefebvre, Germain; Blakemore, Sarah-Jayne

2017-01-01

Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice. PMID:28800597
Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing.

PubMed

Palminteri, Stefano; Lefebvre, Germain; Kilford, Emma J; Blakemore, Sarah-Jayne

2017-08-01

Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice.
Geometry and Op Art.

ERIC Educational Resources Information Center

Brewer, Evelyn J.

1999-01-01

Describes an activity in which students use computers and techniques from Op Art to learn various geometric concepts. Allows them to see the distinct connection between art and mathematics from a personal perspective. Reinforces writing, speaking, and drawing skills while creating slide shows related to the project. (ASK)
Extinction from a rationalist perspective.

PubMed

Gallistel, C R

2012-05-01

The merging of the computational theory of mind and evolutionary thinking leads to a kind of rationalism, in which enduring truths about the world have become implicit in the computations that enable the brain to cope with the experienced world. The dead reckoning computation, for example, is implemented within the brains of animals as one of the mechanisms that enables them to learn where they are (Gallistel, 1990, 1995). It integrates a velocity signal with respect to a time signal. Thus, the manner in which position and velocity relate to one another in the world is reflected in the manner in which signals representing those variables are processed in the brain. I use principles of information theory and Bayesian inference to derive from other simple principles explanations for: (1) the failure of partial reinforcement to increase reinforcements to acquisition; (2) the partial reinforcement extinction effect; (3) spontaneous recovery; (4) renewal; (5) reinstatement; (6) resurgence (aka facilitated reacquisition). Like the principle underlying dead-reckoning, these principles are grounded in analytic considerations. They are the kind of enduring truths about the world that are likely to have shaped the brain's computations. Copyright © 2012 Elsevier B.V. All rights reserved.
Stuttering Thoughts: Negative Self-Referent Thinking Is Less Sensitive to Aversive Outcomes in People with Higher Levels of Depressive Symptoms

PubMed Central

Iijima, Yudai; Takano, Keisuke; Boddez, Yannick; Raes, Filip; Tanno, Yoshihiko

2017-01-01

Learning theories of depression have proposed that depressive cognitions, such as negative thoughts with reference to oneself, can develop through a reinforcement learning mechanism. This negative self-reference is considered to be positively reinforced by rewarding experiences such as genuine support from others after negative self-disclosure, and negatively reinforced by avoidance of potential aversive situations. The learning account additionally predicts that negative self-reference would be maintained by an inability to adjust one’s behavior when negative self-reference no longer leads to such reward. To test this prediction, we designed an adapted version of the reversal-learning task. In this task, participants were reinforced to choose and engage in either negative or positive self-reference by probabilistic economic reward and punishment. Although participants were initially trained to choose negative self-reference, the stimulus-reward contingencies were reversed to prompt a shift toward positive self-reference (Study 1) and a further shift toward negative self-reference (Study 2). Model-based computational analyses showed that depressive symptoms were associated with a low learning rate of negative self-reference, indicating a high level of reward expectancy for negative self-reference even after the contingency reversal. Furthermore, the difficulty in updating outcome predictions of negative self-reference was significantly associated with the extent to which one possesses negative self-images. These results suggest that difficulty in adjusting action-outcome estimates for negative self-reference increases the chance to be faced with negative aspects of self, which may result in depressive symptoms. PMID:28824511
Stuttering Thoughts: Negative Self-Referent Thinking Is Less Sensitive to Aversive Outcomes in People with Higher Levels of Depressive Symptoms.

PubMed

Iijima, Yudai; Takano, Keisuke; Boddez, Yannick; Raes, Filip; Tanno, Yoshihiko

2017-01-01

Learning theories of depression have proposed that depressive cognitions, such as negative thoughts with reference to oneself, can develop through a reinforcement learning mechanism. This negative self-reference is considered to be positively reinforced by rewarding experiences such as genuine support from others after negative self-disclosure, and negatively reinforced by avoidance of potential aversive situations. The learning account additionally predicts that negative self-reference would be maintained by an inability to adjust one's behavior when negative self-reference no longer leads to such reward. To test this prediction, we designed an adapted version of the reversal-learning task. In this task, participants were reinforced to choose and engage in either negative or positive self-reference by probabilistic economic reward and punishment. Although participants were initially trained to choose negative self-reference, the stimulus-reward contingencies were reversed to prompt a shift toward positive self-reference (Study 1) and a further shift toward negative self-reference (Study 2). Model-based computational analyses showed that depressive symptoms were associated with a low learning rate of negative self-reference, indicating a high level of reward expectancy for negative self-reference even after the contingency reversal. Furthermore, the difficulty in updating outcome predictions of negative self-reference was significantly associated with the extent to which one possesses negative self-images. These results suggest that difficulty in adjusting action-outcome estimates for negative self-reference increases the chance to be faced with negative aspects of self, which may result in depressive symptoms.
Valence-Dependent Belief Updating: Computational Validation

PubMed Central

Kuzmanovic, Bojana; Rigoux, Lionel

2017-01-01

People tend to update beliefs about their future outcomes in a valence-dependent way: they are likely to incorporate good news and to neglect bad news. However, belief formation is a complex process which depends not only on motivational factors such as the desire for favorable conclusions, but also on multiple cognitive variables such as prior beliefs, knowledge about personal vulnerabilities and resources, and the size of the probabilities and estimation errors. Thus, we applied computational modeling in order to test for valence-induced biases in updating while formally controlling for relevant cognitive factors. We compared biased and unbiased Bayesian models of belief updating, and specified alternative models based on reinforcement learning. The experiment consisted of 80 trials with 80 different adverse future life events. In each trial, participants estimated the base rate of one of these events and estimated their own risk of experiencing the event before and after being confronted with the actual base rate. Belief updates corresponded to the difference between the two self-risk estimates. Valence-dependent updating was assessed by comparing trials with good news (better-than-expected base rates) with trials with bad news (worse-than-expected base rates). After receiving bad relative to good news, participants' updates were smaller and deviated more strongly from rational Bayesian predictions, indicating a valence-induced bias. Model comparison revealed that the biased (i.e., optimistic) Bayesian model of belief updating better accounted for data than the unbiased (i.e., rational) Bayesian model, confirming that the valence of the new information influenced the amount of updating. Moreover, alternative computational modeling based on reinforcement learning demonstrated higher learning rates for good than for bad news, as well as a moderating role of personal knowledge. Finally, in this specific experimental context, the approach based on reinforcement learning was superior to the Bayesian approach. The computational validation of valence-dependent belief updating represents a novel support for a genuine optimism bias in human belief formation. Moreover, the precise control of relevant cognitive variables justifies the conclusion that the motivation to adopt the most favorable self-referential conclusions biases human judgments. PMID:28706499
Valence-Dependent Belief Updating: Computational Validation.

PubMed

Kuzmanovic, Bojana; Rigoux, Lionel

2017-01-01

People tend to update beliefs about their future outcomes in a valence-dependent way: they are likely to incorporate good news and to neglect bad news. However, belief formation is a complex process which depends not only on motivational factors such as the desire for favorable conclusions, but also on multiple cognitive variables such as prior beliefs, knowledge about personal vulnerabilities and resources, and the size of the probabilities and estimation errors. Thus, we applied computational modeling in order to test for valence-induced biases in updating while formally controlling for relevant cognitive factors. We compared biased and unbiased Bayesian models of belief updating, and specified alternative models based on reinforcement learning. The experiment consisted of 80 trials with 80 different adverse future life events. In each trial, participants estimated the base rate of one of these events and estimated their own risk of experiencing the event before and after being confronted with the actual base rate. Belief updates corresponded to the difference between the two self-risk estimates. Valence-dependent updating was assessed by comparing trials with good news (better-than-expected base rates) with trials with bad news (worse-than-expected base rates). After receiving bad relative to good news, participants' updates were smaller and deviated more strongly from rational Bayesian predictions, indicating a valence-induced bias. Model comparison revealed that the biased (i.e., optimistic) Bayesian model of belief updating better accounted for data than the unbiased (i.e., rational) Bayesian model, confirming that the valence of the new information influenced the amount of updating. Moreover, alternative computational modeling based on reinforcement learning demonstrated higher learning rates for good than for bad news, as well as a moderating role of personal knowledge. Finally, in this specific experimental context, the approach based on reinforcement learning was superior to the Bayesian approach. The computational validation of valence-dependent belief updating represents a novel support for a genuine optimism bias in human belief formation. Moreover, the precise control of relevant cognitive variables justifies the conclusion that the motivation to adopt the most favorable self-referential conclusions biases human judgments.
Rational and Mechanistic Perspectives on Reinforcement Learning

ERIC Educational Resources Information Center

Chater, Nick

2009-01-01

This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…
Mechanisms of Hierarchical Reinforcement Learning in Corticostriatal Circuits 1: Computational Analysis

PubMed Central

Badre, David

2012-01-01

Growing evidence suggests that the prefrontal cortex (PFC) is organized hierarchically, with more anterior regions having increasingly abstract representations. How does this organization support hierarchical cognitive control and the rapid discovery of abstract action rules? We present computational models at different levels of description. A neural circuit model simulates interacting corticostriatal circuits organized hierarchically. In each circuit, the basal ganglia gate frontal actions, with some striatal units gating the inputs to PFC and others gating the outputs to influence response selection. Learning at all of these levels is accomplished via dopaminergic reward prediction error signals in each corticostriatal circuit. This functionality allows the system to exhibit conditional if–then hypothesis testing and to learn rapidly in environments with hierarchical structure. We also develop a hybrid Bayesian-reinforcement learning mixture of experts (MoE) model, which can estimate the most likely hypothesis state of individual participants based on their observed sequence of choices and rewards. This model yields accurate probabilistic estimates about which hypotheses are attended by manipulating attentional states in the generative neural model and recovering them with the MoE model. This 2-pronged modeling approach leads to multiple quantitative predictions that are tested with functional magnetic resonance imaging in the companion paper. PMID:21693490
Working Memory Load Strengthens Reward Prediction Errors.

PubMed

Collins, Anne G E; Ciullo, Brittany; Frank, Michael J; Badre, David

2017-04-19

Reinforcement learning (RL) in simple instrumental tasks is usually modeled as a monolithic process in which reward prediction errors (RPEs) are used to update expected values of choice options. This modeling ignores the different contributions of different memory and decision-making systems thought to contribute even to simple learning. In an fMRI experiment, we investigated how working memory (WM) and incremental RL processes interact to guide human learning. WM load was manipulated by varying the number of stimuli to be learned across blocks. Behavioral results and computational modeling confirmed that learning was best explained as a mixture of two mechanisms: a fast, capacity-limited, and delay-sensitive WM process together with slower RL. Model-based analysis of fMRI data showed that striatum and lateral prefrontal cortex were sensitive to RPE, as shown previously, but, critically, these signals were reduced when the learning problem was within capacity of WM. The degree of this neural interaction related to individual differences in the use of WM to guide behavioral learning. These results indicate that the two systems do not process information independently, but rather interact during learning. SIGNIFICANCE STATEMENT Reinforcement learning (RL) theory has been remarkably productive at improving our understanding of instrumental learning as well as dopaminergic and striatal network function across many mammalian species. However, this neural network is only one contributor to human learning and other mechanisms such as prefrontal cortex working memory also play a key role. Our results also show that these other players interact with the dopaminergic RL system, interfering with its key computation of reward prediction errors. Copyright © 2017 the authors 0270-6474/17/374332-11$15.00/0.

Developing PFC representations using reinforcement learning.

PubMed

Reynolds, Jeremy R; O'Reilly, Randall C

2009-12-01

From both functional and biological considerations, it is widely believed that action production, planning, and goal-oriented behaviors supported by the frontal cortex are organized hierarchically [Fuster (1991); Koechlin, E., Ody, C., & Kouneiher, F. (2003). Neuroscience: The architecture of cognitive control in the human prefrontal cortex. Science, 424, 1181-1184; Miller, G. A., Galanter, E., & Pribram, K. H. (1960). Plans and the structure of behavior. New York: Holt]. However, the nature of the different levels of the hierarchy remains unclear, and little attention has been paid to the origins of such a hierarchy. We address these issues through biologically-inspired computational models that develop representations through reinforcement learning. We explore several different factors in these models that might plausibly give rise to a hierarchical organization of representations within the PFC, including an initial connectivity hierarchy within PFC, a hierarchical set of connections between PFC and subcortical structures controlling it, and differential synaptic plasticity schedules. Simulation results indicate that architectural constraints contribute to the segregation of different types of representations, and that this segregation facilitates learning. These findings are consistent with the idea that there is a functional hierarchy in PFC, as captured in our earlier computational models of PFC function and a growing body of empirical data.
An analysis of intergroup rivalry using Ising model and reinforcement learning

NASA Astrophysics Data System (ADS)

Zhao, Feng-Fei; Qin, Zheng; Shao, Zhuo

2014-01-01

Modeling of intergroup rivalry can help us better understand economic competitions, political elections and other similar activities. The result of intergroup rivalry depends on the co-evolution of individual behavior within one group and the impact from the rival group. In this paper, we model the rivalry behavior using Ising model. Different from other simulation studies using Ising model, the evolution rules of each individual in our model are not static, but have the ability to learn from historical experience using reinforcement learning technique, which makes the simulation more close to real human behavior. We studied the phase transition in intergroup rivalry and focused on the impact of the degree of social freedom, the personality of group members and the social experience of individuals. The results of computer simulation show that a society with a low degree of social freedom and highly educated, experienced individuals is more likely to be one-sided in intergroup rivalry.
Reinforcement learning solution for HJB equation arising in constrained optimal control problem.

PubMed

Luo, Biao; Wu, Huai-Ning; Huang, Tingwen; Liu, Derong

2015-11-01

The constrained optimal control problem depends on the solution of the complicated Hamilton-Jacobi-Bellman equation (HJBE). In this paper, a data-based off-policy reinforcement learning (RL) method is proposed, which learns the solution of the HJBE and the optimal control policy from real system data. One important feature of the off-policy RL is that its policy evaluation can be realized with data generated by other behavior policies, not necessarily the target policy, which solves the insufficient exploration problem. The convergence of the off-policy RL is proved by demonstrating its equivalence to the successive approximation approach. Its implementation procedure is based on the actor-critic neural networks structure, where the function approximation is conducted with linearly independent basis functions. Subsequently, the convergence of the implementation procedure with function approximation is also proved. Finally, its effectiveness is verified through computer simulations. Copyright © 2015 Elsevier Ltd. All rights reserved.
Learning, epigenetics, and computation: An extension on Fitch's proposal. Comment on “Toward a computational framework for cognitive biology: Unifying approaches from cognitive neuroscience and comparative cognition” by W. Tecumseh Fitch

NASA Astrophysics Data System (ADS)

Okanoya, Kazuo

2014-09-01

The comparative computational approach of Fitch [1] attempts to renew the classical David Marr paradigm of computation, algorithm, and implementation, by introducing evolutionary view of the relationship between neural architecture and cognition. This comparative evolutionary view provides constraints useful in narrowing down the problem space for both cognition and neural mechanisms. I will provide two examples from our own studies that reinforce and extend Fitch's proposal.
Somato-dendritic Synaptic Plasticity and Error-backpropagation in Active Dendrites

PubMed Central

Schiess, Mathieu; Urbanczik, Robert; Senn, Walter

2016-01-01

In the last decade dendrites of cortical neurons have been shown to nonlinearly combine synaptic inputs by evoking local dendritic spikes. It has been suggested that these nonlinearities raise the computational power of a single neuron, making it comparable to a 2-layer network of point neurons. But how these nonlinearities can be incorporated into the synaptic plasticity to optimally support learning remains unclear. We present a theoretically derived synaptic plasticity rule for supervised and reinforcement learning that depends on the timing of the presynaptic, the dendritic and the postsynaptic spikes. For supervised learning, the rule can be seen as a biological version of the classical error-backpropagation algorithm applied to the dendritic case. When modulated by a delayed reward signal, the same plasticity is shown to maximize the expected reward in reinforcement learning for various coding scenarios. Our framework makes specific experimental predictions and highlights the unique advantage of active dendrites for implementing powerful synaptic plasticity rules that have access to downstream information via backpropagation of action potentials. PMID:26841235
Negative reinforcement learning is affected in substance dependence.

PubMed

Thompson, Laetitia L; Claus, Eric D; Mikulich-Gilbertson, Susan K; Banich, Marie T; Crowley, Thomas; Krmpotich, Theodore; Miller, David; Tanabe, Jody

2012-06-01

Negative reinforcement results in behavior to escape or avoid an aversive outcome. Withdrawal symptoms are purported to be negative reinforcers in perpetuating substance dependence, but little is known about negative reinforcement learning in this population. The purpose of this study was to examine reinforcement learning in substance dependent individuals (SDI), with an emphasis on assessing negative reinforcement learning. We modified the Iowa Gambling Task to separately assess positive and negative reinforcement. We hypothesized that SDI would show differences in negative reinforcement learning compared to controls and we investigated whether learning differed as a function of the relative magnitude or frequency of the reinforcer. Thirty subjects dependent on psychostimulants were compared with 28 community controls on a decision making task that manipulated outcome frequencies and magnitudes and required an action to avoid a negative outcome. SDI did not learn to avoid negative outcomes to the same degree as controls. This difference was driven by the magnitude, not the frequency, of negative feedback. In contrast, approach behaviors in response to positive reinforcement were similar in both groups. Our findings are consistent with a specific deficit in negative reinforcement learning in SDI. SDI were relatively insensitive to the magnitude, not frequency, of loss. If this generalizes to drug-related stimuli, it suggests that repeated episodes of withdrawal may drive relapse more than the severity of a single episode. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
An Upside to Reward Sensitivity: The Hippocampus Supports Enhanced Reinforcement Learning in Adolescence.

PubMed

Davidow, Juliet Y; Foerde, Karin; Galván, Adriana; Shohamy, Daphna

2016-10-05

Adolescents are notorious for engaging in reward-seeking behaviors, a tendency attributed to heightened activity in the brain's reward systems during adolescence. It has been suggested that reward sensitivity in adolescence might be adaptive, but evidence of an adaptive role has been scarce. Using a probabilistic reinforcement learning task combined with reinforcement learning models and fMRI, we found that adolescents showed better reinforcement learning and a stronger link between reinforcement learning and episodic memory for rewarding outcomes. This behavioral benefit was related to heightened prediction error-related BOLD activity in the hippocampus and to stronger functional connectivity between the hippocampus and the striatum at the time of reinforcement. These findings reveal an important role for the hippocampus in reinforcement learning in adolescence and suggest that reward sensitivity in adolescence is related to adaptive differences in how adolescents learn from experience. Copyright © 2016 Elsevier Inc. All rights reserved.
Cognitive Control Predicts Use of Model-Based Reinforcement-Learning

PubMed Central

Otto, A. Ross; Skatova, Anya; Madlon-Kay, Seth; Daw, Nathaniel D.

2015-01-01

Accounts of decision-making and its neural substrates have long posited the operation of separate, competing valuation systems in the control of choice behavior. Recent theoretical and experimental work suggest that this classic distinction between behaviorally and neurally dissociable systems for habitual and goal-directed (or more generally, automatic and controlled) choice may arise from two computational strategies for reinforcement learning (RL), called model-free and model-based RL, but the cognitive or computational processes by which one system may dominate over the other in the control of behavior is a matter of ongoing investigation. To elucidate this question, we leverage the theoretical framework of cognitive control, demonstrating that individual differences in utilization of goal-related contextual information—in the service of overcoming habitual, stimulus-driven responses—in established cognitive control paradigms predict model-based behavior in a separate, sequential choice task. The behavioral correspondence between cognitive control and model-based RL compellingly suggests that a common set of processes may underpin the two behaviors. In particular, computational mechanisms originally proposed to underlie controlled behavior may be applicable to understanding the interactions between model-based and model-free choice behavior. PMID:25170791
Co-Learning and the Evolution of Social Activity,

DTIC Science & Technology

1994-03-01

section; here we will develop the material in a self-contained fashion. 2.1 Social Games We start by defining the standard notion of a (one-shot...combined body of material is so rich that neither we, nor anyone else with whom we have discussed these matters (and we have discussed them...importing computer- sciency elements in the spirit of reinforcement learning. More specifically on the latter, since our perspective is that of system
Combining Offline and Online Computation for Solving Partially Observable Markov Decision Process

DTIC Science & Technology

2015-03-06

David Hsu and Wee Sun Lee, Monte Carlo Bayesian Reinforcement Learning, International Conference on Machine Learning (ICML), 2012. • Haoyu Bai, David...and Automation (ICRA), 2015. • Zhan Wei Lim, David Hsu, and Wee Sun Lee, Adaptive Informative Path Planning in Metric Spaces. Submitted to Int. J... Automation (ICRA), 2015. 2. Bai, H., Hsu, D., Kochenderfer, M. J., and Lee, W. S., Unmanned aircraft collision avoidance using continuous state POMDPs
Incorporating a Collaborative Web-Based Virtual Laboratory in an Undergraduate Bioinformatics Course

ERIC Educational Resources Information Center

Weisman, David

2010-01-01

Face-to-face bioinformatics courses commonly include a weekly, in-person computer lab to facilitate active learning, reinforce conceptual material, and teach practical skills. Similarly, fully-online bioinformatics courses employ hands-on exercises to achieve these outcomes, although students typically perform this work offsite. Combining a…
Learning Gains and Response to Digital Lessons on Soil Genesis and Development

USDA-ARS?s Scientific Manuscript database

Evolving computer technology offers opportunities for new online approaches in teaching methods and delivery. Well-designed online lessons should reinforce the critical need of the soil science discipline in today’s food, energy, and environmental issues, as well as meet the needs of the diverse cli...
The use of computer-aided learning in chemistry laboratory instruction

NASA Astrophysics Data System (ADS)

Allred, Brian Robert Tracy

This research involves developing and implementing computer software for chemistry laboratory instruction. The specific goal is to design the software and investigate whether it can be used to introduce concepts and laboratory procedures without a lecture format. This would allow students to conduct an experiment even though they may not have been introduced to the chemical concept in their lecture course. This would also allow for another type of interaction for those students who respond more positively to a visual approach to instruction. The first module developed was devoted to using computer software to help introduce students to the concepts related to thin-layer chromatography and setting up and running an experiment. This was achieved through the use of digitized pictures and digitized video clips along with written information. A review quiz was used to help reinforce the learned information. The second module was devoted to the concept of the "dry lab". This module presented students with relevant information regarding the chemical concepts and then showed them the outcome of mixing solutions. By these observations, they were to determine the composition of unknown solutions based on provided descriptions and comparison with their written observations. The third piece of the software designed was a computer game. This program followed the first two modules in providing information the students were to learn. The difference here, though, was incorporating a game scenario for students to use to help reinforce the learning. Students were then assessed to see how much information they retained after playing the game. In each of the three cases, a control group exposed to the traditional lecture format was used. Their results were compared to the experimental group using the computer modules. Based upon the findings, it can be concluded that using technology to aid in the instructional process is definitely of benefit and students were more successful in learning. It is important to note, though, that one single type of instructional method is not the best way to inspire learning. It seems multiple methods provide the best educational experience for all.
Spore: Spawning Evolutionary Misconceptions?

NASA Astrophysics Data System (ADS)

Bean, Thomas E.; Sinatra, Gale M.; Schrader, P. G.

2010-10-01

The use of computer simulations as educational tools may afford the means to develop understanding of evolution as a natural, emergent, and decentralized process. However, special consideration of developmental constraints on learning may be necessary when using these technologies. Specifically, the essentialist (biological forms possess an immutable essence), teleological (assignment of purpose to living things and/or parts of living things that may not be purposeful), and intentionality (assumption that events are caused by an intelligent agent) biases may be reinforced through the use of computer simulations, rather than addressed with instruction. We examine the video game Spore for its depiction of evolutionary content and its potential to reinforce these cognitive biases. In particular, we discuss three pedagogical strategies to mitigate weaknesses of Spore and other computer simulations: directly targeting misconceptions through refutational approaches, targeting specific principles of scientific inquiry, and directly addressing issues related to models as cognitive tools.
The left hemisphere learns what is right: Hemispatial reward learning depends on reinforcement learning processes in the contralateral hemisphere.

PubMed

Aberg, Kristoffer Carl; Doell, Kimberly Crystal; Schwartz, Sophie

2016-08-01

Orienting biases refer to consistent, trait-like direction of attention or locomotion toward one side of space. Recent studies suggest that such hemispatial biases may determine how well people memorize information presented in the left or right hemifield. Moreover, lesion studies indicate that learning rewarded stimuli in one hemispace depends on the integrity of the contralateral striatum. However, the exact neural and computational mechanisms underlying the influence of individual orienting biases on reward learning remain unclear. Because reward-based behavioural adaptation depends on the dopaminergic system and prediction error (PE) encoding in the ventral striatum, we hypothesized that hemispheric asymmetries in dopamine (DA) function may determine individual spatial biases in reward learning. To test this prediction, we acquired fMRI in 33 healthy human participants while they performed a lateralized reward task. Learning differences between hemispaces were assessed by presenting stimuli, assigned to different reward probabilities, to the left or right of central fixation, i.e. presented in the left or right visual hemifield. Hemispheric differences in DA function were estimated through differential fMRI responses to positive vs. negative feedback in the left vs. right ventral striatum, and a computational approach was used to identify the neural correlates of PEs. Our results show that spatial biases favoring reward learning in the right (vs. left) hemifield were associated with increased reward responses in the left hemisphere and relatively better neural encoding of PEs for stimuli presented in the right (vs. left) hemifield. These findings demonstrate that trait-like spatial biases implicate hemisphere-specific learning mechanisms, with individual differences between hemispheres contributing to reinforcing spatial biases. Copyright © 2016 Elsevier Ltd. All rights reserved.
Roles of OA1 octopamine receptor and Dop1 dopamine receptor in mediating appetitive and aversive reinforcement revealed by RNAi studies

PubMed Central

Awata, Hiroko; Wakuda, Ryo; Ishimaru, Yoshiyasu; Matsuoka, Yuji; Terao, Kanta; Katata, Satomi; Matsumoto, Yukihisa; Hamanaka, Yoshitaka; Noji, Sumihare; Mito, Taro; Mizunami, Makoto

2016-01-01

Revealing reinforcing mechanisms in associative learning is important for elucidation of brain mechanisms of behavior. In mammals, dopamine neurons are thought to mediate both appetitive and aversive reinforcement signals. Studies using transgenic fruit-flies suggested that dopamine neurons mediate both appetitive and aversive reinforcements, through the Dop1 dopamine receptor, but our studies using octopamine and dopamine receptor antagonists and using Dop1 knockout crickets suggested that octopamine neurons mediate appetitive reinforcement and dopamine neurons mediate aversive reinforcement in associative learning in crickets. To fully resolve this issue, we examined the effects of silencing of expression of genes that code the OA1 octopamine receptor and Dop1 and Dop2 dopamine receptors by RNAi in crickets. OA1-silenced crickets exhibited impairment in appetitive learning with water but not in aversive learning with sodium chloride solution, while Dop1-silenced crickets exhibited impairment in aversive learning but not in appetitive learning. Dop2-silenced crickets showed normal scores in both appetitive learning and aversive learning. The results indicate that octopamine neurons mediate appetitive reinforcement via OA1 and that dopamine neurons mediate aversive reinforcement via Dop1 in crickets, providing decisive evidence that neurotransmitters and receptors that mediate appetitive reinforcement indeed differ among different species of insects. PMID:27412401
Roles of OA1 octopamine receptor and Dop1 dopamine receptor in mediating appetitive and aversive reinforcement revealed by RNAi studies.

PubMed

Awata, Hiroko; Wakuda, Ryo; Ishimaru, Yoshiyasu; Matsuoka, Yuji; Terao, Kanta; Katata, Satomi; Matsumoto, Yukihisa; Hamanaka, Yoshitaka; Noji, Sumihare; Mito, Taro; Mizunami, Makoto

2016-07-14

Revealing reinforcing mechanisms in associative learning is important for elucidation of brain mechanisms of behavior. In mammals, dopamine neurons are thought to mediate both appetitive and aversive reinforcement signals. Studies using transgenic fruit-flies suggested that dopamine neurons mediate both appetitive and aversive reinforcements, through the Dop1 dopamine receptor, but our studies using octopamine and dopamine receptor antagonists and using Dop1 knockout crickets suggested that octopamine neurons mediate appetitive reinforcement and dopamine neurons mediate aversive reinforcement in associative learning in crickets. To fully resolve this issue, we examined the effects of silencing of expression of genes that code the OA1 octopamine receptor and Dop1 and Dop2 dopamine receptors by RNAi in crickets. OA1-silenced crickets exhibited impairment in appetitive learning with water but not in aversive learning with sodium chloride solution, while Dop1-silenced crickets exhibited impairment in aversive learning but not in appetitive learning. Dop2-silenced crickets showed normal scores in both appetitive learning and aversive learning. The results indicate that octopamine neurons mediate appetitive reinforcement via OA1 and that dopamine neurons mediate aversive reinforcement via Dop1 in crickets, providing decisive evidence that neurotransmitters and receptors that mediate appetitive reinforcement indeed differ among different species of insects.
Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis.

PubMed

Glimcher, Paul W

2011-09-13

A number of recent advances have been achieved in the study of midbrain dopaminergic neurons. Understanding these advances and how they relate to one another requires a deep understanding of the computational models that serve as an explanatory framework and guide ongoing experimental inquiry. This intertwining of theory and experiment now suggests very clearly that the phasic activity of the midbrain dopamine neurons provides a global mechanism for synaptic modification. These synaptic modifications, in turn, provide the mechanistic underpinning for a specific class of reinforcement learning mechanisms that now seem to underlie much of human and animal behavior. This review describes both the critical empirical findings that are at the root of this conclusion and the fantastic theoretical advances from which this conclusion is drawn.
Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis

PubMed Central

Glimcher, Paul W.

2011-01-01

A number of recent advances have been achieved in the study of midbrain dopaminergic neurons. Understanding these advances and how they relate to one another requires a deep understanding of the computational models that serve as an explanatory framework and guide ongoing experimental inquiry. This intertwining of theory and experiment now suggests very clearly that the phasic activity of the midbrain dopamine neurons provides a global mechanism for synaptic modification. These synaptic modifications, in turn, provide the mechanistic underpinning for a specific class of reinforcement learning mechanisms that now seem to underlie much of human and animal behavior. This review describes both the critical empirical findings that are at the root of this conclusion and the fantastic theoretical advances from which this conclusion is drawn. PMID:21389268
Longitudinal investigation on learned helplessness tested under negative and positive reinforcement involving stimulus control.

PubMed

Oliveira, Emileane C; Hunziker, Maria Helena

2014-07-01

In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title. Copyright © 2014. Published by Elsevier B.V.

Adaptive Fuzzy Systems in Computational Intelligence

NASA Technical Reports Server (NTRS)

Berenji, Hamid R.

1996-01-01

In recent years, the interest in computational intelligence techniques, which currently includes neural networks, fuzzy systems, and evolutionary programming, has grown significantly and a number of their applications have been developed in the government and industry. In future, an essential element in these systems will be fuzzy systems that can learn from experience by using neural network in refining their performances. The GARIC architecture, introduced earlier, is an example of a fuzzy reinforcement learning system which has been applied in several control domains such as cart-pole balancing, simulation of to Space Shuttle orbital operations, and tether control. A number of examples from GARIC's applications in these domains will be demonstrated.
Neural mechanisms of reinforcement learning in unmedicated patients with major depressive disorder.

PubMed

Rothkirch, Marcus; Tonn, Jonas; Köhler, Stephan; Sterzer, Philipp

2017-04-01

According to current concepts, major depressive disorder is strongly related to dysfunctional neural processing of motivational information, entailing impairments in reinforcement learning. While computational modelling can reveal the precise nature of neural learning signals, it has not been used to study learning-related neural dysfunctions in unmedicated patients with major depressive disorder so far. We thus aimed at comparing the neural coding of reward and punishment prediction errors, representing indicators of neural learning-related processes, between unmedicated patients with major depressive disorder and healthy participants. To this end, a group of unmedicated patients with major depressive disorder (n = 28) and a group of age- and sex-matched healthy control participants (n = 30) completed an instrumental learning task involving monetary gains and losses during functional magnetic resonance imaging. The two groups did not differ in their learning performance. Patients and control participants showed the same level of prediction error-related activity in the ventral striatum and the anterior insula. In contrast, neural coding of reward prediction errors in the medial orbitofrontal cortex was reduced in patients. Moreover, neural reward prediction error signals in the medial orbitofrontal cortex and ventral striatum showed negative correlations with anhedonia severity. Using a standard instrumental learning paradigm we found no evidence for an overall impairment of reinforcement learning in medication-free patients with major depressive disorder. Importantly, however, the attenuated neural coding of reward in the medial orbitofrontal cortex and the relation between anhedonia and reduced reward prediction error-signalling in the medial orbitofrontal cortex and ventral striatum likely reflect an impairment in experiencing pleasure from rewarding events as a key mechanism of anhedonia in major depressive disorder. © The Author (2017). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Rational and mechanistic perspectives on reinforcement learning.

PubMed

Chater, Nick

2009-12-01

This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: mechanistic and rational. Reinforcement learning is often viewed in mechanistic terms--as describing the operation of aspects of an agent's cognitive and neural machinery. Yet it can also be viewed as a rational level of description, specifically, as describing a class of methods for learning from experience, using minimal background knowledge. This paper considers how rational and mechanistic perspectives differ, and what types of evidence distinguish between them. Reinforcement learning research in the cognitive and brain sciences is often implicitly committed to the mechanistic interpretation. Here the opposite view is put forward: that accounts of reinforcement learning should apply at the rational level, unless there is strong evidence for a mechanistic interpretation. Implications of this viewpoint for reinforcement-based theories in the cognitive and brain sciences are discussed.
Stress affects instrumental learning based on positive or negative reinforcement in interaction with personality in domestic horses

PubMed Central

Valenchon, Mathilde; Lévy, Frédéric; Moussu, Chantal; Lansade, Léa

2017-01-01

The present study investigated how stress affects instrumental learning performance in horses (Equus caballus) depending on the type of reinforcement. Horses were assigned to four groups (N = 15 per group); each group received training with negative or positive reinforcement in the presence or absence of stressors unrelated to the learning task. The instrumental learning task consisted of the horse entering one of two compartments at the appearance of a visual signal given by the experimenter. In the absence of stressors unrelated to the task, learning performance did not differ between negative and positive reinforcements. The presence of stressors unrelated to the task (exposure to novel and sudden stimuli) impaired learning performance. Interestingly, this learning deficit was smaller when the negative reinforcement was used. The negative reinforcement, considered as a stressor related to the task, could have counterbalanced the impact of the extrinsic stressor by focusing attention toward the learning task. In addition, learning performance appears to differ between certain dimensions of personality depending on the presence of stressors and the type of reinforcement. These results suggest that when negative reinforcement is used (i.e. stressor related to the task), the most fearful horses may be the best performers in the absence of stressors but the worst performers when stressors are present. On the contrary, when positive reinforcement is used, the most fearful horses appear to be consistently the worst performers, with and without exposure to stressors unrelated to the learning task. This study is the first to demonstrate in ungulates that stress affects learning performance differentially according to the type of reinforcement and in interaction with personality. It provides fundamental and applied perspectives in the understanding of the relationships between personality and training abilities. PMID:28475581
Multiagent Reinforcement Learning With Sparse Interactions by Negotiation and Knowledge Transfer.

PubMed

Zhou, Luowei; Yang, Pei; Chen, Chunlin; Gao, Yang

2017-05-01

Reinforcement learning has significant applications for multiagent systems, especially in unknown dynamic environments. However, most multiagent reinforcement learning (MARL) algorithms suffer from such problems as exponential computation complexity in the joint state-action space, which makes it difficult to scale up to realistic multiagent problems. In this paper, a novel algorithm named negotiation-based MARL with sparse interactions (NegoSIs) is presented. In contrast to traditional sparse-interaction-based MARL algorithms, NegoSI adopts the equilibrium concept and makes it possible for agents to select the nonstrict equilibrium-dominating strategy profile (nonstrict EDSP) or meta equilibrium for their joint actions. The presented NegoSI algorithm consists of four parts: 1) the equilibrium-based framework for sparse interactions; 2) the negotiation for the equilibrium set; 3) the minimum variance method for selecting one joint action; and 4) the knowledge transfer of local Q -values. In this integrated algorithm, three techniques, i.e., unshared value functions, equilibrium solutions, and sparse interactions are adopted to achieve privacy protection, better coordination and lower computational complexity, respectively. To evaluate the performance of the presented NegoSI algorithm, two groups of experiments are carried out regarding three criteria: 1) steps of each episode; 2) rewards of each episode; and 3) average runtime. The first group of experiments is conducted using six grid world games and shows fast convergence and high scalability of the presented algorithm. Then in the second group of experiments NegoSI is applied to an intelligent warehouse problem and simulated results demonstrate the effectiveness of the presented NegoSI algorithm compared with other state-of-the-art MARL algorithms.
Melding Environmental Education and Creative Learning in Elementary and Middle-school Settings

NASA Astrophysics Data System (ADS)

Jain, S.; Baker, T.; Crofton-Macdonald, J.; Scott, M.

2017-12-01

Teaching environmental topics, such as sustainability and ecosystem management, to students through the lens of computational thinking provides unique educational opportunities. Environmental topics are an excellent source for multidisciplinary learning, as questions concerning human well-being, environmental policy, science, and mathematics can naturally be incorporated into educational discussions and activities. The use of computational modeling allows students to critically reason about and explore environmental concepts by envisioning complexity, and asking and investigating a series of "what if" questions. Students can furthermore reflect on their own relationship with their local ecology. For the past five years, we have tested and developed activities for middle school students. Through in-class activities, workshop, and summer clubs, we have explored these ideas. We plan to present examples from our work and a tentative framework for a new approach to environmental education, one reinforced by computational thinking and creative learning.
Dimensional psychiatry: mental disorders as dysfunctions of basic learning mechanisms.

PubMed

Heinz, Andreas; Schlagenhauf, Florian; Beck, Anne; Wackerhagen, Carolin

2016-08-01

It has been questioned that the more than 300 mental disorders currently listed in international disease classification systems all have a distinct neurobiological correlate. Here, we support the idea that basic dimensions of mental dysfunctions, such as alterations in reinforcement learning, can be identified, which interact with individual vulnerability and psychosocial stress factors and, thus, contribute to syndromes of distress across traditional nosological boundaries. We further suggest that computational modeling of learning behavior can help to identify specific alterations in reinforcement-based decision-making and their associated neurobiological correlates. For example, attribution of salience to drug-related cues associated with dopamine dysfunction in addiction can increase habitual decision-making via promotion of Pavlovian-to-instrumental transfer as indicated by computational modeling of the effect of Pavlovian-conditioned stimuli (here affectively positive or alcohol-related cues) on instrumental approach and avoidance behavior. In schizophrenia, reward prediction errors can be modeled computationally and associated with functional brain activation, thus revealing reduced encoding of such learning signals in the ventral striatum and compensatory activation in the frontal cortex. With respect to negative mood states, it has been shown that both reduced functional activation of the ventral striatum elicited by reward-predicting stimuli and stress-associated activation of the hypothalamic-pituitary-adrenal axis in interaction with reduced serotonin transporter availability and increased amygdala activation by aversive cues contribute to clinical depression; altogether these observations support the notion that basic learning mechanisms, such as Pavlovian and instrumental conditioning and Pavlovian-to-instrumental transfer, represent a basic dimension of mental disorders that can be mechanistically characterized using computational modeling and associated with specific clinical syndromes across established nosological boundaries. Instead of pursuing a narrow focus on single disorders defined by clinical tradition, we suggest that neurobiological research should focus on such basic dimensions, which can be studied in and compared among several mental disorders.
A new computational account of cognitive control over reinforcement-based decision-making: Modeling of a probabilistic learning task.

PubMed

Zendehrouh, Sareh

2015-11-01

Recent work on decision-making field offers an account of dual-system theory for decision-making process. This theory holds that this process is conducted by two main controllers: a goal-directed system and a habitual system. In the reinforcement learning (RL) domain, the habitual behaviors are connected with model-free methods, in which appropriate actions are learned through trial-and-error experiences. However, goal-directed behaviors are associated with model-based methods of RL, in which actions are selected using a model of the environment. Studies on cognitive control also suggest that during processes like decision-making, some cortical and subcortical structures work in concert to monitor the consequences of decisions and to adjust control according to current task demands. Here a computational model is presented based on dual system theory and cognitive control perspective of decision-making. The proposed model is used to simulate human performance on a variant of probabilistic learning task. The basic proposal is that the brain implements a dual controller, while an accompanying monitoring system detects some kinds of conflict including a hypothetical cost-conflict one. The simulation results address existing theories about two event-related potentials, namely error related negativity (ERN) and feedback related negativity (FRN), and explore the best account of them. Based on the results, some testable predictions are also presented. Copyright © 2015 Elsevier Ltd. All rights reserved.
Computational Dysfunctions in Anxiety: Failure to Differentiate Signal From Noise.

PubMed

Huang, He; Thompson, Wesley; Paulus, Martin P

2017-09-15

Differentiating whether an action leads to an outcome by chance or by an underlying statistical regularity that signals environmental change profoundly affects adaptive behavior. Previous studies have shown that anxious individuals may not appropriately differentiate between these situations. This investigation aims to precisely quantify the process deficit in anxious individuals and determine the degree to which these process dysfunctions are specific to anxiety. One hundred twenty-two subjects recruited as part of an ongoing large clinical population study completed a change point detection task. Reinforcement learning models were used to explicate observed behavioral differences in low anxiety (Overall Anxiety Severity and Impairment Scale score ≤ 8) and high anxiety (Overall Anxiety Severity and Impairment Scale score ≥ 9) groups. High anxiety individuals used a suboptimal decision strategy characterized by a higher lose-shift rate. Computational models and simulations revealed that this difference was related to a higher base learning rate. These findings are better explained in a context-dependent reinforcement learning model. Anxious subjects' exaggerated response to uncertainty leads to a suboptimal decision strategy that makes it difficult for these individuals to determine whether an action is associated with an outcome by chance or by some statistical regularity. These findings have important implications for developing new behavioral intervention strategies using learning models. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
"Notice of Violation of IEEE Publication Principles" Multiobjective Reinforcement Learning: A Comprehensive Overview.

PubMed

Liu, Chunming; Xu, Xin; Hu, Dewen

2013-04-29

Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.
Learning Gains and Response to Digital Lessons on Soil Genesis and Development

ERIC Educational Resources Information Center

Mamo, Martha; Ippolito, James A.; Kettler, Timothy A.; Reuter, Ronald; McCallister, Dennis; Morner, Patricia; Husmann, Dann; Blankenship, Erin

2011-01-01

Evolving computer technology is offering opportunities for new online approaches in teaching methods and delivery. Well-designed web-based (online) lessons should reinforce the critical need of the soil science discipline in today's food, energy, and environmental issues, as well as meet the needs of the diverse clientele with interest in…
Pour une pedagogie integree du code oral et du code ecrit (Toward a Pedagogy Integrating Oral and Written Codes).

ERIC Educational Resources Information Center

Guillen-Diaz, Carmen

1990-01-01

A classroom approach that brings oral and written language learning closer together is outlined. The strategy focuses on proper pronunciation using minimal pairs and uses exercises designed for listening and visualization, production, discrimination, re-use and reinforcement, and computer-assisted instruction. (MSE)
Early Childhood Educational Software: Specific Features and Issues of Localization

ERIC Educational Resources Information Center

Nikolopoulou, Kleopatra

2007-01-01

The computer has now become a recognized tool in the education of young children and when used appropriately can reinforce their learning experiences. This paper reviews specific features (relating to pedagogic design, software content and user-interface design) of early childhood educational software and discusses issues in favor of its…
Unified-theory-of-reinforcement neural networks do not simulate the blocking effect.

PubMed

Calvin, Nicholas T; J McDowell, J

2015-11-01

For the last 20 years the unified theory of reinforcement (Donahoe et al., 1993) has been used to develop computer simulations to evaluate its plausibility as an account for behavior. The unified theory of reinforcement states that operant and respondent learning occurs via the same neural mechanisms. As part of a larger project to evaluate the operant behavior predicted by the theory, this project was the first replication of neural network models based on the unified theory of reinforcement. In the process of replicating these neural network models it became apparent that a previously published finding, namely, that the networks simulate the blocking phenomenon (Donahoe et al., 1993), was a misinterpretation of the data. We show that the apparent blocking produced by these networks is an artifact of the inability of these networks to generate the same conditioned response to multiple stimuli. The piecemeal approach to evaluate the unified theory of reinforcement via simulation is critiqued and alternatives are discussed. Copyright © 2015 Elsevier B.V. All rights reserved.
Navigating complex decision spaces: Problems and paradigms in sequential choice

PubMed Central

Walsh, Matthew M.; Anderson, John R.

2015-01-01

To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult when the consequences of an action follow a delay. This introduces the problem of temporal credit assignment. When feedback follows a sequence of decisions, how should the individual assign credit to the intermediate actions that comprise the sequence? Research in reinforcement learning provides two general solutions to this problem: model-free reinforcement learning and model-based reinforcement learning. In this review, we examine connections between stimulus-response and cognitive learning theories, habitual and goal-directed control, and model-free and model-based reinforcement learning. We then consider a range of problems related to temporal credit assignment. These include second-order conditioning and secondary reinforcers, latent learning and detour behavior, partially observable Markov decision processes, actions with distributed outcomes, and hierarchical learning. We ask whether humans and animals, when faced with these problems, behave in a manner consistent with reinforcement learning techniques. Throughout, we seek to identify neural substrates of model-free and model-based reinforcement learning. The former class of techniques is understood in terms of the neurotransmitter dopamine and its effects in the basal ganglia. The latter is understood in terms of a distributed network of regions including the prefrontal cortex, medial temporal lobes cerebellum, and basal ganglia. Not only do reinforcement learning techniques have a natural interpretation in terms of human and animal behavior, but they also provide a useful framework for understanding neural reward valuation and action selection. PMID:23834192
Model-Based Reinforcement Learning under Concurrent Schedules of Reinforcement in Rodents

ERIC Educational Resources Information Center

Huh, Namjung; Jo, Suhyun; Kim, Hoseok; Sul, Jung Hoon; Jung, Min Whan

2009-01-01

Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's…
Spatial-Temporal Reasoning Applications of Computational Intelligence in the Game of Go and Computer Networks

DTIC Science & Technology

2012-01-01

dimensionality, Tesauro used a backpropagation- based , three-layer neural network and implemented the outcome from a self-play game as the reinforcement signal...a school of fish, flock of birds, and colony of ants. Our literature review reveals that no one has used PSO to train the neural network ...trained with a variant of PSO called cellular PSO (CPSO). CSRN is a supervised learning neural network (SLNN). The proposed algorithm for the
Computational psychiatry

PubMed Central

Montague, P. Read; Dolan, Raymond J.; Friston, Karl J.; Dayan, Peter

2013-01-01

Computational ideas pervade many areas of science and have an integrative explanatory role in neuroscience and cognitive science. However, computational depictions of cognitive function have had surprisingly little impact on the way we assess mental illness because diseases of the mind have not been systematically conceptualized in computational terms. Here, we outline goals and nascent efforts in the new field of computational psychiatry, which seeks to characterize mental dysfunction in terms of aberrant computations over multiple scales. We highlight early efforts in this area that employ reinforcement learning and game theoretic frameworks to elucidate decision-making in health and disease. Looking forwards, we emphasize a need for theory development and large-scale computational phenotyping in human subjects. PMID:22177032
GA-based fuzzy reinforcement learning for control of a magnetic bearing system.

PubMed

Lin, C T; Jou, C P

2000-01-01

This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.
Neural correlates of strategic reasoning during competitive games.

PubMed

Seo, Hyojung; Cai, Xinying; Donahue, Christopher H; Lee, Daeyeol

2014-10-17

Although human and animal behaviors are largely shaped by reinforcement and punishment, choices in social settings are also influenced by information about the knowledge and experience of other decision-makers. During competitive games, monkeys increased their payoffs by systematically deviating from a simple heuristic learning algorithm and thereby countering the predictable exploitation by their computer opponent. Neurons in the dorsomedial prefrontal cortex (dmPFC) signaled the animal's recent choice and reward history that reflected the computer's exploitative strategy. The strength of switching signals in the dmPFC also correlated with the animal's tendency to deviate from the heuristic learning algorithm. Therefore, the dmPFC might provide control signals for overriding simple heuristic learning algorithms based on the inferred strategies of the opponent. Copyright © 2014, American Association for the Advancement of Science.

Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning.

PubMed

Fernandez-Gauna, Borja; Etxeberria-Agiriano, Ismael; Graña, Manuel

2015-01-01

Multi-Agent Reinforcement Learning (MARL) algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL) algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV) in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS) in the relevant state-action subspace. Each agent's local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS) control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.
Computational Psychiatry: towards a mathematically informed understanding of mental illness

PubMed Central

Huys, Quentin J M; Roiser, Jonathan P

2016-01-01

Computational Psychiatry aims to describe the relationship between the brain's neurobiology, its environment and mental symptoms in computational terms. In so doing, it may improve psychiatric classification and the diagnosis and treatment of mental illness. It can unite many levels of description in a mechanistic and rigorous fashion, while avoiding biological reductionism and artificial categorisation. We describe how computational models of cognition can infer the current state of the environment and weigh up future actions, and how these models provide new perspectives on two example disorders, depression and schizophrenia. Reinforcement learning describes how the brain can choose and value courses of actions according to their long-term future value. Some depressive symptoms may result from aberrant valuations, which could arise from prior beliefs about the loss of agency (‘helplessness’), or from an inability to inhibit the mental exploration of aversive events. Predictive coding explains how the brain might perform Bayesian inference about the state of its environment by combining sensory data with prior beliefs, each weighted according to their certainty (or precision). Several cortical abnormalities in schizophrenia might reduce precision at higher levels of the inferential hierarchy, biasing inference towards sensory data and away from prior beliefs. We discuss whether striatal hyperdopaminergia might have an adaptive function in this context, and also how reinforcement learning and incentive salience models may shed light on the disorder. Finally, we review some of Computational Psychiatry's applications to neurological disorders, such as Parkinson's disease, and some pitfalls to avoid when applying its methods. PMID:26157034
11.2 YIP Human In the Loop Statistical RelationalLearners

DTIC Science & Technology

2017-10-23

learning formalisms including inverse reinforcement learning [4] and statistical relational learning [7, 5, 8]. We have also applied our algorithms in...one introduced for label preferences. 4 Figure 2: Active Advice Seeking for Inverse Reinforcement Learning. active advice seeking is in selecting the...learning tasks. 1.2.1 Sequential Decision-Making Our previous work on advice for inverse reinforcement learning (IRL) defined advice as action
Learning to Obtain Reward, but Not Avoid Punishment, Is Affected by Presence of PTSD Symptoms in Male Veterans: Empirical Data and Computational Model

PubMed Central

Myers, Catherine E.; Moustafa, Ahmed A.; Sheynin, Jony; VanMeenen, Kirsten M.; Gilbertson, Mark W.; Orr, Scott P.; Beck, Kevin D.; Pang, Kevin C. H.; Servatius, Richard J.

2013-01-01

Post-traumatic stress disorder (PTSD) symptoms include behavioral avoidance which is acquired and tends to increase with time. This avoidance may represent a general learning bias; indeed, individuals with PTSD are often faster than controls on acquiring conditioned responses based on physiologically-aversive feedback. However, it is not clear whether this learning bias extends to cognitive feedback, or to learning from both reward and punishment. Here, male veterans with self-reported current, severe PTSD symptoms (PTSS group) or with few or no PTSD symptoms (control group) completed a probabilistic classification task that included both reward-based and punishment-based trials, where feedback could take the form of reward, punishment, or an ambiguous “no-feedback” outcome that could signal either successful avoidance of punishment or failure to obtain reward. The PTSS group outperformed the control group in total points obtained; the PTSS group specifically performed better than the control group on reward-based trials, with no difference on punishment-based trials. To better understand possible mechanisms underlying observed performance, we used a reinforcement learning model of the task, and applied maximum likelihood estimation techniques to derive estimated parameters describing individual participants’ behavior. Estimations of the reinforcement value of the no-feedback outcome were significantly greater in the control group than the PTSS group, suggesting that the control group was more likely to value this outcome as positively reinforcing (i.e., signaling successful avoidance of punishment). This is consistent with the control group’s generally poorer performance on reward trials, where reward feedback was to be obtained in preference to the no-feedback outcome. Differences in the interpretation of ambiguous feedback may contribute to the facilitated reinforcement learning often observed in PTSD patients, and may in turn provide new insight into how pathological behaviors are acquired and maintained in PTSD. PMID:24015254
Adaptive critic autopilot design of bank-to-turn missiles using fuzzy basis function networks.

PubMed

Lin, Chuan-Kai

2005-04-01

A new adaptive critic autopilot design for bank-to-turn missiles is presented. In this paper, the architecture of adaptive critic learning scheme contains a fuzzy-basis-function-network based associative search element (ASE), which is employed to approximate nonlinear and complex functions of bank-to-turn missiles, and an adaptive critic element (ACE) generating the reinforcement signal to tune the associative search element. In the design of the adaptive critic autopilot, the control law receives signals from a fixed gain controller, an ASE and an adaptive robust element, which can eliminate approximation errors and disturbances. Traditional adaptive critic reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error interactions with a dynamic environment, however, the proposed tuning algorithm can significantly shorten the learning time by online tuning all parameters of fuzzy basis functions and weights of ASE and ACE. Moreover, the weight updating law derived from the Lyapunov stability theory is capable of guaranteeing both tracking performance and stability. Computer simulation results confirm the effectiveness of the proposed adaptive critic autopilot.
A Biologically Plausible Architecture of the Striatum to Solve Context-Dependent Reinforcement Learning Tasks.

PubMed

Shivkumar, Sabyasachi; Muralidharan, Vignesh; Chakravarthy, V Srinivasa

2017-01-01

Basal ganglia circuit is an important subcortical system of the brain thought to be responsible for reward-based learning. Striatum, the largest nucleus of the basal ganglia, serves as an input port that maps cortical information. Microanatomical studies show that the striatum is a mosaic of specialized input-output structures called striosomes and regions of the surrounding matrix called the matrisomes. We have developed a computational model of the striatum using layered self-organizing maps to capture the center-surround structure seen experimentally and explain its functional significance. We believe that these structural components could build representations of state and action spaces in different environments. The striatum model is then integrated with other components of basal ganglia, making it capable of solving reinforcement learning tasks. We have proposed a biologically plausible mechanism of action-based learning where the striosome biases the matrisome activity toward a preferred action. Several studies indicate that the striatum is critical in solving context dependent problems. We build on this hypothesis and the proposed model exploits the modularity of the striatum to efficiently solve such tasks.
A Biologically Plausible Architecture of the Striatum to Solve Context-Dependent Reinforcement Learning Tasks

PubMed Central

Shivkumar, Sabyasachi; Muralidharan, Vignesh; Chakravarthy, V. Srinivasa

2017-01-01

Basal ganglia circuit is an important subcortical system of the brain thought to be responsible for reward-based learning. Striatum, the largest nucleus of the basal ganglia, serves as an input port that maps cortical information. Microanatomical studies show that the striatum is a mosaic of specialized input-output structures called striosomes and regions of the surrounding matrix called the matrisomes. We have developed a computational model of the striatum using layered self-organizing maps to capture the center-surround structure seen experimentally and explain its functional significance. We believe that these structural components could build representations of state and action spaces in different environments. The striatum model is then integrated with other components of basal ganglia, making it capable of solving reinforcement learning tasks. We have proposed a biologically plausible mechanism of action-based learning where the striosome biases the matrisome activity toward a preferred action. Several studies indicate that the striatum is critical in solving context dependent problems. We build on this hypothesis and the proposed model exploits the modularity of the striatum to efficiently solve such tasks. PMID:28680395
Alterations in choice behavior by manipulations of world model.

PubMed

Green, C S; Benson, C; Kersten, D; Schrater, P

2010-09-14

How to compute initially unknown reward values makes up one of the key problems in reinforcement learning theory, with two basic approaches being used. Model-free algorithms rely on the accumulation of substantial amounts of experience to compute the value of actions, whereas in model-based learning, the agent seeks to learn the generative process for outcomes from which the value of actions can be predicted. Here we show that (i) "probability matching"-a consistent example of suboptimal choice behavior seen in humans-occurs in an optimal Bayesian model-based learner using a max decision rule that is initialized with ecologically plausible, but incorrect beliefs about the generative process for outcomes and (ii) human behavior can be strongly and predictably altered by the presence of cues suggestive of various generative processes, despite statistically identical outcome generation. These results suggest human decision making is rational and model based and not consistent with model-free learning.
Alterations in choice behavior by manipulations of world model

PubMed Central

Green, C. S.; Benson, C.; Kersten, D.; Schrater, P.

2010-01-01

How to compute initially unknown reward values makes up one of the key problems in reinforcement learning theory, with two basic approaches being used. Model-free algorithms rely on the accumulation of substantial amounts of experience to compute the value of actions, whereas in model-based learning, the agent seeks to learn the generative process for outcomes from which the value of actions can be predicted. Here we show that (i) “probability matching”—a consistent example of suboptimal choice behavior seen in humans—occurs in an optimal Bayesian model-based learner using a max decision rule that is initialized with ecologically plausible, but incorrect beliefs about the generative process for outcomes and (ii) human behavior can be strongly and predictably altered by the presence of cues suggestive of various generative processes, despite statistically identical outcome generation. These results suggest human decision making is rational and model based and not consistent with model-free learning. PMID:20805507
Homeostatic reinforcement learning for integrating reward collection and physiological stability.

PubMed

Keramati, Mehdi; Gutkin, Boris

2014-12-02

Efficient regulation of internal homeostasis and defending it against perturbations requires adaptive behavioral strategies. However, the computational principles mediating the interaction between homeostatic and associative learning processes remain undefined. Here we use a definition of primary rewards, as outcomes fulfilling physiological needs, to build a normative theory showing how learning motivated behaviors may be modulated by internal states. Within this framework, we mathematically prove that seeking rewards is equivalent to the fundamental objective of physiological stability, defining the notion of physiological rationality of behavior. We further suggest a formal basis for temporal discounting of rewards by showing that discounting motivates animals to follow the shortest path in the space of physiological variables toward the desired setpoint. We also explain how animals learn to act predictively to preclude prospective homeostatic challenges, and several other behavioral patterns. Finally, we suggest a computational role for interaction between hypothalamus and the brain reward system.
Gaze data reveal distinct choice processes underlying model-based and model-free reinforcement learning

PubMed Central

Konovalov, Arkady; Krajbich, Ian

2016-01-01

Organisms appear to learn and make decisions using different strategies known as model-free and model-based learning; the former is mere reinforcement of previously rewarded actions and the latter is a forward-looking strategy that involves evaluation of action-state transition probabilities. Prior work has used neural data to argue that both model-based and model-free learners implement a value comparison process at trial onset, but model-based learners assign more weight to forward-looking computations. Here using eye-tracking, we report evidence for a different interpretation of prior results: model-based subjects make their choices prior to trial onset. In contrast, model-free subjects tend to ignore model-based aspects of the task and instead seem to treat the decision problem as a simple comparison process between two differentially valued items, consistent with previous work on sequential-sampling models of decision making. These findings illustrate a problem with assuming that experimental subjects make their decisions at the same prescribed time. PMID:27511383
A clustering-based graph Laplacian framework for value function approximation in reinforcement learning.

PubMed

Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold

2014-12-01

In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.
A reinforcement learning model of joy, distress, hope and fear

NASA Astrophysics Data System (ADS)

Broekens, Joost; Jacobs, Elmer; Jonker, Catholijn M.

2015-07-01

In this paper we computationally study the relation between adaptive behaviour and emotion. Using the reinforcement learning framework, we propose that learned state utility, ?, models fear (negative) and hope (positive) based on the fact that both signals are about anticipation of loss or gain. Further, we propose that joy/distress is a signal similar to the error signal. We present agent-based simulation experiments that show that this model replicates psychological and behavioural dynamics of emotion. This work distinguishes itself by assessing the dynamics of emotion in an adaptive agent framework - coupling it to the literature on habituation, development, extinction and hope theory. Our results support the idea that the function of emotion is to provide a complex feedback signal for an organism to adapt its behaviour. Our work is relevant for understanding the relation between emotion and adaptation in animals, as well as for human-robot interaction, in particular how emotional signals can be used to communicate between adaptive agents and humans.
Effects of dopamine on reinforcement learning and consolidation in Parkinson's disease.

PubMed

Grogan, John P; Tsivos, Demitra; Smith, Laura; Knight, Brogan E; Bogacz, Rafal; Whone, Alan; Coulthard, Elizabeth J

2017-07-10

Emerging evidence suggests that dopamine may modulate learning and memory with important implications for understanding the neurobiology of memory and future therapeutic targeting. An influential hypothesis posits that dopamine biases reinforcement learning. More recent data also suggest an influence during both consolidation and retrieval. Eighteen Parkinson's disease patients learned through feedback ON or OFF medication, with memory tested 24 hr later ON or OFF medication (4 conditions, within-subjects design with matched healthy control group). Patients OFF medication during learning decreased in memory accuracy over the following 24 hr. In contrast to previous studies, however, dopaminergic medication during learning and testing did not affect expression of positive or negative reinforcement. Two further experiments were run without the 24 hr delay, but they too failed to reproduce effects of dopaminergic medication on reinforcement learning. While supportive of a dopaminergic role in consolidation, this study failed to replicate previous findings on reinforcement learning.
Prespeech motor learning in a neural network using reinforcement.

PubMed

Warlaumont, Anne S; Westermann, Gert; Buder, Eugene H; Oller, D Kimbrough

2013-02-01

Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one's language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the different conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network's post-learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network's post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model's post-learning productions were more likely to resemble the English vowels and vice versa. Copyright © 2012 Elsevier Ltd. All rights reserved.
Social reinforcement can regulate localized brain activity.

PubMed

Mathiak, Krystyna A; Koush, Yury; Dyck, Miriam; Gaber, Tilman J; Alawi, Eliza; Zepf, Florian D; Zvyagintsev, Mikhail; Mathiak, Klaus

2010-11-01

Social learning is essential for adaptive behavior in humans. Neurofeedback based on functional magnetic resonance imaging (fMRI) trains control over localized brain activity. It can disentangle learning processes at the neural level and thus investigate the mechanisms of operant conditioning with explicit social reinforcers. In a pilot study, a computer-generated face provided a positive feedback (smiling) when activity in the anterior cingulate cortex (ACC) increased and gradually returned to a neutral expression when the activity dropped. One female volunteer without previous experience in fMRI underwent training based on a social reinforcer. Directly before and after the neurofeedback runs, neural responses to a cognitive interference task (Simon task) were recorded. We observed a significant increase in activity within ACC during the neurofeedback blocks, correspondent with the a-priori defined anatomical region of interest. In the course of the neurofeedback training, the subject learned to regulate ACC activity and could maintain the control even without direct feedback. Moreover, ACC was activated significantly stronger during Simon task after the neurofeedback training when compared to before. Localized brain activity can be controlled by social reward. The increased ACC activity transferred to a cognitive task with the potential to reduce cognitive interference. Systematic studies are required to explore long-term effects on social behavior and clinical applications.
Reconciling Reinforcement Learning Models with Behavioral Extinction and Renewal: Implications for Addiction, Relapse, and Problem Gambling

ERIC Educational Resources Information Center

Redish, A. David; Jensen, Steve; Johnson, Adam; Kurth-Nelson, Zeb

2007-01-01

Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated value and are thus unable to capture renewal. TDRL…
Intrinsic motivation, curiosity, and learning: Theory and applications in educational technologies.

PubMed

Oudeyer, P-Y; Gottlieb, J; Lopes, M

2016-01-01

This chapter studies the bidirectional causal interactions between curiosity and learning and discusses how understanding these interactions can be leveraged in educational technology applications. First, we review recent results showing how state curiosity, and more generally the experience of novelty and surprise, can enhance learning and memory retention. Then, we discuss how psychology and neuroscience have conceptualized curiosity and intrinsic motivation, studying how the brain can be intrinsically rewarded by novelty, complexity, or other measures of information. We explain how the framework of computational reinforcement learning can be used to model such mechanisms of curiosity. Then, we discuss the learning progress (LP) hypothesis, which posits a positive feedback loop between curiosity and learning. We outline experiments with robots that show how LP-driven attention and exploration can self-organize a developmental learning curriculum scaffolding efficient acquisition of multiple skills/tasks. Finally, we discuss recent work exploiting these conceptual and computational models in educational technologies, showing in particular how intelligent tutoring systems can be designed to foster curiosity and learning. © 2016 Elsevier B.V. All rights reserved.
The Effect of Context Change on Simple Acquisition Disappears with Increased Training

ERIC Educational Resources Information Center

Leon, Samuel P.; Abad, Maria J. F.; Rosas, Juan M.

2010-01-01

The goal of this experiment was to assess the impact that experience with a task has on the context specificity of the learning that occurs. Participants performed an instrumental task within a computer game where different responses were performed in the presence of discriminative stimuli to obtain reinforcers. The number of training trials (3,…
Use of Short Podcasts to Reinforce Learning Outcomes in Biology

ERIC Educational Resources Information Center

Aguiar, Cristina; Carvalho, Ana Amelia; Carvalho, Carla Joana

2009-01-01

Podcasts are audio or video files which can be automatically downloaded to one's computer when the episodes become available, then later transferred to a portable player for listening. The technology thereby enables the user to listen to and/or watch the content anywhere at any time. Formerly popular as radio shows, podcasting was rapidly explored…

Use of Frontal Lobe Hemodynamics as Reinforcement Signals to an Adaptive Controller

PubMed Central

DiStasio, Marcello M.; Francis, Joseph T.

2013-01-01

Decision-making ability in the frontal lobe (among other brain structures) relies on the assignment of value to states of the animal and its environment. Then higher valued states can be pursued and lower (or negative) valued states avoided. The same principle forms the basis for computational reinforcement learning controllers, which have been fruitfully applied both as models of value estimation in the brain, and as artificial controllers in their own right. This work shows how state desirability signals decoded from frontal lobe hemodynamics, as measured with near-infrared spectroscopy (NIRS), can be applied as reinforcers to an adaptable artificial learning agent in order to guide its acquisition of skills. A set of experiments carried out on an alert macaque demonstrate that both oxy- and deoxyhemoglobin concentrations in the frontal lobe show differences in response to both primarily and secondarily desirable (versus undesirable) stimuli. This difference allows a NIRS signal classifier to serve successfully as a reinforcer for an adaptive controller performing a virtual tool-retrieval task. The agent's adaptability allows its performance to exceed the limits of the NIRS classifier decoding accuracy. We also show that decoding state desirabilities is more accurate when using relative concentrations of both oxyhemoglobin and deoxyhemoglobin, rather than either species alone. PMID:23894500
Orbitofrontal Dopamine Depletion Upregulates Caudate Dopamine and Alters Behavior via Changes in Reinforcement Sensitivity

PubMed Central

Cardinal, R. N.; Rygula, R.; Hong, Y. T.; Fryer, T. D.; Sawiak, S. J.; Ferrari, V.; Cockcroft, G.; Aigbirhio, F. I.; Robbins, T. W.; Roberts, A. C.

2014-01-01

Schizophrenia is associated with upregulation of dopamine (DA) release in the caudate nucleus. The caudate has dense connections with the orbitofrontal cortex (OFC) via the frontostriatal loops, and both areas exhibit pathophysiological change in schizophrenia. Despite evidence that abnormalities in dopaminergic neurotransmission and prefrontal cortex function co-occur in schizophrenia, the influence of OFC DA on caudate DA and reinforcement processing is poorly understood. To test the hypothesis that OFC dopaminergic dysfunction disrupts caudate dopamine function, we selectively depleted dopamine from the OFC of marmoset monkeys and measured striatal extracellular dopamine levels (using microdialysis) and dopamine D2/D3 receptor binding (using positron emission tomography), while modeling reinforcement-related behavior in a discrimination learning paradigm. OFC dopamine depletion caused an increase in tonic dopamine levels in the caudate nucleus and a corresponding reduction in D2/D3 receptor binding. Computational modeling of behavior showed that the lesion increased response exploration, reducing the tendency to persist with a recently chosen response side. This effect is akin to increased response switching previously seen in schizophrenia and was correlated with striatal but not OFC D2/D3 receptor binding. These results demonstrate that OFC dopamine depletion is sufficient to induce striatal hyperdopaminergia and changes in reinforcement learning relevant to schizophrenia. PMID:24872570
Psychopathy-related traits and the use of reward and social information: a computational approach

PubMed Central

Brazil, Inti A.; Hunt, Laurence T.; Bulten, Berend H.; Kessels, Roy P. C.; de Bruijn, Ellen R. A.; Mars, Rogier B.

2013-01-01

Psychopathy is often linked to disturbed reinforcement-guided adaptation of behavior in both clinical and non-clinical populations. Recent work suggests that these disturbances might be due to a deficit in actively using information to guide changes in behavior. However, how much information is actually used to guide behavior is difficult to observe directly. Therefore, we used a computational model to estimate the use of information during learning. Thirty-six female subjects were recruited based on their total scores on the Psychopathic Personality Inventory (PPI), a self-report psychopathy list, and performed a task involving simultaneous learning of reward-based and social information. A Bayesian reinforcement-learning model was used to parameterize the use of each source of information during learning. Subsequently, we used the subscales of the PPI to assess psychopathy-related traits, and the traits that were strongly related to the model's parameters were isolated through a formal variable selection procedure. Finally, we assessed how these covaried with model parameters. We succeeded in isolating key personality traits believed to be relevant for psychopathy that can be related to model-based descriptions of subject behavior. Use of reward-history information was negatively related to levels of trait anxiety and fearlessness, whereas use of social advice decreased as the perceived ability to manipulate others and lack of anxiety increased. These results corroborate previous findings suggesting that sub-optimal use of different types of information might be implicated in psychopathy. They also further highlight the importance of considering the potential of computational modeling to understand the role of latent variables, such as the weight people give to various sources of information during goal-directed behavior, when conducting research on psychopathy-related traits and in the field of forensic psychiatry. PMID:24391615
Mobile game development: improving student engagement and motivation in introductory computing courses

NASA Astrophysics Data System (ADS)

Kurkovsky, Stan

2013-06-01

Computer games have been accepted as an engaging and motivating tool in the computer science (CS) curriculum. However, designing and implementing a playable game is challenging, and is best done in advanced courses. Games for mobile devices, on the other hand, offer the advantage of being simpler and, thus, easier to program for lower level students. Learning context of mobile game development can be used to reinforce many core programming topics, such as loops, classes, and arrays. Furthermore, it can also be used to expose students in introductory computing courses to a wide range of advanced topics in order to illustrate that CS can be much more than coding. This paper describes the author's experience with using mobile game development projects in CS I and II, how these projects were integrated into existing courses at several universities, and the lessons learned from this experience.
Interactive computer simulations of knee-replacement surgery.

PubMed

Gunther, Stephen B; Soto, Gabriel E; Colman, William W

2002-07-01

Current surgical training programs in the United States are based on an apprenticeship model. This model is outdated because it does not provide conceptual scaffolding, promote collaborative learning, or offer constructive reinforcement. Our objective was to create a more useful approach by preparing students and residents for operative cases using interactive computer simulations of surgery. Total-knee-replacement surgery (TKR) is an ideal procedure to model on the computer because there is a systematic protocol for the procedure. Also, this protocol is difficult to learn by the apprenticeship model because of the multiple instruments that must be used in a specific order. We designed an interactive computer tutorial to teach medical students and residents how to perform knee-replacement surgery. We also aimed to reinforce the specific protocol of the operative procedure. Our final goal was to provide immediate, constructive feedback. We created a computer tutorial by generating three-dimensional wire-frame models of the surgical instruments. Next, we applied a surface to the wire-frame models using three-dimensional modeling. Finally, the three-dimensional models were animated to simulate the motions of an actual TKR. The tutorial is a step-by-step tutorial that teaches and tests the correct sequence of steps in a TKR. The student or resident must select the correct instruments in the correct order. The learner is encouraged to learn the stepwise surgical protocol through repetitive use of the computer simulation. Constructive feedback is acquired through a grading system, which rates the student's or resident's ability to perform the task in the correct order. The grading system also accounts for the time required to perform the simulated procedure. We evaluated the efficacy of this teaching technique by testing medical students who learned by the computer simulation and those who learned by reading the surgical protocol manual. Both groups then performed TKR on manufactured bone models using real instruments. Their technique was graded with the standard protocol. The students who learned on the computer simulation performed the task in a shorter time and with fewer errors than the control group. They were also more engaged in the learning process. Surgical training programs generally lack a consistent approach to preoperative education related to surgical procedures. This interactive computer tutorial has allowed us to make a quantum leap in medical student and resident teaching in our orthopedic department because the students actually participate in the entire process. Our technique provides a linear, sequential method of skill acquisition and direct feedback, which is ideally suited for learning stepwise surgical protocols. Since our initial evaluation has shown the efficacy of this program, we have implemented this teaching tool into our orthopedic curriculum. Our plans for future work with this simulator include modeling procedures involving other anatomic areas of interest, such as the hip and shoulder.
Behavioral and neural properties of social reinforcement learning

PubMed Central

Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Libby, Victoria; Glover, Gary; Voss, Henning U.; Ballon, Douglas J.; Casey, BJ

2011-01-01

Social learning is critical for engaging in complex interactions with other individuals. Learning from positive social exchanges, such as acceptance from peers, may be similar to basic reinforcement learning. We formally test this hypothesis by developing a novel paradigm that is based upon work in non-human primates and human imaging studies of reinforcement learning. The probability of receiving positive social reinforcement from three distinct peers was parametrically manipulated while brain activity was recorded in healthy adults using event-related functional magnetic resonance imaging (fMRI). Over the course of the experiment, participants responded more quickly to faces of peers who provided more frequent positive social reinforcement, and rated them as more likeable. Modeling trial-by-trial learning showed ventral striatum and orbital frontal cortex activity correlated positively with forming expectations about receiving social reinforcement. Rostral anterior cingulate cortex activity tracked positively with modulations of expected value of the cues (peers). Together, the findings across three levels of analysis - social preferences, response latencies and modeling neural responses – are consistent with reinforcement learning theory and non-human primate electrophysiological studies of reward. This work highlights the fundamental influence of acceptance by one’s peers in altering subsequent behavior. PMID:21917787
The combination of appetitive and aversive reinforcers and the nature of their interaction during auditory learning.

PubMed

Ilango, A; Wetzel, W; Scheich, H; Ohl, F W

2010-03-31

Learned changes in behavior can be elicited by either appetitive or aversive reinforcers. It is, however, not clear whether the two types of motivation, (approaching appetitive stimuli and avoiding aversive stimuli) drive learning in the same or different ways, nor is their interaction understood in situations where the two types are combined in a single experiment. To investigate this question we have developed a novel learning paradigm for Mongolian gerbils, which not only allows rewards and punishments to be presented in isolation or in combination with each other, but also can use these opposite reinforcers to drive the same learned behavior. Specifically, we studied learning of tone-conditioned hurdle crossing in a shuttle box driven by either an appetitive reinforcer (brain stimulation reward) or an aversive reinforcer (electrical footshock), or by a combination of both. Combination of the two reinforcers potentiated speed of acquisition, led to maximum possible performance, and delayed extinction as compared to either reinforcer alone. Additional experiments, using partial reinforcement protocols and experiments in which one of the reinforcers was omitted after the animals had been previously trained with the combination of both reinforcers, indicated that appetitive and aversive reinforcers operated together but acted in different ways: in this particular experimental context, punishment appeared to be more effective for initial acquisition and reward more effective to maintain a high level of conditioned responses (CRs). The results imply that learning mechanisms in problem solving were maximally effective when the initial punishment of mistakes was combined with the subsequent rewarding of correct performance. Copyright 2010 IBRO. Published by Elsevier Ltd. All rights reserved.
Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers

ERIC Educational Resources Information Center

Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.

2014-01-01

Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…
Projective simulation for artificial intelligence

NASA Astrophysics Data System (ADS)

Briegel, Hans J.; de Las Cuevas, Gemma

2012-05-01

We propose a model of a learning agent whose interaction with the environment is governed by a simulation-based projection, which allows the agent to project itself into future situations before it takes real action. Projective simulation is based on a random walk through a network of clips, which are elementary patches of episodic memory. The network of clips changes dynamically, both due to new perceptual input and due to certain compositional principles of the simulation process. During simulation, the clips are screened for specific features which trigger factual action of the agent. The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning. Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.
Projective simulation for artificial intelligence

PubMed Central

Briegel, Hans J.; De las Cuevas, Gemma

2012-01-01

We propose a model of a learning agent whose interaction with the environment is governed by a simulation-based projection, which allows the agent to project itself into future situations before it takes real action. Projective simulation is based on a random walk through a network of clips, which are elementary patches of episodic memory. The network of clips changes dynamically, both due to new perceptual input and due to certain compositional principles of the simulation process. During simulation, the clips are screened for specific features which trigger factual action of the agent. The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning. Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation. PMID:22590690
Habitual control of goal selection in humans

PubMed Central

Cushman, Fiery; Morris, Adam

2015-01-01

Humans choose actions based on both habit and planning. Habitual control is computationally frugal but adapts slowly to novel circumstances, whereas planning is computationally expensive but can adapt swiftly. Current research emphasizes the competition between habits and plans for behavioral control, yet many complex tasks instead favor their integration. We consider a hierarchical architecture that exploits the computational efficiency of habitual control to select goals while preserving the flexibility of planning to achieve those goals. We formalize this mechanism in a reinforcement learning setting, illustrate its costs and benefits, and experimentally demonstrate its spontaneous application in a sequential decision-making task. PMID:26460050
The role of GABAB receptors in human reinforcement learning.

PubMed

Ort, Andres; Kometer, Michael; Rohde, Judith; Seifritz, Erich; Vollenweider, Franz X

2014-10-01

Behavioral evidence from human studies suggests that the γ-aminobutyric acid type B receptor (GABAB receptor) agonist baclofen modulates reinforcement learning and reduces craving in patients with addiction spectrum disorders. However, in contrast to the well established role of dopamine in reinforcement learning, the mechanisms by which the GABAB receptor influences reinforcement learning in humans remain completely unknown. To further elucidate this issue, a cross-over, double-blind, placebo-controlled study was performed in healthy human subjects (N=15) to test the effects of baclofen (20 and 50mg p.o.) on probabilistic reinforcement learning. Outcomes were the feedback-induced P2 component of the event-related potential, the feedback-related negativity, and the P300 component of the event-related potential. Baclofen produced a reduction of P2 amplitude over the course of the experiment, but did not modulate the feedback-related negativity. Furthermore, there was a trend towards increased learning after baclofen administration relative to placebo over the course of the experiment. The present results extend previous theories of reinforcement learning, which focus on the importance of mesolimbic dopamine signaling, and indicate that stimulation of cortical GABAB receptors in a fronto-parietal network leads to better attentional allocation in reinforcement learning. This observation is a first step in our understanding of how baclofen may improve reinforcement learning in healthy subjects. Further studies with bigger sample sizes are needed to corroborate this conclusion and furthermore, test this effect in patients with addiction spectrum disorder. Copyright © 2014 Elsevier B.V. and ECNP. All rights reserved.
Effects of dopamine on reinforcement learning and consolidation in Parkinson’s disease

PubMed Central

Grogan, John P; Tsivos, Demitra; Smith, Laura; Knight, Brogan E; Bogacz, Rafal; Whone, Alan; Coulthard, Elizabeth J

2017-01-01

Emerging evidence suggests that dopamine may modulate learning and memory with important implications for understanding the neurobiology of memory and future therapeutic targeting. An influential hypothesis posits that dopamine biases reinforcement learning. More recent data also suggest an influence during both consolidation and retrieval. Eighteen Parkinson’s disease patients learned through feedback ON or OFF medication, with memory tested 24 hr later ON or OFF medication (4 conditions, within-subjects design with matched healthy control group). Patients OFF medication during learning decreased in memory accuracy over the following 24 hr. In contrast to previous studies, however, dopaminergic medication during learning and testing did not affect expression of positive or negative reinforcement. Two further experiments were run without the 24 hr delay, but they too failed to reproduce effects of dopaminergic medication on reinforcement learning. While supportive of a dopaminergic role in consolidation, this study failed to replicate previous findings on reinforcement learning. DOI: http://dx.doi.org/10.7554/eLife.26801.001 PMID:28691905
Designing computer learning environments for engineering and computer science: The scaffolded knowledge integration framework

NASA Astrophysics Data System (ADS)

Linn, Marcia C.

1995-06-01

Designing effective curricula for complex topics and incorporating technological tools is an evolving process. One important way to foster effective design is to synthesize successful practices. This paper describes a framework called scaffolded knowledge integration and illustrates how it guided the design of two successful course enhancements in the field of computer science and engineering. One course enhancement, the LISP Knowledge Integration Environment, improved learning and resulted in more gender-equitable outcomes. The second course enhancement, the spatial reasoning environment, addressed spatial reasoning in an introductory engineering course. This enhancement minimized the importance of prior knowledge of spatial reasoning and helped students develop a more comprehensive repertoire of spatial reasoning strategies. Taken together, the instructional research programs reinforce the value of the scaffolded knowledge integration framework and suggest directions for future curriculum reformers.
Fear of losing money? Aversive conditioning with secondary reinforcers.

PubMed

Delgado, M R; Labouliere, C D; Phelps, E A

2006-12-01

Money is a secondary reinforcer that acquires its value through social communication and interaction. In everyday human behavior and laboratory studies, money has been shown to influence appetitive or reward learning. It is unclear, however, if money has a similar impact on aversive learning. The goal of this study was to investigate the efficacy of money in aversive learning, comparing it with primary reinforcers that are traditionally used in fear conditioning paradigms. A series of experiments were conducted in which participants initially played a gambling game that led to a monetary gain. They were then presented with an aversive conditioning paradigm, with either shock (primary reinforcer) or loss of money (secondary reinforcer) as the unconditioned stimulus. Skin conductance responses and subjective ratings indicated that potential monetary loss modulated the conditioned response. Depending on the presentation context, the secondary reinforcer was as effective as the primary reinforcer during aversive conditioning. These results suggest that stimuli that acquire reinforcing properties through social communication and interaction, such as money, can effectively influence aversive learning.
Application of a model of instrumental conditioning to mobile robot control

NASA Astrophysics Data System (ADS)

Saksida, Lisa M.; Touretzky, D. S.

1997-09-01

Instrumental conditioning is a psychological process whereby an animal learns to associate its actions with their consequences. This type of learning is exploited in animal training techniques such as 'shaping by successive approximations,' which enables trainers to gradually adjust the animal's behavior by giving strategically timed reinforcements. While this is similar in principle to reinforcement learning, the real phenomenon includes many subtle effects not considered in the machine learning literature. In addition, a good deal of domain information is utilized by an animal learning a new task; it does not start from scratch every time it learns a new behavior. For these reasons, it is not surprising that mobile robot learning algorithms have yet to approach the sophistication and robustness of animal learning. A serious attempt to model instrumental learning could prove fruitful for improving machine learning techniques. In the present paper, we develop a computational theory of shaping at a level appropriate for controlling mobile robots. The theory is based on a series of mechanisms for 'behavior editing,' in which pre-existing behaviors, either innate or previously learned, can be dramatically changed in magnitude, shifted in direction, or otherwise manipulated so as to produce new behavioral routines. We have implemented our theory on Amelia, an RWI B21 mobile robot equipped with a gripper and color video camera. We provide results from training Amelia on several tasks, all of which were constructed as variations of one innate behavior, object-pursuit.
Reinforcement learning and Tourette syndrome.

PubMed

Palminteri, Stefano; Pessiglione, Mathias

2013-01-01

In this chapter, we report the first experimental explorations of reinforcement learning in Tourette syndrome, realized by our team in the last few years. This report will be preceded by an introduction aimed to provide the reader with the state of the art of the knowledge concerning the neural bases of reinforcement learning at the moment of these studies and the scientific rationale beyond them. In short, reinforcement learning is learning by trial and error to maximize rewards and minimize punishments. This decision-making and learning process implicates the dopaminergic system projecting to the frontal cortex-basal ganglia circuits. A large body of evidence suggests that the dysfunction of the same neural systems is implicated in the pathophysiology of Tourette syndrome. Our results show that Tourette condition, as well as the most common pharmacological treatments (dopamine antagonists), affects reinforcement learning performance in these patients. Specifically, the results suggest a deficit in negative reinforcement learning, possibly underpinned by a functional hyperdopaminergia, which could explain the persistence of tics, despite their evident inadaptive (negative) value. This idea, together with the implications of these results in Tourette therapy and the future perspectives, is discussed in Section 4 of this chapter. © 2013 Elsevier Inc. All rights reserved.
On the integration of reinforcement learning and approximate reasoning for control

NASA Technical Reports Server (NTRS)

Berenji, Hamid R.

1991-01-01

The author discusses the importance of strengthening the knowledge representation characteristic of reinforcement learning techniques using methods such as approximate reasoning. The ARIC (approximate reasoning-based intelligent control) architecture is an example of such a hybrid approach in which the fuzzy control rules are modified (fine-tuned) using reinforcement learning. ARIC also demonstrates that it is possible to start with an approximately correct control knowledge base and learn to refine this knowledge through further experience. On the other hand, techniques such as the TD (temporal difference) algorithm and Q-learning establish stronger theoretical foundations for their use in adaptive control and also in stability analysis of hybrid reinforcement learning and approximate reasoning-based controllers.
Generalization of value in reinforcement learning by humans

PubMed Central

Wimmer, G. Elliott; Daw, Nathaniel D.; Shohamy, Daphna

2012-01-01

Research in decision making has focused on the role of dopamine and its striatal targets in guiding choices via learned stimulus-reward or stimulus-response associations, behavior that is well-described by reinforcement learning (RL) theories. However, basic RL is relatively limited in scope and does not explain how learning about stimulus regularities or relations may guide decision making. A candidate mechanism for this type of learning comes from the domain of memory, which has highlighted a role for the hippocampus in learning of stimulus-stimulus relations, typically dissociated from the role of the striatum in stimulus-response learning. Here, we used fMRI and computational model-based analyses to examine the joint contributions of these mechanisms to RL. Humans performed an RL task with added relational structure, modeled after tasks used to isolate hippocampal contributions to memory. On each trial participants chose one of four options, but the reward probabilities for pairs of options were correlated across trials. This (uninstructed) relationship between pairs of options potentially enabled an observer to learn about options’ values based on experience with the other options and to generalize across them. We observed BOLD activity related to learning in the striatum and also in the hippocampus. By comparing a basic RL model to one augmented to allow feedback to generalize between correlated options, we tested whether choice behavior and BOLD activity were influenced by the opportunity to generalize across correlated options. Although such generalization goes beyond standard computational accounts of RL and striatal BOLD, both choices and striatal BOLD were better explained by the augmented model. Consistent with the hypothesized role for the hippocampus in this generalization, functional connectivity between the ventral striatum and hippocampus was modulated, across participants, by the ability of the augmented model to capture participants’ choice. Our results thus point toward an interactive model in which striatal RL systems may employ relational representations typically associated with the hippocampus. PMID:22487039
Frontal Theta Links Prediction Errors to Behavioral Adaptation in Reinforcement Learning

PubMed Central

Cavanagh, James F.; Frank, Michael J.; Klein, Theresa J.; Allen, John J.B.

2009-01-01

Investigations into action monitoring have consistently detailed a fronto-central voltage deflection in the Event-Related Potential (ERP) following the presentation of negatively valenced feedback, sometimes termed the Feedback Related Negativity (FRN). The FRN has been proposed to reflect a neural response to prediction errors during reinforcement learning, yet the single trial relationship between neural activity and the quanta of expectation violation remains untested. Although ERP methods are not well suited to single trial analyses, the FRN has been associated with theta band oscillatory perturbations in the medial prefrontal cortex. Medio-frontal theta oscillations have been previously associated with expectation violation and behavioral adaptation and are well suited to single trial analysis. Here, we recorded EEG activity during a probabilistic reinforcement learning task and fit the performance data to an abstract computational model (Q-learning) for calculation of single-trial reward prediction errors. Single-trial theta oscillatory activities following feedback were investigated within the context of expectation (prediction error) and adaptation (subsequent reaction time change). Results indicate that interactive medial and lateral frontal theta activities reflect the degree of negative and positive reward prediction error in the service of behavioral adaptation. These different brain areas use prediction error calculations for different behavioral adaptations: with medial frontal theta reflecting the utilization of prediction errors for reaction time slowing (specifically following errors), but lateral frontal theta reflecting prediction errors leading to working memory-related reaction time speeding for the correct choice. PMID:19969093

When Does Model-Based Control Pay Off?

PubMed Central

2016-01-01

Many accounts of decision making and reinforcement learning posit the existence of two distinct systems that control choice: a fast, automatic system and a slow, deliberative system. Recent research formalizes this distinction by mapping these systems to “model-free” and “model-based” strategies in reinforcement learning. Model-free strategies are computationally cheap, but sometimes inaccurate, because action values can be accessed by inspecting a look-up table constructed through trial-and-error. In contrast, model-based strategies compute action values through planning in a causal model of the environment, which is more accurate but also more cognitively demanding. It is assumed that this trade-off between accuracy and computational demand plays an important role in the arbitration between the two strategies, but we show that the hallmark task for dissociating model-free and model-based strategies, as well as several related variants, do not embody such a trade-off. We describe five factors that reduce the effectiveness of the model-based strategy on these tasks by reducing its accuracy in estimating reward outcomes and decreasing the importance of its choices. Based on these observations, we describe a version of the task that formally and empirically obtains an accuracy-demand trade-off between model-free and model-based strategies. Moreover, we show that human participants spontaneously increase their reliance on model-based control on this task, compared to the original paradigm. Our novel task and our computational analyses may prove important in subsequent empirical investigations of how humans balance accuracy and demand. PMID:27564094
When Does Model-Based Control Pay Off?

PubMed

Kool, Wouter; Cushman, Fiery A; Gershman, Samuel J

2016-08-01

Many accounts of decision making and reinforcement learning posit the existence of two distinct systems that control choice: a fast, automatic system and a slow, deliberative system. Recent research formalizes this distinction by mapping these systems to "model-free" and "model-based" strategies in reinforcement learning. Model-free strategies are computationally cheap, but sometimes inaccurate, because action values can be accessed by inspecting a look-up table constructed through trial-and-error. In contrast, model-based strategies compute action values through planning in a causal model of the environment, which is more accurate but also more cognitively demanding. It is assumed that this trade-off between accuracy and computational demand plays an important role in the arbitration between the two strategies, but we show that the hallmark task for dissociating model-free and model-based strategies, as well as several related variants, do not embody such a trade-off. We describe five factors that reduce the effectiveness of the model-based strategy on these tasks by reducing its accuracy in estimating reward outcomes and decreasing the importance of its choices. Based on these observations, we describe a version of the task that formally and empirically obtains an accuracy-demand trade-off between model-free and model-based strategies. Moreover, we show that human participants spontaneously increase their reliance on model-based control on this task, compared to the original paradigm. Our novel task and our computational analyses may prove important in subsequent empirical investigations of how humans balance accuracy and demand.
Intelligence moderates reinforcement learning: a mini-review of the neural evidence

PubMed Central

2014-01-01

Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. PMID:25185818
Intelligence moderates reinforcement learning: a mini-review of the neural evidence.

PubMed

Chen, Chong

2015-06-01

Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. Copyright © 2015 the American Physiological Society.
Prespeech motor learning in a neural network using reinforcement☆

PubMed Central

Warlaumont, Anne S.; Westermann, Gert; Buder, Eugene H.; Oller, D. Kimbrough

2012-01-01

Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one’s language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the differ-ent conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network’s post learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network’s post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model’s post-learning productions were more likely to resemble the English vowels and vice versa. PMID:23275137
The Vesalius Project: Interactive Computers in Anatomical Instruction.

ERIC Educational Resources Information Center

McCracken, Thomas O.; Spurgeon, Thomas L.

1991-01-01

Described is a high-resolution, interactive 3-D atlas of human/animal anatomy that students will use to learn the structure of the body and to understand their own bodies in health and disease. This system can be used to reinforce cadaver study or to serve as a substitute for institutions where it is not practical to use cadavers. (KR)
Dopamine selectively remediates ‘model-based’ reward learning: a computational approach

PubMed Central

Sharp, Madeleine E.; Foerde, Karin; Daw, Nathaniel D.

2016-01-01

Patients with loss of dopamine due to Parkinson’s disease are impaired at learning from reward. However, it remains unknown precisely which aspect of learning is impaired. In particular, learning from reward, or reinforcement learning, can be driven by two distinct computational processes. One involves habitual stamping-in of stimulus-response associations, hypothesized to arise computationally from ‘model-free’ learning. The other, ‘model-based’ learning, involves learning a model of the world that is believed to support goal-directed behaviour. Much work has pointed to a role for dopamine in model-free learning. But recent work suggests model-based learning may also involve dopamine modulation, raising the possibility that model-based learning may contribute to the learning impairment in Parkinson’s disease. To directly test this, we used a two-step reward-learning task which dissociates model-free versus model-based learning. We evaluated learning in patients with Parkinson’s disease tested ON versus OFF their dopamine replacement medication and in healthy controls. Surprisingly, we found no effect of disease or medication on model-free learning. Instead, we found that patients tested OFF medication showed a marked impairment in model-based learning, and that this impairment was remediated by dopaminergic medication. Moreover, model-based learning was positively correlated with a separate measure of working memory performance, raising the possibility of common neural substrates. Our results suggest that some learning deficits in Parkinson’s disease may be related to an inability to pursue reward based on complete representations of the environment. PMID:26685155
A new consequence of Simpson's paradox: stable cooperation in one-shot prisoner's dilemma from populations of individualistic learners.

PubMed

Chater, Nick; Vlaev, Ivo; Grinberg, Maurice

2008-08-01

Theories of choice in economics typically assume that interacting agents act individualistically and maximize their own utility. Specifically, game theory proposes that rational players should defect in one-shot prisoners' dilemmas (PD). Defection also appears to be the inevitable outcome for agents who learn by reinforcement of past choices, because whatever the other player does, defection leads to greater reinforcement on each trial. In a computer simulation and 4 experiments, the authors show that, apparently paradoxically, when players' choices are correlated by an exogenous factor (here, the cooperativeness of the specific PD chosen), people obtain greater average reinforcement for cooperating, which can sustain cooperation. This effect arises from a well-known statistical paradox, Simpson's paradox. The authors speculate that this effect may be relevant to aspects of real-world human cooperative behavior.
Reinforcement learning in complementarity game and population dynamics

NASA Astrophysics Data System (ADS)

Jost, Jürgen; Li, Wei

2014-02-01

We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005), 10.1016/j.physa.2004.07.005] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.
Dopamine D3 Receptor Availability Is Associated with Inflexible Decision Making.

PubMed

Groman, Stephanie M; Smith, Nathaniel J; Petrullli, J Ryan; Massi, Bart; Chen, Lihui; Ropchan, Jim; Huang, Yiyun; Lee, Daeyeol; Morris, Evan D; Taylor, Jane R

2016-06-22

Dopamine D2/3 receptor signaling is critical for flexible adaptive behavior; however, it is unclear whether D2, D3, or both receptor subtypes modulate precise signals of feedback and reward history that underlie optimal decision making. Here, PET with the radioligand [(11)C]-(+)-PHNO was used to quantify individual differences in putative D3 receptor availability in rodents trained on a novel three-choice spatial acquisition and reversal-learning task with probabilistic reinforcement. Binding of [(11)C]-(+)-PHNO in the midbrain was negatively related to the ability of rats to adapt to changes in rewarded locations, but not to the initial learning. Computational modeling of choice behavior in the reversal phase indicated that [(11)C]-(+)-PHNO binding in the midbrain was related to the learning rate and sensitivity to positive, but not negative, feedback. Administration of a D3-preferring agonist likewise impaired reversal performance by reducing the learning rate and sensitivity to positive feedback. These results demonstrate a previously unrecognized role for D3 receptors in select aspects of reinforcement learning and suggest that individual variation in midbrain D3 receptors influences flexible behavior. Our combined neuroimaging, behavioral, pharmacological, and computational approach implicates the dopamine D3 receptor in decision-making processes that are altered in psychiatric disorders. Flexible decision-making behavior is dependent upon dopamine D2/3 signaling in corticostriatal brain regions. However, the role of D3 receptors in adaptive, goal-directed behavior has not been thoroughly investigated. By combining PET imaging with the D3-preferring radioligand [(11)C]-(+)-PHNO, pharmacology, a novel three-choice probabilistic discrimination and reversal task and computational modeling of behavior in rats, we report that naturally occurring variation in [(11)C]-(+)-PHNO receptor availability relates to specific aspects of flexible decision making. We confirm these relationships using a D3-preferring agonist, thus identifying a unique role of midbrain D3 receptors in decision-making processes. Copyright © 2016 the authors 0270-6474/16/366732-10$15.00/0.
The Roles of Phasic and Tonic Dopamine in Tic Learning and Expression.

PubMed

Maia, Tiago V; Conceição, Vasco A

2017-09-15

Tourette syndrome (TS) prominently involves dopaminergic disturbances, but the precise nature of those disturbances has remained elusive. A substantial body of empirical work and recent computational models have characterized the specific roles of phasic and tonic dopamine (DA) in action learning and selection, respectively. Using insights from this work and models, we suggest that TS involves increases in both phasic and tonic DA, which produce increased propensities for tic learning and expression, respectively. We review the evidence from reinforcement-learning and habit-learning studies in TS, which supports the idea that TS involves increased phasic DA responses; we also review the evidence that tics engage the habit-learning circuitry. On the basis of these findings, we suggest that tics are exaggerated, maladaptive, and persistent motor habits reinforced by aberrant, increased phasic DA responses. Increased tonic DA amplifies the tendency to execute learned tics and also provides a fertile ground of motor hyperactivity for tic learning. We review evidence suggesting that antipsychotics may counter both the increased propensity for tic expression, by increasing excitability in the indirect pathway, and the increased propensity for tic learning, by shifting plasticity in the indirect pathway toward long-term potentiation (and possibly also through more complex mechanisms). Finally, we review evidence suggesting that low doses of DA agonists that effectively treat TS decrease both phasic and tonic DA, thereby also reducing the propensity for both tic learning and tic expression, respectively. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
The prefrontal cortex and hybrid learning during iterative competitive games.

PubMed

Abe, Hiroshi; Seo, Hyojung; Lee, Daeyeol

2011-12-01

Behavioral changes driven by reinforcement and punishment are referred to as simple or model-free reinforcement learning. Animals can also change their behaviors by observing events that are neither appetitive nor aversive when these events provide new information about payoffs available from alternative actions. This is an example of model-based reinforcement learning and can be accomplished by incorporating hypothetical reward signals into the value functions for specific actions. Recent neuroimaging and single-neuron recording studies showed that the prefrontal cortex and the striatum are involved not only in reinforcement and punishment, but also in model-based reinforcement learning. We found evidence for both types of learning, and hence hybrid learning, in monkeys during simulated competitive games. In addition, in both the dorsolateral prefrontal cortex and orbitofrontal cortex, individual neurons heterogeneously encoded signals related to actual and hypothetical outcomes from specific actions, suggesting that both areas might contribute to hybrid learning. © 2011 New York Academy of Sciences.
Instrumental learning and relearning in individuals with psychopathy and in patients with lesions involving the amygdala or orbitofrontal cortex.

PubMed

Mitchell, D G V; Fine, C; Richell, R A; Newman, C; Lumsden, J; Blair, K S; Blair, R J R

2006-05-01

Previous work has shown that individuals with psychopathy are impaired on some forms of associative learning, particularly stimulus-reinforcement learning (Blair et al., 2004; Newman & Kosson, 1986). Animal work suggests that the acquisition of stimulus-reinforcement associations requires the amygdala (Baxter & Murray, 2002). Individuals with psychopathy also show impoverished reversal learning (Mitchell, Colledge, Leonard, & Blair, 2002). Reversal learning is supported by the ventrolateral and orbitofrontal cortex (Rolls, 2004). In this paper we present experiments investigating stimulus-reinforcement learning and relearning in patients with lesions of the orbitofrontal cortex or amygdala, and individuals with developmental psychopathy without known trauma. The results are interpreted with reference to current neurocognitive models of stimulus-reinforcement learning, relearning, and developmental psychopathy. Copyright (c) 2006 APA, all rights reserved.
Model-based reinforcement learning with dimension reduction.

PubMed

Tangkaratt, Voot; Morimoto, Jun; Sugiyama, Masashi

2016-12-01

The goal of reinforcement learning is to learn an optimal policy which controls an agent to acquire the maximum cumulative reward. The model-based reinforcement learning approach learns a transition model of the environment from data, and then derives the optimal policy using the transition model. However, learning an accurate transition model in high-dimensional environments requires a large amount of data which is difficult to obtain. To overcome this difficulty, in this paper, we propose to combine model-based reinforcement learning with the recently developed least-squares conditional entropy (LSCE) method, which simultaneously performs transition model estimation and dimension reduction. We also further extend the proposed method to imitation learning scenarios. The experimental results show that policy search combined with LSCE performs well for high-dimensional control tasks including real humanoid robot control. Copyright © 2016 Elsevier Ltd. All rights reserved.
Reinforcement of Science Learning through Local Culture: A Delphi Study

ERIC Educational Resources Information Center

Nuangchalerm, Prasart

2008-01-01

This study aims to explore the ways to reinforce science learning through local culture by using Delphi technique. Twenty four participants in various fields of study were selected. The result of study provides a framework for reinforcement of science learning through local culture on the theme life and environment. (Contains 1 table.)
The control of tonic pain by active relief learning

PubMed Central

Mano, Hiroaki; Lee, Michael; Yoshida, Wako; Kawato, Mitsuo; Robbins, Trevor W

2018-01-01

Tonic pain after injury characterises a behavioural state that prioritises recovery. Although generally suppressing cognition and attention, tonic pain needs to allow effective relief learning to reduce the cause of the pain. Here, we describe a central learning circuit that supports learning of relief and concurrently suppresses the level of ongoing pain. We used computational modelling of behavioural, physiological and neuroimaging data in two experiments in which subjects learned to terminate tonic pain in static and dynamic escape-learning paradigms. In both studies, we show that active relief-seeking involves a reinforcement learning process manifest by error signals observed in the dorsal putamen. Critically, this system uses an uncertainty (‘associability’) signal detected in pregenual anterior cingulate cortex that both controls the relief learning rate, and endogenously and parametrically modulates the level of tonic pain. The results define a self-organising learning circuit that reduces ongoing pain when learning about potential relief. PMID:29482716
Attitudes of health care students about computer-aided neuroanatomy instruction.

PubMed

McKeough, D Michael; Bagatell, Nancy

2009-01-01

This study examined students' attitudes toward computer-aided instruction (CAI), specifically neuroanatomy learning modules, to assess which components were primary in establishing these attitudes and to discuss the implications of these attitudes for successfully incorporating CAI in the preparation of health care providers. Seventy-seven masters degree, entry-level, health care professional students matriculated in an introductory neuroanatomy course volunteered as subjects for this study. Students independently reviewed the modules as supplements to lecture and completed a survey to evaluate teaching effectiveness. Responses to survey statements were compared across the learning modules to determine if students viewed the modules differently. Responses to individual survey statements were averaged to measure the strength of agreement or disagreement with the statement. Responses to open-ended questions were theme coded, and frequencies and percentages were calculated for each. Students saw no differences between the learning modules. Students perceived the learning modules as valuable; they enjoyed using the modules but did not prefer CAI over traditional lecture format. The modules were useful in learning or reinforcing neuroanatomical concepts and improving clinical problem-solving skills. Students reported that the visual representation of the neuroanatomical systems, computer animation, ability to control the use of the modules, and navigational fidelity were key factors in determining attitudes. The computer-based learning modules examined in this study were effective as adjuncts to lecture in helping entry-level health care students learn and make clinical applications of neuroanatomy information.
Affective bias as a rational response to the statistics of rewards and punishments.

PubMed

Pulcu, Erdem; Browning, Michael

2017-10-04

Affective bias, the tendency to differentially prioritise the processing of negative relative to positive events, is commonly observed in clinical and non-clinical populations. However, why such biases develop is not known. Using a computational framework, we investigated whether affective biases may reflect individuals' estimates of the information content of negative relative to positive events. During a reinforcement learning task, the information content of positive and negative outcomes was manipulated independently by varying the volatility of their occurrence. Human participants altered the learning rates used for the outcomes selectively, preferentially learning from the most informative. This behaviour was associated with activity of the central norepinephrine system, estimated using pupilometry, for loss outcomes. Humans maintain independent estimates of the information content of distinct positive and negative outcomes which may bias their processing of affective events. Normalising affective biases using computationally inspired interventions may represent a novel approach to treatment development.
Affective bias as a rational response to the statistics of rewards and punishments

PubMed Central

Pulcu, Erdem

2017-01-01

Affective bias, the tendency to differentially prioritise the processing of negative relative to positive events, is commonly observed in clinical and non-clinical populations. However, why such biases develop is not known. Using a computational framework, we investigated whether affective biases may reflect individuals’ estimates of the information content of negative relative to positive events. During a reinforcement learning task, the information content of positive and negative outcomes was manipulated independently by varying the volatility of their occurrence. Human participants altered the learning rates used for the outcomes selectively, preferentially learning from the most informative. This behaviour was associated with activity of the central norepinephrine system, estimated using pupilometry, for loss outcomes. Humans maintain independent estimates of the information content of distinct positive and negative outcomes which may bias their processing of affective events. Normalising affective biases using computationally inspired interventions may represent a novel approach to treatment development. PMID:28976304
Goal-directed, habitual and Pavlovian prosocial behavior

PubMed Central

Gęsiarz, Filip; Crockett, Molly J.

2015-01-01

Although prosocial behaviors have been widely studied across disciplines, the mechanisms underlying them are not fully understood. Evidence from psychology, biology and economics suggests that prosocial behaviors can be driven by a variety of seemingly opposing factors: altruism or egoism, intuition or deliberation, inborn instincts or learned dispositions, and utility derived from actions or their outcomes. Here we propose a framework inspired by research on reinforcement learning and decision making that links these processes and explains characteristics of prosocial behaviors in different contexts. More specifically, we suggest that prosocial behaviors inherit features of up to three decision-making systems employed to choose between self- and other- regarding acts: a goal-directed system that selects actions based on their predicted consequences, a habitual system that selects actions based on their reinforcement history, and a Pavlovian system that emits reflexive responses based on evolutionarily prescribed priors. This framework, initially described in the field of cognitive neuroscience and machine learning, provides insight into the potential neural circuits and computations shaping prosocial behaviors. Furthermore, it identifies specific conditions in which each of these three systems should dominate and promote other- or self- regarding behavior. PMID:26074797

Homeostatic reinforcement learning for integrating reward collection and physiological stability

PubMed Central

Keramati, Mehdi; Gutkin, Boris

2014-01-01

Efficient regulation of internal homeostasis and defending it against perturbations requires adaptive behavioral strategies. However, the computational principles mediating the interaction between homeostatic and associative learning processes remain undefined. Here we use a definition of primary rewards, as outcomes fulfilling physiological needs, to build a normative theory showing how learning motivated behaviors may be modulated by internal states. Within this framework, we mathematically prove that seeking rewards is equivalent to the fundamental objective of physiological stability, defining the notion of physiological rationality of behavior. We further suggest a formal basis for temporal discounting of rewards by showing that discounting motivates animals to follow the shortest path in the space of physiological variables toward the desired setpoint. We also explain how animals learn to act predictively to preclude prospective homeostatic challenges, and several other behavioral patterns. Finally, we suggest a computational role for interaction between hypothalamus and the brain reward system. DOI: http://dx.doi.org/10.7554/eLife.04811.001 PMID:25457346
Partial Planning Reinforcement Learning

DTIC Science & Technology

2012-08-31

Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 15. SUBJECT TERMS Reinforcement Learning, Bayesian Optimization, Active ... Learning , Action Model Learning, Decision Theoretic Assistance Prasad Tadepalli, Alan Fern Oregon State University Office of Sponsored Programs Oregon State
Five-Year-Olds' Systematic Errors in Second-Order False Belief Tasks Are Due to First-Order Theory of Mind Strategy Selection: A Computational Modeling Study.

PubMed

Arslan, Burcu; Taatgen, Niels A; Verbrugge, Rineke

2017-01-01

The focus of studies on second-order false belief reasoning generally was on investigating the roles of executive functions and language with correlational studies. Different from those studies, we focus on the question how 5-year-olds select and revise reasoning strategies in second-order false belief tasks by constructing two computational cognitive models of this process: an instance-based learning model and a reinforcement learning model. Unlike the reinforcement learning model, the instance-based learning model predicted that children who fail second-order false belief tasks would give answers based on first-order theory of mind (ToM) reasoning as opposed to zero-order reasoning. This prediction was confirmed with an empirical study that we conducted with 72 5- to 6-year-old children. The results showed that 17% of the answers were correct and 83% of the answers were wrong. In line with our prediction, 65% of the wrong answers were based on a first-order ToM strategy, while only 29% of them were based on a zero-order strategy (the remaining 6% of subjects did not provide any answer). Based on our instance-based learning model, we propose that when children get feedback "Wrong," they explicitly revise their strategy to a higher level instead of implicitly selecting one of the available ToM strategies. Moreover, we predict that children's failures are due to lack of experience and that with exposure to second-order false belief reasoning, children can revise their wrong first-order reasoning strategy to a correct second-order reasoning strategy.
Five-Year-Olds’ Systematic Errors in Second-Order False Belief Tasks Are Due to First-Order Theory of Mind Strategy Selection: A Computational Modeling Study

PubMed Central

Arslan, Burcu; Taatgen, Niels A.; Verbrugge, Rineke

2017-01-01

The focus of studies on second-order false belief reasoning generally was on investigating the roles of executive functions and language with correlational studies. Different from those studies, we focus on the question how 5-year-olds select and revise reasoning strategies in second-order false belief tasks by constructing two computational cognitive models of this process: an instance-based learning model and a reinforcement learning model. Unlike the reinforcement learning model, the instance-based learning model predicted that children who fail second-order false belief tasks would give answers based on first-order theory of mind (ToM) reasoning as opposed to zero-order reasoning. This prediction was confirmed with an empirical study that we conducted with 72 5- to 6-year-old children. The results showed that 17% of the answers were correct and 83% of the answers were wrong. In line with our prediction, 65% of the wrong answers were based on a first-order ToM strategy, while only 29% of them were based on a zero-order strategy (the remaining 6% of subjects did not provide any answer). Based on our instance-based learning model, we propose that when children get feedback “Wrong,” they explicitly revise their strategy to a higher level instead of implicitly selecting one of the available ToM strategies. Moreover, we predict that children’s failures are due to lack of experience and that with exposure to second-order false belief reasoning, children can revise their wrong first-order reasoning strategy to a correct second-order reasoning strategy. PMID:28293206
Reinforcement learning in supply chains.

PubMed

Valluri, Annapurna; North, Michael J; Macal, Charles M

2009-10-01

Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice.
Democratic population decisions result in robust policy-gradient learning: a parametric study with GPU simulations.

PubMed

Richmond, Paul; Buesing, Lars; Giugliano, Michele; Vasilaki, Eleni

2011-05-04

High performance computing on the Graphics Processing Unit (GPU) is an emerging field driven by the promise of high computational power at a low cost. However, GPU programming is a non-trivial task and moreover architectural limitations raise the question of whether investing effort in this direction may be worthwhile. In this work, we use GPU programming to simulate a two-layer network of Integrate-and-Fire neurons with varying degrees of recurrent connectivity and investigate its ability to learn a simplified navigation task using a policy-gradient learning rule stemming from Reinforcement Learning. The purpose of this paper is twofold. First, we want to support the use of GPUs in the field of Computational Neuroscience. Second, using GPU computing power, we investigate the conditions under which the said architecture and learning rule demonstrate best performance. Our work indicates that networks featuring strong Mexican-Hat-shaped recurrent connections in the top layer, where decision making is governed by the formation of a stable activity bump in the neural population (a "non-democratic" mechanism), achieve mediocre learning results at best. In absence of recurrent connections, where all neurons "vote" independently ("democratic") for a decision via population vector readout, the task is generally learned better and more robustly. Our study would have been extremely difficult on a desktop computer without the use of GPU programming. We present the routines developed for this purpose and show that a speed improvement of 5x up to 42x is provided versus optimised Python code. The higher speed is achieved when we exploit the parallelism of the GPU in the search of learning parameters. This suggests that efficient GPU programming can significantly reduce the time needed for simulating networks of spiking neurons, particularly when multiple parameter configurations are investigated.
Reinforcement learning in scheduling

NASA Technical Reports Server (NTRS)

Dietterich, Tom G.; Ok, Dokyeong; Zhang, Wei; Tadepalli, Prasad

1994-01-01

The goal of this research is to apply reinforcement learning methods to real-world problems like scheduling. In this preliminary paper, we show that learning to solve scheduling problems such as the Space Shuttle Payload Processing and the Automatic Guided Vehicle (AGV) scheduling can be usefully studied in the reinforcement learning framework. We discuss some of the special challenges posed by the scheduling domain to these methods and propose some possible solutions we plan to implement.
Computational neuroscience across the lifespan: Promises and pitfalls.

PubMed

van den Bos, Wouter; Bruckner, Rasmus; Nassar, Matthew R; Mata, Rui; Eppinger, Ben

2017-10-13

In recent years, the application of computational modeling in studies on age-related changes in decision making and learning has gained in popularity. One advantage of computational models is that they provide access to latent variables that cannot be directly observed from behavior. In combination with experimental manipulations, these latent variables can help to test hypotheses about age-related changes in behavioral and neurobiological measures at a level of specificity that is not achievable with descriptive analysis approaches alone. This level of specificity can in turn be beneficial to establish the identity of the corresponding behavioral and neurobiological mechanisms. In this paper, we will illustrate applications of computational methods using examples of lifespan research on risk taking, strategy selection and reinforcement learning. We will elaborate on problems that can occur when computational neuroscience methods are applied to data of different age groups. Finally, we will discuss potential targets for future applications and outline general shortcomings of computational neuroscience methods for research on human lifespan development. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Attention Cueing and Activity Equally Reduce False Alarm Rate in Visual-Auditory Associative Learning through Improving Memory.

PubMed

Nikouei Mahani, Mohammad-Ali; Haghgoo, Hojjat Allah; Azizi, Solmaz; Nili Ahmadabadi, Majid

2016-01-01

In our daily life, we continually exploit already learned multisensory associations and form new ones when facing novel situations. Improving our associative learning results in higher cognitive capabilities. We experimentally and computationally studied the learning performance of healthy subjects in a visual-auditory sensory associative learning task across active learning, attention cueing learning, and passive learning modes. According to our results, the learning mode had no significant effect on learning association of congruent pairs. In addition, subjects' performance in learning congruent samples was not correlated with their vigilance score. Nevertheless, vigilance score was significantly correlated with the learning performance of the non-congruent pairs. Moreover, in the last block of the passive learning mode, subjects significantly made more mistakes in taking non-congruent pairs as associated and consciously reported lower confidence. These results indicate that attention and activity equally enhanced visual-auditory associative learning for non-congruent pairs, while false alarm rate in the passive learning mode did not decrease after the second block. We investigated the cause of higher false alarm rate in the passive learning mode by using a computational model, composed of a reinforcement learning module and a memory-decay module. The results suggest that the higher rate of memory decay is the source of making more mistakes and reporting lower confidence in non-congruent pairs in the passive learning mode.
Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.

PubMed

Ren, Zhipeng; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Zhipeng Ren; Daoyi Dong; Huaxiong Li; Chunlin Chen; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Ren, Zhipeng

2018-06-01

In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.
Dopamine selectively remediates 'model-based' reward learning: a computational approach.

PubMed

Sharp, Madeleine E; Foerde, Karin; Daw, Nathaniel D; Shohamy, Daphna

2016-02-01

Patients with loss of dopamine due to Parkinson's disease are impaired at learning from reward. However, it remains unknown precisely which aspect of learning is impaired. In particular, learning from reward, or reinforcement learning, can be driven by two distinct computational processes. One involves habitual stamping-in of stimulus-response associations, hypothesized to arise computationally from 'model-free' learning. The other, 'model-based' learning, involves learning a model of the world that is believed to support goal-directed behaviour. Much work has pointed to a role for dopamine in model-free learning. But recent work suggests model-based learning may also involve dopamine modulation, raising the possibility that model-based learning may contribute to the learning impairment in Parkinson's disease. To directly test this, we used a two-step reward-learning task which dissociates model-free versus model-based learning. We evaluated learning in patients with Parkinson's disease tested ON versus OFF their dopamine replacement medication and in healthy controls. Surprisingly, we found no effect of disease or medication on model-free learning. Instead, we found that patients tested OFF medication showed a marked impairment in model-based learning, and that this impairment was remediated by dopaminergic medication. Moreover, model-based learning was positively correlated with a separate measure of working memory performance, raising the possibility of common neural substrates. Our results suggest that some learning deficits in Parkinson's disease may be related to an inability to pursue reward based on complete representations of the environment. © The Author (2015). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
B-tree search reinforcement learning for model based intelligent agent

NASA Astrophysics Data System (ADS)

Bhuvaneswari, S.; Vignashwaran, R.

2013-03-01

Agents trained by learning techniques provide a powerful approximation of active solutions for naive approaches. In this study using B - Trees implying reinforced learning the data search for information retrieval is moderated to achieve accuracy with minimum search time. The impact of variables and tactics applied in training are determined using reinforcement learning. Agents based on these techniques perform satisfactory baseline and act as finite agents based on the predetermined model against competitors from the course.
Using Fuzzy Logic for Performance Evaluation in Reinforcement Learning

NASA Technical Reports Server (NTRS)

Berenji, Hamid R.; Khedkar, Pratap S.

1992-01-01

Current reinforcement learning algorithms require long training periods which generally limit their applicability to small size problems. A new architecture is described which uses fuzzy rules to initialize its two neural networks: a neural network for performance evaluation and another for action selection. This architecture is applied to control of dynamic systems and it is demonstrated that it is possible to start with an approximate prior knowledge and learn to refine it through experiments using reinforcement learning.
A common neural circuit mechanism for internally guided and externally reinforced forms of motor learning.

PubMed

Hisey, Erin; Kearney, Matthew Gene; Mooney, Richard

2018-04-01

The complex skills underlying verbal and musical expression can be learned without external punishment or reward, indicating their learning is internally guided. The neural mechanisms that mediate internally guided learning are poorly understood, but a circuit comprising dopamine-releasing neurons in the midbrain ventral tegmental area (VTA) and their targets in the basal ganglia are important to externally reinforced learning. Juvenile zebra finches copy a tutor song in a process that is internally guided and, in adulthood, can learn to modify the fundamental frequency (pitch) of a target syllable in response to external reinforcement with white noise. Here we combined intersectional genetic ablation of VTA neurons, reversible blockade of dopamine receptors in the basal ganglia, and singing-triggered optogenetic stimulation of VTA terminals to establish that a common VTA-basal ganglia circuit enables internally guided song copying and externally reinforced syllable pitch learning.
Knockout crickets for the study of learning and memory: Dopamine receptor Dop1 mediates aversive but not appetitive reinforcement in crickets.

PubMed

Awata, Hiroko; Watanabe, Takahito; Hamanaka, Yoshitaka; Mito, Taro; Noji, Sumihare; Mizunami, Makoto

2015-11-02

Elucidation of reinforcement mechanisms in associative learning is an important subject in neuroscience. In mammals, dopamine neurons are thought to play critical roles in mediating both appetitive and aversive reinforcement. Our pharmacological studies suggested that octopamine and dopamine neurons mediate reward and punishment, respectively, in crickets, but recent studies in fruit-flies concluded that dopamine neurons mediates both reward and punishment, via the type 1 dopamine receptor Dop1. To resolve the discrepancy between studies in different insect species, we produced Dop1 knockout crickets using the CRISPR/Cas9 system and found that they are defective in aversive learning with sodium chloride punishment but not appetitive learning with water or sucrose reward. The results suggest that dopamine and octopamine neurons mediate aversive and appetitive reinforcement, respectively, in crickets. We suggest unexpected diversity in neurotransmitters mediating appetitive reinforcement between crickets and fruit-flies, although the neurotransmitter mediating aversive reinforcement is conserved. This study demonstrates usefulness of the CRISPR/Cas9 system for producing knockout animals for the study of learning and memory.
Social Cognition as Reinforcement Learning: Feedback Modulates Emotion Inference.

PubMed

Zaki, Jamil; Kallman, Seth; Wimmer, G Elliott; Ochsner, Kevin; Shohamy, Daphna

2016-09-01

Neuroscientific studies of social cognition typically employ paradigms in which perceivers draw single-shot inferences about the internal states of strangers. Real-world social inference features much different parameters: People often encounter and learn about particular social targets (e.g., friends) over time and receive feedback about whether their inferences are correct or incorrect. Here, we examined this process and, more broadly, the intersection between social cognition and reinforcement learning. Perceivers were scanned using fMRI while repeatedly encountering three social targets who produced conflicting visual and verbal emotional cues. Perceivers guessed how targets felt and received feedback about whether they had guessed correctly. Visual cues reliably predicted one target's emotion, verbal cues predicted a second target's emotion, and neither reliably predicted the third target's emotion. Perceivers successfully used this information to update their judgments over time. Furthermore, trial-by-trial learning signals-estimated using two reinforcement learning models-tracked activity in ventral striatum and ventromedial pFC, structures associated with reinforcement learning, and regions associated with updating social impressions, including TPJ. These data suggest that learning about others' emotions, like other forms of feedback learning, relies on domain-general reinforcement mechanisms as well as domain-specific social information processing.
UIMX: A User Interface Management System For Scientific Computing With X Windows

NASA Astrophysics Data System (ADS)

Foody, Michael

1989-09-01

Applications with iconic user interfaces, (for example, interfaces with pulldown menus, radio buttons, and scroll bars), such as those found on Apple's Macintosh computer and the IBM PC under Microsoft's Presentation Manager, have become very popular, and for good reason. They are much easier to use than applications with traditional keyboard-oriented interfaces, so training costs are much lower and just about anyone can use them. They are standardized between applications, so once you learn one application you are well along the way to learning another. The use of one reinforces the common elements between applications of the interface, and, as a result, you remember how to use them longer. Finally, for the developer, their support costs can be much lower because of their ease of use.
Human-level control through deep reinforcement learning.

PubMed

Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

2015-02-26

The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Human-level control through deep reinforcement learning

NASA Astrophysics Data System (ADS)

Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

2015-02-01

The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Machine learning in cardiovascular medicine: are we there yet?

PubMed

Shameer, Khader; Johnson, Kipp W; Glicksberg, Benjamin S; Dudley, Joel T; Sengupta, Partho P

2018-01-19

Artificial intelligence (AI) broadly refers to analytical algorithms that iteratively learn from data, allowing computers to find hidden insights without being explicitly programmed where to look. These include a family of operations encompassing several terms like machine learning, cognitive learning, deep learning and reinforcement learning-based methods that can be used to integrate and interpret complex biomedical and healthcare data in scenarios where traditional statistical methods may not be able to perform. In this review article, we discuss the basics of machine learning algorithms and what potential data sources exist; evaluate the need for machine learning; and examine the potential limitations and challenges of implementing machine in the context of cardiovascular medicine. The most promising avenues for AI in medicine are the development of automated risk prediction algorithms which can be used to guide clinical care; use of unsupervised learning techniques to more precisely phenotype complex disease; and the implementation of reinforcement learning algorithms to intelligently augment healthcare providers. The utility of a machine learning-based predictive model will depend on factors including data heterogeneity, data depth, data breadth, nature of modelling task, choice of machine learning and feature selection algorithms, and orthogonal evidence. A critical understanding of the strength and limitations of various methods and tasks amenable to machine learning is vital. By leveraging the growing corpus of big data in medicine, we detail pathways by which machine learning may facilitate optimal development of patient-specific models for improving diagnoses, intervention and outcome in cardiovascular medicine. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

Role of dopamine D2 receptors in human reinforcement learning.

PubMed

Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W

2014-09-01

Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well.
Role of Dopamine D2 Receptors in Human Reinforcement Learning

PubMed Central

Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W

2014-01-01

Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well. PMID:24713613
Network congestion control algorithm based on Actor-Critic reinforcement learning model

NASA Astrophysics Data System (ADS)

Xu, Tao; Gong, Lina; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen

2018-04-01

Aiming at the network congestion control problem, a congestion control algorithm based on Actor-Critic reinforcement learning model is designed. Through the genetic algorithm in the congestion control strategy, the network congestion problems can be better found and prevented. According to Actor-Critic reinforcement learning, the simulation experiment of network congestion control algorithm is designed. The simulation experiments verify that the AQM controller can predict the dynamic characteristics of the network system. Moreover, the learning strategy is adopted to optimize the network performance, and the dropping probability of packets is adaptively adjusted so as to improve the network performance and avoid congestion. Based on the above finding, it is concluded that the network congestion control algorithm based on Actor-Critic reinforcement learning model can effectively avoid the occurrence of TCP network congestion.
From Recurrent Choice to Skill Learning: A Reinforcement-Learning Model

ERIC Educational Resources Information Center

Fu, Wai-Tat; Anderson, John R.

2006-01-01

The authors propose a reinforcement-learning mechanism as a model for recurrent choice and extend it to account for skill learning. The model was inspired by recent research in neurophysiological studies of the basal ganglia and provides an integrated explanation of recurrent choice behavior and skill learning. The behavior includes effects of…
Impact of Cognitive Architectures on Human-Computer Interaction

DTIC Science & Technology

2014-09-01

activation, reinforced learning, emotion, semantic memory , episodic memory , and visual imagery.12 In 2010 Rosenbloom created a variant of the Soar...being added to almost every new version. In 2004 Nuxoll and Laird added episodic memory to the Soar architecture.11 In 2008 Laird presented...York (NY): Psychology Press; 2014; p. 1–50. 11. Nuxoll A, Laird JE. A cognitive model of episodic memory integrated with a general cognitive
Adolescent-specific patterns of behavior and neural activity during social reinforcement learning

PubMed Central

Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, BJ

2014-01-01

Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents towards action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggests possible explanations for how peers may motivate adolescent behavior. PMID:24550063
Adolescent-specific patterns of behavior and neural activity during social reinforcement learning.

PubMed

Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, B J

2014-06-01

Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.
Benchmarking for Bayesian Reinforcement Learning

PubMed Central

Ernst, Damien; Couëtoux, Adrien

2016-01-01

In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the collected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been proposed, but the benchmarks used to compare them are only relevant for specific cases. The paper addresses this problem, and provides a new BRL comparison methodology along with the corresponding open source library. In this methodology, a comparison criterion that measures the performance of algorithms on large sets of Markov Decision Processes (MDPs) drawn from some probability distributions is defined. In order to enable the comparison of non-anytime algorithms, our methodology also includes a detailed analysis of the computation time requirement of each algorithm. Our library is released with all source code and documentation: it includes three test problems, each of which has two different prior distributions, and seven state-of-the-art RL algorithms. Finally, our library is illustrated by comparing all the available algorithms and the results are discussed. PMID:27304891
Benchmarking for Bayesian Reinforcement Learning.

PubMed

Castronovo, Michael; Ernst, Damien; Couëtoux, Adrien; Fonteneau, Raphael

2016-01-01

In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the collected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been proposed, but the benchmarks used to compare them are only relevant for specific cases. The paper addresses this problem, and provides a new BRL comparison methodology along with the corresponding open source library. In this methodology, a comparison criterion that measures the performance of algorithms on large sets of Markov Decision Processes (MDPs) drawn from some probability distributions is defined. In order to enable the comparison of non-anytime algorithms, our methodology also includes a detailed analysis of the computation time requirement of each algorithm. Our library is released with all source code and documentation: it includes three test problems, each of which has two different prior distributions, and seven state-of-the-art RL algorithms. Finally, our library is illustrated by comparing all the available algorithms and the results are discussed.
A common neural network differentially mediates direct and social fear learning.

PubMed

Lindström, Björn; Haaker, Jan; Olsson, Andreas

2018-02-15

Across species, fears often spread between individuals through social learning. Yet, little is known about the neural and computational mechanisms underlying social learning. Addressing this question, we compared social and direct (Pavlovian) fear learning showing that they showed indistinguishable behavioral effects, and involved the same cross-modal (self/other) aversive learning network, centered on the amygdala, the anterior insula (AI), and the anterior cingulate cortex (ACC). Crucially, the information flow within this network differed between social and direct fear learning. Dynamic causal modeling combined with reinforcement learning modeling revealed that the amygdala and AI provided input to this network during direct and social learning, respectively. Furthermore, the AI gated learning signals based on surprise (associability), which were conveyed to the ACC, in both learning modalities. Our findings provide insights into the mechanisms underlying social fear learning, with implications for understanding common psychological dysfunctions, such as phobias and other anxiety disorders. Copyright © 2017 Elsevier Inc. All rights reserved.
Vicarious reinforcement learning signals when instructing others.

PubMed

Apps, Matthew A J; Lesage, Elise; Ramnani, Narender

2015-02-18

Reinforcement learning (RL) theory posits that learning is driven by discrepancies between the predicted and actual outcomes of actions (prediction errors [PEs]). In social environments, learning is often guided by similar RL mechanisms. For example, teachers monitor the actions of students and provide feedback to them. This feedback evokes PEs in students that guide their learning. We report the first study that investigates the neural mechanisms that underpin RL signals in the brain of a teacher. Neurons in the anterior cingulate cortex (ACC) signal PEs when learning from the outcomes of one's own actions but also signal information when outcomes are received by others. Does a teacher's ACC signal PEs when monitoring a student's learning? Using fMRI, we studied brain activity in human subjects (teachers) as they taught a confederate (student) action-outcome associations by providing positive or negative feedback. We examined activity time-locked to the students' responses, when teachers infer student predictions and know actual outcomes. We fitted a RL-based computational model to the behavior of the student to characterize their learning, and examined whether a teacher's ACC signals when a student's predictions are wrong. In line with our hypothesis, activity in the teacher's ACC covaried with the PE values in the model. Additionally, activity in the teacher's insula and ventromedial prefrontal cortex covaried with the predicted value according to the student. Our findings highlight that the ACC signals PEs vicariously for others' erroneous predictions, when monitoring and instructing their learning. These results suggest that RL mechanisms, processed vicariously, may underpin and facilitate teaching behaviors. Copyright © 2015 Apps et al.
Stochastic Reinforcement Benefits Skill Acquisition

ERIC Educational Resources Information Center

Dayan, Eran; Averbeck, Bruno B.; Richmond, Barry J.; Cohen, Leonardo G.

2014-01-01

Learning complex skills is driven by reinforcement, which facilitates both online within-session gains and retention of the acquired skills. Yet, in ecologically relevant situations, skills are often acquired when mapping between actions and rewarding outcomes is unknown to the learning agent, resulting in reinforcement schedules of a stochastic…
A reward optimization method based on action subrewards in hierarchical reinforcement learning.

PubMed

Fu, Yuchen; Liu, Quan; Ling, Xionghong; Cui, Zhiming

2014-01-01

Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.
Multi-agent Reinforcement Learning Model for Effective Action Selection

NASA Astrophysics Data System (ADS)

Youk, Sang Jo; Lee, Bong Keun

Reinforcement learning is a sub area of machine learning concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward. In the case of multi-agent, especially, which state space and action space gets very enormous in compared to single agent, so it needs to take most effective measure available select the action strategy for effective reinforcement learning. This paper proposes a multi-agent reinforcement learning model based on fuzzy inference system in order to improve learning collect speed and select an effective action in multi-agent. This paper verifies an effective action select strategy through evaluation tests based on Robocop Keep away which is one of useful test-beds for multi-agent. Our proposed model can apply to evaluate efficiency of the various intelligent multi-agents and also can apply to strategy and tactics of robot soccer system.
Revealing Neurocomputational Mechanisms of Reinforcement Learning and Decision-Making With the hBayesDM Package

PubMed Central

Ahn, Woo-Young; Haines, Nathaniel; Zhang, Lei

2017-01-01

Reinforcement learning and decision-making (RLDM) provide a quantitative framework and computational theories with which we can disentangle psychiatric conditions into the basic dimensions of neurocognitive functioning. RLDM offer a novel approach to assessing and potentially diagnosing psychiatric patients, and there is growing enthusiasm for both RLDM and computational psychiatry among clinical researchers. Such a framework can also provide insights into the brain substrates of particular RLDM processes, as exemplified by model-based analysis of data from functional magnetic resonance imaging (fMRI) or electroencephalography (EEG). However, researchers often find the approach too technical and have difficulty adopting it for their research. Thus, a critical need remains to develop a user-friendly tool for the wide dissemination of computational psychiatric methods. We introduce an R package called hBayesDM (hierarchical Bayesian modeling of Decision-Making tasks), which offers computational modeling of an array of RLDM tasks and social exchange games. The hBayesDM package offers state-of-the-art hierarchical Bayesian modeling, in which both individual and group parameters (i.e., posterior distributions) are estimated simultaneously in a mutually constraining fashion. At the same time, the package is extremely user-friendly: users can perform computational modeling, output visualization, and Bayesian model comparisons, each with a single line of coding. Users can also extract the trial-by-trial latent variables (e.g., prediction errors) required for model-based fMRI/EEG. With the hBayesDM package, we anticipate that anyone with minimal knowledge of programming can take advantage of cutting-edge computational-modeling approaches to investigate the underlying processes of and interactions between multiple decision-making (e.g., goal-directed, habitual, and Pavlovian) systems. In this way, we expect that the hBayesDM package will contribute to the dissemination of advanced modeling approaches and enable a wide range of researchers to easily perform computational psychiatric research within different populations. PMID:29601060
Reinforcement learning improves behaviour from evaluative feedback

NASA Astrophysics Data System (ADS)

Littman, Michael L.

2015-05-01

Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.
Reinforcement learning improves behaviour from evaluative feedback.

PubMed

Littman, Michael L

2015-05-28

Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.
Program Helps Simulate Neural Networks

NASA Technical Reports Server (NTRS)

Villarreal, James; Mcintire, Gary

1993-01-01

Neural Network Environment on Transputer System (NNETS) computer program provides users high degree of flexibility in creating and manipulating wide variety of neural-network topologies at processing speeds not found in conventional computing environments. Supports back-propagation and back-propagation-related algorithms. Back-propagation algorithm used is implementation of Rumelhart's generalized delta rule. NNETS developed on INMOS Transputer(R). Predefines back-propagation network, Jordan network, and reinforcement network to assist users in learning and defining own networks. Also enables users to configure other neural-network paradigms from NNETS basic architecture. Small portion of software written in OCCAM(R) language.
Quantum-Enhanced Machine Learning

NASA Astrophysics Data System (ADS)

Dunjko, Vedran; Taylor, Jacob M.; Briegel, Hans J.

2016-09-01

The emerging field of quantum machine learning has the potential to substantially aid in the problems and scope of artificial intelligence. This is only enhanced by recent successes in the field of classical machine learning. In this work we propose an approach for the systematic treatment of machine learning, from the perspective of quantum information. Our approach is general and covers all three main branches of machine learning: supervised, unsupervised, and reinforcement learning. While quantum improvements in supervised and unsupervised learning have been reported, reinforcement learning has received much less attention. Within our approach, we tackle the problem of quantum enhancements in reinforcement learning as well, and propose a systematic scheme for providing improvements. As an example, we show that quadratic improvements in learning efficiency, and exponential improvements in performance over limited time periods, can be obtained for a broad class of learning problems.
Role of the medial prefrontal cortex in impaired decision making in juvenile attention-deficit/hyperactivity disorder.

PubMed

Hauser, Tobias U; Iannaccone, Reto; Ball, Juliane; Mathys, Christoph; Brandeis, Daniel; Walitza, Susanne; Brem, Silvia

2014-10-01

Attention-deficit/hyperactivity disorder (ADHD) has been associated with deficient decision making and learning. Models of ADHD have suggested that these deficits could be caused by impaired reward prediction errors (RPEs). Reward prediction errors are signals that indicate violations of expectations and are known to be encoded by the dopaminergic system. However, the precise learning and decision-making deficits and their neurobiological correlates in ADHD are not well known. To determine the impaired decision-making and learning mechanisms in juvenile ADHD using advanced computational models, as well as the related neural RPE processes using multimodal neuroimaging. Twenty adolescents with ADHD and 20 healthy adolescents serving as controls (aged 12-16 years) were examined using a probabilistic reversal learning task while simultaneous functional magnetic resonance imaging and electroencephalogram were recorded. Learning and decision making were investigated by contrasting a hierarchical Bayesian model with an advanced reinforcement learning model and by comparing the model parameters. The neural correlates of RPEs were studied in functional magnetic resonance imaging and electroencephalogram. Adolescents with ADHD showed more simplistic learning as reflected by the reinforcement learning model (exceedance probability, Px = .92) and had increased exploratory behavior compared with healthy controls (mean [SD] decision steepness parameter β: ADHD, 4.83 [2.97]; controls, 6.04 [2.53]; P = .02). The functional magnetic resonance imaging analysis revealed impaired RPE processing in the medial prefrontal cortex during cue as well as during outcome presentation (P < .05, family-wise error correction). The outcome-related impairment in the medial prefrontal cortex could be attributed to deficient processing at 200 to 400 milliseconds after feedback presentation as reflected by reduced feedback-related negativity (ADHD, 0.61 [3.90] μV; controls, -1.68 [2.52] μV; P = .04). The combination of computational modeling of behavior and multimodal neuroimaging revealed that impaired decision making and learning mechanisms in adolescents with ADHD are driven by impaired RPE processing in the medial prefrontal cortex. This novel, combined approach furthers the understanding of the pathomechanisms in ADHD and may advance treatment strategies.

Toward cognitive robotics

NASA Astrophysics Data System (ADS)

Laird, John E.

2009-05-01

Our long-term goal is to develop autonomous robotic systems that have the cognitive abilities of humans, including communication, coordination, adapting to novel situations, and learning through experience. Our approach rests on the recent integration of the Soar cognitive architecture with both virtual and physical robotic systems. Soar has been used to develop a wide variety of knowledge-rich agents for complex virtual environments, including distributed training environments and interactive computer games. For development and testing in robotic virtual environments, Soar interfaces to a variety of robotic simulators and a simple mobile robot. We have recently made significant extensions to Soar that add new memories and new non-symbolic reasoning to Soar's original symbolic processing, which should significantly improve Soar abilities for control of robots. These extensions include episodic memory, semantic memory, reinforcement learning, and mental imagery. Episodic memory and semantic memory support the learning and recalling of prior events and situations as well as facts about the world. Reinforcement learning provides the ability of the system to tune its procedural knowledge - knowledge about how to do things. Mental imagery supports the use of diagrammatic and visual representations that are critical to support spatial reasoning. We speculate on the future of unmanned systems and the need for cognitive robotics to support dynamic instruction and taskability.
Can model-free reinforcement learning explain deontological moral judgments?

PubMed

Ayars, Alisabeth

2016-05-01

Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework. Copyright © 2016 Elsevier B.V. All rights reserved.
General functioning predicts reward and punishment learning in schizophrenia.

PubMed

Somlai, Zsuzsanna; Moustafa, Ahmed A; Kéri, Szabolcs; Myers, Catherine E; Gluck, Mark A

2011-04-01

Previous studies investigating feedback-driven reinforcement learning in patients with schizophrenia have provided mixed results. In this study, we explored the clinical predictors of reward and punishment learning using a probabilistic classification learning task. Patients with schizophrenia (n=40) performed similarly to healthy controls (n=30) on the classification learning task. However, more severe negative and general symptoms were associated with lower reward-learning performance, whereas poorer general psychosocial functioning was correlated with both lower reward- and punishment-learning performances. Multiple linear regression analyses indicated that general psychosocial functioning was the only significant predictor of reinforcement learning performance when education, antipsychotic dose, and positive, negative and general symptoms were included in the analysis. These results suggest a close relationship between reinforcement learning and general psychosocial functioning in schizophrenia. Published by Elsevier B.V.
Agent-based traffic management and reinforcement learning in congested intersection network.

DOT National Transportation Integrated Search

2012-08-01

This study evaluates the performance of traffic control systems based on reinforcement learning (RL), also called approximate dynamic programming (ADP). Two algorithms have been selected for testing: 1) Q-learning and 2) approximate dynamic programmi...
Reinforcement Learning in a Nonstationary Environment: The El Farol Problem

NASA Technical Reports Server (NTRS)

Bell, Ann Maria

1999-01-01

This paper examines the performance of simple learning rules in a complex adaptive system based on a coordination problem modeled on the El Farol problem. The key features of the El Farol problem are that it typically involves a medium number of agents and that agents' pay-off functions have a discontinuous response to increased congestion. First we consider a single adaptive agent facing a stationary environment. We demonstrate that the simple learning rules proposed by Roth and Er'ev can be extremely sensitive to small changes in the initial conditions and that events early in a simulation can affect the performance of the rule over a relatively long time horizon. In contrast, a reinforcement learning rule based on standard practice in the computer science literature converges rapidly and robustly. The situation is reversed when multiple adaptive agents interact: the RE algorithms often converge rapidly to a stable average aggregate attendance despite the slow and erratic behavior of individual learners, while the CS based learners frequently over-attend in the early and intermediate terms. The symmetric mixed strategy equilibria is unstable: all three learning rules ultimately tend towards pure strategies or stabilize in the medium term at non-equilibrium probabilities of attendance. The brittleness of the algorithms in different contexts emphasize the importance of thorough and thoughtful examination of simulation-based results.
Operant conditioning of enhanced pain sensitivity by heat-pain titration.

PubMed

Becker, Susanne; Kleinböhl, Dieter; Klossika, Iris; Hölzl, Rupert

2008-11-15

Operant conditioning mechanisms have been demonstrated to be important in the development of chronic pain. Most experimental studies have investigated the operant modulation of verbal pain reports with extrinsic reinforcement, such as verbal reinforcement. Whether this reflects actual changes in the subjective experience of the nociceptive stimulus remained unclear. This study replicates and extends our previous demonstration that enhanced pain sensitivity to prolonged heat-pain stimulation could be learned in healthy participants through intrinsic reinforcement (contingent changes in nociceptive input) independent of verbal pain reports. In addition, we examine whether different magnitudes of reinforcement differentially enhance pain sensitivity using an operant heat-pain titration paradigm. It is based on the previously developed non-verbal behavioral discrimination task for the assessment of sensitization, which uses discriminative down- or up-regulation of stimulus temperatures in response to changes in subjective intensity. In operant heat-pain titration, this discriminative behavior and not verbal pain report was contingently reinforced or punished by acute decreases or increases in heat-pain intensity. The magnitude of reinforcement was varied between three groups: low (N1=13), medium (N2=11) and high reinforcement (N3=12). Continuous reinforcement was applied to acquire and train the operant behavior, followed by partial reinforcement to analyze the underlying learning mechanisms. Results demonstrated that sensitization to prolonged heat-pain stimulation was enhanced by operant learning within 1h. The extent of sensitization was directly dependent on the received magnitude of reinforcement. Thus, operant learning mechanisms based on intrinsic reinforcement may provide an explanation for the gradual development of sustained hypersensitivity during pain that is becoming chronic.
Bayesian Cue Integration as a Developmental Outcome of Reward Mediated Learning

PubMed Central

Weisswange, Thomas H.; Rothkopf, Constantin A.; Rodemann, Tobias; Triesch, Jochen

2011-01-01

Average human behavior in cue combination tasks is well predicted by Bayesian inference models. As this capability is acquired over developmental timescales, the question arises, how it is learned. Here we investigated whether reward dependent learning, that is well established at the computational, behavioral, and neuronal levels, could contribute to this development. It is shown that a model free reinforcement learning algorithm can indeed learn to do cue integration, i.e. weight uncertain cues according to their respective reliabilities and even do so if reliabilities are changing. We also consider the case of causal inference where multimodal signals can originate from one or multiple separate objects and should not always be integrated. In this case, the learner is shown to develop a behavior that is closest to Bayesian model averaging. We conclude that reward mediated learning could be a driving force for the development of cue integration and causal inference. PMID:21750717
Does arousal interfere with operant conditioning of spike-wave discharges in genetic epileptic rats?

PubMed

Osterhagen, Lasse; Breteler, Marinus; van Luijtelaar, Gilles

2010-06-01

One of the ways in which brain computer interfaces can be used is neurofeedback (NF). Subjects use their brain activation to control an external device, and with this technique it is also possible to learn to control aspects of the brain activity by operant conditioning. Beneficial effects of NF training on seizure occurrence have been described in epileptic patients. Little research has been done about differentiating NF effectiveness by type of epilepsy, particularly, whether idiopathic generalized seizures are susceptible to NF. In this experiment, seizures that manifest themselves as spike-wave discharges (SWDs) in the EEG were reinforced during 10 sessions in 6 rats of the WAG/Rij strain, an animal model for absence epilepsy. EEG's were recorded before and after the training sessions. Reinforcing SWDs let to decreased SWD occurrences during training; however, the changes during training were not persistent in the post-training sessions. Because behavioural states are known to have an influence on the occurrence of SWDs, it is proposed that the reinforcement situation increased arousal which resulted in fewer SWDs. Additional tests supported this hypothesis. The outcomes have implications for the possibility to train SWDs with operant learning techniques. Copyright (c) 2010 Elsevier B.V. All rights reserved.
Improving Robot Motor Learning with Negatively Valenced Reinforcement Signals

PubMed Central

Navarro-Guerrero, Nicolás; Lowe, Robert J.; Wermter, Stefan

2017-01-01

Both nociception and punishment signals have been used in robotics. However, the potential for using these negatively valenced types of reinforcement learning signals for robot learning has not been exploited in detail yet. Nociceptive signals are primarily used as triggers of preprogrammed action sequences. Punishment signals are typically disembodied, i.e., with no or little relation to the agent-intrinsic limitations, and they are often used to impose behavioral constraints. Here, we provide an alternative approach for nociceptive signals as drivers of learning rather than simple triggers of preprogrammed behavior. Explicitly, we use nociception to expand the state space while we use punishment as a negative reinforcement learning signal. We compare the performance—in terms of task error, the amount of perceived nociception, and length of learned action sequences—of different neural networks imbued with punishment-based reinforcement signals for inverse kinematic learning. We contrast the performance of a version of the neural network that receives nociceptive inputs to that without such a process. Furthermore, we provide evidence that nociception can improve learning—making the algorithm more robust against network initializations—as well as behavioral performance by reducing the task error, perceived nociception, and length of learned action sequences. Moreover, we provide evidence that punishment, at least as typically used within reinforcement learning applications, may be detrimental in all relevant metrics. PMID:28420976
Stress attenuates the flexible updating of aversive value

PubMed Central

Raio, Candace M.; Hartley, Catherine A.; Orederu, Temidayo A.; Li, Jian; Phelps, Elizabeth A.

2017-01-01

In a dynamic environment, sources of threat or safety can unexpectedly change, requiring the flexible updating of stimulus−outcome associations that promote adaptive behavior. However, aversive contexts in which we are required to update predictions of threat are often marked by stress. Acute stress is thought to reduce behavioral flexibility, yet its influence on the modulation of aversive value has not been well characterized. Given that stress exposure is a prominent risk factor for anxiety and trauma-related disorders marked by persistent, inflexible responses to threat, here we examined how acute stress affects the flexible updating of threat responses. Participants completed an aversive learning task, in which one stimulus was probabilistically associated with an electric shock, while the other stimulus signaled safety. A day later, participants underwent an acute stress or control manipulation before completing a reversal learning task during which the original stimulus−outcome contingencies switched. Skin conductance and neuroendocrine responses provided indices of sympathetic arousal and stress responses, respectively. Despite equivalent initial learning, stressed participants showed marked impairments in reversal learning relative to controls. Additionally, reversal learning deficits across participants were related to heightened levels of alpha-amylase, a marker of noradrenergic activity. Finally, fitting arousal data to a computational reinforcement learning model revealed that stress-induced reversal learning deficits emerged from stress-specific changes in the weight assigned to prediction error signals, disrupting the adaptive adjustment of learning rates. Our findings provide insight into how stress renders individuals less sensitive to changes in aversive reinforcement and have implications for understanding clinical conditions marked by stress-related psychopathology. PMID:28973957
Place preference and vocal learning rely on distinct reinforcers in songbirds.

PubMed

Murdoch, Don; Chen, Ruidong; Goldberg, Jesse H

2018-04-30

In reinforcement learning (RL) agents are typically tasked with maximizing a single objective function such as reward. But it remains poorly understood how agents might pursue distinct objectives at once. In machines, multiobjective RL can be achieved by dividing a single agent into multiple sub-agents, each of which is shaped by agent-specific reinforcement, but it remains unknown if animals adopt this strategy. Here we use songbirds to test if navigation and singing, two behaviors with distinct objectives, can be differentially reinforced. We demonstrate that strobe flashes aversively condition place preference but not song syllables. Brief noise bursts aversively condition song syllables but positively reinforce place preference. Thus distinct behavior-generating systems, or agencies, within a single animal can be shaped by correspondingly distinct reinforcement signals. Our findings suggest that spatially segregated vocal circuits can solve a credit assignment problem associated with multiobjective learning.
Dopamine-Dependent Reinforcement of Motor Skill Learning: Evidence from Gilles de la Tourette Syndrome

ERIC Educational Resources Information Center

Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias

2011-01-01

Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…
Machine Learning Control For Highly Reconfigurable High-Order Systems

DTIC Science & Technology

2015-01-02

develop and flight test a Reinforcement Learning based approach for autonomous tracking of ground targets using a fixed wing Unmanned...Reinforcement Learning - based algorithms are developed for learning agents’ time dependent dynamics while also learning to control them. Three algorithms...to a wide range of engineering- based problems . Implementation of these solutions, however, is often complicated by the hysteretic, non-linear,
Reinforcement and inference in cross-situational word learning.

PubMed

Tilles, Paulo F C; Fontanari, José F

2013-01-01

Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation) inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only) reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.
Reinforcement Learning of Two-Joint Virtual Arm Reaching in a Computer Model of Sensorimotor Cortex

PubMed Central

Neymotin, Samuel A.; Chadderdon, George L.; Kerr, Cliff C.; Francis, Joseph T.; Lytton, William W.

2014-01-01

Neocortical mechanisms of learning sensorimotor control involve a complex series of interactions at multiple levels, from synaptic mechanisms to cellular dynamics to network connectomics. We developed a model of sensory and motor neocortex consisting of 704 spiking model neurons. Sensory and motor populations included excitatory cells and two types of interneurons. Neurons were interconnected with AMPA/NMDA and GABAA synapses. We trained our model using spike-timing-dependent reinforcement learning to control a two-joint virtual arm to reach to a fixed target. For each of 125 trained networks, we used 200 training sessions, each involving 15 s reaches to the target from 16 starting positions. Learning altered network dynamics, with enhancements to neuronal synchrony and behaviorally relevant information flow between neurons. After learning, networks demonstrated retention of behaviorally relevant memories by using proprioceptive information to perform reach-to-target from multiple starting positions. Networks dynamically controlled which joint rotations to use to reach a target, depending on current arm position. Learning-dependent network reorganization was evident in both sensory and motor populations: learned synaptic weights showed target-specific patterning optimized for particular reach movements. Our model embodies an integrative hypothesis of sensorimotor cortical learning that could be used to interpret future electrophysiological data recorded in vivo from sensorimotor learning experiments. We used our model to make the following predictions: learning enhances synchrony in neuronal populations and behaviorally relevant information flow across neuronal populations, enhanced sensory processing aids task-relevant motor performance and the relative ease of a particular movement in vivo depends on the amount of sensory information required to complete the movement. PMID:24047323
Hemispheric Asymmetries in Striatal Reward Responses Relate to Approach-Avoidance Learning and Encoding of Positive-Negative Prediction Errors in Dopaminergic Midbrain Regions.

PubMed

Aberg, Kristoffer Carl; Doell, Kimberly C; Schwartz, Sophie

2015-10-28

Some individuals are better at learning about rewarding situations, whereas others are inclined to avoid punishments (i.e., enhanced approach or avoidance learning, respectively). In reinforcement learning, action values are increased when outcomes are better than predicted (positive prediction errors [PEs]) and decreased for worse than predicted outcomes (negative PEs). Because actions with high and low values are approached and avoided, respectively, individual differences in the neural encoding of PEs may influence the balance between approach-avoidance learning. Recent correlational approaches also indicate that biases in approach-avoidance learning involve hemispheric asymmetries in dopamine function. However, the computational and neural mechanisms underpinning such learning biases remain unknown. Here we assessed hemispheric reward asymmetry in striatal activity in 34 human participants who performed a task involving rewards and punishments. We show that the relative difference in reward response between hemispheres relates to individual biases in approach-avoidance learning. Moreover, using a computational modeling approach, we demonstrate that better encoding of positive (vs negative) PEs in dopaminergic midbrain regions is associated with better approach (vs avoidance) learning, specifically in participants with larger reward responses in the left (vs right) ventral striatum. Thus, individual dispositions or traits may be determined by neural processes acting to constrain learning about specific aspects of the world. Copyright © 2015 the authors 0270-6474/15/3514491-10$15.00/0.
Interactive machine learning for health informatics: when do we need the human-in-the-loop?

PubMed

Holzinger, Andreas

2016-06-01

Machine learning (ML) is the fastest growing field in computer science, and health informatics is among the greatest challenges. The goal of ML is to develop algorithms which can learn and improve over time and can be used for predictions. Most ML researchers concentrate on automatic machine learning (aML), where great advances have been made, for example, in speech recognition, recommender systems, or autonomous vehicles. Automatic approaches greatly benefit from big data with many training sets. However, in the health domain, sometimes we are confronted with a small number of data sets or rare events, where aML-approaches suffer of insufficient training samples. Here interactive machine learning (iML) may be of help, having its roots in reinforcement learning, preference learning, and active learning. The term iML is not yet well used, so we define it as "algorithms that can interact with agents and can optimize their learning behavior through these interactions, where the agents can also be human." This "human-in-the-loop" can be beneficial in solving computationally hard problems, e.g., subspace clustering, protein folding, or k-anonymization of health data, where human expertise can help to reduce an exponential search space through heuristic selection of samples. Therefore, what would otherwise be an NP-hard problem, reduces greatly in complexity through the input and the assistance of a human agent involved in the learning phase.
A statistical learning strategy for closed-loop control of fluid flows

NASA Astrophysics Data System (ADS)

Guéniat, Florimond; Mathelin, Lionel; Hussaini, M. Yousuff

2016-12-01

This work discusses a closed-loop control strategy for complex systems utilizing scarce and streaming data. A discrete embedding space is first built using hash functions applied to the sensor measurements from which a Markov process model is derived, approximating the complex system's dynamics. A control strategy is then learned using reinforcement learning once rewards relevant with respect to the control objective are identified. This method is designed for experimental configurations, requiring no computations nor prior knowledge of the system, and enjoys intrinsic robustness. It is illustrated on two systems: the control of the transitions of a Lorenz'63 dynamical system, and the control of the drag of a cylinder flow. The method is shown to perform well.
Feasibility and effectiveness of computer-based therapy in community treatment.

PubMed

Brooks, Adam C; Ryder, Deanna; Carise, Deni; Kirby, Kimberly C

2010-10-01

Computerized therapy approaches may expand the reach of evidence-based treatment; however, it is unclear how to integrate these therapies into community-based treatment. We conducted a two-phase pilot study to explore (a) whether clients' use of the Therapeutic Education System (TES), a Web-based community reinforcement approach (CRA) learning program, would benefit them in the absence of counselor support and (b) whether counselors and clients would use the TES in the absence of tangible research-based reinforcement. In Phase 1, clients in the TES condition (n = 14) demonstrated large improvements in knowledge, F(1, 20) = 8.90, p = .007, d = 1.05, and were significantly more likely to select CRA style coping responses, F (1, 20) = 11.95, p = .002, d = 1.16, relative to the treatment-as-usual group (n = 14). We also detected small, nonsignificant, between-group effects indicating TES decreased cocaine use during treatment. In Phase 2, counselors referred only around 10% of their caseload to the TES, and the modal number of completed modules in the absence of tangible reinforcement was three. Computer-based therapy approaches are viable in community-based treatment but must be integrated with incentive systems to ensure engagement. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Toward a Neurobiology of Delusions

PubMed Central

Corlett, P.R.; Taylor, J.R.; Wang, X.-J.; Fletcher, P.C.; Krystal, J.H.

2013-01-01

Delusions are the false and often incorrigible beliefs that can cause severe suffering in mental illness. We cannot yet explain them in terms of underlying neurobiological abnormalities. However, by drawing on recent advances in the biological, computational and psychological processes of reinforcement learning, memory, and perception it may be feasible to account for delusions in terms of cognition and brain function. The account focuses on a particular parameter, prediction error – the mismatch between expectation and experience – that provides a computational mechanism common to cortical hierarchies, frontostriatal circuits and the amygdala as well as parietal cortices. We suggest that delusions result from aberrations in how brain circuits specify hierarchical predictions, and how they compute and respond to prediction errors. Defects in these fundamental brain mechanisms can vitiate perception, memory, bodily agency and social learning such that individuals with delusions experience an internal and external world that healthy individuals would find difficult to comprehend. The present model attempts to provide a framework through which we can build a mechanistic and translational understanding of these puzzling symptoms. PMID:20558235

Computations Underlying Social Hierarchy Learning: Distinct Neural Mechanisms for Updating and Representing Self-Relevant Information.

PubMed

Kumaran, Dharshan; Banino, Andrea; Blundell, Charles; Hassabis, Demis; Dayan, Peter

2016-12-07

Knowledge about social hierarchies organizes human behavior, yet we understand little about the underlying computations. Here we show that a Bayesian inference scheme, which tracks the power of individuals, better captures behavioral and neural data compared with a reinforcement learning model inspired by rating systems used in games such as chess. We provide evidence that the medial prefrontal cortex (MPFC) selectively mediates the updating of knowledge about one's own hierarchy, as opposed to that of another individual, a process that underpinned successful performance and involved functional interactions with the amygdala and hippocampus. In contrast, we observed domain-general coding of rank in the amygdala and hippocampus, even when the task did not require it. Our findings reveal the computations underlying a core aspect of social cognition and provide new evidence that self-relevant information may indeed be afforded a unique representational status in the brain. Copyright Â© 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Evolution with Reinforcement Learning in Negotiation

PubMed Central

Zou, Yi; Zhan, Wenjie; Shao, Yuan

2014-01-01

Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm. PMID:25048108
Evolution with reinforcement learning in negotiation.

PubMed

Zou, Yi; Zhan, Wenjie; Shao, Yuan

2014-01-01

Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm.
Overcoming Learned Helplessness in Community College Students.

ERIC Educational Resources Information Center

Roueche, John E.; Mink, Oscar G.

1982-01-01

Reviews research on the effects of repeated experiences of helplessness and on locus of control. Identifies conditions necessary for overcoming learned helplessness; i.e., the potential for learning to occur; consistent reinforcement; relevant, valued reinforcers; and favorable psychological situation. Recommends eight ways for teachers to…
Dopamine D2 Receptor Signaling in the Nucleus Accumbens Comprises a Metabolic-Cognitive Brain Interface Regulating Metabolic Components of Glucose Reinforcement.

PubMed

Michaelides, Michael; Miller, Michael L; DiNieri, Jennifer A; Gomez, Juan L; Schwartz, Elizabeth; Egervari, Gabor; Wang, Gene Jack; Mobbs, Charles V; Volkow, Nora D; Hurd, Yasmin L

2017-11-01

Appetitive drive is influenced by coordinated interactions between brain circuits that regulate reinforcement and homeostatic signals that control metabolism. Glucose modulates striatal dopamine (DA) and regulates appetitive drive and reinforcement learning. Striatal DA D2 receptors (D2Rs) also regulate reinforcement learning and are implicated in glucose-related metabolic disorders. Nevertheless, interactions between striatal D2R and peripheral glucose have not been previously described. Here we show that manipulations involving striatal D2R signaling coincide with perseverative and impulsive-like responding for sucrose, a disaccharide consisting of fructose and glucose. Fructose conveys orosensory (ie, taste) reinforcement but does not convey metabolic (ie, nutrient-derived) reinforcement. Glucose however conveys orosensory reinforcement but unlike fructose, it is a major metabolic energy source, underlies sustained reinforcement, and activates striatal circuitry. We found that mice with deletion of dopamine- and cAMP-regulated neuronal phosphoprotein (DARPP-32) exclusively in D2R-expressing cells exhibited preferential D2R changes in the nucleus accumbens (NAc), a striatal region that critically regulates sucrose reinforcement. These changes coincided with perseverative and impulsive-like responding for sucrose pellets and sustained reinforcement learning of glucose-paired flavors. These mice were also characterized by significant glucose intolerance (ie, impaired glucose utilization). Systemic glucose administration significantly attenuated sucrose operant responding and D2R activation or blockade in the NAc bidirectionally modulated blood glucose levels and glucose tolerance. Collectively, these results implicate NAc D2R in regulating both peripheral glucose levels and glucose-dependent reinforcement learning behaviors and highlight the notion that glucose metabolic impairments arising from disrupted NAc D2R signaling are involved in compulsive and perseverative feeding behaviors.
Altruistic learning.

PubMed

Seymour, Ben; Yoshida, Wako; Dolan, Ray

2009-01-01

The origin of altruism remains one of the most enduring puzzles of human behaviour. Indeed, true altruism is often thought either not to exist, or to arise merely as a miscalculation of otherwise selfish behaviour. In this paper, we argue that altruism emerges directly from the way in which distinct human decision-making systems learn about rewards. Using insights provided by neurobiological accounts of human decision-making, we suggest that reinforcement learning in game-theoretic social interactions (habitisation over either individuals or games) and observational learning (either imitative of inference based) lead to altruistic behaviour. This arises not only as a result of computational efficiency in the face of processing complexity, but as a direct consequence of optimal inference in the face of uncertainty. Critically, we argue that the fact that evolutionary pressure acts not over the object of learning ('what' is learned), but over the learning systems themselves ('how' things are learned), enables the evolution of altruism despite the direct threat posed by free-riders.
Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments.

PubMed

Leong, Yuan Chang; Radulescu, Angela; Daniel, Reka; DeWoskin, Vivian; Niv, Yael

2017-01-18

Little is known about the relationship between attention and learning during decision making. Using eye tracking and multivariate pattern analysis of fMRI data, we measured participants' dimensional attention as they performed a trial-and-error learning task in which only one of three stimulus dimensions was relevant for reward at any given time. Analysis of participants' choices revealed that attention biased both value computation during choice and value update during learning. Value signals in the ventromedial prefrontal cortex and prediction errors in the striatum were similarly biased by attention. In turn, participants' focus of attention was dynamically modulated by ongoing learning. Attentional switches across dimensions correlated with activity in a frontoparietal attention network, which showed enhanced connectivity with the ventromedial prefrontal cortex between switches. Our results suggest a bidirectional interaction between attention and learning: attention constrains learning to relevant dimensions of the environment, while we learn what to attend to via trial and error. Copyright © 2017 Elsevier Inc. All rights reserved.
A junction-tree based learning algorithm to optimize network wide traffic control: A coordinated multi-agent framework

DOE PAGES

Zhu, Feng; Aziz, H. M. Abdul; Qian, Xinwu; ...

2015-01-31

Our study develops a novel reinforcement learning algorithm for the challenging coordinated signal control problem. Traffic signals are modeled as intelligent agents interacting with the stochastic traffic environment. The model is built on the framework of coordinated reinforcement learning. The Junction Tree Algorithm (JTA) based reinforcement learning is proposed to obtain an exact inference of the best joint actions for all the coordinated intersections. Moreover, the algorithm is implemented and tested with a network containing 18 signalized intersections in VISSIM. Finally, our results show that the JTA based algorithm outperforms independent learning (Q-learning), real-time adaptive learning, and fixed timing plansmore » in terms of average delay, number of stops, and vehicular emissions at the network level.« less
Impaired Expected Value Computations Coupled With Overreliance on Stimulus-Response Learning in Schizophrenia.

PubMed

Hernaus, Dennis; Gold, James M; Waltz, James A; Frank, Michael J

2018-04-03

While many have emphasized impaired reward prediction error signaling in schizophrenia, multiple studies suggest that some decision-making deficits may arise from overreliance on stimulus-response systems together with a compromised ability to represent expected value. Guided by computational frameworks, we formulated and tested two scenarios in which maladaptive representations of expected value should be most evident, thereby delineating conditions that may evoke decision-making impairments in schizophrenia. In a modified reinforcement learning paradigm, 42 medicated people with schizophrenia and 36 healthy volunteers learned to select the most frequently rewarded option in a 75-25 pair: once when presented with a more deterministic (90-10) pair and once when presented with a more probabilistic (60-40) pair. Novel and old combinations of choice options were presented in a subsequent transfer phase. Computational modeling was employed to elucidate contributions from stimulus-response systems (actor-critic) and expected value (Q-learning). People with schizophrenia showed robust performance impairments with increasing value difference between two competing options, which strongly correlated with decreased contributions from expected value-based learning (Q-learning). Moreover, a subtle yet consistent contextual choice bias for the probabilistic 75 option was present in people with schizophrenia, which could be accounted for by a context-dependent reward prediction error in the actor-critic. We provide evidence that decision-making impairments in schizophrenia increase monotonically with demands placed on expected value computations. A contextual choice bias is consistent with overreliance on stimulus-response learning, which may signify a deficit secondary to the maladaptive representation of expected value. These results shed new light on conditions under which decision-making impairments may arise. Copyright © 2018 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Modeling choice and reaction time during arbitrary visuomotor learning through the coordination of adaptive working memory and reinforcement learning

PubMed Central

Viejo, Guillaume; Khamassi, Mehdi; Brovelli, Andrea; Girard, Benoît

2015-01-01

Current learning theory provides a comprehensive description of how humans and other animals learn, and places behavioral flexibility and automaticity at heart of adaptive behaviors. However, the computations supporting the interactions between goal-directed and habitual decision-making systems are still poorly understood. Previous functional magnetic resonance imaging (fMRI) results suggest that the brain hosts complementary computations that may differentially support goal-directed and habitual processes in the form of a dynamical interplay rather than a serial recruitment of strategies. To better elucidate the computations underlying flexible behavior, we develop a dual-system computational model that can predict both performance (i.e., participants' choices) and modulations in reaction times during learning of a stimulus–response association task. The habitual system is modeled with a simple Q-Learning algorithm (QL). For the goal-directed system, we propose a new Bayesian Working Memory (BWM) model that searches for information in the history of previous trials in order to minimize Shannon entropy. We propose a model for QL and BWM coordination such that the expensive memory manipulation is under control of, among others, the level of convergence of the habitual learning. We test the ability of QL or BWM alone to explain human behavior, and compare them with the performance of model combinations, to highlight the need for such combinations to explain behavior. Two of the tested combination models are derived from the literature, and the latter being our new proposal. In conclusion, all subjects were better explained by model combinations, and the majority of them are explained by our new coordination proposal. PMID:26379518
The Effects of Partial Reinforcement in the Acquisition and Extinction of Recurrent Serial Patterns.

ERIC Educational Resources Information Center

Dockstader, Steven L.

The purpose of these 2 experiments was to determine whether sequential response pattern behavior is affected by partial reinforcement in the same way as other behavior systems. The first experiment investigated the partial reinforcement extinction effects (PREE) in a sequential concept learning task where subjects were required to learn a…
Microstimulation of the Human Substantia Nigra Alters Reinforcement Learning

PubMed Central

Ramayya, Ashwin G.; Misra, Amrit

2014-01-01

Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action–reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action–reward associations rather than stimulus–reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action–reward associations during reinforcement learning. PMID:24828643
An Investigation of Ways to Reduce the Failure Rate of Student Pilots during Flying Training in the Royal Australian Air Force.

DTIC Science & Technology

1987-09-01

Luthans (28) expanded the concept of learning as follows: 1. Learning involves a change, though not necessarily an improvement, in behaviour. Learning...that results in an unpleasant outcome is not likely to be repeated (36:244). Luthans and Kreitner (27) described the various forms of reinforcement as...four 33 alternatives (defined previously on page 24 and taken from Luthans ) of positive reinforcement, negative reinforcement, extinction and punishment
Democratic Population Decisions Result in Robust Policy-Gradient Learning: A Parametric Study with GPU Simulations

PubMed Central

Richmond, Paul; Buesing, Lars; Giugliano, Michele; Vasilaki, Eleni

2011-01-01

High performance computing on the Graphics Processing Unit (GPU) is an emerging field driven by the promise of high computational power at a low cost. However, GPU programming is a non-trivial task and moreover architectural limitations raise the question of whether investing effort in this direction may be worthwhile. In this work, we use GPU programming to simulate a two-layer network of Integrate-and-Fire neurons with varying degrees of recurrent connectivity and investigate its ability to learn a simplified navigation task using a policy-gradient learning rule stemming from Reinforcement Learning. The purpose of this paper is twofold. First, we want to support the use of GPUs in the field of Computational Neuroscience. Second, using GPU computing power, we investigate the conditions under which the said architecture and learning rule demonstrate best performance. Our work indicates that networks featuring strong Mexican-Hat-shaped recurrent connections in the top layer, where decision making is governed by the formation of a stable activity bump in the neural population (a “non-democratic” mechanism), achieve mediocre learning results at best. In absence of recurrent connections, where all neurons “vote” independently (“democratic”) for a decision via population vector readout, the task is generally learned better and more robustly. Our study would have been extremely difficult on a desktop computer without the use of GPU programming. We present the routines developed for this purpose and show that a speed improvement of 5x up to 42x is provided versus optimised Python code. The higher speed is achieved when we exploit the parallelism of the GPU in the search of learning parameters. This suggests that efficient GPU programming can significantly reduce the time needed for simulating networks of spiking neurons, particularly when multiple parameter configurations are investigated. PMID:21572529
The many worlds hypothesis of dopamine prediction error: implications of a parallel circuit architecture in the basal ganglia.

PubMed

Lau, Brian; Monteiro, Tiago; Paton, Joseph J

2017-10-01

Computational models of reinforcement learning (RL) strive to produce behavior that maximises reward, and thus allow software or robots to behave adaptively [1]. At the core of RL models is a learned mapping between 'states'-situations or contexts that an agent might encounter in the world-and actions. A wealth of physiological and anatomical data suggests that the basal ganglia (BG) is important for learning these mappings [2,3]. However, the computations performed by specific circuits are unclear. In this brief review, we highlight recent work concerning the anatomy and physiology of BG circuits that suggest refinements in our understanding of computations performed by the basal ganglia. We focus on one important component of basal ganglia circuitry, midbrain dopamine neurons, drawing attention to data that has been cast as supporting or departing from the RL framework that has inspired experiments in basal ganglia research over the past two decades. We suggest that the parallel circuit architecture of the BG might be expected to produce variability in the response properties of different dopamine neurons, and that variability in response profile may not reflect variable functions, but rather different arguments that serve as inputs to a common function: the computation of prediction error. Copyright © 2017 Elsevier Ltd. All rights reserved.
Rethinking Extinction

PubMed Central

Dunsmoor, Joseph E.; Niv, Yael; Daw, Nathaniel; Phelps, Elizabeth A.

2015-01-01

Extinction serves as the leading theoretical framework and experimental model to describe how learned behaviors diminish through absence of anticipated reinforcement. In the past decade, extinction has moved beyond the realm of associative learning theory and behavioral experimentation in animals and has become a topic of considerable interest in the neuroscience of learning, memory, and emotion. Here, we review research and theories of extinction, both as a learning process and as a behavioral technique, and consider whether traditional understandings warrant a re-examination. We discuss the neurobiology, cognitive factors, and major computational theories, and revisit the predominant view that extinction results in new learning that interferes with expression of the original memory. Additionally, we reconsider the limitations of extinction as a technique to prevent the relapse of maladaptive behavior, and discuss novel approaches, informed by contemporary theoretical advances, that augment traditional extinction methods to target and potentially alter maladaptive memories. PMID:26447572
The habenula encodes negative motivational value associated with primary punishment in humans.

PubMed

Lawson, Rebecca P; Seymour, Ben; Loh, Eleanor; Lutti, Antoine; Dolan, Raymond J; Dayan, Peter; Weiskopf, Nikolaus; Roiser, Jonathan P

2014-08-12

Learning what to approach, and what to avoid, involves assigning value to environmental cues that predict positive and negative events. Studies in animals indicate that the lateral habenula encodes the previously learned negative motivational value of stimuli. However, involvement of the habenula in dynamic trial-by-trial aversive learning has not been assessed, and the functional role of this structure in humans remains poorly characterized, in part, due to its small size. Using high-resolution functional neuroimaging and computational modeling of reinforcement learning, we demonstrate positive habenula responses to the dynamically changing values of cues signaling painful electric shocks, which predict behavioral suppression of responses to those cues across individuals. By contrast, negative habenula responses to monetary reward cue values predict behavioral invigoration. Our findings show that the habenula plays a key role in an online aversive learning system and in generating associated motivated behavior in humans.
Working Memory and Reinforcement Schedule Jointly Determine Reinforcement Learning in Children: Potential Implications for Behavioral Parent Training

PubMed Central

Segers, Elien; Beckers, Tom; Geurts, Hilde; Claes, Laurence; Danckaerts, Marina; van der Oord, Saskia

2018-01-01

Introduction: Behavioral Parent Training (BPT) is often provided for childhood psychiatric disorders. These disorders have been shown to be associated with working memory impairments. BPT is based on operant learning principles, yet how operant principles shape behavior (through the partial reinforcement (PRF) extinction effect, i.e., greater resistance to extinction that is created when behavior is reinforced partially rather than continuously) and the potential role of working memory therein is scarcely studied in children. This study explored the PRF extinction effect and the role of working memory therein using experimental tasks in typically developing children. Methods: Ninety-seven children (age 6–10) completed a working memory task and an operant learning task, in which children acquired a response-sequence rule under either continuous or PRF (120 trials), followed by an extinction phase (80 trials). Data of 88 children were used for analysis. Results: The PRF extinction effect was confirmed: We observed slower acquisition and extinction in the PRF condition as compared to the continuous reinforcement (CRF) condition. Working memory was negatively related to acquisition but not extinction performance. Conclusion: Both reinforcement contingencies and working memory relate to acquisition performance. Potential implications for BPT are that decreasing working memory load may enhance the chance of optimally learning through reinforcement. PMID:29643822
How partial reinforcement of food cues affects the extinction and reacquisition of appetitive responses. A new model for dieting success?

PubMed

van den Akker, Karolien; Havermans, Remco C; Bouton, Mark E; Jansen, Anita

2014-10-01

Animals and humans can easily learn to associate an initially neutral cue with food intake through classical conditioning, but extinction of learned appetitive responses can be more difficult. Intermittent or partial reinforcement of food cues causes especially persistent behaviour in animals: after exposure to such learning schedules, the decline in responding that occurs during extinction is slow. After extinction, increases in responding with renewed reinforcement of food cues (reacquisition) might be less rapid after acquisition with partial reinforcement. In humans, it may be that the eating behaviour of some individuals resembles partial reinforcement schedules to a greater extent, possibly affecting dieting success by interacting with extinction and reacquisition. Furthermore, impulsivity has been associated with less successful dieting, and this association might be explained by impulsivity affecting the learning and extinction of appetitive responses. In the present two studies, the effects of different reinforcement schedules and impulsivity on the acquisition, extinction, and reacquisition of appetitive responses were investigated in a conditioning paradigm involving food rewards in healthy humans. Overall, the results indicate both partial reinforcement schedules and, possibly, impulsivity to be associated with worse extinction performance. A new model of dieting success is proposed: learning histories and, perhaps, certain personality traits (impulsivity) can interfere with the extinction and reacquisition of appetitive responses to food cues and they may be causally related to unsuccessful dieting. Copyright © 2014 Elsevier Ltd. All rights reserved.
Regulating recognition decisions through incremental reinforcement learning.

PubMed

Han, Sanghoon; Dobbins, Ian G

2009-06-01

Does incremental reinforcement learning influence recognition memory judgments? We examined this question by subtly altering the relative validity or availability of feedback in order to differentially reinforce old or new recognition judgments. Experiment 1 probabilistically and incorrectly indicated that either misses or false alarms were correct in the context of feedback that was otherwise accurate. Experiment 2 selectively withheld feedback for either misses or false alarms in the context of feedback that was otherwise present. Both manipulations caused prominent shifts of recognition memory decision criteria that remained for considerable periods even after feedback had been altogether removed. Overall, these data demonstrate that incremental reinforcement-learning mechanisms influence the degree of caution subjects exercise when evaluating explicit memories.

Infant Contingency Learning in Different Cultural Contexts

ERIC Educational Resources Information Center

Graf, Frauke; Lamm, Bettina; Goertz, Claudia; Kolling, Thorsten; Freitag, Claudia; Spangler, Sibylle; Fassbender, Ina; Teubert, Manuel; Vierhaus, Marc; Keller, Heidi; Lohaus, Arnold; Schwarzer, Gudrun; Knopf, Monika

2012-01-01

Three-month-old Cameroonian Nso farmer and German middle-class infants were compared regarding learning and retention in a computerized mobile task. Infants achieving a preset learning criterion during reinforcement were tested for immediate and long-term retention measured in terms of an increased response rate after reinforcement and after a…
Adaptive Educational Software by Applying Reinforcement Learning

ERIC Educational Resources Information Center

Bennane, Abdellah

2013-01-01

The introduction of the intelligence in teaching software is the object of this paper. In software elaboration process, one uses some learning techniques in order to adapt the teaching software to characteristics of student. Generally, one uses the artificial intelligence techniques like reinforcement learning, Bayesian network in order to adapt…
A self-taught artificial agent for multi-physics computational model personalization.

PubMed

Neumann, Dominik; Mansi, Tommaso; Itu, Lucian; Georgescu, Bogdan; Kayvanpour, Elham; Sedaghat-Hamedani, Farbod; Amr, Ali; Haas, Jan; Katus, Hugo; Meder, Benjamin; Steidl, Stefan; Hornegger, Joachim; Comaniciu, Dorin

2016-12-01

Personalization is the process of fitting a model to patient data, a critical step towards application of multi-physics computational models in clinical practice. Designing robust personalization algorithms is often a tedious, time-consuming, model- and data-specific process. We propose to use artificial intelligence concepts to learn this task, inspired by how human experts manually perform it. The problem is reformulated in terms of reinforcement learning. In an off-line phase, Vito, our self-taught artificial agent, learns a representative decision process model through exploration of the computational model: it learns how the model behaves under change of parameters. The agent then automatically learns an optimal strategy for on-line personalization. The algorithm is model-independent; applying it to a new model requires only adjusting few hyper-parameters of the agent and defining the observations to match. The full knowledge of the model itself is not required. Vito was tested in a synthetic scenario, showing that it could learn how to optimize cost functions generically. Then Vito was applied to the inverse problem of cardiac electrophysiology and the personalization of a whole-body circulation model. The obtained results suggested that Vito could achieve equivalent, if not better goodness of fit than standard methods, while being more robust (up to 11% higher success rates) and with faster (up to seven times) convergence rate. Our artificial intelligence approach could thus make personalization algorithms generalizable and self-adaptable to any patient and any model. Copyright © 2016. Published by Elsevier B.V.
Dissociating hippocampal and striatal contributions to sequential prediction learning

PubMed Central

Bornstein, Aaron M.; Daw, Nathaniel D.

2011-01-01

Behavior may be generated on the basis of many different kinds of learned contingencies. For instance, responses could be guided by the direct association between a stimulus and response, or by sequential stimulus-stimulus relationships (as in model-based reinforcement learning or goal-directed actions). However, the neural architecture underlying sequential predictive learning is not well-understood, in part because it is difficult to isolate its effect on choice behavior. To track such learning more directly, we examined reaction times (RTs) in a probabilistic sequential picture identification task. We used computational learning models to isolate trial-by-trial effects of two distinct learning processes in behavior, and used these as signatures to analyze the separate neural substrates of each process. RTs were best explained via the combination of two delta rule learning processes with different learning rates. To examine neural manifestations of these learning processes, we used functional magnetic resonance imaging to seek correlates of timeseries related to expectancy or surprise. We observed such correlates in two regions, hippocampus and striatum. By estimating the learning rates best explaining each signal, we verified that they were uniquely associated with one of the two distinct processes identified behaviorally. These differential correlates suggest that complementary anticipatory functions drive each region's effect on behavior. Our results provide novel insights as to the quantitative computational distinctions between medial temporal and basal ganglia learning networks and enable experiments that exploit trial-by-trial measurement of the unique contributions of both hippocampus and striatum to response behavior. PMID:22487032
Design of a Neurally Plausible Model of Fear Learning

PubMed Central

Krasne, Franklin B.; Fanselow, Michael S.; Zelikowsky, Moriel

2011-01-01

A neurally oriented conceptual and computational model of fear conditioning manifested by freezing behavior (FRAT), which accounts for many aspects of delay and context conditioning, has been constructed. Conditioning and extinction are the result of neuromodulation-controlled LTP at synapses of thalamic, cortical, and hippocampal afferents on principal cells and inhibitory interneurons of lateral and basal amygdala. The phenomena accounted for by the model (and simulated by the computational version) include conditioning, secondary reinforcement, blocking, the immediate shock deficit, extinction, renewal, and a range of empirically valid effects of pre- and post-training ablation or inactivation of hippocampus or amygdala nuclei. PMID:21845175
Army Training Study: Training Effectiveness Analysis (TEA) Summary. Volume 4. Ordnance, Signal, CAMMS (Computer Assisted Map Maneuver System).

DTIC Science & Technology

1978-08-08

learning can be reinforced on the job because individuals at all aptitude and experience levels investigated have the ability to be successful ...both game board and brigade controllers provided realistic feedback and guidance to the command group players . A second major function of the brigade...and experimental measures. The brigade level data collectors and game board players were controllers provided by the participating units’ parent
The GI Project: a prototype electronic textbook for high school biology.

PubMed

Calhoun, P S; Fishman, E K

1997-01-01

A prototype electronic science textbook for secondary education was developed to help bridge the gap between state-of-the-art medical technology and the basic science classroom. The prototype combines the latest in radiologic imaging techniques with a user-friendly multimedia computer program to teach the anatomy, physiology, and diseases of the gastrointestinal (GI) tract. The program includes original text, illustrations, photographs, animations, images from upper GI studies, plain radiographs, computed tomographic images, and three-dimensional reconstructions. These features are intended to create a stimulus-rich environment in which the high school science student can enjoy a variety of interactive experiences that will facilitate the learning process. The computer-based book is a new educational tool that promises to play a prominent role in the coming years. Current research suggests that computer-based books are valuable as an alternative educational medium. Although it is not yet clear what form textbooks will take in the future, computer-based books are already proving valuable as an alternative educational medium. For beginning students, they reinforce the material found in traditional textbooks and class presentations; for advanced students, they provide motivation to learn outside the traditional classroom.
Optogenetic mimicry of the transient activation of dopamine neurons by natural reward is sufficient for operant reinforcement.

PubMed

Kim, Kyung Man; Baratta, Michael V; Yang, Aimei; Lee, Doheon; Boyden, Edward S; Fiorillo, Christopher D

2012-01-01

Activation of dopamine receptors in forebrain regions, for minutes or longer, is known to be sufficient for positive reinforcement of stimuli and actions. However, the firing rate of dopamine neurons is increased for only about 200 milliseconds following natural reward events that are better than expected, a response which has been described as a "reward prediction error" (RPE). Although RPE drives reinforcement learning (RL) in computational models, it has not been possible to directly test whether the transient dopamine signal actually drives RL. Here we have performed optical stimulation of genetically targeted ventral tegmental area (VTA) dopamine neurons expressing Channelrhodopsin-2 (ChR2) in mice. We mimicked the transient activation of dopamine neurons that occurs in response to natural reward by applying a light pulse of 200 ms in VTA. When a single light pulse followed each self-initiated nose poke, it was sufficient in itself to cause operant reinforcement. Furthermore, when optical stimulation was delivered in separate sessions according to a predetermined pattern, it increased locomotion and contralateral rotations, behaviors that are known to result from activation of dopamine neurons. All three of the optically induced operant and locomotor behaviors were tightly correlated with the number of VTA dopamine neurons that expressed ChR2, providing additional evidence that the behavioral responses were caused by activation of dopamine neurons. These results provide strong evidence that the transient activation of dopamine neurons provides a functional reward signal that drives learning, in support of RL theories of dopamine function.
Punishment insensitivity and impaired reinforcement learning in preschoolers.

PubMed

Briggs-Gowan, Margaret J; Nichols, Sara R; Voss, Joel; Zobel, Elvira; Carter, Alice S; McCarthy, Kimberly J; Pine, Daniel S; Blair, James; Wakschlag, Lauren S

2014-01-01

Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a developmental vulnerability to psychopathic traits. One hundred and fifty-seven preschoolers (mean age 4.7 ± 0.8 years) participated in a substudy that was embedded within a larger project. Children completed the 'Stars-in-Jars' task, which involved learning to select rewarded jars and avoid punished jars. Maternal report of responsiveness to socialization was assessed with the Punishment Insensitivity and Low Concern for Others scales of the Multidimensional Assessment of Preschool Disruptive Behavior (MAP-DB). Punishment Insensitivity, but not Low Concern for Others, was significantly associated with reinforcement learning in multivariate models that accounted for age and sex. Specifically, higher Punishment Insensitivity was associated with significantly lower overall performance and more errors on punished trials ('passive avoidance'). Impairments in reinforcement learning manifest in preschoolers who are high in maternal ratings of Punishment Insensitivity. If replicated, these findings may help to pinpoint the neurodevelopmental antecedents of psychopathic tendencies and suggest novel intervention targets beginning in early childhood. © 2013 The Authors. Journal of Child Psychology and Psychiatry © 2013 Association for Child and Adolescent Mental Health.
Rats bred for helplessness exhibit positive reinforcement learning deficits which are not alleviated by an antidepressant dose of the MAO-B inhibitor deprenyl.

PubMed

Schulz, Daniela; Henn, Fritz A; Petri, David; Huston, Joseph P

2016-08-04

Principles of negative reinforcement learning may play a critical role in the etiology and treatment of depression. We examined the integrity of positive reinforcement learning in congenitally helpless (cH) rats, an animal model of depression, using a random ratio schedule and a devaluation-extinction procedure. Furthermore, we tested whether an antidepressant dose of the monoamine oxidase (MAO)-B inhibitor deprenyl would reverse any deficits in positive reinforcement learning. We found that cH rats (n=9) were impaired in the acquisition of even simple operant contingencies, such as a fixed interval (FI) 20 schedule. cH rats exhibited no apparent deficits in appetite or reward sensitivity. They reacted to the devaluation of food in a manner consistent with a dose-response relationship. Reinforcer motivation as assessed by lever pressing across sessions with progressively decreasing reward probabilities was highest in congenitally non-helpless (cNH, n=10) rats as long as the reward probabilities remained relatively high. cNH compared to wild-type (n=10) rats were also more resistant to extinction across sessions. Compared to saline (n=5), deprenyl (n=5) reduced the duration of immobility of cH rats in the forced swimming test, indicative of antidepressant effects, but did not restore any deficits in the acquisition of a FI 20 schedule. We conclude that positive reinforcement learning was impaired in rats bred for helplessness, possibly due to motivational impairments but not deficits in reward sensitivity, and that deprenyl exerted antidepressant effects but did not reverse the deficits in positive reinforcement learning. Copyright © 2016 IBRO. Published by Elsevier Ltd. All rights reserved.
Antipsychotic dose modulates behavioral and neural responses to feedback during reinforcement learning in schizophrenia.

PubMed

Insel, Catherine; Reinen, Jenna; Weber, Jochen; Wager, Tor D; Jarskog, L Fredrik; Shohamy, Daphna; Smith, Edward E

2014-03-01

Schizophrenia is characterized by an abnormal dopamine system, and dopamine blockade is the primary mechanism of antipsychotic treatment. Consistent with the known role of dopamine in reward processing, prior research has demonstrated that patients with schizophrenia exhibit impairments in reward-based learning. However, it remains unknown how treatment with antipsychotic medication impacts the behavioral and neural signatures of reinforcement learning in schizophrenia. The goal of this study was to examine whether antipsychotic medication modulates behavioral and neural responses to prediction error coding during reinforcement learning. Patients with schizophrenia completed a reinforcement learning task while undergoing functional magnetic resonance imaging. The task consisted of two separate conditions in which participants accumulated monetary gain or avoided monetary loss. Behavioral results indicated that antipsychotic medication dose was associated with altered behavioral approaches to learning, such that patients taking higher doses of medication showed increased sensitivity to negative reinforcement. Higher doses of antipsychotic medication were also associated with higher learning rates (LRs), suggesting that medication enhanced sensitivity to trial-by-trial feedback. Neuroimaging data demonstrated that antipsychotic dose was related to differences in neural signatures of feedback prediction error during the loss condition. Specifically, patients taking higher doses of medication showed attenuated prediction error responses in the striatum and the medial prefrontal cortex. These findings indicate that antipsychotic medication treatment may influence motivational processes in patients with schizophrenia.
Toward a dual-learning systems model of speech category learning

PubMed Central

Chandrasekaran, Bharath; Koslov, Seth R.; Maddox, W. T.

2014-01-01

More than two decades of work in vision posits the existence of dual-learning systems of category learning. The reflective system uses working memory to develop and test rules for classifying in an explicit fashion, while the reflexive system operates by implicitly associating perception with actions that lead to reinforcement. Dual-learning systems models hypothesize that in learning natural categories, learners initially use the reflective system and, with practice, transfer control to the reflexive system. The role of reflective and reflexive systems in auditory category learning and more specifically in speech category learning has not been systematically examined. In this article, we describe a neurobiologically constrained dual-learning systems theoretical framework that is currently being developed in speech category learning and review recent applications of this framework. Using behavioral and computational modeling approaches, we provide evidence that speech category learning is predominantly mediated by the reflexive learning system. In one application, we explore the effects of normal aging on non-speech and speech category learning. Prominently, we find a large age-related deficit in speech learning. The computational modeling suggests that older adults are less likely to transition from simple, reflective, unidimensional rules to more complex, reflexive, multi-dimensional rules. In a second application, we summarize a recent study examining auditory category learning in individuals with elevated depressive symptoms. We find a deficit in reflective-optimal and an enhancement in reflexive-optimal auditory category learning. Interestingly, individuals with elevated depressive symptoms also show an advantage in learning speech categories. We end with a brief summary and description of a number of future directions. PMID:25132827
Microstimulation of the human substantia nigra alters reinforcement learning.

PubMed

Ramayya, Ashwin G; Misra, Amrit; Baltuch, Gordon H; Kahana, Michael J

2014-05-14

Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action-reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action-reward associations rather than stimulus-reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action-reward associations during reinforcement learning. Copyright © 2014 the authors 0270-6474/14/346887-09$15.00/0.
Using Cross-Sectional Imaging to Convey Organ Relationships: An Integrated Learning Environment for Students of Gross Anatomy

PubMed Central

Forman, Bruce H.; Eccles, Randy; Piggins, Judith; Raila, Wayne; Estey, Greg; Barnett, G. Octo

1990-01-01

We have developed a visually oriented, computer-controlled learning environment designed for use by students of gross anatomy. The goals of this module are to reinforce the concepts of organ relationships and topography by using computed axial tomographic (CAT) images accessed from a videodisc integrated with color graphics and to introduce students to cross-sectional radiographic anatomy. We chose to build the program around CAT scan images because they not only provide excellent structural detail but also offer an anatomic orientation (transverse) that complements that used in the dissection laboratory (basically a layer-by-layer, anterior-to-posterior, or coronal approach). Our system, built using a Microsoft Windows-386 based authoring environment which we designed and implemented, integrates text, video images, and graphics into a single screen display. The program allows both user browsing of information, facilitated by hypertext links, and didactic sessions including mini-quizzes for self-assessment.
Batch Mode Reinforcement Learning based on the Synthesis of Artificial Trajectories

PubMed Central

Fonteneau, Raphael; Murphy, Susan A.; Wehenkel, Louis; Ernst, Damien

2013-01-01

In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value function. As an alternative to the use of function approximators, we rely on the synthesis of “artificial trajectories” from the given sample of trajectories, and show that this idea opens new avenues for designing and analyzing algorithms for batch mode reinforcement learning. PMID:24049244
Human reinforcement learning subdivides structured action spaces by learning effector-specific values

PubMed Central

Gershman, Samuel J.; Pesaran, Bijan; Daw, Nathaniel D.

2009-01-01

Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable, due to the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning – such as prediction error signals for action valuation associated with dopamine and the striatum – can cope with this “curse of dimensionality.” We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and BOLD activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to “divide and conquer” reinforcement learning over high-dimensional action spaces. PMID:19864565
Human reinforcement learning subdivides structured action spaces by learning effector-specific values.

PubMed

Gershman, Samuel J; Pesaran, Bijan; Daw, Nathaniel D

2009-10-28

Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.
Dual learning processes underlying human decision-making in reversal learning tasks: functional significance and evidence from the model fit to human behavior

PubMed Central

Bai, Yu; Katahira, Kentaro; Ohira, Hideki

2014-01-01

Humans are capable of correcting their actions based on actions performed in the past, and this ability enables them to adapt to a changing environment. The computational field of reinforcement learning (RL) has provided a powerful explanation for understanding such processes. Recently, the dual learning system, modeled as a hybrid model that incorporates value update based on reward-prediction error and learning rate modulation based on the surprise signal, has gained attention as a model for explaining various neural signals. However, the functional significance of the hybrid model has not been established. In the present study, we used computer simulation in a reversal learning task to address functional significance in a probabilistic reversal learning task. The hybrid model was found to perform better than the standard RL model in a large parameter setting. These results suggest that the hybrid model is more robust against the mistuning of parameters compared with the standard RL model when decision-makers continue to learn stimulus-reward contingencies, which can create abrupt changes. The parameter fitting results also indicated that the hybrid model fit better than the standard RL model for more than 50% of the participants, which suggests that the hybrid model has more explanatory power for the behavioral data than the standard RL model. PMID:25161635
The role of first impression in operant learning.

PubMed

Shteingart, Hanan; Neiman, Tal; Loewenstein, Yonatan

2013-05-01

We quantified the effect of first experience on behavior in operant learning and studied its underlying computational principles. To that goal, we analyzed more than 200,000 choices in a repeated-choice experiment. We found that the outcome of the first experience has a substantial and lasting effect on participants' subsequent behavior, which we term outcome primacy. We found that this outcome primacy can account for much of the underweighting of rare events, where participants apparently underestimate small probabilities. We modeled behavior in this task using a standard, model-free reinforcement learning algorithm. In this model, the values of the different actions are learned over time and are used to determine the next action according to a predefined action-selection rule. We used a novel nonparametric method to characterize this action-selection rule and showed that the substantial effect of first experience on behavior is consistent with the reinforcement learning model if we assume that the outcome of first experience resets the values of the experienced actions, but not if we assume arbitrary initial conditions. Moreover, the predictive power of our resetting model outperforms previously published models regarding the aggregate choice behavior. These findings suggest that first experience has a disproportionately large effect on subsequent actions, similar to primacy effects in other fields of cognitive psychology. The mechanism of resetting of the initial conditions that underlies outcome primacy may thus also account for other forms of primacy. PsycINFO Database Record (c) 2013 APA, all rights reserved.
A Dynamic Connectome Supports the Emergence of Stable Computational Function of Neural Circuits through Reward-Based Learning.

PubMed

Kappel, David; Legenstein, Robert; Habenschuss, Stefan; Hsieh, Michael; Maass, Wolfgang

2018-01-01

Synaptic connections between neurons in the brain are dynamic because of continuously ongoing spine dynamics, axonal sprouting, and other processes. In fact, it was recently shown that the spontaneous synapse-autonomous component of spine dynamics is at least as large as the component that depends on the history of pre- and postsynaptic neural activity. These data are inconsistent with common models for network plasticity and raise the following questions: how can neural circuits maintain a stable computational function in spite of these continuously ongoing processes, and what could be functional uses of these ongoing processes? Here, we present a rigorous theoretical framework for these seemingly stochastic spine dynamics and rewiring processes in the context of reward-based learning tasks. We show that spontaneous synapse-autonomous processes, in combination with reward signals such as dopamine, can explain the capability of networks of neurons in the brain to configure themselves for specific computational tasks, and to compensate automatically for later changes in the network or task. Furthermore, we show theoretically and through computer simulations that stable computational performance is compatible with continuously ongoing synapse-autonomous changes. After reaching good computational performance it causes primarily a slow drift of network architecture and dynamics in task-irrelevant dimensions, as observed for neural activity in motor cortex and other areas. On the more abstract level of reinforcement learning the resulting model gives rise to an understanding of reward-driven network plasticity as continuous sampling of network configurations.

A Dynamic Connectome Supports the Emergence of Stable Computational Function of Neural Circuits through Reward-Based Learning

PubMed Central

Habenschuss, Stefan; Hsieh, Michael

2018-01-01

Synaptic connections between neurons in the brain are dynamic because of continuously ongoing spine dynamics, axonal sprouting, and other processes. In fact, it was recently shown that the spontaneous synapse-autonomous component of spine dynamics is at least as large as the component that depends on the history of pre- and postsynaptic neural activity. These data are inconsistent with common models for network plasticity and raise the following questions: how can neural circuits maintain a stable computational function in spite of these continuously ongoing processes, and what could be functional uses of these ongoing processes? Here, we present a rigorous theoretical framework for these seemingly stochastic spine dynamics and rewiring processes in the context of reward-based learning tasks. We show that spontaneous synapse-autonomous processes, in combination with reward signals such as dopamine, can explain the capability of networks of neurons in the brain to configure themselves for specific computational tasks, and to compensate automatically for later changes in the network or task. Furthermore, we show theoretically and through computer simulations that stable computational performance is compatible with continuously ongoing synapse-autonomous changes. After reaching good computational performance it causes primarily a slow drift of network architecture and dynamics in task-irrelevant dimensions, as observed for neural activity in motor cortex and other areas. On the more abstract level of reinforcement learning the resulting model gives rise to an understanding of reward-driven network plasticity as continuous sampling of network configurations. PMID:29696150
Separation of time-based and trial-based accounts of the partial reinforcement extinction effect.

PubMed

Bouton, Mark E; Woods, Amanda M; Todd, Travis P

2014-01-01

Two appetitive conditioning experiments with rats examined time-based and trial-based accounts of the partial reinforcement extinction effect (PREE). In the PREE, the loss of responding that occurs in extinction is slower when the conditioned stimulus (CS) has been paired with a reinforcer on some of its presentations (partially reinforced) instead of every presentation (continuously reinforced). According to a time-based or "time-accumulation" view (e.g., Gallistel and Gibbon, 2000), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger amount of time has accumulated in the CS over trials. In contrast, according to a trial-based view (e.g., Capaldi, 1967), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger number of CS presentations. Experiment 1 used a procedure that equated partially and continuously reinforced groups on their expected times to reinforcement during conditioning. A PREE was still observed. Experiment 2 then used an extinction procedure that allowed time in the CS and the number of trials to accumulate differentially through extinction. The PREE was still evident when responding was examined as a function of expected time units to the reinforcer, but was eliminated when responding was examined as a function of expected trial units to the reinforcer. There was no evidence that the animal responded according to the ratio of time accumulated during the CS in extinction over the time in the CS expected before the reinforcer. The results thus favor a trial-based account over a time-based account of extinction and the PREE. This article is part of a Special Issue entitled: Associative and Temporal Learning. Copyright © 2013 Elsevier B.V. All rights reserved.
Autonomous reinforcement learning with experience replay.

PubMed

Wawrzyński, Paweł; Tanwani, Ajay Kumar

2013-05-01

This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. Copyright © 2012 Elsevier Ltd. All rights reserved.
Electrophysiological correlates of reinforcement learning in young people with Tourette syndrome with and without co-occurring ADHD symptoms.

PubMed

Shephard, Elizabeth; Jackson, Georgina M; Groom, Madeleine J

2016-06-01

Altered reinforcement learning is implicated in the causes of Tourette syndrome (TS) and attention-deficit/hyperactivity disorder (ADHD). TS and ADHD frequently co-occur but how this affects reinforcement learning has not been investigated. We examined the ability of young people with TS (n=18), TS+ADHD (N=17), ADHD (n=13) and typically developing controls (n=20) to learn and reverse stimulus-response (S-R) associations based on positive and negative reinforcement feedback. We used a 2 (TS-yes, TS-no)×2 (ADHD-yes, ADHD-no) factorial design to assess the effects of TS, ADHD, and their interaction on behavioural (accuracy, RT) and event-related potential (stimulus-locked P3, feedback-locked P2, feedback-related negativity, FRN) indices of learning and reversing the S-R associations. TS was associated with intact learning and reversal performance and largely typical ERP amplitudes. ADHD was associated with lower accuracy during S-R learning and impaired reversal learning (significantly reduced accuracy and a trend for smaller P3 amplitude). The results indicate that co-occurring ADHD symptoms impair reversal learning in TS+ADHD. The implications of these findings for behavioural tic therapies are discussed. Copyright © 2016 ISDN. Published by Elsevier Ltd. All rights reserved.
Utilising reinforcement learning to develop strategies for driving auditory neural implants.

PubMed

Lee, Geoffrey W; Zambetta, Fabio; Li, Xiaodong; Paolini, Antonio G

2016-08-01

In this paper we propose a novel application of reinforcement learning to the area of auditory neural stimulation. We aim to develop a simulation environment which is based off real neurological responses to auditory and electrical stimulation in the cochlear nucleus (CN) and inferior colliculus (IC) of an animal model. Using this simulator we implement closed loop reinforcement learning algorithms to determine which methods are most effective at learning effective acoustic neural stimulation strategies. By recording a comprehensive set of acoustic frequency presentations and neural responses from a set of animals we created a large database of neural responses to acoustic stimulation. Extensive electrical stimulation in the CN and the recording of neural responses in the IC provides a mapping of how the auditory system responds to electrical stimuli. The combined dataset is used as the foundation for the simulator, which is used to implement and test learning algorithms. Reinforcement learning, utilising a modified n-Armed Bandit solution, is implemented to demonstrate the model's function. We show the ability to effectively learn stimulation patterns which mimic the cochlea's ability to covert acoustic frequencies to neural activity. Time taken to learn effective replication using neural stimulation takes less than 20 min under continuous testing. These results show the utility of reinforcement learning in the field of neural stimulation. These results can be coupled with existing sound processing technologies to develop new auditory prosthetics that are adaptable to the recipients current auditory pathway. The same process can theoretically be abstracted to other sensory and motor systems to develop similar electrical replication of neural signals.
Embedded Incremental Feature Selection for Reinforcement Learning

DTIC Science & Technology

2012-05-01

Prior to this work, feature selection for reinforce- ment learning has focused on linear value function ap- proximation ( Kolter and Ng, 2009; Parr et al...InProceed- ings of the the 23rd International Conference on Ma- chine Learning, pages 449–456. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature
Social Learning, Reinforcement and Crime: Evidence from Three European Cities

ERIC Educational Resources Information Center

Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina

2012-01-01

This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…
Trading Rules on Stock Markets Using Genetic Network Programming with Reinforcement Learning and Importance Index

NASA Astrophysics Data System (ADS)

Mabu, Shingo; Hirasawa, Kotaro; Furuzuki, Takayuki

Genetic Network Programming (GNP) is an evolutionary computation which represents its solutions using graph structures. Since GNP can create quite compact programs and has an implicit memory function, it has been clarified that GNP works well especially in dynamic environments. In addition, a study on creating trading rules on stock markets using GNP with Importance Index (GNP-IMX) has been done. IMX is a new element which is a criterion for decision making. In this paper, we combined GNP-IMX with Actor-Critic (GNP-IMX&AC) and create trading rules on stock markets. Evolution-based methods evolve their programs after enough period of time because they must calculate fitness values, however reinforcement learning can change programs during the period, therefore the trading rules can be created efficiently. In the simulation, the proposed method is trained using the stock prices of 10 brands in 2002 and 2003. Then the generalization ability is tested using the stock prices in 2004. The simulation results show that the proposed method can obtain larger profits than GNP-IMX without AC and Buy&Hold.
Novelty and Inductive Generalization in Human Reinforcement Learning

PubMed Central

Gershman, Samuel J.; Niv, Yael

2015-01-01

In reinforcement learning, a decision maker searching for the most rewarding option is often faced with the question: what is the value of an option that has never been tried before? One way to frame this question is as an inductive problem: how can I generalize my previous experience with one set of options to a novel option? We show how hierarchical Bayesian inference can be used to solve this problem, and describe an equivalence between the Bayesian model and temporal difference learning algorithms that have been proposed as models of reinforcement learning in humans and animals. According to our view, the search for the best option is guided by abstract knowledge about the relationships between different options in an environment, resulting in greater search efficiency compared to traditional reinforcement learning algorithms previously applied to human cognition. In two behavioral experiments, we test several predictions of our model, providing evidence that humans learn and exploit structured inductive knowledge to make predictions about novel options. In light of this model, we suggest a new interpretation of dopaminergic responses to novelty. PMID:25808176
Learning with incomplete information and the mathematical structure behind it.

PubMed

Kühn, Reimer; Stamatescu, Ion-Olimpiu

2007-07-01

We investigate the problem of learning with incomplete information as exemplified by learning with delayed reinforcement. We study a two phase learning scenario in which a phase of Hebbian associative learning based on momentary internal representations is supplemented by an 'unlearning' phase depending on a graded reinforcement signal. The reinforcement signal quantifies the success-rate globally for a number of learning steps in phase one, and 'unlearning' is indiscriminate with respect to associations learnt in that phase. Learning according to this model is studied via simulations and analytically within a student-teacher scenario for both single layer networks and, for a committee machine. Success and speed of learning depend on the ratio lambda of the learning rates used for the associative Hebbian learning phase and for the unlearning-correction in response to the reinforcement signal, respectively. Asymptotically perfect generalization is possible only, if this ratio exceeds a critical value lambda( c ), in which case the generalization error exhibits a power law decay with the number of examples seen by the student, with an exponent that depends in a non-universal manner on the parameter lambda. We find these features to be robust against a wide spectrum of modifications of microscopic modelling details. Two illustrative applications-one of a robot learning to navigate a field containing obstacles, and the problem of identifying a specific component in a collection of stimuli-are also provided.
Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity, and Performance

DTIC Science & Technology

2014-09-29

Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity , and Performance W. Bradley Knox...positive a trainer’s reward values are; temporal discounting, the extent to which future reward is discounted in value; episodicity , whether task...learning occurs in discrete learning episodes instead of one continuing session; and task performance, the agent’s performance on the task the trainer
What is Intrinsic Motivation? A Typology of Computational Approaches

PubMed Central

Oudeyer, Pierre-Yves; Kaplan, Frederic

2007-01-01

Intrinsic motivation, centrally involved in spontaneous exploration and curiosity, is a crucial concept in developmental psychology. It has been argued to be a crucial mechanism for open-ended cognitive development in humans, and as such has gathered a growing interest from developmental roboticists in the recent years. The goal of this paper is threefold. First, it provides a synthesis of the different approaches of intrinsic motivation in psychology. Second, by interpreting these approaches in a computational reinforcement learning framework, we argue that they are not operational and even sometimes inconsistent. Third, we set the ground for a systematic operational study of intrinsic motivation by presenting a formal typology of possible computational approaches. This typology is partly based on existing computational models, but also presents new ways of conceptualizing intrinsic motivation. We argue that this kind of computational typology might be useful for opening new avenues for research both in psychology and developmental robotics. PMID:18958277
With you or against you: social orientation dependent learning signals guide actions made for others.

PubMed

Christopoulos, George I; King-Casas, Brooks

2015-01-01

In social environments, it is crucial that decision-makers take account of the impact of their actions not only for oneself, but also on other social agents. Previous work has identified neural signals in the striatum encoding value-based prediction errors for outcomes to oneself; also, recent work suggests that neural activity in prefrontal cortex may similarly encode value-based prediction errors related to outcomes to others. However, prior work also indicates that social valuations are not isomorphic, with social value orientations of decision-makers ranging on a cooperative to competitive continuum; this variation has not been examined within social learning environments. Here, we combine a computational model of learning with functional neuroimaging to examine how individual differences in orientation impact neural mechanisms underlying 'other-value' learning. Across four experimental conditions, reinforcement learning signals for other-value were identified in medial prefrontal cortex, and were distinct from self-value learning signals identified in striatum. Critically, the magnitude and direction of the other-value learning signal depended strongly on an individual's cooperative or competitive orientation toward others. These data indicate that social decisions are guided by a social orientation-dependent learning system that is computationally similar but anatomically distinct from self-value learning. The sensitivity of the medial prefrontal learning signal to social preferences suggests a mechanism linking such preferences to biases in social actions and highlights the importance of incorporating heterogeneous social predispositions in neurocomputational models of social behavior. Published by Elsevier Inc.
With you or against you: Social orientation dependent learning signals guide actions made for others

PubMed Central

Christopoulos, George I.; King-Casas, Brooks

2014-01-01

In social environments, it is crucial that decision-makers take account of the impact of their actions not only for oneself, but also on other social agents. Previous work has identified neural signals in the striatum encoding value-based prediction errors for outcomes to oneself; also, recent work suggests neural activity in prefrontal cortex may similarly encode value-based prediction errors related to outcomes to others. However, prior work also indicates that social valuations are not isomorphic, with social value orientations of decision-makers ranging on a cooperative to competitive continuum; this variation has not been examined within social learning environments. Here, we combine a computational model of learning with functional neuroimaging to examine how individual differences in orientation impact neural mechanisms underlying ‘other-value’ learning. Across four experimental conditions, reinforcement learning signals for other-value were identified in medial prefrontal cortex, and were distinct from self-value learning signals identified in striatum. Critically, the magnitude and direction of the other-value learning signal depended strongly on an individual’s cooperative or competitive orientation towards others. These data indicate that social decisions are guided by a social orientation-dependent learning system that is computationally similar but anatomically distinct from self-value learning. The sensitivity of the medial prefrontal learning signal to social preferences suggests a mechanism linking such preferences to biases in social actions and highlights the importance of incorporating heterogeneous social predispositions in neurocomputational models of social behavior. PMID:25224998
Fuzzy Q-Learning for Generalization of Reinforcement Learning

NASA Technical Reports Server (NTRS)

Berenji, Hamid R.

1996-01-01

Fuzzy Q-Learning, introduced earlier by the author, is an extension of Q-Learning into fuzzy environments. GARIC is a methodology for fuzzy reinforcement learning. In this paper, we introduce GARIC-Q, a new method for doing incremental Dynamic Programming using a society of intelligent agents which are controlled at the top level by Fuzzy Q-Learning and at the local level, each agent learns and operates based on GARIC. GARIC-Q improves the speed and applicability of Fuzzy Q-Learning through generalization of input space by using fuzzy rules and bridges the gap between Q-Learning and rule based intelligent systems.
An extended reinforcement learning model of basal ganglia to understand the contributions of serotonin and dopamine in risk-based decision making, reward prediction, and punishment learning

PubMed Central

Balasubramani, Pragathi P.; Chakravarthy, V. Srinivasa; Ravindran, Balaraman; Moustafa, Ahmed A.

2014-01-01

Although empirical and neural studies show that serotonin (5HT) plays many functional roles in the brain, prior computational models mostly focus on its role in behavioral inhibition. In this study, we present a model of risk based decision making in a modified Reinforcement Learning (RL)-framework. The model depicts the roles of dopamine (DA) and serotonin (5HT) in Basal Ganglia (BG). In this model, the DA signal is represented by the temporal difference error (δ), while the 5HT signal is represented by a parameter (α) that controls risk prediction error. This formulation that accommodates both 5HT and DA reconciles some of the diverse roles of 5HT particularly in connection with the BG system. We apply the model to different experimental paradigms used to study the role of 5HT: (1) Risk-sensitive decision making, where 5HT controls risk assessment, (2) Temporal reward prediction, where 5HT controls time-scale of reward prediction, and (3) Reward/Punishment sensitivity, in which the punishment prediction error depends on 5HT levels. Thus the proposed integrated RL model reconciles several existing theories of 5HT and DA in the BG. PMID:24795614
Framework for robot skill learning using reinforcement learning

NASA Astrophysics Data System (ADS)

Wei, Yingzi; Zhao, Mingyang

2003-09-01

Robot acquiring skill is a process similar to human skill learning. Reinforcement learning (RL) is an on-line actor critic method for a robot to develop its skill. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. We present an augmented reward function that provides a new way for RL controller to incorporate prior knowledge and experience into the RL controller. Also, the difference form of augmented reward function is considered carefully. The additional reward beyond conventional reward will provide more heuristic information for RL. In this paper, we present a strategy for the task of complex skill learning. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. The new form of value function is introduced to attain smooth motion switching swiftly. We present a formal, but practical, framework for robot skill learning and also illustrate with an example the utility of method for learning skilled robot control on line.
Proactivity and Reinforcement: The Contingency of Social Behavior

ERIC Educational Resources Information Center

Williams, J. Sherwood; And Others

1976-01-01

This paper analyzes development of group structure in terms of the stimulus-sampling perspective. Learning is the continual sampling of possibilities, with those reinforced possibilities increasing in probability of occurance. This contingency learning approach is tested experimentally. (NG)
Mobile robots exploration through cnn-based reinforcement learning.

PubMed

Tai, Lei; Liu, Ming

2016-01-01

Exploration in an unknown environment is an elemental application for mobile robots. In this paper, we outlined a reinforcement learning method aiming for solving the exploration problem in a corridor environment. The learning model took the depth image from an RGB-D sensor as the only input. The feature representation of the depth image was extracted through a pre-trained convolutional-neural-networks model. Based on the recent success of deep Q-network on artificial intelligence, the robot controller achieved the exploration and obstacle avoidance abilities in several different simulated environments. It is the first time that the reinforcement learning is used to build an exploration strategy for mobile robots through raw sensor information.
Automated Inattention and Fatigue Detection System in Distance Education for Elementary School Students

ERIC Educational Resources Information Center

Hwang, Kuo-An; Yang, Chia-Hao

2009-01-01

Most courses based on distance learning focus on the cognitive domain of learning. Because students are sometimes inattentive or tired, they may neglect the attention goal of learning. This study proposes an auto-detection and reinforcement mechanism for the distance-education system based on the reinforcement teaching strategy. If a student is…

When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition

ERIC Educational Resources Information Center

Janssen, Christian P.; Gray, Wayne D.

2012-01-01

Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other…
Fuzzy Sarsa with Focussed Replacing Eligibility Traces for Robust and Accurate Control

NASA Astrophysics Data System (ADS)

Kamdem, Sylvain; Ohki, Hidehiro; Sueda, Naomichi

Several methods of reinforcement learning in continuous state and action spaces that utilize fuzzy logic have been proposed in recent years. This paper introduces Fuzzy Sarsa(λ), an on-policy algorithm for fuzzy learning that relies on a novel way of computing replacing eligibility traces to accelerate the policy evaluation. It is tested against several temporal difference learning algorithms: Sarsa(λ), Fuzzy Q(λ), an earlier fuzzy version of Sarsa and an actor-critic algorithm. We perform detailed evaluations on two benchmark problems : a maze domain and the cart pole. Results of various tests highlight the strengths and weaknesses of these algorithms and show that Fuzzy Sarsa(λ) outperforms all other algorithms tested for a larger granularity of design and under noisy conditions. It is a highly competitive method of learning in realistic noisy domains where a denser fuzzy design over the state space is needed for a more precise control.
RM-SORN: a reward-modulated self-organizing recurrent neural network.

PubMed

Aswolinskiy, Witali; Pipa, Gordon

2015-01-01

Neural plasticity plays an important role in learning and memory. Reward-modulation of plasticity offers an explanation for the ability of the brain to adapt its neural activity to achieve a rewarded goal. Here, we define a neural network model that learns through the interaction of Intrinsic Plasticity (IP) and reward-modulated Spike-Timing-Dependent Plasticity (STDP). IP enables the network to explore possible output sequences and STDP, modulated by reward, reinforces the creation of the rewarded output sequences. The model is tested on tasks for prediction, recall, non-linear computation, pattern recognition, and sequence generation. It achieves performance comparable to networks trained with supervised learning, while using simple, biologically motivated plasticity rules, and rewarding strategies. The results confirm the importance of investigating the interaction of several plasticity rules in the context of reward-modulated learning and whether reward-modulated self-organization can explain the amazing capabilities of the brain.
Dissociating error-based and reinforcement-based loss functions during sensorimotor learning

PubMed Central

McGregor, Heather R.; Mohatarem, Ayman

2017-01-01

It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback. PMID:28753634
Dissociating error-based and reinforcement-based loss functions during sensorimotor learning.

PubMed

Cashaback, Joshua G A; McGregor, Heather R; Mohatarem, Ayman; Gribble, Paul L

2017-07-01

It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback.
Altered Risk-Based Decision Making following Adolescent Alcohol Use Results from an Imbalance in Reinforcement Learning in Rats

PubMed Central

Hart, Andrew S.; Collins, Anne L.; Bernstein, Ilene L.; Phillips, Paul E. M.

2012-01-01

Alcohol use during adolescence has profound and enduring consequences on decision-making under risk. However, the fundamental psychological processes underlying these changes are unknown. Here, we show that alcohol use produces over-fast learning for better-than-expected, but not worse-than-expected, outcomes without altering subjective reward valuation. We constructed a simple reinforcement learning model to simulate altered decision making using behavioral parameters extracted from rats with a history of adolescent alcohol use. Remarkably, the learning imbalance alone was sufficient to simulate the divergence in choice behavior observed between these groups of animals. These findings identify a selective alteration in reinforcement learning following adolescent alcohol use that can account for a robust change in risk-based decision making persisting into later life. PMID:22615989
An experiment on the use of disposable plastics as a reinforcement in concrete beams

NASA Technical Reports Server (NTRS)

Chowdhury, Mostafiz R.

1992-01-01

Illustrated here is the concept of reinforced concrete structures by the use of computer simulation and an inexpensive hands-on design experiment. The students in our construction management program use disposable plastic as a reinforcement to demonstrate their understanding of reinforced concrete and prestressed concrete beams. The plastics used for such an experiment vary from plastic bottles to steel reinforced auto tires. This experiment will show the extent to which plastic reinforcement increases the strength of a concrete beam. The procedure of using such throw-away plastics in an experiment to explain the interaction between the reinforcement material and concrete, and a comparison of the test results for using different types of waste plastics are discussed. A computer analysis to simulate the structural response is used to compare the test results and to understand the analytical background of reinforced concrete design. This interaction of using computers to analyze structures and to relate the output results with real experimentation is found to be a very useful method for teaching a math-based analytical subject to our non-engineering students.
Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters.

PubMed

Khamassi, Mehdi; Enel, Pierre; Dominey, Peter Ford; Procyk, Emmanuel

2013-01-01

Converging evidence suggest that the medial prefrontal cortex (MPFC) is involved in feedback categorization, performance monitoring, and task monitoring, and may contribute to the online regulation of reinforcement learning (RL) parameters that would affect decision-making processes in the lateral prefrontal cortex (LPFC). Previous neurophysiological experiments have shown MPFC activities encoding error likelihood, uncertainty, reward volatility, as well as neural responses categorizing different types of feedback, for instance, distinguishing between choice errors and execution errors. Rushworth and colleagues have proposed that the involvement of MPFC in tracking the volatility of the task could contribute to the regulation of one of RL parameters called the learning rate. We extend this hypothesis by proposing that MPFC could contribute to the regulation of other RL parameters such as the exploration rate and default action values in case of task shifts. Here, we analyze the sensitivity to RL parameters of behavioral performance in two monkey decision-making tasks, one with a deterministic reward schedule and the other with a stochastic one. We show that there exist optimal parameter values specific to each of these tasks, that need to be found for optimal performance and that are usually hand-tuned in computational models. In contrast, automatic online regulation of these parameters using some heuristics can help producing a good, although non-optimal, behavioral performance in each task. We finally describe our computational model of MPFC-LPFC interaction used for online regulation of the exploration rate and its application to a human-robot interaction scenario. There, unexpected uncertainties are produced by the human introducing cued task changes or by cheating. The model enables the robot to autonomously learn to reset exploration in response to such uncertain cues and events. The combined results provide concrete evidence specifying how prefrontal cortical subregions may cooperate to regulate RL parameters. It also shows how such neurophysiologically inspired mechanisms can control advanced robots in the real world. Finally, the model's learning mechanisms that were challenged in the last robotic scenario provide testable predictions on the way monkeys may learn the structure of the task during the pretraining phase of the previous laboratory experiments. Copyright © 2013 Elsevier B.V. All rights reserved.
How to build better memory training games

PubMed Central

Deveau, Jenni; Jaeggi, Susanne M.; Zordan, Victor; Phung, Calvin; Seitz, Aaron R.

2015-01-01

Can we create engaging training programs that improve working memory (WM) skills? While there are numerous procedures that attempt to do so, there is a great deal of controversy regarding their efficacy. Nonetheless, recent meta-analytic evidence shows consistent improvements across studies on lab-based tasks generalizing beyond the specific training effects (Au et al., 2014; Karbach and Verhaeghen, 2014), however, there is little research into how WM training aids participants in their daily life. Here we propose that incorporating design principles from the fields of Perceptual Learning (PL) and Computer Science might augment the efficacy of WM training, and ultimately lead to greater learning and transfer. In particular, the field of PL has identified numerous mechanisms (including attention, reinforcement, multisensory facilitation and multi-stimulus training) that promote brain plasticity. Also, computer science has made great progress in the scientific approach to game design that can be used to create engaging environments for learning. We suggest that approaches integrating knowledge across these fields may lead to a more effective WM interventions and better reflect real world conditions. PMID:25620916
Comparative learning theory and its application in the training of horses.

PubMed

Cooper, J J

1998-11-01

Training can best be explained as a process that occurs through stimulus-response-reinforcement chains, whereby animals are conditioned to associate cues in their environment, with specific behavioural responses and their rewarding consequences. Research into learning in horses has concentrated on their powers of discrimination and on primary positive reinforcement schedules, where the correct response is paired with a desirable consequence such as food. In contrast, a number of other learning processes that are used in training have been widely studied in other species, but have received little scientific investigation in the horse. These include: negative reinforcement, where performance of the correct response is followed by removal of, or decrease in, intensity of a unpleasant stimulus; punishment, where an incorrect response is paired with an undesirable consequence, but without consistent prior warning; secondary conditioning, where a natural primary reinforcer such as food is closely associated with an arbitrary secondary reinforcer such as vocal praise; and variable or partial conditioning, where once the correct response has been learnt, reinforcement is presented according to an intermittent schedule to increase resistance to extinction outside of training.
The nature of sexual reinforcement.

PubMed Central

Crawford, L L; Holloway, K S; Domjan, M

1993-01-01

Sexual reinforcers are not part of a regulatory system involved in the maintenance of critical metabolic processes, they differ for males and females, they differ as a function of species and mating system, and they show ontogenetic and seasonal changes related to endocrine conditions. Exposure to a member of the opposite sex without copulation can be sufficient for sexual reinforcement. However, copulatory access is a stronger reinforcer, and copulatory opportunity can serve to enhance the reinforcing efficacy of stimulus features of a sexual partner. Conversely, under certain conditions, noncopulatory exposure serves to decrease reinforcer efficacy. Many common learning phenomena such as acquisition, extinction, discrimination learning, second-order conditioning, and latent inhibition have been demonstrated in sexual conditioning. These observations extend the generality of findings obtained with more conventional reinforcers, but the mechanisms of these effects and their gender and species specificity remain to be explored. PMID:8354970
Laboratory Sequence in Computational Methods for Introductory Chemistry

NASA Astrophysics Data System (ADS)

Cody, Jason A.; Wiser, Dawn C.

2003-07-01

A four-exercise laboratory sequence for introductory chemistry integrating hands-on, student-centered experience with computer modeling has been designed and implemented. The progression builds from exploration of molecular shapes to intermolecular forces and the impact of those forces on chemical separations made with gas chromatography and distillation. The sequence ends with an exploration of molecular orbitals. The students use the computers as a tool; they build the molecules, submit the calculations, and interpret the results. Because of the construction of the sequence and its placement spanning the semester break, good laboratory notebook practices are reinforced and the continuity of course content and methods between semesters is emphasized. The inclusion of these techniques in the first year of chemistry has had a positive impact on student perceptions and student learning.
Corticostriatal circuit mechanisms of value-based action selection: Implementation of reinforcement learning algorithms and beyond.

PubMed

Morita, Kenji; Jitsev, Jenia; Morrison, Abigail

2016-09-15

Value-based action selection has been suggested to be realized in the corticostriatal local circuits through competition among neural populations. In this article, we review theoretical and experimental studies that have constructed and verified this notion, and provide new perspectives on how the local-circuit selection mechanisms implement reinforcement learning (RL) algorithms and computations beyond them. The striatal neurons are mostly inhibitory, and lateral inhibition among them has been classically proposed to realize "Winner-Take-All (WTA)" selection of the maximum-valued action (i.e., 'max' operation). Although this view has been challenged by the revealed weakness, sparseness, and asymmetry of lateral inhibition, which suggest more complex dynamics, WTA-like competition could still occur on short time scales. Unlike the striatal circuit, the cortical circuit contains recurrent excitation, which may enable retention or temporal integration of information and probabilistic "soft-max" selection. The striatal "max" circuit and the cortical "soft-max" circuit might co-implement an RL algorithm called Q-learning; the cortical circuit might also similarly serve for other algorithms such as SARSA. In these implementations, the cortical circuit presumably sustains activity representing the executed action, which negatively impacts dopamine neurons so that they can calculate reward-prediction-error. Regarding the suggested more complex dynamics of striatal, as well as cortical, circuits on long time scales, which could be viewed as a sequence of short WTA fragments, computational roles remain open: such a sequence might represent (1) sequential state-action-state transitions, constituting replay or simulation of the internal model, (2) a single state/action by the whole trajectory, or (3) probabilistic sampling of state/action. Copyright © 2016. Published by Elsevier B.V.
Impairments in action-outcome learning in schizophrenia.

PubMed

Morris, Richard W; Cyrzon, Chad; Green, Melissa J; Le Pelley, Mike E; Balleine, Bernard W

2018-03-03

Learning the causal relation between actions and their outcomes (AO learning) is critical for goal-directed behavior when actions are guided by desire for the outcome. This can be contrasted with habits that are acquired by reinforcement and primed by prevailing stimuli, in which causal learning plays no part. Recently, we demonstrated that goal-directed actions are impaired in schizophrenia; however, whether this deficit exists alongside impairments in habit or reinforcement learning is unknown. The present study distinguished deficits in causal learning from reinforcement learning in schizophrenia. We tested people with schizophrenia (SZ, n = 25) and healthy adults (HA, n = 25) in a vending machine task. Participants learned two action-outcome contingencies (e.g., push left to get a chocolate M&M, push right to get a cracker), and they also learned one contingency was degraded by delivery of noncontingent outcomes (e.g., free M&Ms), as well as changes in value by outcome devaluation. Both groups learned the best action to obtain rewards; however, SZ did not distinguish the more causal action when one AO contingency was degraded. Moreover, action selection in SZ was insensitive to changes in outcome value unless feedback was provided, and this was related to the deficit in AO learning. The failure to encode the causal relation between action and outcome in schizophrenia occurred without any apparent deficit in reinforcement learning. This implies that poor goal-directed behavior in schizophrenia cannot be explained by a more primary deficit in reward learning such as insensitivity to reward value or reward prediction errors.
The algorithmic anatomy of model-based evaluation

PubMed Central

Daw, Nathaniel D.; Dayan, Peter

2014-01-01

Despite many debates in the first half of the twentieth century, it is now largely a truism that humans and other animals build models of their environments and use them for prediction and control. However, model-based (MB) reasoning presents severe computational challenges. Alternative, computationally simpler, model-free (MF) schemes have been suggested in the reinforcement learning literature, and have afforded influential accounts of behavioural and neural data. Here, we study the realization of MB calculations, and the ways that this might be woven together with MF values and evaluation methods. There are as yet mostly only hints in the literature as to the resulting tapestry, so we offer more preview than review. PMID:25267820
The Effects of a Token Reinforcement System on the Reading and Arithmetic Skills Learnings of Migrant Primary School Pupils.

ERIC Educational Resources Information Center

Heitzman, Andrew J.

The New York State Center for Migrant Studies conducted this 1968 study which investigated effects of token reinforcers on reading and arithmetic skills learnings of migrant primary school students during a 6-week summer school session. Students (Negro and Caucasian) received plastic tokens to reward skills learning responses. Tokens were traded…
The Effects of Observation of Learn Units during Reinforcement and Correction Conditions on the Rate of Learning Math Algorithms by Fifth Grade Students

ERIC Educational Resources Information Center

Neu, Jessica Adele

2013-01-01

I conducted two studies on the comparative effects of the observation of learn units during (a) reinforcement or (b) correction conditions on the acquisition of math objectives. The dependent variables were the within-session cumulative numbers of correct responses emitted during observational sessions. The independent variables were the…
An Evaluation of Pedagogical Tutorial Tactics for a Natural Language Tutoring System: A Reinforcement Learning Approach

ERIC Educational Resources Information Center

Chi, Min; VanLehn, Kurt; Litman, Diane; Jordan, Pamela

2011-01-01

Pedagogical strategies are policies for a tutor to decide the next action when there are multiple actions available. When the content is controlled to be the same across experimental conditions, there has been little evidence that tutorial decisions have an impact on students' learning. In this paper, we applied Reinforcement Learning (RL) to…
The Identification and Establishment of Reinforcement for Collaboration in Elementary Students

ERIC Educational Resources Information Center

Darcy, Laura

2017-01-01

In Experiment 1, I conducted a functional analysis of student rate of learning with and without a peer-yoked contingency for 12 students in Kindergarten through 2nd grade in order to determine if they had conditioned reinforcement for collaboration. Using an ABAB reversal design, I compared rate of learning as measured by learn units to criterion…
Stress enhances model-free reinforcement learning only after negative outcome

PubMed Central

Lee, Daeyeol

2017-01-01

Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i.e., how quickly information from the new environment is incorporated into choices. Participants were randomly assigned to either a stress or a control condition, and performed a two-stage Markov decision-making task in which the reward probabilities underwent periodic reversals without notice. We found that stress increased the contribution of model-free reinforcement learning only after negative outcome. Furthermore, stress decreased the learning rate. The results suggest that stress diminishes one’s ability to make adaptive choices in multiple aspects of reinforcement learning. This finding has implications for understanding how stress facilitates maladaptive habits, such as addictive behavior, and other dysfunctional behaviors associated with stress in clinical and educational contexts. PMID:28723943

Stress enhances model-free reinforcement learning only after negative outcome.

PubMed

Park, Heyeon; Lee, Daeyeol; Chey, Jeanyung

2017-01-01

Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i.e., how quickly information from the new environment is incorporated into choices. Participants were randomly assigned to either a stress or a control condition, and performed a two-stage Markov decision-making task in which the reward probabilities underwent periodic reversals without notice. We found that stress increased the contribution of model-free reinforcement learning only after negative outcome. Furthermore, stress decreased the learning rate. The results suggest that stress diminishes one's ability to make adaptive choices in multiple aspects of reinforcement learning. This finding has implications for understanding how stress facilitates maladaptive habits, such as addictive behavior, and other dysfunctional behaviors associated with stress in clinical and educational contexts.
Implicit chaining in cotton-top tamarins (Saguinus oedipus) with elements equated for probability of reinforcement

PubMed Central

Dillon, Laura; Collins, Meaghan; Conway, Maura; Cunningham, Kate

2013-01-01

Three experiments examined the implicit learning of sequences under conditions in which the elements comprising a sequence were equated in terms of reinforcement probability. In Experiment 1 cotton-top tamarins (Saguinus oedipus) experienced a five-element sequence displayed serially on a touch screen in which reinforcement probability was equated across elements at .16 per element. Tamarins demonstrated learning of this sequence with higher latencies during a random test as compared to baseline sequence training. In Experiments 2 and 3, manipulations of the procedure used in the first experiment were undertaken to rule out a confound owing to the fact that the elements in Experiment 1 bore different temporal relations to the intertrial interval (ITI), an inhibitory period. The results of Experiments 2 and 3 indicated that the implicit learning observed in Experiment 1 was not due to temporal proximity between some elements and the inhibitory ITI. The results taken together support two conclusion: First that tamarins engaged in sequence learning whether or not there was contingent reinforcement for learning the sequence, and second that this learning was not due to subtle differences in associative strength between the elements of the sequence. PMID:23344718
Improving the Science Excursion: An Educational Technologist's View

ERIC Educational Resources Information Center

Balson, M.

1973-01-01

Analyzes the nature of the learning process and attempts to show how the three components of a reinforcement contingency, the stimulus, the response and the reinforcement can be utilized to increase the efficiency of a typical science learning experience, the excursion. (JR)
Vector-based navigation using grid-like representations in artificial agents.

PubMed

Banino, Andrea; Barry, Caswell; Uria, Benigno; Blundell, Charles; Lillicrap, Timothy; Mirowski, Piotr; Pritzel, Alexander; Chadwick, Martin J; Degris, Thomas; Modayil, Joseph; Wayne, Greg; Soyer, Hubert; Viola, Fabio; Zhang, Brian; Goroshin, Ross; Rabinowitz, Neil; Pascanu, Razvan; Beattie, Charlie; Petersen, Stig; Sadik, Amir; Gaffney, Stephen; King, Helen; Kavukcuoglu, Koray; Hassabis, Demis; Hadsell, Raia; Kumaran, Dharshan

2018-05-01

Deep neural networks have achieved impressive successes in fields ranging from object recognition to complex games such as Go 1,2 . Navigation, however, remains a substantial challenge for artificial agents, with deep neural networks trained by reinforcement learning 3-5 failing to rival the proficiency of mammalian spatial behaviour, which is underpinned by grid cells in the entorhinal cortex 6 . Grid cells are thought to provide a multi-scale periodic representation that functions as a metric for coding space 7,8 and is critical for integrating self-motion (path integration) 6,7,9 and planning direct trajectories to goals (vector-based navigation) 7,10,11 . Here we set out to leverage the computational functions of grid cells to develop a deep reinforcement learning agent with mammal-like navigational abilities. We first trained a recurrent network to perform path integration, leading to the emergence of representations resembling grid cells, as well as other entorhinal cell types 12 . We then showed that this representation provided an effective basis for an agent to locate goals in challenging, unfamiliar, and changeable environments-optimizing the primary objective of navigation through deep reinforcement learning. The performance of agents endowed with grid-like representations surpassed that of an expert human and comparison agents, with the metric quantities necessary for vector-based navigation derived from grid-like units within the network. Furthermore, grid-like representations enabled agents to conduct shortcut behaviours reminiscent of those performed by mammals. Our findings show that emergent grid-like representations furnish agents with a Euclidean spatial metric and associated vector operations, providing a foundation for proficient navigation. As such, our results support neuroscientific theories that see grid cells as critical for vector-based navigation 7,10,11 , demonstrating that the latter can be combined with path-based strategies to support navigation in challenging environments.
Gaze-contingent reinforcement learning reveals incentive value of social signals in young children and adults

PubMed Central

Smith, Tim J.; Senju, Atsushi

2017-01-01

While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue–reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue–reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. PMID:28250186
Gaze-contingent reinforcement learning reveals incentive value of social signals in young children and adults.

PubMed

Vernetti, Angélina; Smith, Tim J; Senju, Atsushi

2017-03-15

While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. © 2017 The Authors.
Learning and altering behaviours by reinforcement: neurocognitive differences between children and adults.

PubMed

Shephard, E; Jackson, G M; Groom, M J

2014-01-01

This study examined neurocognitive differences between children and adults in the ability to learn and adapt simple stimulus-response associations through feedback. Fourteen typically developing children (mean age=10.2) and 15 healthy adults (mean age=25.5) completed a simple task in which they learned to associate visually presented stimuli with manual responses based on performance feedback (acquisition phase), and then reversed and re-learned those associations following an unexpected change in reinforcement contingencies (reversal phase). Electrophysiological activity was recorded throughout task performance. We found no group differences in learning-related changes in performance (reaction time, accuracy) or in the amplitude of event-related potentials (ERPs) associated with stimulus processing (P3 ERP) or feedback processing (feedback-related negativity; FRN) during the acquisition phase. However, children's performance was significantly more disrupted by the reversal than adults and FRN amplitudes were significantly modulated by the reversal phase in children but not adults. These findings indicate that children have specific difficulties with reinforcement learning when acquired behaviours must be altered. This may be caused by the added demands on immature executive functioning, specifically response monitoring, created by the requirement to reverse the associations, or a developmental difference in the way in which children and adults approach reinforcement learning. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.
Reinforcement Learning with Orthonormal Basis Adaptation Based on Activity-Oriented Index Allocation

NASA Astrophysics Data System (ADS)

Satoh, Hideki

An orthonormal basis adaptation method for function approximation was developed and applied to reinforcement learning with multi-dimensional continuous state space. First, a basis used for linear function approximation of a control function is set to an orthonormal basis. Next, basis elements with small activities are replaced with other candidate elements as learning progresses. As this replacement is repeated, the number of basis elements with large activities increases. Example chaos control problems for multiple logistic maps were solved, demonstrating that the method for adapting an orthonormal basis can modify a basis while holding the orthonormality in accordance with changes in the environment to improve the performance of reinforcement learning and to eliminate the adverse effects of redundant noisy states.
Construction of multi-agent mobile robots control system in the problem of persecution with using a modified reinforcement learning method based on neural networks

NASA Astrophysics Data System (ADS)

Patkin, M. L.; Rogachev, G. N.

2018-02-01

A method for constructing a multi-agent control system for mobile robots based on training with reinforcement using deep neural networks is considered. Synthesis of the management system is proposed to be carried out with reinforcement training and the modified Actor-Critic method, in which the Actor module is divided into Action Actor and Communication Actor in order to simultaneously manage mobile robots and communicate with partners. Communication is carried out by sending partners at each step a vector of real numbers that are added to the observation vector and affect the behaviour. Functions of Actors and Critic are approximated by deep neural networks. The Critics value function is trained by using the TD-error method and the Actor’s function by using DDPG. The Communication Actor’s neural network is trained through gradients received from partner agents. An environment in which a cooperative multi-agent interaction is present was developed, computer simulation of the application of this method in the control problem of two robots pursuing two goals was carried out.
Flow Navigation by Smart Microswimmers via Reinforcement Learning

NASA Astrophysics Data System (ADS)

Colabrese, Simona; Biferale, Luca; Celani, Antonio; Gustavsson, Kristian

2017-11-01

We have numerically modeled active particles which are able to acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. We show that those swimmers can learn effective strategies just by experience, using a reinforcement learning algorithm. As an example, we focus on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, exploiting the underlying flow whenever possible. The reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This work paves the way towards the engineering of smart microswimmers that solve difficult navigation problems. ERC AdG NewTURB 339032.
Neural correlates of reinforcement learning and social preferences in competitive bidding.

PubMed

van den Bos, Wouter; Talwar, Arjun; McClure, Samuel M

2013-01-30

In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.
Confidence and psychosis: a neuro-computational account of contingency learning disruption by NMDA blockade

PubMed Central

Vinckier, F; Gaillard, R; Palminteri, S; Rigoux, L; Salvador, A; Fornito, A; Adapa, R; Krebs, M O; Pessiglione, M; Fletcher, P C

2016-01-01

A state of pathological uncertainty about environmental regularities might represent a key step in the pathway to psychotic illness. Early psychosis can be investigated in healthy volunteers under ketamine, an NMDA receptor antagonist. Here, we explored the effects of ketamine on contingency learning using a placebo-controlled, double-blind, crossover design. During functional magnetic resonance imaging, participants performed an instrumental learning task, in which cue-outcome contingencies were probabilistic and reversed between blocks. Bayesian model comparison indicated that in such an unstable environment, reinforcement learning parameters are downregulated depending on confidence level, an adaptive mechanism that was specifically disrupted by ketamine administration. Drug effects were underpinned by altered neural activity in a fronto-parietal network, which reflected the confidence-based shift to exploitation of learned contingencies. Our findings suggest that an early characteristic of psychosis lies in a persistent doubt that undermines the stabilization of behavioral policy resulting in a failure to exploit regularities in the environment. PMID:26055423
Confidence and psychosis: a neuro-computational account of contingency learning disruption by NMDA blockade.

PubMed

Vinckier, F; Gaillard, R; Palminteri, S; Rigoux, L; Salvador, A; Fornito, A; Adapa, R; Krebs, M O; Pessiglione, M; Fletcher, P C

2016-07-01

A state of pathological uncertainty about environmental regularities might represent a key step in the pathway to psychotic illness. Early psychosis can be investigated in healthy volunteers under ketamine, an NMDA receptor antagonist. Here, we explored the effects of ketamine on contingency learning using a placebo-controlled, double-blind, crossover design. During functional magnetic resonance imaging, participants performed an instrumental learning task, in which cue-outcome contingencies were probabilistic and reversed between blocks. Bayesian model comparison indicated that in such an unstable environment, reinforcement learning parameters are downregulated depending on confidence level, an adaptive mechanism that was specifically disrupted by ketamine administration. Drug effects were underpinned by altered neural activity in a fronto-parietal network, which reflected the confidence-based shift to exploitation of learned contingencies. Our findings suggest that an early characteristic of psychosis lies in a persistent doubt that undermines the stabilization of behavioral policy resulting in a failure to exploit regularities in the environment.
Reward-based training of recurrent neural networks for cognitive and value-based tasks

PubMed Central

Song, H Francis; Yang, Guangyu R; Wang, Xiao-Jing

2017-01-01

Trained neural network models, which exhibit features of neural activity recorded from behaving animals, may provide insights into the circuit mechanisms of cognitive functions through systematic analysis of network activity and connectivity. However, in contrast to the graded error signals commonly used to train networks through supervised learning, animals learn from reward feedback on definite actions through reinforcement learning. Reward maximization is particularly relevant when optimal behavior depends on an animal’s internal judgment of confidence or subjective preferences. Here, we implement reward-based training of recurrent neural networks in which a value network guides learning by using the activity of the decision network to predict future reward. We show that such models capture behavioral and electrophysiological findings from well-known experimental paradigms. Our work provides a unified framework for investigating diverse cognitive and value-based computations, and predicts a role for value representation that is essential for learning, but not executing, a task. DOI: http://dx.doi.org/10.7554/eLife.21492.001 PMID:28084991
Attentional Selection Can Be Predicted by Reinforcement Learning of Task-relevant Stimulus Features Weighted by Value-independent Stickiness.

PubMed

Balcarras, Matthew; Ardid, Salva; Kaping, Daniel; Everling, Stefan; Womelsdorf, Thilo

2016-02-01

Attention includes processes that evaluate stimuli relevance, select the most relevant stimulus against less relevant stimuli, and bias choice behavior toward the selected information. It is not clear how these processes interact. Here, we captured these processes in a reinforcement learning framework applied to a feature-based attention task that required macaques to learn and update the value of stimulus features while ignoring nonrelevant sensory features, locations, and action plans. We found that value-based reinforcement learning mechanisms could account for feature-based attentional selection and choice behavior but required a value-independent stickiness selection process to explain selection errors while at asymptotic behavior. By comparing different reinforcement learning schemes, we found that trial-by-trial selections were best predicted by a model that only represents expected values for the task-relevant feature dimension, with nonrelevant stimulus features and action plans having only a marginal influence on covert selections. These findings show that attentional control subprocesses can be described by (1) the reinforcement learning of feature values within a restricted feature space that excludes irrelevant feature dimensions, (2) a stochastic selection process on feature-specific value representations, and (3) value-independent stickiness toward previous feature selections akin to perseveration in the motor domain. We speculate that these three mechanisms are implemented by distinct but interacting brain circuits and that the proposed formal account of feature-based stimulus selection will be important to understand how attentional subprocesses are implemented in primate brain networks.
An adaptive brain actuated system for augmenting rehabilitation

PubMed Central

Roset, Scott A.; Gant, Katie; Prasad, Abhishek; Sanchez, Justin C.

2014-01-01

For people living with paralysis, restoration of hand function remains the top priority because it leads to independence and improvement in quality of life. In approaches to restore hand and arm function, a goal is to better engage voluntary control and counteract maladaptive brain reorganization that results from non-use. Standard rehabilitation augmented with developments from the study of brain-computer interfaces could provide a combined therapy approach for motor cortex rehabilitation and to alleviate motor impairments. In this paper, an adaptive brain-computer interface system intended for application to control a functional electrical stimulation (FES) device is developed as an experimental test bed for augmenting rehabilitation with a brain-computer interface. The system's performance is improved throughout rehabilitation by passive user feedback and reinforcement learning. By continuously adapting to the user's brain activity, similar adaptive systems could be used to support clinical brain-computer interface neurorehabilitation over multiple days. PMID:25565945
Extinction of Pavlovian conditioning: The influence of trial number and reinforcement history.

PubMed

Chan, C K J; Harris, Justin A

2017-08-01

Pavlovian conditioning is sensitive to the temporal relationship between the conditioned stimulus (CS) and the unconditioned stimulus (US). This has motivated models that describe learning as a process that continuously updates associative strength during the trial or specifically encodes the CS-US interval. These models predict that extinction of responding is also continuous, such that response loss is proportional to the cumulative duration of exposure to the CS without the US. We review evidence showing that this prediction is incorrect, and that extinction is trial-based rather than time-based. We also present two experiments that test the importance of trials versus time on the Partial Reinforcement Extinction Effect (PREE), in which responding extinguishes more slowly for a CS that was inconsistently reinforced with the US than for a consistently reinforced one. We show that increasing the number of extinction trials of the partially reinforced CS, relative to the consistently reinforced CS, overcomes the PREE. However, increasing the duration of extinction trials by the same amount does not overcome the PREE. We conclude that animals learn about the likelihood of the US per trial during conditioning, and learn trial-by-trial about the absence of the US during extinction. Moreover, what they learn about the likelihood of the US during conditioning affects how sensitive they are to the absence of the US during extinction. Copyright © 2017 Elsevier B.V. All rights reserved.
Can Service Learning Reinforce Social and Cultural Bias? Exploring a Popular Model of Family Involvement for Early Childhood Teacher Candidates

ERIC Educational Resources Information Center

Dunn-Kenney, Maylan

2010-01-01

Service learning is often used in teacher education as a way to challenge social bias and provide teacher candidates with skills needed to work in partnership with diverse families. Although some literature suggests that service learning could reinforce cultural bias, there is little documentation. In a study of 21 early childhood teacher…
Deep Gate Recurrent Neural Network

DTIC Science & Technology

2016-11-22

Schmidhuber. A system for robotic heart surgery that learns to tie knots using recurrent neural networks. In IEEE International Conference on...tasks, such as Machine Translation (Bahdanau et al. (2015)) or Robot Reinforcement Learning (Bakker (2001)). The main idea behind these networks is to...and J. Peters. Reinforcement learning in robotics : A survey. The International Journal of Robotics Research, 32:1238–1274, 2013. ISSN 0278-3649. doi
Memristive device based learning for navigation in robots.

PubMed

Sarim, Mohammad; Kumar, Manish; Jha, Rashmi; Minai, Ali A

2017-11-08

Biomimetic robots have gained attention recently for various applications ranging from resource hunting to search and rescue operations during disasters. Biological species are known to intuitively learn from the environment, gather and process data, and make appropriate decisions. Such sophisticated computing capabilities in robots are difficult to achieve, especially if done in real-time with ultra-low energy consumption. Here, we present a novel memristive device based learning architecture for robots. Two terminal memristive devices with resistive switching of oxide layer are modeled in a crossbar array to develop a neuromorphic platform that can impart active real-time learning capabilities in a robot. This approach is validated by navigating a robot vehicle in an unknown environment with randomly placed obstacles. Further, the proposed scheme is compared with reinforcement learning based algorithms using local and global knowledge of the environment. The simulation as well as experimental results corroborate the validity and potential of the proposed learning scheme for robots. The results also show that our learning scheme approaches an optimal solution for some environment layouts in robot navigation.

Conceptualizing withdrawal-induced escalation of alcohol self-administration as a learned, plasticity-dependent process

PubMed Central

Walker, Brendan M.

2013-01-01

This article represents one of five contributions focusing on the topic “Plasticity and neuroadaptive responses within the extended amygdala in response to chronic or excessive alcohol exposure” that were developed by awardees participating in the Young Investigator Award Symposium at the “Alcoholism and Stress: A Framework for Future Treatment Strategies” conference in Volterra, Italy on May 3–6, 2011 that was organized/chaired by Drs. Antonio Noronha and Fulton Crews and sponsored by the National Institute on Alcohol Abuse and Alcoholism. This review discusses the dependence-induced neuroadaptations in affective systems that provide a basis for negative reinforcement learning and presents evidence demonstrating that escalated alcohol consumption during withdrawal is a learned, plasticity-dependent process. The review concludes by identifying changes within extended amygdala dynorphin/kappa-opioid receptor systems that could serve as the foundation for the occurrence of negative reinforcement processes. While some evidence contained herein may be specific to alcohol dependence-related learning and plasticity, much of the information will be of relevance to any addictive disorder involving negative reinforcement mechanisms. Collectively, the information presented within this review provides a framework to assess the negative reinforcing effects of alcohol in a manner that distinguishes neuroadaptations produced by chronic alcohol exposure from the actual plasticity that is associated with negative reinforcement learning in dependent organisms. PMID:22459874
Reinforcement Learning Strategies for Clinical Trials in Non-small Cell Lung Cancer

PubMed Central

Zhao, Yufan; Zeng, Donglin; Socinski, Mark A.; Kosorok, Michael R.

2010-01-01

Summary Typical regimens for advanced metastatic stage IIIB/IV non-small cell lung cancer (NSCLC) consist of multiple lines of treatment. We present an adaptive reinforcement learning approach to discover optimal individualized treatment regimens from a specially designed clinical trial (a “clinical reinforcement trial”) of an experimental treatment for patients with advanced NSCLC who have not been treated previously with systemic therapy. In addition to the complexity of the problem of selecting optimal compounds for first and second-line treatments based on prognostic factors, another primary goal is to determine the optimal time to initiate second-line therapy, either immediately or delayed after induction therapy, yielding the longest overall survival time. A reinforcement learning method called Q-learning is utilized which involves learning an optimal regimen from patient data generated from the clinical reinforcement trial. Approximating the Q-function with time-indexed parameters can be achieved by using a modification of support vector regression which can utilize censored data. Within this framework, a simulation study shows that the procedure can extract optimal regimens for two lines of treatment directly from clinical data without prior knowledge of the treatment effect mechanism. In addition, we demonstrate that the design reliably selects the best initial time for second-line therapy while taking into account the heterogeneity of NSCLC across patients. PMID:21385164
German dental faculty attitudes towards computer-assisted learning and their correlation with personal and professional profiles.

PubMed

Welk, A; Rosin, M; Seyer, D; Splieth, C; Siemer, M; Meyer, G

2005-08-01

Compared with its potential, computer technology use is still lacking in medical/dental education. To investigate the primary advantages of computer-assisted learning (CAL) systems in German dental education, as well as the reasons for their relatively low degree of use correlated with personal and professional profiles of respondents. A questionnaire was mailed to heads in the departments of conservative dentistry and prosthetic dentistry in all dental schools in Germany. Besides investigating the advantages and barriers to the use of computer technology, the questionnaire also contained questions regarding each respondent's gender, age, academic rank, experience in academia and computer skills. The response rate to the questionnaire was 90% (112 of 125). The results indicated a distinct discrepancy between the desire for and actual occurrence of lectures, seminars, etc. to instruct students in ways to search for and acquire knowledge, especially using computer technology. The highest-ranked advantages of CAL systems in order, as seen by respondents, were the possibilities for individual learning, increased motivation, and both objective theoretical tests and practical tests. The highest-ranked reasons for the low degree of usage of CAL systems in order were the inability to finance, followed equally by a lack of studies of CAL and poor cost-advantage ratio, and too much effort required to integrate CAL into the curriculum. Moreover, the higher the computer skills of the respondents, the more they noted insufficient quality of CAL systems (r = 0.200, P = 0.035) and content differences from their own dental faculty's expert opinions (r = 0.228, P = 0.016) as reasons for low use. The correlations of the attitudes towards CAL with the personal and professional profiles showed not only statistical significant reinforcements of, but also interesting deviations from, the average responses.
Depression, Activity, and Evaluation of Reinforcement

ERIC Educational Resources Information Center

Hammen, Constance L.; Glass, David R., Jr.

1975-01-01

This research attempted to find the causal relation between mood and level of reinforcement. An effort was made to learn what mood change might occur if depressed subjects increased their levels of participation in reinforcing activities. (Author/RK)
What Can Reinforcement Learning Teach Us About Non-Equilibrium Quantum Dynamics

NASA Astrophysics Data System (ADS)

Bukov, Marin; Day, Alexandre; Sels, Dries; Weinberg, Phillip; Polkovnikov, Anatoli; Mehta, Pankaj

Equilibrium thermodynamics and statistical physics are the building blocks of modern science and technology. Yet, our understanding of thermodynamic processes away from equilibrium is largely missing. In this talk, I will reveal the potential of what artificial intelligence can teach us about the complex behaviour of non-equilibrium systems. Specifically, I will discuss the problem of finding optimal drive protocols to prepare a desired target state in quantum mechanical systems by applying ideas from Reinforcement Learning [one can think of Reinforcement Learning as the study of how an agent (e.g. a robot) can learn and perfect a given policy through interactions with an environment.]. The driving protocols learnt by our agent suggest that the non-equilibrium world features possibilities easily defying intuition based on equilibrium physics.
Kinesthetic Reinforcement-Is It a Boon to Learning?

ERIC Educational Resources Information Center

Bohrer, Roxilu K.

1970-01-01

Language instruction, particularly in the elementary school, should be reinforced through the use of visual aids and through associated physical activity. Kinesthetic experiences provide an opportunity to make use of non-verbal cues to meaning, enliven classroom activities, and maximize learning for pupils. The author discusses the educational…
Reinforcing Basic Skills Through Social Studies. Grades 4-7.

ERIC Educational Resources Information Center

Lewis, Teresa Marie

Arranged into seven parts, this document provides a variety of games and activities, bulletin board ideas, overhead transparencies, student handouts, and learning station ideas to help reinforce basic social studies skills in the intermediate grades. In part 1, students learn about timelines, first constructing their own life timeline, then a…
Effects of Reinforcement on Peer Imitation in a Small Group Play Context

ERIC Educational Resources Information Center

Barton, Erin E.; Ledford, Jennifer R.

2018-01-01

Children with disabilities often have deficits in imitation skills, particularly in imitating peers. Imitation is considered a behavioral cusp--which, once learned, allows a child to access additional and previously unavailable learning opportunities. In the current study, researchers examined the efficacy of contingent reinforcement delivered…
Neurofeedback in Learning Disabled Children: Visual versus Auditory Reinforcement.

PubMed

Fernández, Thalía; Bosch-Bayard, Jorge; Harmony, Thalía; Caballero, María I; Díaz-Comas, Lourdes; Galán, Lídice; Ricardo-Garcell, Josefina; Aubert, Eduardo; Otero-Ojeda, Gloria

2016-03-01

Children with learning disabilities (LD) frequently have an EEG characterized by an excess of theta and a deficit of alpha activities. NFB using an auditory stimulus as reinforcer has proven to be a useful tool to treat LD children by positively reinforcing decreases of the theta/alpha ratio. The aim of the present study was to optimize the NFB procedure by comparing the efficacy of visual (with eyes open) versus auditory (with eyes closed) reinforcers. Twenty LD children with an abnormally high theta/alpha ratio were randomly assigned to the Auditory or the Visual group, where a 500 Hz tone or a visual stimulus (a white square), respectively, was used as a positive reinforcer when the value of the theta/alpha ratio was reduced. Both groups had signs consistent with EEG maturation, but only the Auditory Group showed behavioral/cognitive improvements. In conclusion, the auditory reinforcer was more efficacious in reducing the theta/alpha ratio, and it improved the cognitive abilities more than the visual reinforcer.
Solving the quantum many-body problem with artificial neural networks

NASA Astrophysics Data System (ADS)

Carleo, Giuseppe; Troyer, Matthias

2017-02-01

The challenge posed by the many-body problem in quantum physics originates from the difficulty of describing the nontrivial correlations encoded in the exponential complexity of the many-body wave function. Here we demonstrate that systematic machine learning of the wave function can reduce this complexity to a tractable computational form for some notable cases of physical interest. We introduce a variational representation of quantum states based on artificial neural networks with a variable number of hidden neurons. A reinforcement-learning scheme we demonstrate is capable of both finding the ground state and describing the unitary time evolution of complex interacting quantum systems. Our approach achieves high accuracy in describing prototypical interacting spins models in one and two dimensions.
Reinforcement learning agents providing advice in complex video games

NASA Astrophysics Data System (ADS)

Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa

2014-01-01

This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.
The probability of reinforcement per trial affects posttrial responding and subsequent extinction but not within-trial responding.

PubMed

Harris, Justin A; Kwok, Dorothy W S

2018-01-01

During magazine approach conditioning, rats do not discriminate between a conditional stimulus (CS) that is consistently reinforced with food and a CS that is occasionally (partially) reinforced, as long as the CSs have the same overall reinforcement rate per second. This implies that rats are indifferent to the probability of reinforcement per trial. However, in the same rats, the per-trial reinforcement rate will affect subsequent extinction-responding extinguishes more rapidly for a CS that was consistently reinforced than for a partially reinforced CS. Here, we trained rats with consistently and partially reinforced CSs that were matched for overall reinforcement rate per second. We measured conditioned responding both during and immediately after the CSs. Differences in the per-trial probability of reinforcement did not affect the acquisition of responding during the CS but did affect subsequent extinction of that responding, and also affected the post-CS response rates during conditioning. Indeed, CSs with the same probability of reinforcement per trial evoked the same amount of post-CS responding even when they differed in overall reinforcement rate and thus evoked different amounts of responding during the CS. We conclude that reinforcement rate per second controls rats' acquisition of responding during the CS, but at the same time, rats also learn specifically about the probability of reinforcement per trial. The latter learning affects the rats' expectation of reinforcement as an outcome of the trial, which influences their ability to detect retrospectively that an opportunity for reinforcement was missed, and, in turn, drives extinction. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Establishment and Maintenance of Socially Learned Conditioned Reinforcement in Young Children: Elimination of the Role of Adults and View of Peers' Faces

ERIC Educational Resources Information Center

Zrinzo, Michelle; Greer, R. Douglas

2013-01-01

Prior research has demonstrated the establishment of reinforcers for learning and maintenance with young children as a function of social learning where a peer and an adult experimenter were present. The presence of an adult experimenter was eliminated in the present study to test if the effect produced in the prior studies would occur with only…
Structure identification in fuzzy inference using reinforcement learning

NASA Technical Reports Server (NTRS)

Berenji, Hamid R.; Khedkar, Pratap

1993-01-01

In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.
The cognitive architecture of anxiety-like behavioral inhibition.

PubMed

Bach, Dominik R

2017-01-01

The combination of reward and potential threat is termed approach/avoidance conflict and elicits specific behaviors, including passive avoidance and behavioral inhibition (BI). Anxiety-relieving drugs reduce these behaviors, and a rich psychological literature has addressed how personality traits dominated by BI predispose for anxiety disorders. Yet, a formal understanding of the cognitive inference and planning processes underlying anxiety-like BI is lacking. Here, we present and empirically test such formalization in the terminology of reinforcement learning. We capitalize on a human computer game in which participants collect sequentially appearing monetary tokens while under threat of virtual "predation." First, we demonstrate that humans modulate BI according to experienced consequences. This suggests an instrumental implementation of BI generation rather than a Pavlovian mechanism that is agnostic about action outcomes. Second, an internal model that would make BI adaptive is expressed in an independent task that involves no threat. The existence of such internal model is a necessary condition to conclude that BI is under model-based control. These findings relate a plethora of human and nonhuman observations on BI to reinforcement learning theory, and crucially constrain the quest for its neural implementation. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Distribution majorization of corner points by reinforcement learning for moving object detection

NASA Astrophysics Data System (ADS)

Wu, Hao; Yu, Hao; Zhou, Dongxiang; Cheng, Yongqiang

2018-04-01

Corner points play an important role in moving object detection, especially in the case of free-moving camera. Corner points provide more accurate information than other pixels and reduce the computation which is unnecessary. Previous works only use intensity information to locate the corner points, however, the information that former and the last frames provided also can be used. We utilize the information to focus on more valuable area and ignore the invaluable area. The proposed algorithm is based on reinforcement learning, which regards the detection of corner points as a Markov process. In the Markov model, the video to be detected is regarded as environment, the selections of blocks for one corner point are regarded as actions and the performance of detection is regarded as state. Corner points are assigned to be the blocks which are seperated from original whole image. Experimentally, we select a conventional method which uses marching and Random Sample Consensus algorithm to obtain objects as the main framework and utilize our algorithm to improve the result. The comparison between the conventional method and the same one with our algorithm show that our algorithm reduce 70% of the false detection.
Relationship between Reinforcement and Eye Movements during Ocular Motor Training with Learning Disabled Children.

ERIC Educational Resources Information Center

Punnett, Audrey F.; Steinhauer, Gene D.

1984-01-01

Four reading disabled children were given eight sessions of ocular motor training with reinforcement and eight sessions without reinforcement. Two reading disabled control Ss were treated similarly but received no ocular motor training. Results demonstrated that reinforcement can improve ocular motor skills, which in turn elevates reading…
Learning the specific quality of taste reinforcement in larval Drosophila.

PubMed

Schleyer, Michael; Miura, Daisuke; Tanimura, Teiichi; Gerber, Bertram

2015-01-27

The only property of reinforcement insects are commonly thought to learn about is its value. We show that larval Drosophila not only remember the value of reinforcement (How much?), but also its quality (What?). This is demonstrated both within the appetitive domain by using sugar vs amino acid as different reward qualities, and within the aversive domain by using bitter vs high-concentration salt as different qualities of punishment. From the available literature, such nuanced memories for the quality of reinforcement are unexpected and pose a challenge to present models of how insect memory is organized. Given that animals as simple as larval Drosophila, endowed with but 10,000 neurons, operate with both reinforcement value and quality, we suggest that both are fundamental aspects of mnemonic processing-in any brain.
The partial-reinforcement extinction effect and the contingent-sampling hypothesis.

PubMed

Hochman, Guy; Erev, Ido

2013-12-01

The partial-reinforcement extinction effect (PREE) implies that learning under partial reinforcements is more robust than learning under full reinforcements. While the advantages of partial reinforcements have been well-documented in laboratory studies, field research has failed to support this prediction. In the present study, we aimed to clarify this pattern. Experiment 1 showed that partial reinforcements increase the tendency to select the promoted option during extinction; however, this effect is much smaller than the negative effect of partial reinforcements on the tendency to select the promoted option during the training phase. Experiment 2 demonstrated that the overall effect of partial reinforcements varies inversely with the attractiveness of the alternative to the promoted behavior: The overall effect is negative when the alternative is relatively attractive, and positive when the alternative is relatively unattractive. These results can be captured with a contingent-sampling model assuming that people select options that provided the best payoff in similar past experiences. The best fit was obtained under the assumption that similarity is defined by the sequence of the last four outcomes.
Tiger salamanders' (Ambystoma tigrinum) response learning and usage of visual cues.

PubMed

Kundey, Shannon M A; Millar, Roberto; McPherson, Justin; Gonzalez, Maya; Fitz, Aleyna; Allen, Chadbourne

2016-05-01

We explored tiger salamanders' (Ambystoma tigrinum) learning to execute a response within a maze as proximal visual cue conditions varied. In Experiment 1, salamanders learned to turn consistently in a T-maze for reinforcement before the maze was rotated. All learned the initial task and executed the trained turn during test, suggesting that they learned to demonstrate the reinforced response during training and continued to perform it during test. In a second experiment utilizing a similar procedure, two visual cues were placed consistently at the maze junction. Salamanders were reinforced for turning towards one cue. Cue placement was reversed during test. All learned the initial task, but executed the trained turn rather than turning towards the visual cue during test, evidencing response learning. In Experiment 3, we investigated whether a compound visual cue could control salamanders' behaviour when it was the only cue predictive of reinforcement in a cross-maze by varying start position and cue placement. All learned to turn in the direction indicated by the compound visual cue, indicating that visual cues can come to control their behaviour. Following training, testing revealed that salamanders attended to stimuli foreground over background features. Overall, these results suggest that salamanders learn to execute responses over learning to use visual cues but can use visual cues if required. Our success with this paradigm offers the potential in future studies to explore salamanders' cognition further, as well as to shed light on how features of the tiger salamanders' life history (e.g. hibernation and metamorphosis) impact cognition.

A neurocomputational account of reward and novelty processing and effects of psychostimulants in attention deficit hyperactivity disorder.

PubMed

Sethi, Arjun; Voon, Valerie; Critchley, Hugo D; Cercignani, Mara; Harrison, Neil A

2018-05-01

Computational models of reinforcement learning have helped dissect discrete components of reward-related function and characterize neurocognitive deficits in psychiatric illnesses. Stimulus novelty biases decision-making, even when unrelated to choice outcome, acting as if possessing intrinsic reward value to guide decisions toward uncertain options. Heightened novelty seeking is characteristic of attention deficit hyperactivity disorder, yet how this influences reward-related decision-making is computationally encoded, or is altered by stimulant medication, is currently uncertain. Here we used an established reinforcement-learning task to model effects of novelty on reward-related behaviour during functional MRI in 30 adults with attention deficit hyperactivity disorder and 30 age-, sex- and IQ-matched control subjects. Each participant was tested on two separate occasions, once ON and once OFF stimulant medication. OFF medication, patients with attention deficit hyperactivity disorder showed significantly impaired task performance (P = 0.027), and greater selection of novel options (P = 0.004). Moreover, persistence in selecting novel options predicted impaired task performance (P = 0.025). These behavioural deficits were accompanied by a significantly lower learning rate (P = 0.011) and heightened novelty signalling within the substantia nigra/ventral tegmental area (family-wise error corrected P < 0.05). Compared to effects in controls, stimulant medication improved attention deficit hyperactivity disorder participants' overall task performance (P = 0.011), increased reward-learning rates (P = 0.046) and enhanced their ability to differentiate optimal from non-optimal novel choices (P = 0.032). It also reduced substantia nigra/ventral tegmental area responses to novelty. Preliminary cross-sectional evidence additionally suggested an association between long-term stimulant treatment and a reduction in the rewarding value of novelty. These data suggest that aberrant substantia nigra/ventral tegmental area novelty processing plays an important role in the suboptimal reward-related decision-making characteristic of attention deficit hyperactivity disorder. Compared to effects in controls, abnormalities in novelty processing and reward-related learning were improved by stimulant medication, suggesting that they may be disorder-specific targets for the pharmacological management of attention deficit hyperactivity disorder symptoms.
Absence of “Warm-Up” during Active Avoidance Learning in a Rat Model of Anxiety Vulnerability: Insights from Computational Modeling

PubMed Central

Myers, Catherine E.; Smith, Ian M.; Servatius, Richard J.; Beck, Kevin D.

2014-01-01

Avoidance behaviors, in which a learned response causes omission of an upcoming punisher, are a core feature of many psychiatric disorders. While reinforcement learning (RL) models have been widely used to study the development of appetitive behaviors, less attention has been paid to avoidance. Here, we present a RL model of lever-press avoidance learning in Sprague-Dawley (SD) rats and in the inbred Wistar Kyoto (WKY) rat, which has been proposed as a model of anxiety vulnerability. We focus on “warm-up,” transiently decreased avoidance responding at the start of a testing session, which is shown by SD but not WKY rats. We first show that a RL model can correctly simulate key aspects of acquisition, extinction, and warm-up in SD rats; we then show that WKY behavior can be simulated by altering three model parameters, which respectively govern the tendency to explore new behaviors vs. exploit previously reinforced ones, the tendency to repeat previous behaviors regardless of reinforcement, and the learning rate for predicting future outcomes. This suggests that several, dissociable mechanisms may contribute independently to strain differences in behavior. The model predicts that, if the “standard” inter-session interval is shortened from 48 to 24 h, SD rats (but not WKY) will continue to show warm-up; we confirm this prediction in an empirical study with SD and WKY rats. The model further predicts that SD rats will continue to show warm-up with inter-session intervals as short as a few minutes, while WKY rats will not show warm-up, even with inter-session intervals as long as a month. Together, the modeling and empirical data indicate that strain differences in warm-up are qualitative rather than just the result of differential sensitivity to task variables. Understanding the mechanisms that govern expression of warm-up behavior in avoidance may lead to better understanding of pathological avoidance, and potential pathways to modify these processes. PMID:25183956
Intelligent multiagent coordination based on reinforcement hierarchical neuro-fuzzy models.

PubMed

Mendoza, Leonardo Forero; Vellasco, Marley; Figueiredo, Karla

2014-12-01

This paper presents the research and development of two hybrid neuro-fuzzy models for the hierarchical coordination of multiple intelligent agents. The main objective of the models is to have multiple agents interact intelligently with each other in complex systems. We developed two new models of coordination for intelligent multiagent systems, which integrates the Reinforcement Learning Hierarchical Neuro-Fuzzy model with two proposed coordination mechanisms: the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with a market-driven coordination mechanism (MA-RL-HNFP-MD) and the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with graph coordination (MA-RL-HNFP-CG). In order to evaluate the proposed models and verify the contribution of the proposed coordination mechanisms, two multiagent benchmark applications were developed: the pursuit game and the robot soccer simulation. The results obtained demonstrated that the proposed coordination mechanisms greatly improve the performance of the multiagent system when compared with other strategies.
Refining Linear Fuzzy Rules by Reinforcement Learning

NASA Technical Reports Server (NTRS)

Berenji, Hamid R.; Khedkar, Pratap S.; Malkani, Anil

1996-01-01

Linear fuzzy rules are increasingly being used in the development of fuzzy logic systems. Radial basis functions have also been used in the antecedents of the rules for clustering in product space which can automatically generate a set of linear fuzzy rules from an input/output data set. Manual methods are usually used in refining these rules. This paper presents a method for refining the parameters of these rules using reinforcement learning which can be applied in domains where supervised input-output data is not available and reinforcements are received only after a long sequence of actions. This is shown for a generalization of radial basis functions. The formation of fuzzy rules from data and their automatic refinement is an important step in closing the gap between the application of reinforcement learning methods in the domains where only some limited input-output data is available.
Toward an autonomous brain machine interface: integrating sensorimotor reward modulation and reinforcement learning.

PubMed

Marsh, Brandi T; Tarigoppula, Venkata S Aditya; Chen, Chen; Francis, Joseph T

2015-05-13

For decades, neurophysiologists have worked on elucidating the function of the cortical sensorimotor control system from the standpoint of kinematics or dynamics. Recently, computational neuroscientists have developed models that can emulate changes seen in the primary motor cortex during learning. However, these simulations rely on the existence of a reward-like signal in the primary sensorimotor cortex. Reward modulation of the primary sensorimotor cortex has yet to be characterized at the level of neural units. Here we demonstrate that single units/multiunits and local field potentials in the primary motor (M1) cortex of nonhuman primates (Macaca radiata) are modulated by reward expectation during reaching movements and that this modulation is present even while subjects passively view cursor motions that are predictive of either reward or nonreward. After establishing this reward modulation, we set out to determine whether we could correctly classify rewarding versus nonrewarding trials, on a moment-to-moment basis. This reward information could then be used in collaboration with reinforcement learning principles toward an autonomous brain-machine interface. The autonomous brain-machine interface would use M1 for both decoding movement intention and extraction of reward expectation information as evaluative feedback, which would then update the decoding algorithm as necessary. In the work presented here, we show that this, in theory, is possible. Copyright © 2015 the authors 0270-6474/15/357374-14$15.00/0.
Instructed knowledge shapes feedback-driven aversive learning in striatum and orbitofrontal cortex, but not the amygdala

PubMed Central

Atlas, Lauren Y; Doll, Bradley B; Li, Jian; Daw, Nathaniel D; Phelps, Elizabeth A

2016-01-01

Socially-conveyed rules and instructions strongly shape expectations and emotions. Yet most neuroscientific studies of learning consider reinforcement history alone, irrespective of knowledge acquired through other means. We examined fear conditioning and reversal in humans to test whether instructed knowledge modulates the neural mechanisms of feedback-driven learning. One group was informed about contingencies and reversals. A second group learned only from reinforcement. We combined quantitative models with functional magnetic resonance imaging and found that instructions induced dissociations in the neural systems of aversive learning. Responses in striatum and orbitofrontal cortex updated with instructions and correlated with prefrontal responses to instructions. Amygdala responses were influenced by reinforcement similarly in both groups and did not update with instructions. Results extend work on instructed reward learning and reveal novel dissociations that have not been observed with punishments or rewards. Findings support theories of specialized threat-detection and may have implications for fear maintenance in anxiety. DOI: http://dx.doi.org/10.7554/eLife.15192.001 PMID:27171199
Flow Navigation by Smart Microswimmers via Reinforcement Learning

NASA Astrophysics Data System (ADS)

Colabrese, Simona; Gustavsson, Kristian; Celani, Antonio; Biferale, Luca

2017-04-01

Smart active particles can acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. Their goal is to learn the best way to navigate by exploiting the underlying flow whenever possible. As an example, we focus our attention on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, given the constraints enforced by fluid mechanics. By means of numerical experiments, we show that swimmers indeed learn nearly optimal strategies just by experience. A reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This Letter illustrates the potential of reinforcement learning algorithms to model adaptive behavior in complex flows and paves the way towards the engineering of smart microswimmers that solve difficult navigation problems.
Neural correlates of forward planning in a spatial decision task in humans

PubMed Central

Simon, Dylan Alexander; Daw, Nathaniel D.

2011-01-01

Although reinforcement learning (RL) theories have been influential in characterizing the brain’s mechanisms for reward-guided choice, the predominant temporal difference (TD) algorithm cannot explain many flexible or goal-directed actions that have been demonstrated behaviorally. We investigate such actions by contrasting an RL algorithm that is model-based, in that it relies on learning a map or model of the task and planning within it, to traditional model-free TD learning. To distinguish these approaches in humans, we used fMRI in a continuous spatial navigation task, in which frequent changes to the layout of the maze forced subjects continually to relearn their favored routes, thereby exposing the RL mechanisms employed. We sought evidence for the neural substrates of such mechanisms by comparing choice behavior and BOLD signals to decision variables extracted from simulations of either algorithm. Both choices and value-related BOLD signals in striatum, though most often associated with TD learning, were better explained by the model-based theory. Further, predecessor quantities for the model-based value computation were correlated with BOLD signals in the medial temporal lobe and frontal cortex. These results point to a significant extension of both the computational and anatomical substrates for RL in the brain. PMID:21471389
Active Learning to Understand Infectious Disease Models and Improve Policy Making

PubMed Central

Vladislavleva, Ekaterina; Broeckhove, Jan; Beutels, Philippe; Hens, Niel

2014-01-01

Modeling plays a major role in policy making, especially for infectious disease interventions but such models can be complex and computationally intensive. A more systematic exploration is needed to gain a thorough systems understanding. We present an active learning approach based on machine learning techniques as iterative surrogate modeling and model-guided experimentation to systematically analyze both common and edge manifestations of complex model runs. Symbolic regression is used for nonlinear response surface modeling with automatic feature selection. First, we illustrate our approach using an individual-based model for influenza vaccination. After optimizing the parameter space, we observe an inverse relationship between vaccination coverage and cumulative attack rate reinforced by herd immunity. Second, we demonstrate the use of surrogate modeling techniques on input-response data from a deterministic dynamic model, which was designed to explore the cost-effectiveness of varicella-zoster virus vaccination. We use symbolic regression to handle high dimensionality and correlated inputs and to identify the most influential variables. Provided insight is used to focus research, reduce dimensionality and decrease decision uncertainty. We conclude that active learning is needed to fully understand complex systems behavior. Surrogate models can be readily explored at no computational expense, and can also be used as emulator to improve rapid policy making in various settings. PMID:24743387
Active learning to understand infectious disease models and improve policy making.

PubMed

Willem, Lander; Stijven, Sean; Vladislavleva, Ekaterina; Broeckhove, Jan; Beutels, Philippe; Hens, Niel

2014-04-01

Modeling plays a major role in policy making, especially for infectious disease interventions but such models can be complex and computationally intensive. A more systematic exploration is needed to gain a thorough systems understanding. We present an active learning approach based on machine learning techniques as iterative surrogate modeling and model-guided experimentation to systematically analyze both common and edge manifestations of complex model runs. Symbolic regression is used for nonlinear response surface modeling with automatic feature selection. First, we illustrate our approach using an individual-based model for influenza vaccination. After optimizing the parameter space, we observe an inverse relationship between vaccination coverage and cumulative attack rate reinforced by herd immunity. Second, we demonstrate the use of surrogate modeling techniques on input-response data from a deterministic dynamic model, which was designed to explore the cost-effectiveness of varicella-zoster virus vaccination. We use symbolic regression to handle high dimensionality and correlated inputs and to identify the most influential variables. Provided insight is used to focus research, reduce dimensionality and decrease decision uncertainty. We conclude that active learning is needed to fully understand complex systems behavior. Surrogate models can be readily explored at no computational expense, and can also be used as emulator to improve rapid policy making in various settings.
Decision theory, reinforcement learning, and the brain.

PubMed

Dayan, Peter; Daw, Nathaniel D

2008-12-01

Decision making is a core competence for animals and humans acting and surviving in environments they only partially comprehend, gaining rewards and punishments for their troubles. Decision-theoretic concepts permeate experiments and computational models in ethology, psychology, and neuroscience. Here, we review a well-known, coherent Bayesian approach to decision making, showing how it unifies issues in Markovian decision problems, signal detection psychophysics, sequential sampling, and optimal exploration and discuss paradigmatic psychological and neural examples of each problem. We discuss computational issues concerning what subjects know about their task and how ambitious they are in seeking optimal solutions; we address algorithmic topics concerning model-based and model-free methods for making choices; and we highlight key aspects of the neural implementation of decision making.
Targeted intervention: Computational approaches to elucidate and predict relapse in alcoholism.

PubMed

Heinz, Andreas; Deserno, Lorenz; Zimmermann, Ulrich S; Smolka, Michael N; Beck, Anne; Schlagenhauf, Florian

2017-05-01

Alcohol use disorder (AUD) and addiction in general is characterized by failures of choice resulting in repeated drug intake despite severe negative consequences. Behavioral change is hard to accomplish and relapse after detoxification is common and can be promoted by consumption of small amounts of alcohol as well as exposure to alcohol-associated cues or stress. While those environmental factors contributing to relapse have long been identified, the underlying psychological and neurobiological mechanism on which those factors act are to date incompletely understood. Based on the reinforcing effects of drugs of abuse, animal experiments showed that drug, cue and stress exposure affect Pavlovian and instrumental learning processes, which can increase salience of drug cues and promote habitual drug intake. In humans, computational approaches can help to quantify changes in key learning mechanisms during the development and maintenance of alcohol dependence, e.g. by using sequential decision making in combination with computational modeling to elucidate individual differences in model-free versus more complex, model-based learning strategies and their neurobiological correlates such as prediction error signaling in fronto-striatal circuits. Computational models can also help to explain how alcohol-associated cues trigger relapse: mechanisms such as Pavlovian-to-Instrumental Transfer can quantify to which degree Pavlovian conditioned stimuli can facilitate approach behavior including alcohol seeking and intake. By using generative models of behavioral and neural data, computational approaches can help to quantify individual differences in psychophysiological mechanisms that underlie the development and maintenance of AUD and thus promote targeted intervention. Copyright © 2016 Elsevier Inc. All rights reserved.
Life Span Differences in Electrophysiological Correlates of Monitoring Gains and Losses during Probabilistic Reinforcement Learning

ERIC Educational Resources Information Center

Hammerer, Dorothea; Li, Shu-Chen; Muller, Viktor; Lindenberger, Ulman

2011-01-01

By recording the feedback-related negativity (FRN) in response to gains and losses, we investigated the contribution of outcome monitoring mechanisms to age-associated differences in probabilistic reinforcement learning. Specifically, we assessed the difference of the monitoring reactions to gains and losses to investigate the monitoring of…
Reinforcement Learning in Young Adults with Developmental Language Impairment

ERIC Educational Resources Information Center

Lee, Joanna C.; Tomblin, J. Bruce

2012-01-01

The aim of the study was to examine reinforcement learning (RL) in young adults with developmental language impairment (DLI) within the context of a neurocomputational model of the basal ganglia-dopamine system (Frank, Seeberger, & O'Reilly, 2004). Two groups of young adults, one with DLI and the other without, were recruited. A probabilistic…
Effective Reinforcement Techniques in Elementary Physical Education: The Key to Behavior Management

ERIC Educational Resources Information Center

Downing, John; Keating, Tedd; Bennett, Carl

2005-01-01

The ability to shape appropriate behavior while extinguishing misbehavior is critical to teaching and learning in physical education. The scientific principles that affect student learning in the gymnasium also apply to the methods teachers use to influence social behaviors. Research indicates that reinforcement strategies are more effective than…
Reinforcement learning state estimator.

PubMed

Morimoto, Jun; Doya, Kenji

2007-03-01

In this study, we propose a novel use of reinforcement learning for estimating hidden variables and parameters of nonlinear dynamical systems. A critical issue in hidden-state estimation is that we cannot directly observe estimation errors. However, by defining errors of observable variables as a delayed penalty, we can apply a reinforcement learning frame-work to state estimation problems. Specifically, we derive a method to construct a nonlinear state estimator by finding an appropriate feedback input gain using the policy gradient method. We tested the proposed method on single pendulum dynamics and show that the joint angle variable could be successfully estimated by observing only the angular velocity, and vice versa. In addition, we show that we could acquire a state estimator for the pendulum swing-up task in which a swing-up controller is also acquired by reinforcement learning simultaneously. Furthermore, we demonstrate that it is possible to estimate the dynamics of the pendulum itself while the hidden variables are estimated in the pendulum swing-up task. Application of the proposed method to a two-linked biped model is also presented.
Reciprocity Family Counseling: A Multi-Ethnic Model.

ERIC Educational Resources Information Center

Penrose, David M.

The Reciprocity Family Counseling Method involves learning principles of behavior modification including selective reinforcement, behavioral contracting, self-correction, and over-correction. Selective reinforcement refers to the recognition and modification of parent/child responses and reinforcers. Parents and children are asked to identify…
Reinforcement learning: Solving two case studies

NASA Astrophysics Data System (ADS)

Duarte, Ana Filipa; Silva, Pedro; dos Santos, Cristina Peixoto

2012-09-01

Reinforcement Learning algorithms offer interesting features for the control of autonomous systems, such as the ability to learn from direct interaction with the environment, and the use of a simple reward signalas opposed to the input-outputs pairsused in classic supervised learning. The reward signal indicates the success of failure of the actions executed by the agent in the environment. In this work, are described RL algorithmsapplied to two case studies: the Crawler robot and the widely known inverted pendulum. We explore RL capabilities to autonomously learn a basic locomotion pattern in the Crawler, andapproach the balancing problem of biped locomotion using the inverted pendulum.
OC48 - Hurtology: an online course.

PubMed

Frechette, Casey; Frechette, Barbara

2016-05-09

Theme: School health. The prevalence of anxiety and depression suggests a need to improve on the mental health education of young people. The school setting can provide a venue for offering such knowledge to adolescents. This study explored whether a video game enhances lessons designed to help adolescents become more receptive to learning about mental health concepts. This study used a quantitative between-subjects design. The first group experienced a set of computer-based lessons. The second group received the same content, but also played a video game designed to reinforce topics explored in the other materials. The findings showed that game players demonstrated deeper learning on at least one measure. Helping adolescents develop better ways to understand the relevance of emotional health is a worthwhile endeavour. New technologies can be used to improve learning and help young people become more receptive to addressing mental health concerns.
Reinforcement active learning in the vibrissae system: optimal object localization.

PubMed

Gordon, Goren; Dorfman, Nimrod; Ahissar, Ehud

2013-01-01

Rats move their whiskers to acquire information about their environment. It has been observed that they palpate novel objects and objects they are required to localize in space. We analyze whisker-based object localization using two complementary paradigms, namely, active learning and intrinsic-reward reinforcement learning. Active learning algorithms select the next training samples according to the hypothesized solution in order to better discriminate between correct and incorrect labels. Intrinsic-reward reinforcement learning uses prediction errors as the reward to an actor-critic design, such that behavior converges to the one that optimizes the learning process. We show that in the context of object localization, the two paradigms result in palpation whisking as their respective optimal solution. These results suggest that rats may employ principles of active learning and/or intrinsic reward in tactile exploration and can guide future research to seek the underlying neuronal mechanisms that implement them. Furthermore, these paradigms are easily transferable to biomimetic whisker-based artificial sensors and can improve the active exploration of their environment. Copyright © 2012 Elsevier Ltd. All rights reserved.

An intelligent agent for optimal river-reservoir system management

NASA Astrophysics Data System (ADS)

Rieker, Jeffrey D.; Labadie, John W.

2012-09-01

A generalized software package is presented for developing an intelligent agent for stochastic optimization of complex river-reservoir system management and operations. Reinforcement learning is an approach to artificial intelligence for developing a decision-making agent that learns the best operational policies without the need for explicit probabilistic models of hydrologic system behavior. The agent learns these strategies experientially in a Markov decision process through observational interaction with the environment and simulation of the river-reservoir system using well-calibrated models. The graphical user interface for the reinforcement learning process controller includes numerous learning method options and dynamic displays for visualizing the adaptive behavior of the agent. As a case study, the generalized reinforcement learning software is applied to developing an intelligent agent for optimal management of water stored in the Truckee river-reservoir system of California and Nevada for the purpose of streamflow augmentation for water quality enhancement. The intelligent agent successfully learns long-term reservoir operational policies that specifically focus on mitigating water temperature extremes during persistent drought periods that jeopardize the survival of threatened and endangered fish species.
Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning

PubMed Central

Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

2015-01-01

Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction. PMID:26065018
Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning.

PubMed

Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

2015-01-01

Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction.
A comparison of differential reinforcement procedures with children with autism.

PubMed

Boudreau, Brittany A; Vladescu, Jason C; Kodak, Tiffany M; Argott, Paul J; Kisamore, April N

2015-12-01

The current evaluation compared the effects of 2 differential reinforcement arrangements and a nondifferential reinforcement arrangement on the acquisition of tacts for 3 children with autism. Participants learned in all reinforcement-based conditions, and we discuss areas for future research in light of these findings and potential limitations. © Society for the Experimental Analysis of Behavior.
Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis

PubMed Central

2013-01-01

Background Depression is characterised partly by blunted reactions to reward. However, tasks probing this deficiency have not distinguished insensitivity to reward from insensitivity to the prediction errors for reward that determine learning and are putatively reported by the phasic activity of dopamine neurons. We attempted to disentangle these factors with respect to anhedonia in the context of stress, Major Depressive Disorder (MDD), Bipolar Disorder (BPD) and a dopaminergic challenge. Methods Six behavioural datasets involving 392 experimental sessions were subjected to a model-based, Bayesian meta-analysis. Participants across all six studies performed a probabilistic reward task that used an asymmetric reinforcement schedule to assess reward learning. Healthy controls were tested under baseline conditions, stress or after receiving the dopamine D2 agonist pramipexole. In addition, participants with current or past MDD or BPD were evaluated. Reinforcement learning models isolated the contributions of variation in reward sensitivity and learning rate. Results MDD and anhedonia reduced reward sensitivity more than they affected the learning rate, while a low dose of the dopamine D2 agonist pramipexole showed the opposite pattern. Stress led to a pattern consistent with a mixed effect on reward sensitivity and learning rate. Conclusion Reward-related learning reflected at least two partially separable contributions. The first related to phasic prediction error signalling, and was preferentially modulated by a low dose of the dopamine agonist pramipexole. The second related directly to reward sensitivity, and was preferentially reduced in MDD and anhedonia. Stress altered both components. Collectively, these findings highlight the contribution of model-based reinforcement learning meta-analysis for dissecting anhedonic behavior. PMID:23782813
Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.

PubMed

Ezaki, Takahiro; Horita, Yutaka; Takezawa, Masanori; Masuda, Naoki

2016-07-01

Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody) conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.
Motor Learning Enhances Use-Dependent Plasticity

PubMed Central

2017-01-01

Motor behaviors are shaped not only by current sensory signals but also by the history of recent experiences. For instance, repeated movements toward a particular target bias the subsequent movements toward that target direction. This process, called use-dependent plasticity (UDP), is considered a basic and goal-independent way of forming motor memories. Most studies consider movement history as the critical component that leads to UDP (Classen et al., 1998; Verstynen and Sabes, 2011). However, the effects of learning (i.e., improved performance) on UDP during movement repetition have not been investigated. Here, we used transcranial magnetic stimulation in two experiments to assess plasticity changes occurring in the primary motor cortex after individuals repeated reinforced and nonreinforced actions. The first experiment assessed whether learning a skill task modulates UDP. We found that a group that successfully learned the skill task showed greater UDP than a group that did not accumulate learning, but made comparable repeated actions. The second experiment aimed to understand the role of reinforcement learning in UDP while controlling for reward magnitude and action kinematics. We found that providing subjects with a binary reward without visual feedback of the cursor led to increased UDP effects. Subjects in the group that received comparable reward not associated with their actions maintained the previously induced UDP. Our findings illustrate how reinforcing consistent actions strengthens use-dependent memories and provide insight into operant mechanisms that modulate plastic changes in the motor cortex. SIGNIFICANCE STATEMENT Performing consistent motor actions induces use-dependent plastic changes in the motor cortex. This plasticity reflects one of the basic forms of human motor learning. Past studies assumed that this form of learning is exclusively affected by repetition of actions. However, here we showed that success-based reinforcement signals could affect the human use-dependent plasticity (UDP) process. Our results indicate that learning augments and interacts with UDP. This effect is important to the understanding of the interplay between the different forms of motor learning and suggests that reinforcement is not only important to learning new behaviors, but can shape our subsequent behavior via its interaction with UDP. PMID:28143961
Specific effect of a dopamine partial agonist on counterfactual learning: evidence from Gilles de la Tourette syndrome.

PubMed

Salvador, Alexandre; Worbe, Yulia; Delorme, Cécile; Coricelli, Giorgio; Gaillard, Raphaël; Robbins, Trevor W; Hartmann, Andreas; Palminteri, Stefano

2017-07-24

The dopamine partial agonist aripiprazole is increasingly used to treat pathologies for which other antipsychotics are indicated because it displays fewer side effects, such as sedation and depression-like symptoms, than other dopamine receptor antagonists. Previously, we showed that aripiprazole may protect motivational function by preserving reinforcement-related signals used to sustain reward-maximization. However, the effect of aripiprazole on more cognitive facets of human reinforcement learning, such as learning from the forgone outcomes of alternative courses of action (i.e., counterfactual learning), is unknown. To test the influence of aripiprazole on counterfactual learning, we administered a reinforcement learning task that involves both direct learning from obtained outcomes and indirect learning from forgone outcomes to two groups of Gilles de la Tourette (GTS) patients, one consisting of patients who were completely unmedicated and the other consisting of patients who were receiving aripiprazole monotherapy, and to healthy subjects. We found that whereas learning performance improved in the presence of counterfactual feedback in both healthy controls and unmedicated GTS patients, this was not the case in aripiprazole-medicated GTS patients. Our results suggest that whereas aripiprazole preserves direct learning of action-outcome associations, it may impair more complex inferential processes, such as counterfactual learning from forgone outcomes, in GTS patients treated with this medication.
Development and evaluation of an interactive electronic laboratory manual for cooperative learning of medical histology.

PubMed

Khalil, Mohammed K; Kirkley, Debbie L; Kibble, Jonathan D

2013-01-01

This article describes the development of an interactive computer-based laboratory manual, created to facilitate the teaching and learning of medical histology. The overarching goal of developing the manual is to facilitate self-directed group interactivities that actively engage students during laboratory sessions. The design of the manual includes guided instruction for students to navigate virtual slides, exercises for students to monitor learning, and cases to provide clinical relevance. At the end of the laboratory activities, student groups can generate a laboratory report that may be used to provide formative feedback. The instructional value of the manual was evaluated by a questionnaire containing both closed-ended and open-ended items. Closed-ended items using a five-point Likert-scale assessed the format and navigation, instructional contents, group process, and learning process. Open-ended items assessed student's perception on the effectiveness of the manual in facilitating their learning. After implementation for two consecutive years, student evaluation of the manual was highly positive and indicated that it facilitated their learning by reinforcing and clarifying classroom sessions, improved their understanding, facilitated active and cooperative learning, and supported self-monitoring of their learning. Copyright © 2013 American Association of Anatomists.
Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning.

PubMed

Sidarta, Ananda; Vahdat, Shahabeddin; Bernardi, Nicolò F; Ostry, David J

2016-11-16

As one learns to dance or play tennis, the desired somatosensory state is typically unknown. Trial and error is important as motor behavior is shaped by successful and unsuccessful movements. As an experimental model, we designed a task in which human participants make reaching movements to a hidden target and receive positive reinforcement when successful. We identified somatic and reinforcement-based sources of plasticity on the basis of changes in functional connectivity using resting-state fMRI before and after learning. The neuroimaging data revealed reinforcement-related changes in both motor and somatosensory brain areas in which a strengthening of connectivity was related to the amount of positive reinforcement during learning. Areas of prefrontal cortex were similarly altered in relation to reinforcement, with connectivity between sensorimotor areas of putamen and the reward-related ventromedial prefrontal cortex strengthened in relation to the amount of successful feedback received. In other analyses, we assessed connectivity related to changes in movement direction between trials, a type of variability that presumably reflects exploratory strategies during learning. We found that connectivity in a network linking motor and somatosensory cortices increased with trial-to-trial changes in direction. Connectivity varied as well with the change in movement direction following incorrect movements. Here the changes were observed in a somatic memory and decision making network involving ventrolateral prefrontal cortex and second somatosensory cortex. Our results point to the idea that the initial stages of motor learning are not wholly motor but rather involve plasticity in somatic and prefrontal networks related both to reward and exploration. In the initial stages of motor learning, the placement of the limbs is learned primarily through trial and error. In an experimental analog, participants make reaching movements to a hidden target and receive positive feedback when successful. We identified sources of plasticity based on changes in functional connectivity using resting-state fMRI. The main finding is that there is a strengthening of connectivity between reward-related prefrontal areas and sensorimotor areas in the basal ganglia and frontal cortex. There is also a strengthening of connectivity related to movement exploration in sensorimotor circuits involved in somatic memory and decision making. The results indicate that initial stages of motor learning depend on plasticity in somatic and prefrontal networks related to reward and exploration. Copyright © 2016 the authors 0270-6474/16/3611682-11$15.00/0.
Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning

PubMed Central

Sidarta, Ananda; Vahdat, Shahabeddin; Bernardi, Nicolò F.

2016-01-01

As one learns to dance or play tennis, the desired somatosensory state is typically unknown. Trial and error is important as motor behavior is shaped by successful and unsuccessful movements. As an experimental model, we designed a task in which human participants make reaching movements to a hidden target and receive positive reinforcement when successful. We identified somatic and reinforcement-based sources of plasticity on the basis of changes in functional connectivity using resting-state fMRI before and after learning. The neuroimaging data revealed reinforcement-related changes in both motor and somatosensory brain areas in which a strengthening of connectivity was related to the amount of positive reinforcement during learning. Areas of prefrontal cortex were similarly altered in relation to reinforcement, with connectivity between sensorimotor areas of putamen and the reward-related ventromedial prefrontal cortex strengthened in relation to the amount of successful feedback received. In other analyses, we assessed connectivity related to changes in movement direction between trials, a type of variability that presumably reflects exploratory strategies during learning. We found that connectivity in a network linking motor and somatosensory cortices increased with trial-to-trial changes in direction. Connectivity varied as well with the change in movement direction following incorrect movements. Here the changes were observed in a somatic memory and decision making network involving ventrolateral prefrontal cortex and second somatosensory cortex. Our results point to the idea that the initial stages of motor learning are not wholly motor but rather involve plasticity in somatic and prefrontal networks related both to reward and exploration. SIGNIFICANCE STATEMENT In the initial stages of motor learning, the placement of the limbs is learned primarily through trial and error. In an experimental analog, participants make reaching movements to a hidden target and receive positive feedback when successful. We identified sources of plasticity based on changes in functional connectivity using resting-state fMRI. The main finding is that there is a strengthening of connectivity between reward-related prefrontal areas and sensorimotor areas in the basal ganglia and frontal cortex. There is also a strengthening of connectivity related to movement exploration in sensorimotor circuits involved in somatic memory and decision making. The results indicate that initial stages of motor learning depend on plasticity in somatic and prefrontal networks related to reward and exploration. PMID:27852776
Exploring the limits of learning: Segregation of information integration and response selection is required for learning a serial reversal task

PubMed Central

Zanutto, B. Silvano

2017-01-01

Animals are proposed to learn the latent rules governing their environment in order to maximize their chances of survival. However, rules may change without notice, forcing animals to keep a memory of which one is currently at work. Rule switching can lead to situations in which the same stimulus/response pairing is positively and negatively rewarded in the long run, depending on variables that are not accessible to the animal. This fact raises questions on how neural systems are capable of reinforcement learning in environments where the reinforcement is inconsistent. Here we address this issue by asking about which aspects of connectivity, neural excitability and synaptic plasticity are key for a very general, stochastic spiking neural network model to solve a task in which rules change without being cued, taking the serial reversal task (SRT) as paradigm. Contrary to what could be expected, we found strong limitations for biologically plausible networks to solve the SRT. Especially, we proved that no network of neurons can learn a SRT if it is a single neural population that integrates stimuli information and at the same time is responsible of choosing the behavioural response. This limitation is independent of the number of neurons, neuronal dynamics or plasticity rules, and arises from the fact that plasticity is locally computed at each synapse, and that synaptic changes and neuronal activity are mutually dependent processes. We propose and characterize a spiking neural network model that solves the SRT, which relies on separating the functions of stimuli integration and response selection. The model suggests that experimental efforts to understand neural function should focus on the characterization of neural circuits according to their connectivity, neural dynamics, and the degree of modulation of synaptic plasticity with reward. PMID:29077735
Speed/accuracy trade-off between the habitual and the goal-directed processes.

PubMed

Keramati, Mehdi; Dezfouli, Amir; Piray, Payam

2011-05-01

Instrumental responses are hypothesized to be of two kinds: habitual and goal-directed, mediated by the sensorimotor and the associative cortico-basal ganglia circuits, respectively. The existence of the two heterogeneous associative learning mechanisms can be hypothesized to arise from the comparative advantages that they have at different stages of learning. In this paper, we assume that the goal-directed system is behaviourally flexible, but slow in choice selection. The habitual system, in contrast, is fast in responding, but inflexible in adapting its behavioural strategy to new conditions. Based on these assumptions and using the computational theory of reinforcement learning, we propose a normative model for arbitration between the two processes that makes an approximately optimal balance between search-time and accuracy in decision making. Behaviourally, the model can explain experimental evidence on behavioural sensitivity to outcome at the early stages of learning, but insensitivity at the later stages. It also explains that when two choices with equal incentive values are available concurrently, the behaviour remains outcome-sensitive, even after extensive training. Moreover, the model can explain choice reaction time variations during the course of learning, as well as the experimental observation that as the number of choices increases, the reaction time also increases. Neurobiologically, by assuming that phasic and tonic activities of midbrain dopamine neurons carry the reward prediction error and the average reward signals used by the model, respectively, the model predicts that whereas phasic dopamine indirectly affects behaviour through reinforcing stimulus-response associations, tonic dopamine can directly affect behaviour through manipulating the competition between the habitual and the goal-directed systems and thus, affect reaction time.
What is the optimal task difficulty for reinforcement learning of brain self-regulation?

PubMed

Bauer, Robert; Vukelić, Mathias; Gharabaghi, Alireza

2016-09-01

The balance between action and reward during neurofeedback may influence reinforcement learning of brain self-regulation. Eleven healthy volunteers participated in three runs of motor imagery-based brain-machine interface feedback where a robot passively opened the hand contingent to β-band modulation. For each run, the β-desynchronization threshold to initiate the hand robot movement increased in difficulty (low, moderate, and demanding). In this context, the incentive to learn was estimated by the change of reward per action, operationalized as the change in reward duration per movement onset. Variance analysis revealed a significant interaction between threshold difficulty and the relationship between reward duration and number of movement onsets (p<0.001), indicating a negative learning incentive for low difficulty, but a positive learning incentive for moderate and demanding runs. Exploration of different thresholds in the same data set indicated that the learning incentive peaked at higher thresholds than the threshold which resulted in maximum classification accuracy. Specificity is more important than sensitivity of neurofeedback for reinforcement learning of brain self-regulation. Learning efficiency requires adequate challenge by neurofeedback interventions. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Autonomous Performance Monitoring System: Monitoring and Self-Tuning (MAST)

NASA Technical Reports Server (NTRS)

Peterson, Chariya; Ziyad, Nigel A.

2000-01-01

Maintaining the long-term performance of software onboard a spacecraft can be a major factor in the cost of operations. In particular, the task of controlling and maintaining a future mission of distributed spacecraft will undoubtedly pose a great challenge, since the complexity of multiple spacecraft flying in formation grows rapidly as the number of spacecraft in the formation increases. Eventually, new approaches will be required in developing viable control systems that can handle the complexity of the data and that are flexible, reliable and efficient. In this paper we propose a methodology that aims to maintain the accuracy of flight software, while reducing the computational complexity of software tuning tasks. The proposed Monitoring and Self-Tuning (MAST) method consists of two parts: a flight software monitoring algorithm and a tuning algorithm. The dependency on the software being monitored is mostly contained in the monitoring process, while the tuning process is a generic algorithm independent of the detailed knowledge on the software. This architecture will enable MAST to be applicable to different onboard software controlling various dynamics of the spacecraft, such as attitude self-calibration, and formation control. An advantage of MAST over conventional techniques such as filter or batch least square is that the tuning algorithm uses machine learning approach to handle uncertainty in the problem domain, resulting in reducing over all computational complexity. The underlying concept of this technique is a reinforcement learning scheme based on cumulative probability generated by the historical performance of the system. The success of MAST will depend heavily on the reinforcement scheme used in the tuning algorithm, which guarantees the tuning solutions exist.
Composition of web services using Markov decision processes and dynamic programming.

PubMed

Uc-Cetina, Víctor; Moo-Mena, Francisco; Hernandez-Ucan, Rafael

2015-01-01

We propose a Markov decision process model for solving the Web service composition (WSC) problem. Iterative policy evaluation, value iteration, and policy iteration algorithms are used to experimentally validate our approach, with artificial and real data. The experimental results show the reliability of the model and the methods employed, with policy iteration being the best one in terms of the minimum number of iterations needed to estimate an optimal policy, with the highest Quality of Service attributes. Our experimental work shows how the solution of a WSC problem involving a set of 100,000 individual Web services and where a valid composition requiring the selection of 1,000 services from the available set can be computed in the worst case in less than 200 seconds, using an Intel Core i5 computer with 6 GB RAM. Moreover, a real WSC problem involving only 7 individual Web services requires less than 0.08 seconds, using the same computational power. Finally, a comparison with two popular reinforcement learning algorithms, sarsa and Q-learning, shows that these algorithms require one or two orders of magnitude and more time than policy iteration, iterative policy evaluation, and value iteration to handle WSC problems of the same complexity.
Robust reinforcement learning.

PubMed

Morimoto, Jun; Doya, Kenji

2005-02-01

This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both offline learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H(infinity) control, we consider a differential game in which a "disturbing" agent tries to make the worst possible disturbance while a "control" agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H(infinity) control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.
Coexistence of Reward and Unsupervised Learning During the Operant Conditioning of Neural Firing Rates

PubMed Central

Kerr, Robert R.; Grayden, David B.; Thomas, Doreen A.; Gilson, Matthieu; Burkitt, Anthony N.

2014-01-01

A fundamental goal of neuroscience is to understand how cognitive processes, such as operant conditioning, are performed by the brain. Typical and well studied examples of operant conditioning, in which the firing rates of individual cortical neurons in monkeys are increased using rewards, provide an opportunity for insight into this. Studies of reward-modulated spike-timing-dependent plasticity (RSTDP), and of other models such as R-max, have reproduced this learning behavior, but they have assumed that no unsupervised learning is present (i.e., no learning occurs without, or independent of, rewards). We show that these models cannot elicit firing rate reinforcement while exhibiting both reward learning and ongoing, stable unsupervised learning. To fix this issue, we propose a new RSTDP model of synaptic plasticity based upon the observed effects that dopamine has on long-term potentiation and depression (LTP and LTD). We show, both analytically and through simulations, that our new model can exhibit unsupervised learning and lead to firing rate reinforcement. This requires that the strengthening of LTP by the reward signal is greater than the strengthening of LTD and that the reinforced neuron exhibits irregular firing. We show the robustness of our findings to spike-timing correlations, to the synaptic weight dependence that is assumed, and to changes in the mean reward. We also consider our model in the differential reinforcement of two nearby neurons. Our model aligns more strongly with experimental studies than previous models and makes testable predictions for future experiments. PMID:24475240
Histidine-decarboxylase knockout mice show deficient nonreinforced episodic object memory, improved negatively reinforced water-maze performance, and increased neo- and ventro-striatal dopamine turnover.

PubMed

Dere, Ekrem; De Souza-Silva, Maria A; Topic, Bianca; Spieler, Richard E; Haas, Helmut L; Huston, Joseph P

2003-01-01

The brain's histaminergic system has been implicated in hippocampal synaptic plasticity, learning, and memory, as well as brain reward and reinforcement. Our past pharmacological and lesion studies indicated that the brain's histamine system exerts inhibitory effects on the brain's reinforcement respective reward system reciprocal to mesolimbic dopamine systems, thereby modulating learning and memory performance. Given the close functional relationship between brain reinforcement and memory processes, the total disruption of brain histamine synthesis via genetic disruption of its synthesizing enzyme, histidine decarboxylase (HDC), in the mouse might have differential effects on learning dependent on the task-inherent reinforcement contingencies. Here, we investigated the effects of an HDC gene disruption in the mouse in a nonreinforced object exploration task and a negatively reinforced water-maze task as well as on neo- and ventro-striatal dopamine systems known to be involved in brain reward and reinforcement. Histidine decarboxylase knockout (HDC-KO) mice had higher dihydrophenylacetic acid concentrations and a higher dihydrophenylacetic acid/dopamine ratio in the neostriatum. In the ventral striatum, dihydrophenylacetic acid/dopamine and 3-methoxytyramine/dopamine ratios were higher in HDC-KO mice. Furthermore, the HDC-KO mice showed improved water-maze performance during both hidden and cued platform tasks, but deficient object discrimination based on temporal relationships. Our data imply that disruption of brain histamine synthesis can have both memory promoting and suppressive effects via distinct and independent mechanisms and further indicate that these opposed effects are related to the task-inherent reinforcement contingencies.
Dorsal Striatal-Midbrain Connectivity in Humans Predicts How Reinforcements Are Used to Guide Decisions

ERIC Educational Resources Information Center

Kahnt, Thorsten; Park, Soyoung Q.; Cohen, Michael X.; Beck, Anne; Heinz, Andreas; Wrase, Jana

2009-01-01

It has been suggested that the target areas of dopaminergic midbrain neurons, the dorsal (DS) and ventral striatum (VS), are differently involved in reinforcement learning especially as actor and critic. Whereas the critic learns to predict rewards, the actor maintains action values to guide future decisions. The different midbrain connections to…

Autonomous Inter-Task Transfer in Reinforcement Learning Domains

DTIC Science & Technology

2008-08-01

Twentieth International Joint Conference on Artificial Intelli - gence, 2007. 304 Fumihide Tanaka and Masayuki Yamamura. Multitask reinforcement learning...Functions . . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . 18 2.2.4 Instance-based...tures [Laird et al., 1986, Choi et al., 2007]. However, TL for RL tasks has only recently been gaining attention in the artificial intelligence
A look at Behaviourism and Perceptual Control Theory in Interface Design

DTIC Science & Technology

1998-02-01

behaviours such as response variability, instinctive drift, autoshaping , etc. Perceptual Control Theory (PCT) postulates that behaviours result from the...internal variables. Behaviourism, on the other hand, can not account for variability in responses, instinctive drift, autoshaping , etc. Researchers... Autoshaping . Animals appear to learn without reinforcement. However, conditioning theory speculates that learning results only when reinforcement
BEHAVIORAL MECHANISMS UNDERLYING NICOTINE REINFORCEMENT

PubMed Central

Rupprecht, Laura E.; Smith, Tracy T.; Schassburger, Rachel L.; Buffalari, Deanne M.; Sved, Alan F.; Donny, Eric C.

2015-01-01

Cigarette smoking is the leading cause of preventable deaths worldwide and nicotine, the primary psychoactive constituent in tobacco, drives sustained use. The behavioral actions of nicotine are complex and extend well beyond the actions of the drug as a primary reinforcer. Stimuli that are consistently paired with nicotine can, through associative learning, take on reinforcing properties as conditioned stimuli. These conditioned stimuli can then impact the rate and probability of behavior and even function as conditioning reinforcers that maintain behavior in the absence of nicotine. Nicotine can also act as a conditioned stimulus, predicting the delivery of other reinforcers, which may allow nicotine to acquire value as a conditioned reinforcer. These associative effects, establishing non-nicotine stimuli as conditioned stimuli with discriminative stimulus and conditioned reinforcing properties as well as establishing nicotine as a conditioned stimulus, are predicted by basic conditioning principles. However, nicotine can also act non-associatively. Nicotine directly enhances the reinforcing efficacy of other reinforcing stimuli in the environment, an effect that does not require a temporal or predictive relationship between nicotine and either the stimulus or the behavior. Hence, the reinforcing actions of nicotine stem both from the primary reinforcing actions of the drug (and the subsequent associative learning effects) as well as the reinforcement enhancement action of nicotine which is non-associative in nature. Gaining a better understanding of how nicotine impacts behavior will allow for maximally effective tobacco control efforts aimed at reducing the harm associated with tobacco use by reducing and/or treating its addictiveness. PMID:25638333
Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia.

PubMed

Markou, Athina; Salamone, John D; Bussey, Timothy J; Mar, Adam C; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

2013-11-01

The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu) meeting. A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. Copyright © 2013 Elsevier Ltd. All rights reserved.
Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia

PubMed Central

Markou, Athina; Salamone, John D.; Bussey, Timothy; Mar, Adam; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

2013-01-01

The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu). A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. PMID:23994273
Feature Reinforcement Learning: Part I. Unstructured MDPs

NASA Astrophysics Data System (ADS)

Hutter, Marcus

2009-12-01

General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part II (Hutter, 2009c). The role of POMDPs is also considered there.
The role of within-compound associations in learning about absent cues.

PubMed

Witnauer, James E; Miller, Ralph R

2011-05-01

When two cues are reinforced together (in compound), most associative models assume that animals learn an associative network that includes direct cue-outcome associations and a within-compound association. All models of associative learning subscribe to the importance of cue-outcome associations, but most models assume that within-compound associations are irrelevant to each cue's subsequent behavioral control. In the present article, we present an extension of Van Hamme and Wasserman's (Learning and Motivation 25:127-151, 1994) model of retrospective revaluation based on learning about absent cues that are retrieved through within-compound associations. The model was compared with a model lacking retrieval through within-compound associations. Simulations showed that within-compound associations are necessary for the model to explain higher-order retrospective revaluation and the observed greater retrospective revaluation after partial reinforcement than after continuous reinforcement alone. These simulations suggest that the associability of an absent stimulus is determined by the extent to which the stimulus is activated through the within-compound association.
The role of within-compound associations in learning about absent cues

PubMed Central

Witnauer, James E.

2011-01-01

When two cues are reinforced together (in compound), most associative models assume that animals learn an associative network that includes direct cue–outcome associations and a within-compound association. All models of associative learning subscribe to the importance of cue–outcome associations, but most models assume that within-compound associations are irrelevant to each cue's subsequent behavioral control. In the present article, we present an extension of Van Hamme and Wasserman's (Learning and Motivation 25:127–151, 1994) model of retrospective revaluation based on learning about absent cues that are retrieved through within-compound associations. The model was compared with a model lacking retrieval through within-compound associations. Simulations showed that within-compound associations are necessary for the model to explain higher-order retrospective revaluation and the observed greater retrospective revaluation after partial reinforcement than after continuous reinforcement alone. These simulations suggest that the associability of an absent stimulus is determined by the extent to which the stimulus is activated through the within-compound association. PMID:21264569
Pleasurable music affects reinforcement learning according to the listener

PubMed Central

Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira

2013-01-01

Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875
Network Supervision of Adult Experience and Learning Dependent Sensory Cortical Plasticity.

PubMed

Blake, David T

2017-06-18

The brain is capable of remodeling throughout life. The sensory cortices provide a useful preparation for studying neuroplasticity both during development and thereafter. In adulthood, sensory cortices change in the cortical area activated by behaviorally relevant stimuli, by the strength of response within that activated area, and by the temporal profiles of those responses. Evidence supports forms of unsupervised, reinforcement, and fully supervised network learning rules. Studies on experience-dependent plasticity have mostly not controlled for learning, and they find support for unsupervised learning mechanisms. Changes occur with greatest ease in neurons containing α-CamKII, which are pyramidal neurons in layers II/III and layers V/VI. These changes use synaptic mechanisms including long term depression. Synaptic strengthening at NMDA-containing synapses does occur, but its weak association with activity suggests other factors also initiate changes. Studies that control learning find support of reinforcement learning rules and limited evidence of other forms of supervised learning. Behaviorally associating a stimulus with reinforcement leads to a strengthening of cortical response strength and enlarging of response area with poor selectivity. Associating a stimulus with omission of reinforcement leads to a selective weakening of responses. In some preparations in which these associations are not as clearly made, neurons with the most informative discharges are relatively stronger after training. Studies analyzing the temporal profile of responses associated with omission of reward, or of plasticity in studies with different discriminanda but statistically matched stimuli, support the existence of limited supervised network learning. © 2017 American Physiological Society. Compr Physiol 7:977-1008, 2017. Copyright © 2017 John Wiley & Sons, Inc.
Feedback from the heart: Emotional learning and memory is controlled by cardiac cycle, interoceptive accuracy and personality.

PubMed

Pfeifer, Gaby; Garfinkel, Sarah N; Gould van Praag, Cassandra D; Sahota, Kuljit; Betka, Sophie; Critchley, Hugo D

2017-05-01

Feedback processing is critical to trial-and-error learning. Here, we examined whether interoceptive signals concerning the state of cardiovascular arousal influence the processing of reinforcing feedback during the learning of 'emotional' face-name pairs, with subsequent effects on retrieval. Participants (N=29) engaged in a learning task of face-name pairs (fearful, neutral, happy faces). Correct and incorrect learning decisions were reinforced by auditory feedback, which was delivered either at cardiac systole (on the heartbeat, when baroreceptors signal the contraction of the heart to the brain), or at diastole (between heartbeats during baroreceptor quiescence). We discovered a cardiac influence on feedback processing that enhanced the learning of fearful faces in people with heightened interoceptive ability. Individuals with enhanced accuracy on a heartbeat counting task learned fearful face-name pairs better when feedback was given at systole than at diastole. This effect was not present for neutral and happy faces. At retrieval, we also observed related effects of personality: First, individuals scoring higher for extraversion showed poorer retrieval accuracy. These individuals additionally manifested lower resting heart rate and lower state anxiety, suggesting that attenuated levels of cardiovascular arousal in extraverts underlies poorer performance. Second, higher extraversion scores predicted higher emotional intensity ratings of fearful faces reinforced at systole. Third, individuals scoring higher for neuroticism showed higher retrieval confidence for fearful faces reinforced at diastole. Our results show that cardiac signals shape feedback processing to influence learning of fearful faces, an effect underpinned by personality differences linked to psychophysiological arousal. Copyright © 2017 Elsevier B.V. All rights reserved.
A reinforcement learning-based architecture for fuzzy logic control

NASA Technical Reports Server (NTRS)

Berenji, Hamid R.

1992-01-01

This paper introduces a new method for learning to refine a rule-based fuzzy logic controller. A reinforcement learning technique is used in conjunction with a multilayer neural network model of a fuzzy controller. The approximate reasoning based intelligent control (ARIC) architecture proposed here learns by updating its prediction of the physical system's behavior and fine tunes a control knowledge base. Its theory is related to Sutton's temporal difference (TD) method. Because ARIC has the advantage of using the control knowledge of an experienced operator and fine tuning it through the process of learning, it learns faster than systems that train networks from scratch. The approach is applied to a cart-pole balancing system.
Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning.

PubMed

Pilarski, Patrick M; Dawson, Michael R; Degris, Thomas; Fahimi, Farbod; Carey, Jason P; Sutton, Richard S

2011-01-01

As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis. © 2011 IEEE
Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning.

PubMed

Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane

2017-01-01

Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.
Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

PubMed Central

Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane

2017-01-01

Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning. PMID:29163004
Application of fuzzy logic-neural network based reinforcement learning to proximity and docking operations: Translational controller results

NASA Technical Reports Server (NTRS)

Jani, Yashvant

1992-01-01

The reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Maximum Mission (SMM) satellite simulation. In utilizing these fuzzy learning techniques, we also use the Approximate Reasoning based Intelligent Control (ARIC) architecture, and so we use two terms interchangeable to imply the same. This activity is carried out in the Software Technology Laboratory utilizing the Orbital Operations Simulator (OOS). This report is the deliverable D3 in our project activity and provides the test results of the fuzzy learning translational controller. This report is organized in six sections. Based on our experience and analysis with the attitude controller, we have modified the basic configuration of the reinforcement learning algorithm in ARIC as described in section 2. The shuttle translational controller and its implementation in fuzzy learning architecture is described in section 3. Two test cases that we have performed are described in section 4. Our results and conclusions are discussed in section 5, and section 6 provides future plans and summary for the project.
Learning the specific quality of taste reinforcement in larval Drosophila

PubMed Central

Schleyer, Michael; Miura, Daisuke; Tanimura, Teiichi; Gerber, Bertram

2015-01-01

The only property of reinforcement insects are commonly thought to learn about is its value. We show that larval Drosophila not only remember the value of reinforcement (How much?), but also its quality (What?). This is demonstrated both within the appetitive domain by using sugar vs amino acid as different reward qualities, and within the aversive domain by using bitter vs high-concentration salt as different qualities of punishment. From the available literature, such nuanced memories for the quality of reinforcement are unexpected and pose a challenge to present models of how insect memory is organized. Given that animals as simple as larval Drosophila, endowed with but 10,000 neurons, operate with both reinforcement value and quality, we suggest that both are fundamental aspects of mnemonic processing—in any brain. DOI: http://dx.doi.org/10.7554/eLife.04711.001 PMID:25622533
Evidence for a neural law of effect.

PubMed

Athalye, Vivek R; Santos, Fernando J; Carmena, Jose M; Costa, Rui M

2018-03-02

Thorndike's law of effect states that actions that lead to reinforcements tend to be repeated more often. Accordingly, neural activity patterns leading to reinforcement are also reentered more frequently. Reinforcement relies on dopaminergic activity in the ventral tegmental area (VTA), and animals shape their behavior to receive dopaminergic stimulation. Seeking evidence for a neural law of effect, we found that mice learn to reenter more frequently motor cortical activity patterns that trigger optogenetic VTA self-stimulation. Learning was accompanied by gradual shaping of these patterns, with participating neurons progressively increasing and aligning their covariance to that of the target pattern. Motor cortex patterns that lead to phasic dopaminergic VTA activity are progressively reinforced and shaped, suggesting a mechanism by which animals select and shape actions to reliably achieve reinforcement. Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Learned helplessness and learned prevalence: exploring the causal relations among perceived controllability, reward prevalence, and exploration.

PubMed

Teodorescu, Kinneret; Erev, Ido

2014-10-01

Exposure to uncontrollable outcomes has been found to trigger learned helplessness, a state in which the agent, because of lack of exploration, fails to take advantage of regained control. Although the implications of this phenomenon have been widely studied, its underlying cause remains undetermined. One can learn not to explore because the environment is uncontrollable, because the average reinforcement for exploring is low, or because rewards for exploring are rare. In the current research, we tested a simple experimental paradigm that contrasts the predictions of these three contributors and offers a unified psychological mechanism that underlies the observed phenomena. Our results demonstrate that learned helplessness is not correlated with either the perceived controllability of one's environment or the average reward, which suggests that reward prevalence is a better predictor of exploratory behavior than the other two factors. A simple computational model in which exploration decisions were based on small samples of past experiences captured the empirical phenomena while also providing a cognitive basis for feelings of uncontrollability. © The Author(s) 2014.
Technology in the teaching of neuroscience: enhanced student learning.

PubMed

Griffin, John D

2003-12-01

The primary motivation for integrating any form of education technology into a particular course or curriculum should always be to enhance student learning. However, it can be difficult to determine which technologies will be the most appropriate and effective teaching tools. Through the alignment of technology-enhanced learning experiences with a clear set of learning objectives, teaching becomes more efficient and effective and learning is truly enhanced. In this article, I describe how I have made extensive use of technology in two neuroscience courses that differ in structure and content. Course websites function as resource centers and provide a forum for student interaction. PowerPoint presentations enhance formal lectures and provide an organized outline of presented material. Some lectures are also supplemented with interactive CD-ROMs, used in the presentation of difficult physiological concepts. In addition, a computer-based physiological recording system is used in laboratory sessions, improving the hands-on experience of group learning while reinforcing the concepts of the research method. Although technology can provide powerful teaching tools, the enhancement of the learning environment is still dependent on the instructor. It is the skill and enthusiasm of the instructor that determines whether technology will be used effectively.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.