Multi-agent Reinforcement Learning Model for Effective Action Selection
NASA Astrophysics Data System (ADS)
Youk, Sang Jo; Lee, Bong Keun
Reinforcement learning is a sub area of machine learning concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward. In the case of multi-agent, especially, which state space and action space gets very enormous in compared to single agent, so it needs to take most effective measure available select the action strategy for effective reinforcement learning. This paper proposes a multi-agent reinforcement learning model based on fuzzy inference system in order to improve learning collect speed and select an effective action in multi-agent. This paper verifies an effective action select strategy through evaluation tests based on Robocop Keep away which is one of useful test-beds for multi-agent. Our proposed model can apply to evaluate efficiency of the various intelligent multi-agents and also can apply to strategy and tactics of robot soccer system.
Intelligent multiagent coordination based on reinforcement hierarchical neuro-fuzzy models.
Mendoza, Leonardo Forero; Vellasco, Marley; Figueiredo, Karla
2014-12-01
This paper presents the research and development of two hybrid neuro-fuzzy models for the hierarchical coordination of multiple intelligent agents. The main objective of the models is to have multiple agents interact intelligently with each other in complex systems. We developed two new models of coordination for intelligent multiagent systems, which integrates the Reinforcement Learning Hierarchical Neuro-Fuzzy model with two proposed coordination mechanisms: the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with a market-driven coordination mechanism (MA-RL-HNFP-MD) and the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with graph coordination (MA-RL-HNFP-CG). In order to evaluate the proposed models and verify the contribution of the proposed coordination mechanisms, two multiagent benchmark applications were developed: the pursuit game and the robot soccer simulation. The results obtained demonstrated that the proposed coordination mechanisms greatly improve the performance of the multiagent system when compared with other strategies.
Multiagent cooperation and competition with deep reinforcement learning.
Tampuu, Ardi; Matiisen, Tambet; Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul
2017-01-01
Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.
Emotional Multiagent Reinforcement Learning in Spatial Social Dilemmas.
Yu, Chao; Zhang, Minjie; Ren, Fenghui; Tan, Guozhen
2015-12-01
Social dilemmas have attracted extensive interest in the research of multiagent systems in order to study the emergence of cooperative behaviors among selfish agents. Understanding how agents can achieve cooperation in social dilemmas through learning from local experience is a critical problem that has motivated researchers for decades. This paper investigates the possibility of exploiting emotions in agent learning in order to facilitate the emergence of cooperation in social dilemmas. In particular, the spatial version of social dilemmas is considered to study the impact of local interactions on the emergence of cooperation in the whole system. A double-layered emotional multiagent reinforcement learning framework is proposed to endow agents with internal cognitive and emotional capabilities that can drive these agents to learn cooperative behaviors. Experimental results reveal that various network topologies and agent heterogeneities have significant impacts on agent learning behaviors in the proposed framework, and under certain circumstances, high levels of cooperation can be achieved among the agents.
Multiagent cooperation and competition with deep reinforcement learning
Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul
2017-01-01
Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments. PMID:28380078
FMRQ-A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks.
Zhang, Zhen; Zhao, Dongbin; Gao, Junwei; Wang, Dongqing; Dai, Yujie
2017-06-01
In this paper, we propose a multiagent reinforcement learning algorithm dealing with fully cooperative tasks. The algorithm is called frequency of the maximum reward Q-learning (FMRQ). FMRQ aims to achieve one of the optimal Nash equilibria so as to optimize the performance index in multiagent systems. The frequency of obtaining the highest global immediate reward instead of immediate reward is used as the reinforcement signal. With FMRQ each agent does not need the observation of the other agents' actions and only shares its state and reward at each step. We validate FMRQ through case studies of repeated games: four cases of two-player two-action and one case of three-player two-action. It is demonstrated that FMRQ can converge to one of the optimal Nash equilibria in these cases. Moreover, comparison experiments on tasks with multiple states and finite steps are conducted. One is box-pushing and the other one is distributed sensor network problem. Experimental results show that the proposed algorithm outperforms others with higher performance.
Concurrent Learning of Control in Multi agent Sequential Decision Tasks
2018-04-17
Concurrent Learning of Control in Multi-agent Sequential Decision Tasks The overall objective of this project was to develop multi-agent reinforcement...learning (MARL) approaches for intelligent agents to autonomously learn distributed control policies in decentral- ized partially observable...shall be subject to any oenalty for failing to comply with a collection of information if it does not display a currently valid OMB control number
Coupled replicator equations for the dynamics of learning in multiagent systems
NASA Astrophysics Data System (ADS)
Sato, Yuzuru; Crutchfield, James P.
2003-01-01
Starting with a group of reinforcement-learning agents we derive coupled replicator equations that describe the dynamics of collective learning in multiagent systems. We show that, although agents model their environment in a self-interested way without sharing knowledge, a game dynamics emerges naturally through environment-mediated interactions. An application to rock-scissors-paper game interactions shows that the collective learning dynamics exhibits a diversity of competitive and cooperative behaviors. These include quasiperiodicity, stable limit cycles, intermittency, and deterministic chaos—behaviors that should be expected in heterogeneous multiagent systems described by the general replicator equations we derive.
Decentralized learning in Markov games.
Vrancx, Peter; Verbeeck, Katja; Nowé, Ann
2008-08-01
Learning automata (LA) were recently shown to be valuable tools for designing multiagent reinforcement learning algorithms. One of the principal contributions of the LA theory is that a set of decentralized independent LA is able to control a finite Markov chain with unknown transition probabilities and rewards. In this paper, we propose to extend this algorithm to Markov games--a straightforward extension of single-agent Markov decision problems to distributed multiagent decision problems. We show that under the same ergodic assumptions of the original theorem, the extended algorithm will converge to a pure equilibrium point between agent policies.
Multiagent Reinforcement Learning With Sparse Interactions by Negotiation and Knowledge Transfer.
Zhou, Luowei; Yang, Pei; Chen, Chunlin; Gao, Yang
2017-05-01
Reinforcement learning has significant applications for multiagent systems, especially in unknown dynamic environments. However, most multiagent reinforcement learning (MARL) algorithms suffer from such problems as exponential computation complexity in the joint state-action space, which makes it difficult to scale up to realistic multiagent problems. In this paper, a novel algorithm named negotiation-based MARL with sparse interactions (NegoSIs) is presented. In contrast to traditional sparse-interaction-based MARL algorithms, NegoSI adopts the equilibrium concept and makes it possible for agents to select the nonstrict equilibrium-dominating strategy profile (nonstrict EDSP) or meta equilibrium for their joint actions. The presented NegoSI algorithm consists of four parts: 1) the equilibrium-based framework for sparse interactions; 2) the negotiation for the equilibrium set; 3) the minimum variance method for selecting one joint action; and 4) the knowledge transfer of local Q -values. In this integrated algorithm, three techniques, i.e., unshared value functions, equilibrium solutions, and sparse interactions are adopted to achieve privacy protection, better coordination and lower computational complexity, respectively. To evaluate the performance of the presented NegoSI algorithm, two groups of experiments are carried out regarding three criteria: 1) steps of each episode; 2) rewards of each episode; and 3) average runtime. The first group of experiments is conducted using six grid world games and shows fast convergence and high scalability of the presented algorithm. Then in the second group of experiments NegoSI is applied to an intelligent warehouse problem and simulated results demonstrate the effectiveness of the presented NegoSI algorithm compared with other state-of-the-art MARL algorithms.
Optimal control in microgrid using multi-agent reinforcement learning.
Li, Fu-Dong; Wu, Min; He, Yong; Chen, Xin
2012-11-01
This paper presents an improved reinforcement learning method to minimize electricity costs on the premise of satisfying the power balance and generation limit of units in a microgrid with grid-connected mode. Firstly, the microgrid control requirements are analyzed and the objective function of optimal control for microgrid is proposed. Then, a state variable "Average Electricity Price Trend" which is used to express the most possible transitions of the system is developed so as to reduce the complexity and randomicity of the microgrid, and a multi-agent architecture including agents, state variables, action variables and reward function is formulated. Furthermore, dynamic hierarchical reinforcement learning, based on change rate of key state variable, is established to carry out optimal policy exploration. The analysis shows that the proposed method is beneficial to handle the problem of "curse of dimensionality" and speed up learning in the unknown large-scale world. Finally, the simulation results under JADE (Java Agent Development Framework) demonstrate the validity of the presented method in optimal control for a microgrid with grid-connected mode. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Wang, Jing; Yang, Tianyu; Staskevich, Gennady; Abbe, Brian
2017-04-01
This paper studies the cooperative control problem for a class of multiagent dynamical systems with partially unknown nonlinear system dynamics. In particular, the control objective is to solve the state consensus problem for multiagent systems based on the minimisation of certain cost functions for individual agents. Under the assumption that there exist admissible cooperative controls for such class of multiagent systems, the formulated problem is solved through finding the optimal cooperative control using the approximate dynamic programming and reinforcement learning approach. With the aid of neural network parameterisation and online adaptive learning, our method renders a practically implementable approximately adaptive neural cooperative control for multiagent systems. Specifically, based on the Bellman's principle of optimality, the Hamilton-Jacobi-Bellman (HJB) equation for multiagent systems is first derived. We then propose an approximately adaptive policy iteration algorithm for multiagent cooperative control based on neural network approximation of the value functions. The convergence of the proposed algorithm is rigorously proved using the contraction mapping method. The simulation results are included to validate the effectiveness of the proposed algorithm.
Unifying Temporal and Structural Credit Assignment Problems
NASA Technical Reports Server (NTRS)
Agogino, Adrian K.; Tumer, Kagan
2004-01-01
Single-agent reinforcement learners in time-extended domains and multi-agent systems share a common dilemma known as the credit assignment problem. Multi-agent systems have the structural credit assignment problem of determining the contributions of a particular agent to a common task. Instead, time-extended single-agent systems have the temporal credit assignment problem of determining the contribution of a particular action to the quality of the full sequence of actions. Traditionally these two problems are considered different and are handled in separate ways. In this article we show how these two forms of the credit assignment problem are equivalent. In this unified frame-work, a single-agent Markov decision process can be broken down into a single-time-step multi-agent process. Furthermore we show that Monte-Carlo estimation or Q-learning (depending on whether the values of resulting actions in the episode are known at the time of learning) are equivalent to different agent utility functions in a multi-agent system. This equivalence shows how an often neglected issue in multi-agent systems is equivalent to a well-known deficiency in multi-time-step learning and lays the basis for solving time-extended multi-agent problems, where both credit assignment problems are present.
Liu, Chunming; Xu, Xin; Hu, Dewen
2013-04-29
Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.
Time-Extended Policies in Mult-Agent Reinforcement Learning
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Agogino, Adrian K.
2004-01-01
Reinforcement learning methods perform well in many domains where a single agent needs to take a sequence of actions to perform a task. These methods use sequences of single-time-step rewards to create a policy that tries to maximize a time-extended utility, which is a (possibly discounted) sum of these rewards. In this paper we build on our previous work showing how these methods can be extended to a multi-agent environment where each agent creates its own policy that works towards maximizing a time-extended global utility over all agents actions. We show improved methods for creating time-extended utilities for the agents that are both "aligned" with the global utility and "learnable." We then show how to crate single-time-step rewards while avoiding the pi fall of having rewards aligned with the global reward leading to utilities not aligned with the global utility. Finally, we apply these reward functions to the multi-agent Gridworld problem. We explicitly quantify a utility's learnability and alignment, and show that reinforcement learning agents using the prescribed reward functions successfully tradeoff learnability and alignment. As a result they outperform both global (e.g., team games ) and local (e.g., "perfectly learnable" ) reinforcement learning solutions by as much as an order of magnitude.
Effect of reinforcement learning on coordination of multiangent systems
NASA Astrophysics Data System (ADS)
Bukkapatnam, Satish T. S.; Gao, Greg
2000-12-01
For effective coordination of distributed environments involving multiagent systems, learning ability of each agent in the environment plays a crucial role. In this paper, we develop a simple group learning method based on reinforcement, and study its effect on coordination through application to a supply chain procurement scenario involving a computer manufacturer. Here, all parties are represented by self-interested, autonomous agents, each capable of performing specific simple tasks. They negotiate with each other to perform complex tasks and thus coordinate supply chain procurement. Reinforcement learning is intended to enable each agent to reach a best negotiable price within a shortest possible time. Our simulations of the application scenario under different learning strategies reveals the positive effects of reinforcement learning on an agent's as well as the system's performance.
Learning Sequences of Actions in Collectives of Autonomous Agents
NASA Technical Reports Server (NTRS)
Turner, Kagan; Agogino, Adrian K.; Wolpert, David H.; Clancy, Daniel (Technical Monitor)
2001-01-01
In this paper we focus on the problem of designing a collective of autonomous agents that individually learn sequences of actions such that the resultant sequence of joint actions achieves a predetermined global objective. We are particularly interested in instances of this problem where centralized control is either impossible or impractical. For single agent systems in similar domains, machine learning methods (e.g., reinforcement learners) have been successfully used. However, applying such solutions directly to multi-agent systems often proves problematic, as agents may work at cross-purposes, or have difficulty in evaluating their contribution to achievement of the global objective, or both. Accordingly, the crucial design step in multiagent systems centers on determining the private objectives of each agent so that as the agents strive for those objectives, the system reaches a good global solution. In this work we consider a version of this problem involving multiple autonomous agents in a grid world. We use concepts from collective intelligence to design goals for the agents that are 'aligned' with the global goal, and are 'learnable' in that agents can readily see how their behavior affects their utility. We show that reinforcement learning agents using those goals outperform both 'natural' extensions of single agent algorithms and global reinforcement, learning solutions based on 'team games'.
NASA Astrophysics Data System (ADS)
Patkin, M. L.; Rogachev, G. N.
2018-02-01
A method for constructing a multi-agent control system for mobile robots based on training with reinforcement using deep neural networks is considered. Synthesis of the management system is proposed to be carried out with reinforcement training and the modified Actor-Critic method, in which the Actor module is divided into Action Actor and Communication Actor in order to simultaneously manage mobile robots and communicate with partners. Communication is carried out by sending partners at each step a vector of real numbers that are added to the observation vector and affect the behaviour. Functions of Actors and Critic are approximated by deep neural networks. The Critics value function is trained by using the TD-error method and the Actor’s function by using DDPG. The Communication Actor’s neural network is trained through gradients received from partner agents. An environment in which a cooperative multi-agent interaction is present was developed, computer simulation of the application of this method in the control problem of two robots pursuing two goals was carried out.
2010-02-01
multi-agent reputation management. State abstraction is a technique used to allow machine learning technologies to cope with problems that have large...state abstrac- tion process to enable reinforcement learning in domains with large state spaces. State abstraction is vital to machine learning ...across a collective of independent platforms. These individual elements, often referred to as agents in the machine learning community, should exhibit both
Zhu, Feng; Aziz, H. M. Abdul; Qian, Xinwu; ...
2015-01-31
Our study develops a novel reinforcement learning algorithm for the challenging coordinated signal control problem. Traffic signals are modeled as intelligent agents interacting with the stochastic traffic environment. The model is built on the framework of coordinated reinforcement learning. The Junction Tree Algorithm (JTA) based reinforcement learning is proposed to obtain an exact inference of the best joint actions for all the coordinated intersections. Moreover, the algorithm is implemented and tested with a network containing 18 signalized intersections in VISSIM. Finally, our results show that the JTA based algorithm outperforms independent learning (Q-learning), real-time adaptive learning, and fixed timing plansmore » in terms of average delay, number of stops, and vehicular emissions at the network level.« less
Optimal and Autonomous Control Using Reinforcement Learning: A Survey.
Kiumarsi, Bahare; Vamvoudakis, Kyriakos G; Modares, Hamidreza; Lewis, Frank L
2018-06-01
This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.
Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning.
Fernandez-Gauna, Borja; Etxeberria-Agiriano, Ismael; Graña, Manuel
2015-01-01
Multi-Agent Reinforcement Learning (MARL) algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL) algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV) in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS) in the relevant state-action subspace. Each agent's local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS) control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.
Reinforcement learning in supply chains.
Valluri, Annapurna; North, Michael J; Macal, Charles M
2009-10-01
Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice.
Model-free learning on robot kinematic chains using a nested multi-agent topology
NASA Astrophysics Data System (ADS)
Karigiannis, John N.; Tzafestas, Costas S.
2016-11-01
This paper proposes a model-free learning scheme for the developmental acquisition of robot kinematic control and dexterous manipulation skills. The approach is based on a nested-hierarchical multi-agent architecture that intuitively encapsulates the topology of robot kinematic chains, where the activity of each independent degree-of-freedom (DOF) is finally mapped onto a distinct agent. Each one of those agents progressively evolves a local kinematic control strategy in a game-theoretic sense, that is, based on a partial (local) view of the whole system topology, which is incrementally updated through a recursive communication process according to the nested-hierarchical topology. Learning is thus approached not through demonstration and training but through an autonomous self-exploration process. A fuzzy reinforcement learning scheme is employed within each agent to enable efficient exploration in a continuous state-action domain. This paper constitutes in fact a proof of concept, demonstrating that global dexterous manipulation skills can indeed evolve through such a distributed iterative learning of local agent sensorimotor mappings. The main motivation behind the development of such an incremental multi-agent topology is to enhance system modularity, to facilitate extensibility to more complex problem domains and to improve robustness with respect to structural variations including unpredictable internal failures. These attributes of the proposed system are assessed in this paper through numerical experiments in different robot manipulation task scenarios, involving both single and multi-robot kinematic chains. The generalisation capacity of the learning scheme is experimentally assessed and robustness properties of the multi-agent system are also evaluated with respect to unpredictable variations in the kinematic topology. Furthermore, these numerical experiments demonstrate the scalability properties of the proposed nested-hierarchical architecture, where new agents can be recursively added in the hierarchy to encapsulate individual active DOFs. The results presented in this paper demonstrate the feasibility of such a distributed multi-agent control framework, showing that the solutions which emerge are plausible and near-optimal. Numerical efficiency and computational cost issues are also discussed.
Reinforcement learning agents providing advice in complex video games
NASA Astrophysics Data System (ADS)
Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa
2014-01-01
This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.
Market Model for Resource Allocation in Emerging Sensor Networks with Reinforcement Learning
Zhang, Yue; Song, Bin; Zhang, Ying; Du, Xiaojiang; Guizani, Mohsen
2016-01-01
Emerging sensor networks (ESNs) are an inevitable trend with the development of the Internet of Things (IoT), and intend to connect almost every intelligent device. Therefore, it is critical to study resource allocation in such an environment, due to the concern of efficiency, especially when resources are limited. By viewing ESNs as multi-agent environments, we model them with an agent-based modelling (ABM) method and deal with resource allocation problems with market models, after describing users’ patterns. Reinforcement learning methods are introduced to estimate users’ patterns and verify the outcomes in our market models. Experimental results show the efficiency of our methods, which are also capable of guiding topology management. PMID:27916841
Analysis of the Pricing Process in Electricity Market using Multi-Agent Model
NASA Astrophysics Data System (ADS)
Shimomura, Takahiro; Saisho, Yuichi; Fujii, Yasumasa; Yamaji, Kenji
Many electric utilities world-wide have been forced to change their ways of doing business, from vertically integrated mechanisms to open market systems. We are facing urgent issues about how we design the structures of power market systems. In order to settle down these issues, many studies have been made with market models of various characteristics and regulations. The goal of modeling analysis is to enrich our understanding of fundamental process that may appear. However, there are many kinds of modeling methods. Each has drawback and advantage about validity and versatility. This paper presents two kinds of methods to construct multi-agent market models. One is based on game theory and another is based on reinforcement learning. By comparing the results of the two methods, they can advance in validity and help us figure out potential problems in electricity markets which have oligopolistic generators, demand fluctuation and inelastic demand. Moreover, this model based on reinforcement learning enables us to consider characteristics peculiar to electricity markets which have plant unit characteristics, seasonable and hourly demand fluctuation, real-time regulation market and operating reserve market. This model figures out importance of the share of peak-load-plants and the way of designing operating reserve market.
Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer.
Hu, Yujing; Gao, Yang; An, Bo
2015-07-01
An important approach in multiagent reinforcement learning (MARL) is equilibrium-based MARL, which adopts equilibrium solution concepts in game theory and requires agents to play equilibrium strategies at each state. However, most existing equilibrium-based MARL algorithms cannot scale due to a large number of computationally expensive equilibrium computations (e.g., computing Nash equilibria is PPAD-hard) during learning. For the first time, this paper finds that during the learning process of equilibrium-based MARL, the one-shot games corresponding to each state's successive visits often have the same or similar equilibria (for some states more than 90% of games corresponding to successive visits have similar equilibria). Inspired by this observation, this paper proposes to use equilibrium transfer to accelerate equilibrium-based MARL. The key idea of equilibrium transfer is to reuse previously computed equilibria when each agent has a small incentive to deviate. By introducing transfer loss and transfer condition, a novel framework called equilibrium transfer-based MARL is proposed. We prove that although equilibrium transfer brings transfer loss, equilibrium-based MARL algorithms can still converge to an equilibrium policy under certain assumptions. Experimental results in widely used benchmarks (e.g., grid world game, soccer game, and wall game) show that the proposed framework: 1) not only significantly accelerates equilibrium-based MARL (up to 96.7% reduction in learning time), but also achieves higher average rewards than algorithms without equilibrium transfer and 2) scales significantly better than algorithms without equilibrium transfer when the state/action space grows and the number of agents increases.
Distributed reinforcement learning for adaptive and robust network intrusion response
NASA Astrophysics Data System (ADS)
Malialis, Kleanthis; Devlin, Sam; Kudenko, Daniel
2015-07-01
Distributed denial of service (DDoS) attacks constitute a rapidly evolving threat in the current Internet. Multiagent Router Throttling is a novel approach to defend against DDoS attacks where multiple reinforcement learning agents are installed on a set of routers and learn to rate-limit or throttle traffic towards a victim server. The focus of this paper is on online learning and scalability. We propose an approach that incorporates task decomposition, team rewards and a form of reward shaping called difference rewards. One of the novel characteristics of the proposed system is that it provides a decentralised coordinated response to the DDoS problem, thus being resilient to DDoS attacks themselves. The proposed system learns remarkably fast, thus being suitable for online learning. Furthermore, its scalability is successfully demonstrated in experiments involving 1000 learning agents. We compare our approach against a baseline and a popular state-of-the-art throttling technique from the network security literature and show that the proposed approach is more effective, adaptive to sophisticated attack rate dynamics and robust to agent failures.
Multi-agents and learning: Implications for Webusage mining.
Lotfy, Hewayda M S; Khamis, Soheir M S; Aboghazalah, Maie M
2016-03-01
Characterization of user activities is an important issue in the design and maintenance of websites. Server weblog files have abundant information about the user's current interests. This information can be mined and analyzed therefore the administrators may be able to guide the users in their browsing activity so they may obtain relevant information in a shorter span of time to obtain user satisfaction. Web-based technology facilitates the creation of personally meaningful and socially useful knowledge through supportive interactions, communication and collaboration among educators, learners and information. This paper suggests a new methodology based on learning techniques for a Web-based Multiagent-based application to discover the hidden patterns in the user's visited links. It presents a new approach that involves unsupervised, reinforcement learning, and cooperation between agents. It is utilized to discover patterns that represent the user's profiles in a sample website into specific categories of materials using significance percentages. These profiles are used to make recommendations of interesting links and categories to the user. The experimental results of the approach showed successful user pattern recognition, and cooperative learning among agents to obtain user profiles. It indicates that combining different learning algorithms is capable of improving user satisfaction indicated by the percentage of precision, recall, the progressive category weight and F 1-measure.
Multi-agents and learning: Implications for Webusage mining
Lotfy, Hewayda M.S.; Khamis, Soheir M.S.; Aboghazalah, Maie M.
2015-01-01
Characterization of user activities is an important issue in the design and maintenance of websites. Server weblog files have abundant information about the user’s current interests. This information can be mined and analyzed therefore the administrators may be able to guide the users in their browsing activity so they may obtain relevant information in a shorter span of time to obtain user satisfaction. Web-based technology facilitates the creation of personally meaningful and socially useful knowledge through supportive interactions, communication and collaboration among educators, learners and information. This paper suggests a new methodology based on learning techniques for a Web-based Multiagent-based application to discover the hidden patterns in the user’s visited links. It presents a new approach that involves unsupervised, reinforcement learning, and cooperation between agents. It is utilized to discover patterns that represent the user’s profiles in a sample website into specific categories of materials using significance percentages. These profiles are used to make recommendations of interesting links and categories to the user. The experimental results of the approach showed successful user pattern recognition, and cooperative learning among agents to obtain user profiles. It indicates that combining different learning algorithms is capable of improving user satisfaction indicated by the percentage of precision, recall, the progressive category weight and F1-measure. PMID:26966569
Agent Reward Shaping for Alleviating Traffic Congestion
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Agogino, Adrian
2006-01-01
Traffic congestion problems provide a unique environment to study how multi-agent systems promote desired system level behavior. What is particularly interesting in this class of problems is that no individual action is intrinsically "bad" for the system but that combinations of actions among agents lead to undesirable outcomes, As a consequence, agents need to learn how to coordinate their actions with those of other agents, rather than learn a particular set of "good" actions. This problem is ubiquitous in various traffic problems, including selecting departure times for commuters, routes for airlines, and paths for data routers. In this paper we present a multi-agent approach to two traffic problems, where far each driver, an agent selects the most suitable action using reinforcement learning. The agent rewards are based on concepts from collectives and aim to provide the agents with rewards that are both easy to learn and that if learned, lead to good system level behavior. In the first problem, we study how agents learn the best departure times of drivers in a daily commuting environment and how following those departure times alleviates congestion. In the second problem, we study how agents learn to select desirable routes to improve traffic flow and minimize delays for. all drivers.. In both sets of experiments,. agents using collective-based rewards produced near optimal performance (93-96% of optimal) whereas agents using system rewards (63-68%) barely outperformed random action selection (62-64%) and agents using local rewards (48-72%) performed worse than random in some instances.
Learning to use working memory: a reinforcement learning gating model of rule acquisition in rats
Lloyd, Kevin; Becker, Nadine; Jones, Matthew W.; Bogacz, Rafal
2012-01-01
Learning to form appropriate, task-relevant working memory representations is a complex process central to cognition. Gating models frame working memory as a collection of past observations and use reinforcement learning (RL) to solve the problem of when to update these observations. Investigation of how gating models relate to brain and behavior remains, however, at an early stage. The current study sought to explore the ability of simple RL gating models to replicate rule learning behavior in rats. Rats were trained in a maze-based spatial learning task that required animals to make trial-by-trial choices contingent upon their previous experience. Using an abstract version of this task, we tested the ability of two gating algorithms, one based on the Actor-Critic and the other on the State-Action-Reward-State-Action (SARSA) algorithm, to generate behavior consistent with the rats'. Both models produced rule-acquisition behavior consistent with the experimental data, though only the SARSA gating model mirrored faster learning following rule reversal. We also found that both gating models learned multiple strategies in solving the initial task, a property which highlights the multi-agent nature of such models and which is of importance in considering the neural basis of individual differences in behavior. PMID:23115551
Design and Control of Large Collections of Learning Agents
NASA Technical Reports Server (NTRS)
Agogino, Adrian
2001-01-01
The intelligent control of multiple autonomous agents is an important yet difficult task. Previous methods used to address this problem have proved to be either too brittle, too hard to use, or not scalable to large systems. The 'Collective Intelligence' project at NASA/Ames provides an elegant, machine-learning approach to address these problems. This approach mathematically defines some essential properties that a reward system should have to promote coordinated behavior among reinforcement learners. This work has focused on creating additional key properties and algorithms within the mathematics of the Collective Intelligence framework. One of the additions will allow agents to learn more quickly, in a more coordinated manner. The other will let agents learn with less knowledge of their environment. These additions will allow the framework to be applied more easily, to a much larger domain of multi-agent problems.
Optimal Reward Functions in Distributed Reinforcement Learning
NASA Technical Reports Server (NTRS)
Wolpert, David H.; Tumer, Kagan
2000-01-01
We consider the design of multi-agent systems so as to optimize an overall world utility function when (1) those systems lack centralized communication and control, and (2) each agents runs a distinct Reinforcement Learning (RL) algorithm. A crucial issue in such design problems is to initialize/update each agent's private utility function, so as to induce best possible world utility. Traditional 'team game' solutions to this problem sidestep this issue and simply assign to each agent the world utility as its private utility function. In previous work we used the 'Collective Intelligence' framework to derive a better choice of private utility functions, one that results in world utility performance up to orders of magnitude superior to that ensuing from use of the team game utility. In this paper we extend these results. We derive the general class of private utility functions that both are easy for the individual agents to learn and that, if learned well, result in high world utility. We demonstrate experimentally that using these new utility functions can result in significantly improved performance over that of our previously proposed utility, over and above that previous utility's superiority to the conventional team game utility.
Zhang, Chengwei; Li, Xiaohong; Li, Shuxin; Feng, Zhiyong
2017-09-20
Biological environment is uncertain and its dynamic is similar to the multiagent environment, thus the research results of the multiagent system area can provide valuable insights to the understanding of biology and are of great significance for the study of biology. Learning in a multiagent environment is highly dynamic since the environment is not stationary anymore and each agent's behavior changes adaptively in response to other coexisting learners, and vice versa. The dynamics becomes more unpredictable when we move from fixed-agent interaction environments to multiagent social learning framework. Analytical understanding of the underlying dynamics is important and challenging. In this work, we present a social learning framework with homogeneous learners (e.g., Policy Hill Climbing (PHC) learners), and model the behavior of players in the social learning framework as a hybrid dynamical system. By analyzing the dynamical system, we obtain some conditions about convergence or non-convergence. We experimentally verify the predictive power of our model using a number of representative games. Experimental results confirm the theoretical analysis. Under multiagent social learning framework, we modeled the behavior of agent in biologic environment, and theoretically analyzed the dynamics of the model. We present some sufficient conditions about convergence or non-convergence and prove them theoretically. It can be used to predict the convergence of the system.
Multi-Agent Methods for the Configuration of Random Nanocomputers
NASA Technical Reports Server (NTRS)
Lawson, John W.
2004-01-01
As computational devices continue to shrink, the cost of manufacturing such devices is expected to grow exponentially. One alternative to the costly, detailed design and assembly of conventional computers is to place the nano-electronic components randomly on a chip. The price for such a trivial assembly process is that the resulting chip would not be programmable by conventional means. In this work, we show that such random nanocomputers can be adaptively programmed using multi-agent methods. This is accomplished through the optimization of an associated high dimensional error function. By representing each of the independent variables as a reinforcement learning agent, we are able to achieve convergence must faster than with other methods, including simulated annealing. Standard combinational logic circuits such as adders and multipliers are implemented in a straightforward manner. In addition, we show that the intrinsic flexibility of these adaptive methods allows the random computers to be reconfigured easily, making them reusable. Recovery from faults is also demonstrated.
Multi-Agent Framework for Virtual Learning Spaces.
ERIC Educational Resources Information Center
Sheremetov, Leonid; Nunez, Gustavo
1999-01-01
Discussion of computer-supported collaborative learning, distributed artificial intelligence, and intelligent tutoring systems focuses on the concept of agents, and describes a virtual learning environment that has a multi-agent system. Describes a model of interactions in collaborative learning and discusses agents for Web-based virtual…
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Vengerov, David
1999-01-01
Successful operations of future multi-agent intelligent systems require efficient cooperation schemes between agents sharing learning experiences. We consider a pseudo-realistic world in which one or more opportunities appear and disappear in random locations. Agents use fuzzy reinforcement learning to learn which opportunities are most worthy of pursuing based on their promise rewards, expected lifetimes, path lengths and expected path costs. We show that this world is partially observable because the history of an agent influences the distribution of its future states. We consider a cooperation mechanism in which agents share experience by using and-updating one joint behavior policy. We also implement a coordination mechanism for allocating opportunities to different agents in the same world. Our results demonstrate that K cooperative agents each learning in a separate world over N time steps outperform K independent agents each learning in a separate world over K*N time steps, with this result becoming more pronounced as the degree of partial observability in the environment increases. We also show that cooperation between agents learning in the same world decreases performance with respect to independent agents. Since cooperation reduces diversity between agents, we conclude that diversity is a key parameter in the trade off between maximizing utility from cooperation when diversity is low and maximizing utility from competitive coordination when diversity is high.
Implementing Multiage Education.
ERIC Educational Resources Information Center
Gaustad, Joan
1996-01-01
Multiage education is the placement of children of varying ages, grades, and ability levels in the same classroom with the aim of improving learning for all of them. Teaching a multiage class requires very different knowledge and skills than teaching traditional single-grade classes. Interest in multiage education has grown in recent years, and…
NASA Astrophysics Data System (ADS)
Lubashevsky, I.; Kanemoto, S.
2010-07-01
A continuous time model for multiagent systems governed by reinforcement learning with scale-free memory is developed. The agents are assumed to act independently of one another in optimizing their choice of possible actions via trial-and-error search. To gain awareness about the action value the agents accumulate in their memory the rewards obtained from taking a specific action at each moment of time. The contribution of the rewards in the past to the agent current perception of action value is described by an integral operator with a power-law kernel. Finally a fractional differential equation governing the system dynamics is obtained. The agents are considered to interact with one another implicitly via the reward of one agent depending on the choice of the other agents. The pairwise interaction model is adopted to describe this effect. As a specific example of systems with non-transitive interactions, a two agent and three agent systems of the rock-paper-scissors type are analyzed in detail, including the stability analysis and numerical simulation. Scale-free memory is demonstrated to cause complex dynamics of the systems at hand. In particular, it is shown that there can be simultaneously two modes of the system instability undergoing subcritical and supercritical bifurcation, with the latter one exhibiting anomalous oscillations with the amplitude and period growing with time. Besides, the instability onset via this supercritical mode may be regarded as “altruism self-organization”. For the three agent system the instability dynamics is found to be rather irregular and can be composed of alternate fragments of oscillations different in their properties.
Collective Machine Learning: Team Learning and Classification in Multi-Agent Systems
ERIC Educational Resources Information Center
Gifford, Christopher M.
2009-01-01
This dissertation focuses on the collaboration of multiple heterogeneous, intelligent agents (hardware or software) which collaborate to learn a task and are capable of sharing knowledge. The concept of collaborative learning in multi-agent and multi-robot systems is largely under studied, and represents an area where further research is needed to…
Scalable Planning and Learning for Multiagent POMDPs
2015-01-01
Scalable Planning and Learning for Multiagent POMDPs Christopher Amato CSAIL, MIT Cambridge, MA 02139 camato@csail.mit.edu Frans A. Oliehoek...state of a special POMDP, called a BA- POMDP. The BA-POMDP can be extended to the multiagent setting ( Amato and Oliehoek 2013), yielding the Bayes...2012; Amato et al. 2013) in the form of factored Dec-POMDPs (Oliehoek, Whiteson, and Spaan 2013; Pajarinen and Pel- tonen 2011) and networked
A Distributed Intelligent E-Learning System
ERIC Educational Resources Information Center
Kristensen, Terje
2016-01-01
An E-learning system based on a multi-agent (MAS) architecture combined with the Dynamic Content Manager (DCM) model of E-learning, is presented. We discuss the benefits of using such a multi-agent architecture. Finally, the MAS architecture is compared with a pure service-oriented architecture (SOA). This MAS architecture may also be used within…
An incremental approach to genetic-algorithms-based classification.
Guan, Sheng-Uei; Zhu, Fangming
2005-04-01
Incremental learning has been widely addressed in the machine learning literature to cope with learning tasks where the learning environment is ever changing or training samples become available over time. However, most research work explores incremental learning with statistical algorithms or neural networks, rather than evolutionary algorithms. The work in this paper employs genetic algorithms (GAs) as basic learning algorithms for incremental learning within one or more classifier agents in a multiagent environment. Four new approaches with different initialization schemes are proposed. They keep the old solutions and use an "integration" operation to integrate them with new elements to accommodate new attributes, while biased mutation and crossover operations are adopted to further evolve a reinforced solution. The simulation results on benchmark classification data sets show that the proposed approaches can deal with the arrival of new input attributes and integrate them with the original input space. It is also shown that the proposed approaches can be successfully used for incremental learning and improve classification rates as compared to the retraining GA. Possible applications for continuous incremental training and feature selection are also discussed.
NASA Technical Reports Server (NTRS)
HolmesParker, Chris; Taylor, Mathew E.; Tumer, Kagan; Agogino, Adrian
2014-01-01
Learning in multiagent systems can be slow because agents must learn both how to behave in a complex environment and how to account for the actions of other agents. The inability of an agent to distinguish between the true environmental dynamics and those caused by the stochastic exploratory actions of other agents creates noise in each agent's reward signal. This learning noise can have unforeseen and often undesirable effects on the resultant system performance. We define such noise as exploratory action noise, demonstrate the critical impact it can have on the learning process in multiagent settings, and introduce a reward structure to effectively remove such noise from each agent's reward signal. In particular, we introduce Coordinated Learning without Exploratory Action Noise (CLEAN) rewards and empirically demonstrate their benefits
ERIC Educational Resources Information Center
McIntyre, Ellen; Kyle, Diane
2002-01-01
In nongraded, multi-age classrooms, children have the opportunity to learn a great deal from their more proficient classmates. Children in multi-age, nongraded programs often learn that children differ, and they learn to assist each other in productive ways. The organizational scheme has the potential to remove much of the competition of…
Quicker Q-Learning in Multi-Agent Systems
NASA Technical Reports Server (NTRS)
Agogino, Adrian K.; Tumer, Kagan
2005-01-01
Multi-agent learning in Markov Decisions Problems is challenging because of the presence ot two credit assignment problems: 1) How to credit an action taken at time step t for rewards received at t' greater than t; and 2) How to credit an action taken by agent i considering the system reward is a function of the actions of all the agents. The first credit assignment problem is typically addressed with temporal difference methods such as Q-learning OK TD(lambda) The second credit assi,onment problem is typically addressed either by hand-crafting reward functions that assign proper credit to an agent, or by making certain independence assumptions about an agent's state-space and reward function. To address both credit assignment problems simultaneously, we propose the Q Updates with Immediate Counterfactual Rewards-learning (QUICR-learning) designed to improve both the convergence properties and performance of Q-learning in large multi-agent problems. Instead of assuming that an agent s value function can be made independent of other agents, this method suppresses the impact of other agents using counterfactual rewards. Results on multi-agent grid-world problems over multiple topologies show that QUICR-learning can achieve up to thirty fold improvements in performance over both conventional and local Q-learning in the largest tested systems.
Multiage Grouping and Student Collaboration
ERIC Educational Resources Information Center
Cowan, Matthew
2014-01-01
The aim of this action research project was to investigate students' social preferences and pro-social interactions in a multiage, high school classroom in order to better understand how to group students to maximize learning and collaboration. According to many educational experts and previous inquiries, mixed-age learning groups introduce…
ERIC Educational Resources Information Center
Edwards, Susan; Blaise, Mindy; Hammer, Marie
2009-01-01
Postdevelopmental perspectives in early childhood education and care increasingly reference alternative ways of understanding learning, growth and development in early learning. Drawing on these ideas, this paper examines research findings which focused on early childhood teachers' understandings of multiage grouping. The findings suggested that…
A Center Moves toward Multiage Grouping: What Have We Learned?
ERIC Educational Resources Information Center
Schrier, Deborah; Mercado, Betsy
1994-01-01
Notes that, despite concerns from parents and caregivers, recent research suggests that major benefits result from multiage grouping. Examines the concept of multiage grouping and explores practical issues raised by parents, teachers, and administrators in the Early Childhood Research Center at the State University of New York at Buffalo as it…
Agents Control in Intelligent Learning Systems: The Case of Reactive Characteristics
ERIC Educational Resources Information Center
Laureano-Cruces, Ana Lilia; Ramirez-Rodriguez, Javier; de Arriaga, Fernando; Escarela-Perez, Rafael
2006-01-01
Intelligent learning systems (ILSs) have evolved in the last few years basically because of influences received from multi-agent architectures (MAs). Conflict resolution among agents has been a very important problem for multi-agent systems, with specific features in the case of ILSs. The literature shows that ILSs with cognitive or pedagogical…
Optimal Wonderful Life Utility Functions in Multi-Agent Systems
NASA Technical Reports Server (NTRS)
Wolpert, David H.; Tumer, Kagan; Swanson, Keith (Technical Monitor)
2000-01-01
The mathematics of Collective Intelligence (COINs) is concerned with the design of multi-agent systems so as to optimize an overall global utility function when those systems lack centralized communication and control. Typically in COINs each agent runs a distinct Reinforcement Learning (RL) algorithm, so that much of the design problem reduces to how best to initialize/update each agent's private utility function, as far as the ensuing value of the global utility is concerned. Traditional team game solutions to this problem assign to each agent the global utility as its private utility function. In previous work we used the COIN framework to derive the alternative Wonderful Life Utility (WLU), and experimentally established that having the agents use it induces global utility performance up to orders of magnitude superior to that induced by use of the team game utility. The WLU has a free parameter (the clamping parameter) which we simply set to zero in that previous work. Here we derive the optimal value of the clamping parameter, and demonstrate experimentally that using that optimal value can result in significantly improved performance over that of clamping to zero, over and above the improvement beyond traditional approaches.
Product Distribution Theory for Control of Multi-Agent Systems
NASA Technical Reports Server (NTRS)
Lee, Chia Fan; Wolpert, David H.
2004-01-01
Product Distribution (PD) theory is a new framework for controlling Multi-Agent Systems (MAS's). First we review one motivation of PD theory, as the information-theoretic extension of conventional full-rationality game theory to the case of bounded rational agents. In this extension the equilibrium of the game is the optimizer of a Lagrangian of the (probability distribution of) the joint stare of the agents. Accordingly we can consider a team game in which the shared utility is a performance measure of the behavior of the MAS. For such a scenario the game is at equilibrium - the Lagrangian is optimized - when the joint distribution of the agents optimizes the system's expected performance. One common way to find that equilibrium is to have each agent run a reinforcement learning algorithm. Here we investigate the alternative of exploiting PD theory to run gradient descent on the Lagrangian. We present computer experiments validating some of the predictions of PD theory for how best to do that gradient descent. We also demonstrate how PD theory can improve performance even when we are not allowed to rerun the MAS from different initial conditions, a requirement implicit in some previous work.
The Modern Multi-Age Classroom
ERIC Educational Resources Information Center
Carter, Paula
2005-01-01
The students from first, second and third grade in a high-poverty school system learn from one another and flourish in a caring classroom. Multi-age grouping builds strong relationships among teachers, students, and families.
ERIC Educational Resources Information Center
Chadli, Abdelhafid; Bendella, Fatima; Tranvouez, Erwan
2015-01-01
In this paper we present an Agent-based evaluation approach in a context of Multi-agent simulation learning systems. Our evaluation model is based on a two stage assessment approach: (1) a Distributed skill evaluation combining agents and fuzzy sets theory; and (2) a Negotiation based evaluation of students' performance during a training…
ERIC Educational Resources Information Center
Frost, Joe L.; And Others
Three brochures for parents are presented. The first lists potential playground hazards and suggestions for improving playgrounds. The second describes benefits of the multiage classroom, comparing such a classroom with a traditional, single-grade class. The third brochure describes verbal, logical, visual, musical, and physical learning styles…
Learning and innovative elements of strategy adoption rules expand cooperative network topologies.
Wang, Shijun; Szalay, Máté S; Zhang, Changshui; Csermely, Peter
2008-04-09
Cooperation plays a key role in the evolution of complex systems. However, the level of cooperation extensively varies with the topology of agent networks in the widely used models of repeated games. Here we show that cooperation remains rather stable by applying the reinforcement learning strategy adoption rule, Q-learning on a variety of random, regular, small-word, scale-free and modular network models in repeated, multi-agent Prisoner's Dilemma and Hawk-Dove games. Furthermore, we found that using the above model systems other long-term learning strategy adoption rules also promote cooperation, while introducing a low level of noise (as a model of innovation) to the strategy adoption rules makes the level of cooperation less dependent on the actual network topology. Our results demonstrate that long-term learning and random elements in the strategy adoption rules, when acting together, extend the range of network topologies enabling the development of cooperation at a wider range of costs and temptations. These results suggest that a balanced duo of learning and innovation may help to preserve cooperation during the re-organization of real-world networks, and may play a prominent role in the evolution of self-organizing, complex systems.
Hardware-Assisted Large-Scale Neuroevolution for Multiagent Learning
2014-12-30
SECURITY CLASSIFICATION OF: This DURIP equipment award was used to purchase, install, and bring on-line two Berkeley Emulation Engines ( BEEs ) and two...mini- BEE machines to establish an FPGA-based high-performance multiagent training platform and its associated software. This acquisition of BEE4-W...Platform; Probabilistic Domain Transformation; Hardware-Assisted; FPGA; BEE ; Hive Brain; Multiagent. REPORT DOCUMENTATION PAGE 11. SPONSOR/MONITOR’S
ERIC Educational Resources Information Center
Linik, Joyce Riha
1998-01-01
Describes techniques used in a multi-age class at Coupeville Elementary School, Washington, to boost reading comprehension and inspire students' love of books: access to an abundance of books, challenges to students, skills reinforcement, combined phonics and whole-language instruction, running-record assessment, paired reading, independent…
Yang, Yongliang; Modares, Hamidreza; Wunsch, Donald C; Yin, Yixin
2018-06-01
This paper develops optimal control protocols for the distributed output synchronization problem of leader-follower multiagent systems with an active leader. Agents are assumed to be heterogeneous with different dynamics and dimensions. The desired trajectory is assumed to be preplanned and is generated by the leader. Other follower agents autonomously synchronize to the leader by interacting with each other using a communication network. The leader is assumed to be active in the sense that it has a nonzero control input so that it can act independently and update its control to keep the followers away from possible danger. A distributed observer is first designed to estimate the leader's state and generate the reference signal for each follower. Then, the output synchronization of leader-follower systems with an active leader is formulated as a distributed optimal tracking problem, and inhomogeneous algebraic Riccati equations (AREs) are derived to solve it. The resulting distributed optimal control protocols not only minimize the steady-state error but also optimize the transient response of the agents. An off-policy reinforcement learning algorithm is developed to solve the inhomogeneous AREs online in real time and without requiring any knowledge of the agents' dynamics. Finally, two simulation examples are conducted to illustrate the effectiveness of the proposed algorithm.
NASA Astrophysics Data System (ADS)
Madani, Kaveh; Hooshyar, Milad
2014-11-01
Reservoir systems with multiple operators can benefit from coordination of operation policies. To maximize the total benefit of these systems the literature has normally used the social planner's approach. Based on this approach operation decisions are optimized using a multi-objective optimization model with a compound system's objective. While the utility of the system can be increased this way, fair allocation of benefits among the operators remains challenging for the social planner who has to assign controversial weights to the system's beneficiaries and their objectives. Cooperative game theory provides an alternative framework for fair and efficient allocation of the incremental benefits of cooperation. To determine the fair and efficient utility shares of the beneficiaries, cooperative game theory solution methods consider the gains of each party in the status quo (non-cooperation) as well as what can be gained through the grand coalition (social planner's solution or full cooperation) and partial coalitions. Nevertheless, estimation of the benefits of different coalitions can be challenging in complex multi-beneficiary systems. Reinforcement learning can be used to address this challenge and determine the gains of the beneficiaries for different levels of cooperation, i.e., non-cooperation, partial cooperation, and full cooperation, providing the essential input for allocation based on cooperative game theory. This paper develops a game theory-reinforcement learning (GT-RL) method for determining the optimal operation policies in multi-operator multi-reservoir systems with respect to fairness and efficiency criteria. As the first step to underline the utility of the GT-RL method in solving complex multi-agent multi-reservoir problems without a need for developing compound objectives and weight assignment, the proposed method is applied to a hypothetical three-agent three-reservoir system.
QUICR-learning for Multi-Agent Coordination
NASA Technical Reports Server (NTRS)
Agogino, Adrian K.; Tumer, Kagan
2006-01-01
Coordinating multiple agents that need to perform a sequence of actions to maximize a system level reward requires solving two distinct credit assignment problems. First, credit must be assigned for an action taken at time step t that results in a reward at time step t > t. Second, credit must be assigned for the contribution of agent i to the overall system performance. The first credit assignment problem is typically addressed with temporal difference methods such as Q-learning. The second credit assignment problem is typically addressed by creating custom reward functions. To address both credit assignment problems simultaneously, we propose the "Q Updates with Immediate Counterfactual Rewards-learning" (QUICR-learning) designed to improve both the convergence properties and performance of Q-learning in large multi-agent problems. QUICR-learning is based on previous work on single-time-step counterfactual rewards described by the collectives framework. Results on a traffic congestion problem shows that QUICR-learning is significantly better than a Q-learner using collectives-based (single-time-step counterfactual) rewards. In addition QUICR-learning provides significant gains over conventional and local Q-learning. Additional results on a multi-agent grid-world problem show that the improvements due to QUICR-learning are not domain specific and can provide up to a ten fold increase in performance over existing methods.
Early childhood numeracy in a multiage setting
NASA Astrophysics Data System (ADS)
Wood, Karen; Frid, Sandra
2005-10-01
This research is a case study examining numeracy teaching and learning practices in an early childhood multiage setting with Pre-Primary to Year 2 children. Data were collected via running records, researcher reflection notes, and video and audio recordings. Video and audio transcripts were analysed using a mathematical discourse and social interactions coding system designed by MacMillan (1998), while the running records and reflection notes contributed to descriptions of the children's interactions with each other and with the teachers. Teachers used an `assisted performance' approach to instruction that supported problem solving and inquiry processes in mathematics activities, and this, combined with a child-centred pedagogy and specific values about community learning, created a learning environment designed to stimulate and foster learning. The mathematics discourse analysis showed a use of explanatory language in mathematics discourse, and this language supported scaffolding among children for new mathematics concepts. These and other interactions related to peer sharing, tutoring and regulation also emerged as key aspects of students' learning practices. However, the findings indicated that multiage grouping alone did not support learning. Rather, effective learning was dependent upon the teacher's capacities to develop productive discussion among children, as well as implement developmentally appropriate curricula that addressed the needs of the different children.
A Multi-Agent System for Intelligent Online Education.
ERIC Educational Resources Information Center
O'Riordan, Colm; Griffith, Josephine
1999-01-01
Describes the system architecture of an intelligent Web-based education system that includes user modeling agents, information filtering agents for automatic information gathering, and the multi-agent interaction. Discusses information management; user interaction; support for collaborative peer-peer learning; implementation; testing; and future…
No-Regret Learning and a Mechanism for Distributed Multiagent Planning
2008-02-01
adversarial agents who influence prices for the resources. The adversarial agents benefit from arbitrage : that is, their incentive is to uncover ...who influence prices for the resources. The adversarial agents benefit from arbitrage : that is, their incentive is to uncover violations of the resource...2. REPORT TYPE 3. DATES COVERED 00-00-2008 to 00-00-2008 4. TITLE AND SUBTITLE No-Regret Learning and a Mechanism for Distributed Multiagent
Research and application of multi-agent genetic algorithm in tower defense game
NASA Astrophysics Data System (ADS)
Jin, Shaohua
2018-04-01
In this paper, a new multi-agent genetic algorithm based on orthogonal experiment is proposed, which is based on multi-agent system, genetic algorithm and orthogonal experimental design. The design of neighborhood competition operator, orthogonal crossover operator, Son and self-learning operator. The new algorithm is applied to mobile tower defense game, according to the characteristics of the game, the establishment of mathematical models, and finally increases the value of the game's monster.
NASA Astrophysics Data System (ADS)
Riegels, N.; Siegfried, T.; Pereira Cardenal, S. J.; Jensen, R. A.; Bauer-Gottwein, P.
2008-12-01
In most economics--driven approaches to optimizing water use at the river basin scale, the system is modelled deterministically with the goal of maximizing overall benefits. However, actual operation and allocation decisions must be made under hydrologic and economic uncertainty. In addition, river basins often cross political boundaries, and different states may not be motivated to cooperate so as to maximize basin- scale benefits. Even within states, competing agents such as irrigation districts, municipal water agencies, and large industrial users may not have incentives to cooperate to realize efficiency gains identified in basin- level studies. More traditional simulation--optimization approaches assume pre-commitment by individual agents and stakeholders and unconditional compliance on each side. While this can help determine attainable gains and tradeoffs from efficient management, such hardwired policies do not account for dynamic feedback between agents themselves or between agents and their environments (e.g. due to climate change etc.). In reality however, we are dealing with an out-of-equilibrium multi-agent system, where there is neither global knowledge nor global control, but rather continuous strategic interaction between decision making agents. Based on the theory of stochastic games, we present a computational framework that allows for studying the dynamic feedback between decision--making agents themselves and an inherently uncertain environment in a spatially and temporally distributed manner. Agents with decision-making control over water allocation such as countries, irrigation districts, and municipalities are represented by reinforcement learning agents and coupled to a detailed hydrologic--economic model. This approach emphasizes learning by agents from their continuous interaction with other agents and the environment. It provides a convenient framework for the solution of the problem of dynamic decision-making in a mixed cooperative / non-cooperative environment with which different institutional setups and incentive systems can be studied so to identify reasonable ways to reach desirable, Pareto--optimal allocation outcomes. Preliminary results from an application to the Syr Darya river basin in Central Asia will be presented and discussed. The Syr Darya River is a classic example of a transboundary river basin in which basin-wide efficiency gains identified in optimization studies have not been sufficient to induce cooperative management of the river by the riparian states.
Early Childhood Numeracy in a Multiage Setting
ERIC Educational Resources Information Center
Wood, Karen; Frid, Sandra
2005-01-01
This research is a case study examining numeracy teaching and learning practices in an early childhood multiage setting with Pre-Primary to Year 2 children. Data were collected via running records, researcher reflection notes, and video and audio recordings. Video and audio transcripts were analysed using a mathematical discourse and social…
Sequential Nonlinear Learning for Distributed Multiagent Systems via Extreme Learning Machines.
Vanli, Nuri Denizcan; Sayin, Muhammed O; Delibalta, Ibrahim; Kozat, Suleyman Serdar
2017-03-01
We study online nonlinear learning over distributed multiagent systems, where each agent employs a single hidden layer feedforward neural network (SLFN) structure to sequentially minimize arbitrary loss functions. In particular, each agent trains its own SLFN using only the data that is revealed to itself. On the other hand, the aim of the multiagent system is to train the SLFN at each agent as well as the optimal centralized batch SLFN that has access to all the data, by exchanging information between neighboring agents. We address this problem by introducing a distributed subgradient-based extreme learning machine algorithm. The proposed algorithm provides guaranteed upper bounds on the performance of the SLFN at each agent and shows that each of these individual SLFNs asymptotically achieves the performance of the optimal centralized batch SLFN. Our performance guarantees explicitly distinguish the effects of data- and network-dependent parameters on the convergence rate of the proposed algorithm. The experimental results illustrate that the proposed algorithm achieves the oracle performance significantly faster than the state-of-the-art methods in the machine learning and signal processing literature. Hence, the proposed method is highly appealing for the applications involving big data.
Adaptive, Distributed Control of Constrained Multi-Agent Systems
NASA Technical Reports Server (NTRS)
Bieniawski, Stefan; Wolpert, David H.
2004-01-01
Product Distribution (PO) theory was recently developed as a broad framework for analyzing and optimizing distributed systems. Here we demonstrate its use for adaptive distributed control of Multi-Agent Systems (MASS), i.e., for distributed stochastic optimization using MAS s. First we review one motivation of PD theory, as the information-theoretic extension of conventional full-rationality game theory to the case of bounded rational agents. In this extension the equilibrium of the game is the optimizer of a Lagrangian of the (Probability dist&&on on the joint state of the agents. When the game in question is a team game with constraints, that equilibrium optimizes the expected value of the team game utility, subject to those constraints. One common way to find that equilibrium is to have each agent run a Reinforcement Learning (E) algorithm. PD theory reveals this to be a particular type of search algorithm for minimizing the Lagrangian. Typically that algorithm i s quite inefficient. A more principled alternative is to use a variant of Newton's method to minimize the Lagrangian. Here we compare this alternative to RL-based search in three sets of computer experiments. These are the N Queen s problem and bin-packing problem from the optimization literature, and the Bar problem from the distributed RL literature. Our results confirm that the PD-theory-based approach outperforms the RL-based scheme in all three domains.
Argumentation Based Joint Learning: A Novel Ensemble Learning Approach
Xu, Junyi; Yao, Li; Li, Le
2015-01-01
Recently, ensemble learning methods have been widely used to improve classification performance in machine learning. In this paper, we present a novel ensemble learning method: argumentation based multi-agent joint learning (AMAJL), which integrates ideas from multi-agent argumentation, ensemble learning, and association rule mining. In AMAJL, argumentation technology is introduced as an ensemble strategy to integrate multiple base classifiers and generate a high performance ensemble classifier. We design an argumentation framework named Arena as a communication platform for knowledge integration. Through argumentation based joint learning, high quality individual knowledge can be extracted, and thus a refined global knowledge base can be generated and used independently for classification. We perform numerous experiments on multiple public datasets using AMAJL and other benchmark methods. The results demonstrate that our method can effectively extract high quality knowledge for ensemble classifier and improve the performance of classification. PMID:25966359
NASA Astrophysics Data System (ADS)
Wang, W.; Wang, D.; Peng, Z. H.
2017-09-01
Without assuming that the communication topologies among the neural network (NN) weights are to be undirected and the states of each agent are measurable, the cooperative learning NN output feedback control is addressed for uncertain nonlinear multi-agent systems with identical structures in strict-feedback form. By establishing directed communication topologies among NN weights to share their learned knowledge, NNs with cooperative learning laws are employed to identify the uncertainties. By designing NN-based κ-filter observers to estimate the unmeasurable states, a new cooperative learning output feedback control scheme is proposed to guarantee that the system outputs can track nonidentical reference signals with bounded tracking errors. A simulation example is given to demonstrate the effectiveness of the theoretical results.
ERIC Educational Resources Information Center
Hradnansky, Terre A.
This practicum was designed to increase parental involvement and parental support in the area of interactive mathematics homework by helping parents to better understand their role and responsibilities towards helping their child with the interactive math homework that reinforces the curriculum. Family math meetings were offered and follow-up…
Self-Learning Power Control in Wireless Sensor Networks.
Chincoli, Michele; Liotta, Antonio
2018-01-27
Current trends in interconnecting myriad smart objects to monetize on Internet of Things applications have led to high-density communications in wireless sensor networks. This aggravates the already over-congested unlicensed radio bands, calling for new mechanisms to improve spectrum management and energy efficiency, such as transmission power control. Existing protocols are based on simplistic heuristics that often approach interference problems (i.e., packet loss, delay and energy waste) by increasing power, leading to detrimental results. The scope of this work is to investigate how machine learning may be used to bring wireless nodes to the lowest possible transmission power level and, in turn, to respect the quality requirements of the overall network. Lowering transmission power has benefits in terms of both energy consumption and interference. We propose a protocol of transmission power control through a reinforcement learning process that we have set in a multi-agent system. The agents are independent learners using the same exploration strategy and reward structure, leading to an overall cooperative network. The simulation results show that the system converges to an equilibrium where each node transmits at the minimum power while respecting high packet reception ratio constraints. Consequently, the system benefits from low energy consumption and packet delay.
Self-Learning Power Control in Wireless Sensor Networks
Liotta, Antonio
2018-01-01
Current trends in interconnecting myriad smart objects to monetize on Internet of Things applications have led to high-density communications in wireless sensor networks. This aggravates the already over-congested unlicensed radio bands, calling for new mechanisms to improve spectrum management and energy efficiency, such as transmission power control. Existing protocols are based on simplistic heuristics that often approach interference problems (i.e., packet loss, delay and energy waste) by increasing power, leading to detrimental results. The scope of this work is to investigate how machine learning may be used to bring wireless nodes to the lowest possible transmission power level and, in turn, to respect the quality requirements of the overall network. Lowering transmission power has benefits in terms of both energy consumption and interference. We propose a protocol of transmission power control through a reinforcement learning process that we have set in a multi-agent system. The agents are independent learners using the same exploration strategy and reward structure, leading to an overall cooperative network. The simulation results show that the system converges to an equilibrium where each node transmits at the minimum power while respecting high packet reception ratio constraints. Consequently, the system benefits from low energy consumption and packet delay. PMID:29382072
Watson, Richard A; Mills, Rob; Buckley, C L
2011-01-01
In some circumstances complex adaptive systems composed of numerous self-interested agents can self-organize into structures that enhance global adaptation, efficiency, or function. However, the general conditions for such an outcome are poorly understood and present a fundamental open question for domains as varied as ecology, sociology, economics, organismic biology, and technological infrastructure design. In contrast, sufficient conditions for artificial neural networks to form structures that perform collective computational processes such as associative memory/recall, classification, generalization, and optimization are well understood. Such global functions within a single agent or organism are not wholly surprising, since the mechanisms (e.g., Hebbian learning) that create these neural organizations may be selected for this purpose; but agents in a multi-agent system have no obvious reason to adhere to such a structuring protocol or produce such global behaviors when acting from individual self-interest. However, Hebbian learning is actually a very simple and fully distributed habituation or positive feedback principle. Here we show that when self-interested agents can modify how they are affected by other agents (e.g., when they can influence which other agents they interact with), then, in adapting these inter-agent relationships to maximize their own utility, they will necessarily alter them in a manner homologous with Hebbian learning. Multi-agent systems with adaptable relationships will thereby exhibit the same system-level behaviors as neural networks under Hebbian learning. For example, improved global efficiency in multi-agent systems can be explained by the inherent ability of associative memory to generalize by idealizing stored patterns and/or creating new combinations of subpatterns. Thus distributed multi-agent systems can spontaneously exhibit adaptive global behaviors in the same sense, and by the same mechanism, as with the organizational principles familiar in connectionist models of organismic learning.
Collective learning for the emergence of social norms in networked multiagent systems.
Yu, Chao; Zhang, Minjie; Ren, Fenghui
2014-12-01
Social norms such as social rules and conventions play a pivotal role in sustaining system order by regulating and controlling individual behaviors toward a global consensus in large-scale distributed systems. Systematic studies of efficient mechanisms that can facilitate the emergence of social norms enable us to build and design robust distributed systems, such as electronic institutions and norm-governed sensor networks. This paper studies the emergence of social norms via learning from repeated local interactions in networked multiagent systems. A collective learning framework, which imitates the opinion aggregation process in human decision making, is proposed to study the impact of agent local collective behaviors on the emergence of social norms in a number of different situations. In the framework, each agent interacts repeatedly with all of its neighbors. At each step, an agent first takes a best-response action toward each of its neighbors and then combines all of these actions into a final action using ensemble learning methods. Extensive experiments are carried out to evaluate the framework with respect to different network topologies, learning strategies, numbers of actions, influences of nonlearning agents, and so on. Experimental results reveal some significant insights into the manipulation and control of norm emergence in networked multiagent systems achieved through local collective behaviors.
Construction of a Learning Agent Handling Its Rewards According to Environmental Situations
NASA Astrophysics Data System (ADS)
Moriyama, Koichi; Numao, Masayuki
The authors aim at constructing an agent which learns appropriate actions in a Multi-Agent environment with and without social dilemmas. For this aim, the agent must have nonrationality that makes it give up its own profit when it should do that. Since there are many studies on rational learning that brings more and more profit, it is desirable to utilize them for constructing the agent. Therefore, we use a reward-handling manner that makes internal evaluation from the agent's rewards, and then the agent learns actions by a rational learning method with the internal evaluation. If the agent has only a fixed manner, however, it does not act well in the environment with and without dilemmas. Thus, the authors equip the agent with several reward-handling manners and criteria for selecting an effective one for the environmental situation. In the case of humans, what generates the internal evaluation is usually called emotion. Hence, this study also aims at throwing light on emotional activities of humans from a constructive view. In this paper, we divide a Multi-Agent environment into three situations and construct an agent having the reward-handling manners and the criteria. We observe that the agent acts well in all the three Multi-Agent situations composed of homogeneous agents.
Enhanced risk management by an emerging multi-agent architecture
NASA Astrophysics Data System (ADS)
Lin, Sin-Jin; Hsu, Ming-Fu
2014-07-01
Classification in imbalanced datasets has attracted much attention from researchers in the field of machine learning. Most existing techniques tend not to perform well on minority class instances when the dataset is highly skewed because they focus on minimising the forecasting error without considering the relative distribution of each class. This investigation proposes an emerging multi-agent architecture, grounded on cooperative learning, to solve the class-imbalanced classification problem. Additionally, this study deals further with the obscure nature of the multi-agent architecture and expresses comprehensive rules for auditors. The results from this study indicate that the presented model performs satisfactorily in risk management and is able to tackle a highly class-imbalanced dataset comparatively well. Furthermore, the knowledge visualised process, supported by real examples, can assist both internal and external auditors who must allocate limited detecting resources; they can take the rules as roadmaps to modify the auditing programme.
Adaptivity in Agent-Based Routing for Data Networks
NASA Technical Reports Server (NTRS)
Wolpert, David H.; Kirshner, Sergey; Merz, Chris J.; Turner, Kagan
2000-01-01
Adaptivity, both of the individual agents and of the interaction structure among the agents, seems indispensable for scaling up multi-agent systems (MAS s) in noisy environments. One important consideration in designing adaptive agents is choosing their action spaces to be as amenable as possible to machine learning techniques, especially to reinforcement learning (RL) techniques. One important way to have the interaction structure connecting agents itself be adaptive is to have the intentions and/or actions of the agents be in the input spaces of the other agents, much as in Stackelberg games. We consider both kinds of adaptivity in the design of a MAS to control network packet routing. We demonstrate on the OPNET event-driven network simulator the perhaps surprising fact that simply changing the action space of the agents to be better suited to RL can result in very large improvements in their potential performance: at their best settings, our learning-amenable router agents achieve throughputs up to three and one half times better than that of the standard Bellman-Ford routing algorithm, even when the Bellman-Ford protocol traffic is maintained. We then demonstrate that much of that potential improvement can be realized by having the agents learn their settings when the agent interaction structure is itself adaptive.
Observer-based distributed adaptive iterative learning control for linear multi-agent systems
NASA Astrophysics Data System (ADS)
Li, Jinsha; Liu, Sanyang; Li, Junmin
2017-10-01
This paper investigates the consensus problem for linear multi-agent systems from the viewpoint of two-dimensional systems when the state information of each agent is not available. Observer-based fully distributed adaptive iterative learning protocol is designed in this paper. A local observer is designed for each agent and it is shown that without using any global information about the communication graph, all agents achieve consensus perfectly for all undirected connected communication graph when the number of iterations tends to infinity. The Lyapunov-like energy function is employed to facilitate the learning protocol design and property analysis. Finally, simulation example is given to illustrate the theoretical analysis.
Learning Science in Small Multi-Age Groups: The Role of Age Composition
ERIC Educational Resources Information Center
Kallery, Maria; Loupidou, Thomais
2016-01-01
The present study examines how the overall cognitive achievements in science of the younger children in a class where the students work in small multi-age groups are influenced by the number of older children in the groups. The context of the study was early-years education. The study has two parts: The first part involved classes attended by…
ERIC Educational Resources Information Center
Logue, Mary Ellin
2006-01-01
This article presents an action research conducted by a group of teachers comparing multiage with same-age interactions of children, especially among toddlers. The research involving 31 children ranging in age from two through five-and-a-half was conducted under optimal conditions, with small groups, low teacher-child ratios, and highly trained…
Learning other agents` preferences in multiagent negotiation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bui, H.H.; Kieronska, D.; Venkatesh, S.
In multiagent systems, an agent does not usually have complete information about the preferences and decision making processes of other agents. This might prevent the agents from making coordinated choices, purely due to their ignorance of what others want. This paper describes the integration of a learning module into a communication-intensive negotiating agent architecture. The learning module gives the agents the ability to learn about other agents` preferences via past interactions. Over time, the agents can incrementally update their models of other agents` preferences and use them to make better coordinated decisions. Combining both communication and learning, as two complementmore » knowledge acquisition methods, helps to reduce the amount of communication needed on average, and is justified in situations where communication is computationally costly or simply not desirable (e.g. to preserve the individual privacy).« less
A Multi-Agent Question-Answering System for E-Learning and Collaborative Learning Environment
ERIC Educational Resources Information Center
Alinaghi, Tannaz; Bahreininejad, Ardeshir
2011-01-01
The increasing advances of new Internet technologies in all application domains have changed life styles and interactions. E-learning and collaborative learning environment systems are originated through such changes and aim at providing facilities for people in different times and geographical locations to cooperate, collaborate, learn and work…
Analysis of Foreign Exchange Interventions by Intervention Agent with an Artificial Market Approach
NASA Astrophysics Data System (ADS)
Matsui, Hiroki; Tojo, Satoshi
We propose a multi-agent system which learns intervention policies and evaluates the effect of interventions in an artificial foreign exchange market. Izumi et al. had presented a system called AGEDASI TOF to simulate artificial market, together with a support system for the government to decide foreign exchange policies. However, the system needed to fix the amount of governmental intervention prior to the simulation, and was not realistic. In addition, the interventions in the system did not affect supply and demand of currencies; thus we could not discuss the effect of intervention correctly. First, we improve the system so as to make much of the weights of influential factors. Thereafter, we introduce an intervention agent that has the role of the central bank to stabilize the market. We could show that the agent learned the effective intervention policies through the reinforcement learning, and that the exchange rate converged to a certain extent in the expected range. We could also estimate the amount of intervention, showing the efficacy of signaling. In this model, in order to investigate the aliasing of the perception of the intervention agent, we introduced a pseudo-agent who was supposed to be able to observe all the behaviors of dealer agents; with this super-agent, we discussed the adequate granularity for a market state description.
Pang, Shaoning; Ban, Tao; Kadobayashi, Youki; Kasabov, Nikola K
2012-04-01
To adapt linear discriminant analysis (LDA) to real-world applications, there is a pressing need to equip it with an incremental learning ability to integrate knowledge presented by one-pass data streams, a functionality to join multiple LDA models to make the knowledge sharing between independent learning agents more efficient, and a forgetting functionality to avoid reconstruction of the overall discriminant eigenspace caused by some irregular changes. To this end, we introduce two adaptive LDA learning methods: LDA merging and LDA splitting. These provide the benefits of ability of online learning with one-pass data streams, retained class separability identical to the batch learning method, high efficiency for knowledge sharing due to condensed knowledge representation by the eigenspace model, and more preferable time and storage costs than traditional approaches under common application conditions. These properties are validated by experiments on a benchmark face image data set. By a case study on the application of the proposed method to multiagent cooperative learning and system alternation of a face recognition system, we further clarified the adaptability of the proposed methods to complex dynamic learning tasks.
Learning from Multiple Collaborating Intelligent Tutors: An Agent-based Approach.
ERIC Educational Resources Information Center
Solomos, Konstantinos; Avouris, Nikolaos
1999-01-01
Describes an open distributed multi-agent tutoring system (MATS) and discusses issues related to learning in such open environments. Topics include modeling a one student-many teachers approach in a computer-based learning context; distributed artificial intelligence; implementation issues; collaboration; and user interaction. (Author/LRW)
ERIC Educational Resources Information Center
Skoning, Stacey
2010-01-01
This article demonstrates how the use of creative movement and dance offers effective instructional strategies to meet the diverse learning needs of students in an inclusive classroom. Every day in one multi-age, fully inclusive classroom, students are meaningfully engaged in learning through movement--they move to learn science, social studies,…
Creating Tesselations with Pavement Chalk: Implementing Best Practices in Mathematics
ERIC Educational Resources Information Center
Furner, Joseph M.; Goodman, Barbara; Meeks, Shirley
2004-01-01
Implementing best practices like cooperative learning, using concrete manipulatives, problem solving, technology, active learning, multi-age grouping, and team teaching have shown benefits for students when learning mathematics concepts within the curriculum (Zemelman, Daniels & Hyde, 1998; NCTM, 2000). What started as a professional development…
A Self-Adaptive Multi-Agent System Approach for Collaborative Mobile Learning
ERIC Educational Resources Information Center
de la Iglesia, Didac Gil; Calderon, Juan Felipe; Weyns, Danny; Milrad, Marcelo; Nussbaum, Miguel
2015-01-01
Mobile technologies have emerged as facilitators in the learning process, extending traditional classroom activities. However, engineering mobile learning applications for outdoor usage poses severe challenges. The requirements of these applications are challenging, as many different aspects need to be catered, such as resource access and sharing,…
An Analysis on a Negotiation Model Based on Multiagent Systems with Symbiotic Learning and Evolution
NASA Astrophysics Data System (ADS)
Hossain, Md. Tofazzal
This study explores an evolutionary analysis on a negotiation model based on Masbiole (Multiagent Systems with Symbiotic Learning and Evolution) which has been proposed as a new methodology of Multiagent Systems (MAS) based on symbiosis in the ecosystem. In Masbiole, agents evolve in consideration of not only their own benefits and losses, but also the benefits and losses of opponent agents. To aid effective application of Masbiole, we develop a competitive negotiation model where rigorous and advanced intelligent decision-making mechanisms are required for agents to achieve solutions. A Negotiation Protocol is devised aiming at developing a set of rules for agents' behavior during evolution. Simulations use a newly developed evolutionary computing technique, called Genetic Network Programming (GNP) which has the directed graph-type gene structure that can develop and design the required intelligent mechanisms for agents. In a typical scenario, competitive negotiation solutions are reached by concessions that are usually predetermined in the conventional MAS. In this model, however, not only concession is determined automatically by symbiotic evolution (making the system intelligent, automated, and efficient) but the solution also achieves Pareto optimal automatically.
Intervening or Ignoring: Learning about Teaching in New Times
ERIC Educational Resources Information Center
Blaise, Mindy; Elsden-Clifton, Jennifer
2007-01-01
In response to the rise of collaborative learning within education, two teacher educators redesigned their courses to explore the complexities of pedagogy within a New Learning framework. Multi-age grouping provided opportunities for pre-service teachers to work with others from different year levels on an interdisciplinary assessment task. As a…
Designing Agent Utilities for Coordinated, Scalable and Robust Multi-Agent Systems
NASA Technical Reports Server (NTRS)
Tumer, Kagan
2005-01-01
Coordinating the behavior of a large number of agents to achieve a system level goal poses unique design challenges. In particular, problems of scaling (number of agents in the thousands to tens of thousands), observability (agents have limited sensing capabilities), and robustness (the agents are unreliable) make it impossible to simply apply methods developed for small multi-agent systems composed of reliable agents. To address these problems, we present an approach based on deriving agent goals that are aligned with the overall system goal, and can be computed using information readily available to the agents. Then, each agent uses a simple reinforcement learning algorithm to pursue its own goals. Because of the way in which those goals are derived, there is no need to use difficult to scale external mechanisms to force collaboration or coordination among the agents, or to ensure that agents actively attempt to appropriate the tasks of agents that suffered failures. To present these results in a concrete setting, we focus on the problem of finding the sub-set of a set of imperfect devices that results in the best aggregate device. This is a large distributed agent coordination problem where each agent (e.g., device) needs to determine whether to be part of the aggregate device. Our results show that the approach proposed in this work provides improvements of over an order of magnitude over both traditional search methods and traditional multi-agent methods. Furthermore, the results show that even in extreme cases of agent failures (i.e., half the agents failed midway through the simulation) the system's performance degrades gracefully and still outperforms a failure-free and centralized search algorithm. The results also show that the gains increase as the size of the system (e.g., number of agents) increases. This latter result is particularly encouraging and suggests that this method is ideally suited for domains where the number of agents is currently in the thousands and will reach tens or hundreds of thousands in the near future.
NASA Astrophysics Data System (ADS)
Narayan Ray, Dip; Majumder, Somajyoti
2014-07-01
Several attempts have been made by the researchers around the world to develop a number of autonomous exploration techniques for robots. But it has been always an important issue for developing the algorithm for unstructured and unknown environments. Human-like gradual Multi-agent Q-leaming (HuMAQ) is a technique developed for autonomous robotic exploration in unknown (and even unimaginable) environments. It has been successfully implemented in multi-agent single robotic system. HuMAQ uses the concept of Subsumption architecture, a well-known Behaviour-based architecture for prioritizing the agents of the multi-agent system and executes only the most common action out of all the different actions recommended by different agents. Instead of using new state-action table (Q-table) each time, HuMAQ uses the immediate past table for efficient and faster exploration. The proof of learning has also been established both theoretically and practically. HuMAQ has the potential to be used in different and difficult situations as well as applications. The same architecture has been modified to use for multi-robot exploration in an environment. Apart from all other existing agents used in the single robotic system, agents for inter-robot communication and coordination/ co-operation with the other similar robots have been introduced in the present research. Current work uses a series of indigenously developed identical autonomous robotic systems, communicating with each other through ZigBee protocol.
ERIC Educational Resources Information Center
Trevors, Gregory; Duffy, Melissa; Azevedo, Roger
2014-01-01
Hypermedia learning environments (HLE) unevenly present new challenges and opportunities to learning processes and outcomes depending on learner characteristics and instructional supports. In this experimental study, we examined how one such HLE--MetaTutor, an intelligent, multi-agent tutoring system designed to scaffold cognitive and…
TSI-Enhanced Pedagogical Agents to Engage Learners in Virtual Worlds
ERIC Educational Resources Information Center
Leung, Steve; Virwaney, Sandeep; Lin, Fuhua; Armstrong, AJ; Dubbelboer, Adien
2013-01-01
Building pedagogical applications in virtual worlds is a multi-disciplinary endeavor that involves learning theories, application development framework, and mediated communication theories. This paper presents a project that integrates game-based learning, multi-agent system architecture (MAS), and the theory of Transformed Social Interaction…
A study on expertise of agents and its effects on cooperative Q-learning.
Araabi, Babak Nadjar; Mastoureshgh, Sahar; Ahmadabadi, Majid Nili
2007-04-01
Cooperation in learning (CL) can be realized in a multiagent system, if agents are capable of learning from both their own experiments and other agents' knowledge and expertise. Extra resources are exploited into higher efficiency and faster learning in CL as compared to that of individual learning (IL). In the real world, however, implementation of CL is not a straightforward task, in part due to possible differences in area of expertise (AOE). In this paper, reinforcement-learning homogenous agents are considered in an environment with multiple goals or tasks. As a result, they become expert in different domains with different amounts of expertness. Each agent uses a one-step Q-learning algorithm and is capable of exchanging its Q-table with those of its teammates. Two crucial questions are addressed in this paper: "How the AOE of an agent can be extracted?" and "How agents can improve their performance in CL by knowing their AOEs?" An algorithm is developed to extract the AOE based on state transitions as a gold standard from a behavioral point of view. Moreover, it is discussed that the AOE can be implicitly obtained through agents' expertness in the state level. Three new methods for CL through the combination of Q-tables are developed and examined for overall performance after CL. The performances of developed methods are compared with that of IL, strategy sharing (SS), and weighted SS (WSS). Obtained results show the superior performance of AOE-based methods as compared to that of existing CL methods, which do not use the notion of AOE. These results are very encouraging in support of the idea that "cooperation based on the AOE" performs better than the general CL methods.
A Multi-Agent System Approach for Distance Learning Architecture
ERIC Educational Resources Information Center
Turgay, Safiye
2005-01-01
The goal of this study is to suggest the agent systems by intelligence and adaptability properties in distance learning environment. The suggested system has flexible, agile, intelligence and cooperation features. System components are teachers, students (learners), and resources. Inter component relations are modeled and reviewed by using the…
NASA Astrophysics Data System (ADS)
Chen, Jiaxi; Li, Junmin
2018-02-01
In this paper, we investigate the perfect consensus problem for second-order linearly parameterised multi-agent systems (MAS) with imprecise communication topology structure. Takagi-Sugeno (T-S) fuzzy models are presented to describe the imprecise communication topology structure of leader-following MAS, and a distributed adaptive iterative learning control protocol is proposed with the dynamic of leader unknown to any of the agent. The proposed protocol guarantees that the follower agents can track the leader perfectly on [0,T] for the consensus problem. Under alignment condition, a sufficient condition of the consensus for closed-loop MAS is given based on Lyapunov stability theory. Finally, a numerical example and a multiple pendulum system are given to illustrate the effectiveness of the proposed algorithm.
Hybrid Multiagent System for Automatic Object Learning Classification
NASA Astrophysics Data System (ADS)
Gil, Ana; de La Prieta, Fernando; López, Vivian F.
The rapid evolution within the context of e-learning is closely linked to international efforts on the standardization of learning object metadata, which provides learners in a web-based educational system with ubiquitous access to multiple distributed repositories. This article presents a hybrid agent-based architecture that enables the recovery of learning objects tagged in Learning Object Metadata (LOM) and provides individualized help with selecting learning materials to make the most suitable choice among many alternatives.
ERIC Educational Resources Information Center
Hoppe, H. Ulrich
2016-01-01
The 1998 paper by Martin Mühlenbrock, Frank Tewissen, and myself introduced a multi-agent architecture and a component engineering approach for building open distributed learning environments to support group learning in different types of classroom settings. It took up prior work on "multiple student modeling" as a method to configure…
Hierarchical Reinforcement in Continuous State and Multi-Agent Environments
2005-09-01
Mahadevan, Chair Andrew G. Barto, Member Victor R . Lesser, Member Weibo Gong, Member W. Bruce Croft, Department Chair Computer Science To my parents...of my cubicle during my unwanted one-year absence. Thank you Colin Barringer , Jad Davis, Andy Fagg, Jeffrey Johns, Anders Jonsson, George Konidaris...transforma- tions of the problem. One definition is that an MDP model M consists of five elements 〈S,A,P , R , I〉 defined as follows:1 • S: is the set of
Coordinating Decentralized Learning and Conflict Resolution across Agent Boundaries
ERIC Educational Resources Information Center
Cheng, Shanjun
2012-01-01
It is crucial for embedded systems to adapt to the dynamics of open environments. This adaptation process becomes especially challenging in the context of multiagent systems because of scalability, partial information accessibility and complex interaction of agents. It is a challenge for agents to learn good policies, when they need to plan and…
An Intentional Laboratory: The San Carlos Charter Learning Center.
ERIC Educational Resources Information Center
Darwish, Elise
2000-01-01
Describes the San Carlos Charter Learning Center, a K-8 school chartered by the San Carlos, California, school district to be a research and development site. It has successfully shared practices in multi-age groupings, interdisciplinary instruction, parents as teachers, and staff evaluation. The article expands on the school's challenges and…
Therrien, Amanda S; Wolpert, Daniel M; Bastian, Amy J
2016-01-01
Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. © The Author (2015). Published by Oxford University Press on behalf of the Guarantors of Brain.
Therrien, Amanda S.; Wolpert, Daniel M.
2016-01-01
Abstract See Miall and Galea (doi: 10.1093/awv343 ) for a scientific commentary on this article. Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. PMID:26626368
Apprenticeship Learning: Learning to Schedule from Human Experts
2016-06-09
approaches to learning such models are based on Markov models, such as reinforcement learning or inverse reinforcement learning (Busoniu, Babuska, and De...via inverse reinforcement learning. In ICML. Barto, A. G., and Mahadevan, S. 2003. Recent advances in hierarchical reinforcement learning. Discrete...of tasks with temporal constraints. In Proc. AAAI, 2110–2116. Odom, P., and Natarajan, S. 2015. Active advice seeking for inverse reinforcement
Literacy: Learning and Loving It!
ERIC Educational Resources Information Center
Hurd, Molly
2017-01-01
Halifax Independent School is a small K-9 school where children learn in multi-age groups and in a setting where all the core subjects are integrated into the study of interesting themes such as Oceans, Nova Scotia, and Living Things. These themes last a whole year, involve the whole elementary school, and are designed to cover traditional subject…
ICPL: Intelligent Cooperative Planning and Learning for Multi-agent Systems
2012-02-29
objective was to develop a new planning approach for teams!of multiple UAVs that tightly integrates learning and cooperative!control algorithms at... algorithms at multiple levels of the planning architecture. The research results enabled a team of mobile agents to learn to adapt and react to uncertainty in...expressive representation that incorporates feature conjunctions. Our algorithm is simple to implement, fast to execute, and can be combined with any
Future applications of artificial intelligence to Mission Control Centers
NASA Technical Reports Server (NTRS)
Friedland, Peter
1991-01-01
Future applications of artificial intelligence to Mission Control Centers are presented in the form of the viewgraphs. The following subject areas are covered: basic objectives of the NASA-wide AI program; inhouse research program; constraint-based scheduling; learning and performance improvement for scheduling; GEMPLAN multi-agent planner; planning, scheduling, and control; Bayesian learning; efficient learning algorithms; ICARUS (an integrated architecture for learning); design knowledge acquisition and retention; computer-integrated documentation; and some speculation on future applications.
Multi-Agent Inference in Social Networks: A Finite Population Learning Approach.
Fan, Jianqing; Tong, Xin; Zeng, Yao
When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people's incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning , to address whether with high probability, a large fraction of people in a given finite population network can make "good" inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows.
ERIC Educational Resources Information Center
Booker, Angela; Montgomery-Block, Kindra; Scott, Zenae; Reyes, bel; Onyewuenyi, Adaurennaya
2011-01-01
This article reports on a collaborative partnership, based in principles of public scholarship and designed to serve local, at-risk or high-risk youth. The program is a six-week summer service-learning initiative in the Sacramento, California, area developed for transitioning 9th grade students through a multi-agency partnership. The project…
ERIC Educational Resources Information Center
Anu, Liljeström; Jorma, Enkenberg; Sinikka, Pöllänen
2014-01-01
This paper presents a case study in which multi-age students (aged 6-12, N?=?32) in small groups made autonomous inquiries about the phenomenon of winter fishing within the framework of design-oriented pedagogy. The research analyzed storytelling videos that the students produced as learning objects. These videos revealed a picture of the…
ERIC Educational Resources Information Center
Leonard, Jacqueline; Chamberlin, Scott A.; Johnson, Joy B.; Verma, Geeta
2016-01-01
In this paper, results from a 2-year informal science education study are presented. Children (aged 8-12) in this study participated in multi-aged groups to learn science within the context of paleontology and climate change. The goals of the project were to increase science content knowledge among underrepresented minority students and to enhance…
Control Theoretic Modeling for Uncertain Cultural Attitudes and Unknown Adversarial Intent
2009-02-01
Constructive computational tools. 15. SUBJECT TERMS social learning, social networks , multiagent systems, game theory 16. SECURITY CLASSIFICATION OF: a...over- reactionary behaviors; 3) analysis of rational social learning in networks : analysis of belief propagation in social networks in various...general methodology as a predictive device for social network formation and for communication network formation with constraints on the lengths of
ERIC Educational Resources Information Center
Fyson, Rachel
2007-01-01
This paper outlines the key findings from a recent study of statutory service responses to young people with learning disabilities who show sexually inappropriate or abusive behaviours, with a particular focus on the involvement of criminal justice agencies. The study found that although inappropriate sexual behaviours were commonplace in special…
Layered Learning in Multi-Agent Systems
1998-12-15
project almost from the beginning has tirelessly experimented with different robot architectures, always managing to pull things together and create...TEAM MEMBER AGENT ARCHITECTURE I " ! Midfielder, Left : • i ) ( ^ J Goalie , Center Home Coordinates Home Range Max Range Figure
A computational neural model of goal-directed utterance selection.
Klein, Michael; Kamp, Hans; Palm, Guenther; Doya, Kenji
2010-06-01
It is generally agreed that much of human communication is motivated by extra-linguistic goals: we often make utterances in order to get others to do something, or to make them support our cause, or adopt our point of view, etc. However, thus far a computational foundation for this view on language use has been lacking. In this paper we propose such a foundation using Markov Decision Processes. We borrow computational components from the field of action selection and motor control, where a neurobiological basis of these components has been established. In particular, we make use of internal models (i.e., next-state transition functions defined on current state action pairs). The internal model is coupled with reinforcement learning of a value function that is used to assess the desirability of any state that utterances (as well as certain non-verbal actions) can bring about. This cognitive architecture is tested in a number of multi-agent game simulations. In these computational experiments an agent learns to predict the context-dependent effects of utterances by interacting with other agents that are already competent speakers. We show that the cognitive architecture can account for acquiring the capability of deciding when to speak in order to achieve a certain goal (instead of performing a non-verbal action or simply doing nothing), whom to address and what to say. Copyright 2010 Elsevier Ltd. All rights reserved.
Rational and Mechanistic Perspectives on Reinforcement Learning
ERIC Educational Resources Information Center
Chater, Nick
2009-01-01
This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…
Quinn, Emma; Johnstone, Travers; Najjar, Zeina; Cains, Toni; Tan, Geoff; Huhtinen, Essi; Nilsson, Sven; Burgess, Stuart; Dunn, Matthew; Gupta, Leena
2017-09-05
The incident command system (ICS) provides a common structure to control and coordinate an emergency response, regardless of scale or predicted impact. The lessons learned from the application of an ICS for large infectious disease outbreaks are documented. However, there is scant evidence on the application of an ICS to manage a local multiagency response to a disease cluster with environmental health risks. The Sydney Local Health District Public Health Unit (PHU) in New South Wales, Australia, was notified of 5 cases of Legionnaires' disease during 2 weeks in May 2016. This unusual incident triggered a multiagency investigation involving an ICS with staff from the PHU, 3 local councils, and the state health department to help prevent any further public health risk. The early and judicious use of ICS enabled a timely and effective response by supporting clear communication lines between the incident controller and field staff. The field team was key in preventing any ongoing public health risk through inspection, sampling, testing, and management of water systems identified to be at-risk for transmission of legionella. Good working relationships between partner agencies and trust in the technical proficiency of environmental health staff aided in the effective management of the response. (Disaster Med Public Health Preparedness. 2017;page 1 of 4).
Negative reinforcement learning is affected in substance dependence.
Thompson, Laetitia L; Claus, Eric D; Mikulich-Gilbertson, Susan K; Banich, Marie T; Crowley, Thomas; Krmpotich, Theodore; Miller, David; Tanabe, Jody
2012-06-01
Negative reinforcement results in behavior to escape or avoid an aversive outcome. Withdrawal symptoms are purported to be negative reinforcers in perpetuating substance dependence, but little is known about negative reinforcement learning in this population. The purpose of this study was to examine reinforcement learning in substance dependent individuals (SDI), with an emphasis on assessing negative reinforcement learning. We modified the Iowa Gambling Task to separately assess positive and negative reinforcement. We hypothesized that SDI would show differences in negative reinforcement learning compared to controls and we investigated whether learning differed as a function of the relative magnitude or frequency of the reinforcer. Thirty subjects dependent on psychostimulants were compared with 28 community controls on a decision making task that manipulated outcome frequencies and magnitudes and required an action to avoid a negative outcome. SDI did not learn to avoid negative outcomes to the same degree as controls. This difference was driven by the magnitude, not the frequency, of negative feedback. In contrast, approach behaviors in response to positive reinforcement were similar in both groups. Our findings are consistent with a specific deficit in negative reinforcement learning in SDI. SDI were relatively insensitive to the magnitude, not frequency, of loss. If this generalizes to drug-related stimuli, it suggests that repeated episodes of withdrawal may drive relapse more than the severity of a single episode. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Davidow, Juliet Y; Foerde, Karin; Galván, Adriana; Shohamy, Daphna
2016-10-05
Adolescents are notorious for engaging in reward-seeking behaviors, a tendency attributed to heightened activity in the brain's reward systems during adolescence. It has been suggested that reward sensitivity in adolescence might be adaptive, but evidence of an adaptive role has been scarce. Using a probabilistic reinforcement learning task combined with reinforcement learning models and fMRI, we found that adolescents showed better reinforcement learning and a stronger link between reinforcement learning and episodic memory for rewarding outcomes. This behavioral benefit was related to heightened prediction error-related BOLD activity in the hippocampus and to stronger functional connectivity between the hippocampus and the striatum at the time of reinforcement. These findings reveal an important role for the hippocampus in reinforcement learning in adolescence and suggest that reward sensitivity in adolescence is related to adaptive differences in how adolescents learn from experience. Copyright © 2016 Elsevier Inc. All rights reserved.
The Crabapple Experience: Insights from Program Evaluations.
ERIC Educational Resources Information Center
Elmore, Randy; Wisenbaker, Joe
2000-01-01
An evaluation of a Georgia middle school's multi-age grouping program revealed significant progress regarding student self-esteem, achievement, community building, and teacher collaboration. The Crabapple experience illustrates how one model of student-centered, developmentally appropriate, and integrated learning can benefit middle-level…
Team Formation in Partially Observable Multi-Agent Systems
NASA Technical Reports Server (NTRS)
Agogino, Adrian K.; Tumer, Kagan
2004-01-01
Sets of multi-agent teams often need to maximize a global utility rating the performance of the entire system where a team cannot fully observe other teams agents. Such limited observability hinders team-members trying to pursue their team utilities to take actions that also help maximize the global utility. In this article, we show how team utilities can be used in partially observable systems. Furthermore, we show how team sizes can be manipulated to provide the best compromise between having easy to learn team utilities and having them aligned with the global utility, The results show that optimally sized teams in a partially observable environments outperform one team in a fully observable environment, by up to 30%.
NASA Astrophysics Data System (ADS)
Liljeström, Anu; Enkenberg, Jorma; Pöllänen, Sinikka
2013-03-01
This design experiment aimed to answer the question of how to mediate the practices of authentic science inquiries in primary education. An instructional approach based on activity theory was designed and carried out with multi-age students in a small village school. An open-ended learning task was offered to the older students. Their task was to design and implement instruction about the Ice Age to their younger fellows. The objective was collaborative learning among students, the teacher, and outside domain experts. Mobile phones and GPS technologies were applied as the main technological mediators in the learning process. Technology provided an opportunity to expand the learning environment outside the classroom, including the natural environment. Empirically, the goal was to answer the following questions: What kind of learning project emerged? How did the students' knowledge develop? What kinds of science learning processes, activities, and practices were represented? Multiple and parallel data were collected to achieve this aim. The data analysis revealed that the learning project both challenged the students to develop explanations for the phenomena and generated high quality conceptual and physical models in question. During the learning project, the roles of the community members were shaped, mixed, and integrated. The teacher also repeatedly evaluated and adjusted her behavior. The confidence of the learners in their abilities raised the quality of their learning outcomes. The findings showed that this instructional approach can not only mediate the kind of authentic practices that scientists apply but also make learning more holistic than it has been. Thus, it can be concluded that nature of the task, the tool-integrated collaborative inquiries in the natural environment, and the multiage setting can make learning whole.
Awata, Hiroko; Wakuda, Ryo; Ishimaru, Yoshiyasu; Matsuoka, Yuji; Terao, Kanta; Katata, Satomi; Matsumoto, Yukihisa; Hamanaka, Yoshitaka; Noji, Sumihare; Mito, Taro; Mizunami, Makoto
2016-01-01
Revealing reinforcing mechanisms in associative learning is important for elucidation of brain mechanisms of behavior. In mammals, dopamine neurons are thought to mediate both appetitive and aversive reinforcement signals. Studies using transgenic fruit-flies suggested that dopamine neurons mediate both appetitive and aversive reinforcements, through the Dop1 dopamine receptor, but our studies using octopamine and dopamine receptor antagonists and using Dop1 knockout crickets suggested that octopamine neurons mediate appetitive reinforcement and dopamine neurons mediate aversive reinforcement in associative learning in crickets. To fully resolve this issue, we examined the effects of silencing of expression of genes that code the OA1 octopamine receptor and Dop1 and Dop2 dopamine receptors by RNAi in crickets. OA1-silenced crickets exhibited impairment in appetitive learning with water but not in aversive learning with sodium chloride solution, while Dop1-silenced crickets exhibited impairment in aversive learning but not in appetitive learning. Dop2-silenced crickets showed normal scores in both appetitive learning and aversive learning. The results indicate that octopamine neurons mediate appetitive reinforcement via OA1 and that dopamine neurons mediate aversive reinforcement via Dop1 in crickets, providing decisive evidence that neurotransmitters and receptors that mediate appetitive reinforcement indeed differ among different species of insects. PMID:27412401
Awata, Hiroko; Wakuda, Ryo; Ishimaru, Yoshiyasu; Matsuoka, Yuji; Terao, Kanta; Katata, Satomi; Matsumoto, Yukihisa; Hamanaka, Yoshitaka; Noji, Sumihare; Mito, Taro; Mizunami, Makoto
2016-07-14
Revealing reinforcing mechanisms in associative learning is important for elucidation of brain mechanisms of behavior. In mammals, dopamine neurons are thought to mediate both appetitive and aversive reinforcement signals. Studies using transgenic fruit-flies suggested that dopamine neurons mediate both appetitive and aversive reinforcements, through the Dop1 dopamine receptor, but our studies using octopamine and dopamine receptor antagonists and using Dop1 knockout crickets suggested that octopamine neurons mediate appetitive reinforcement and dopamine neurons mediate aversive reinforcement in associative learning in crickets. To fully resolve this issue, we examined the effects of silencing of expression of genes that code the OA1 octopamine receptor and Dop1 and Dop2 dopamine receptors by RNAi in crickets. OA1-silenced crickets exhibited impairment in appetitive learning with water but not in aversive learning with sodium chloride solution, while Dop1-silenced crickets exhibited impairment in aversive learning but not in appetitive learning. Dop2-silenced crickets showed normal scores in both appetitive learning and aversive learning. The results indicate that octopamine neurons mediate appetitive reinforcement via OA1 and that dopamine neurons mediate aversive reinforcement via Dop1 in crickets, providing decisive evidence that neurotransmitters and receptors that mediate appetitive reinforcement indeed differ among different species of insects.
Oliveira, Emileane C; Hunziker, Maria Helena
2014-07-01
In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title. Copyright © 2014. Published by Elsevier B.V.
Rational and mechanistic perspectives on reinforcement learning.
Chater, Nick
2009-12-01
This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: mechanistic and rational. Reinforcement learning is often viewed in mechanistic terms--as describing the operation of aspects of an agent's cognitive and neural machinery. Yet it can also be viewed as a rational level of description, specifically, as describing a class of methods for learning from experience, using minimal background knowledge. This paper considers how rational and mechanistic perspectives differ, and what types of evidence distinguish between them. Reinforcement learning research in the cognitive and brain sciences is often implicitly committed to the mechanistic interpretation. Here the opposite view is put forward: that accounts of reinforcement learning should apply at the rational level, unless there is strong evidence for a mechanistic interpretation. Implications of this viewpoint for reinforcement-based theories in the cognitive and brain sciences are discussed.
Valenchon, Mathilde; Lévy, Frédéric; Moussu, Chantal; Lansade, Léa
2017-01-01
The present study investigated how stress affects instrumental learning performance in horses (Equus caballus) depending on the type of reinforcement. Horses were assigned to four groups (N = 15 per group); each group received training with negative or positive reinforcement in the presence or absence of stressors unrelated to the learning task. The instrumental learning task consisted of the horse entering one of two compartments at the appearance of a visual signal given by the experimenter. In the absence of stressors unrelated to the task, learning performance did not differ between negative and positive reinforcements. The presence of stressors unrelated to the task (exposure to novel and sudden stimuli) impaired learning performance. Interestingly, this learning deficit was smaller when the negative reinforcement was used. The negative reinforcement, considered as a stressor related to the task, could have counterbalanced the impact of the extrinsic stressor by focusing attention toward the learning task. In addition, learning performance appears to differ between certain dimensions of personality depending on the presence of stressors and the type of reinforcement. These results suggest that when negative reinforcement is used (i.e. stressor related to the task), the most fearful horses may be the best performers in the absence of stressors but the worst performers when stressors are present. On the contrary, when positive reinforcement is used, the most fearful horses appear to be consistently the worst performers, with and without exposure to stressors unrelated to the learning task. This study is the first to demonstrate in ungulates that stress affects learning performance differentially according to the type of reinforcement and in interaction with personality. It provides fundamental and applied perspectives in the understanding of the relationships between personality and training abilities. PMID:28475581
NASA Astrophysics Data System (ADS)
Talbot, C. A.; Ralph, M.; Jasperse, J.; Forbis, J.
2017-12-01
Lessons learned from the multi-agency Forecast-Informed Reservoir Operations (FIRO) effort demonstrate how research and observations can inform operations and policy decisions at Federal, State and Local water management agencies with the collaborative engagement and support of researchers, engineers, operators and stakeholders. The FIRO steering committee consists of scientists, engineers and operators from research and operational elements of the National Oceanographic and Atmospheric Administration and the US Army Corps of Engineers, researchers from the US Geological Survey and the US Bureau of Reclamation, the state climatologist from the California Department of Water Resources, the chief engineer from the Sonoma County Water Agency, and the director of the Scripps Institution of Oceanography's Center for Western Weather and Water Extremes at the University of California-San Diego. The FIRO framework also provides a means of testing and demonstrating the benefits of next-generation water cycle observations, understanding and models in water resources operations.
Multi-Agent Inference in Social Networks: A Finite Population Learning Approach
Tong, Xin; Zeng, Yao
2016-01-01
When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people’s incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning, to address whether with high probability, a large fraction of people in a given finite population network can make “good” inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows. PMID:27076691
Racial bias shapes social reinforcement learning.
Lindström, Björn; Selbing, Ida; Molapour, Tanaz; Olsson, Andreas
2014-03-01
Both emotional facial expressions and markers of racial-group belonging are ubiquitous signals in social interaction, but little is known about how these signals together affect future behavior through learning. To address this issue, we investigated how emotional (threatening or friendly) in-group and out-group faces reinforced behavior in a reinforcement-learning task. We asked whether reinforcement learning would be modulated by intergroup attitudes (i.e., racial bias). The results showed that individual differences in racial bias critically modulated reinforcement learning. As predicted, racial bias was associated with more efficiently learned avoidance of threatening out-group individuals. We used computational modeling analysis to quantitatively delimit the underlying processes affected by social reinforcement. These analyses showed that racial bias modulates the rate at which exposure to threatening out-group individuals is transformed into future avoidance behavior. In concert, these results shed new light on the learning processes underlying social interaction with racial-in-group and out-group individuals.
Hierarchically organized behavior and its neural foundations: A reinforcement-learning perspective
Botvinick, Matthew M.; Niv, Yael; Barto, Andrew C.
2009-01-01
Research on human and animal behavior has long emphasized its hierarchical structure — the divisibility of ongoing behavior into discrete tasks, which are comprised of subtask sequences, which in turn are built of simple actions. The hierarchical structure of behavior has also been of enduring interest within neuroscience, where it has been widely considered to reflect prefrontal cortical functions. In this paper, we reexamine behavioral hierarchy and its neural substrates from the point of view of recent developments in computational reinforcement learning. Specifically, we consider a set of approaches known collectively as hierarchical reinforcement learning, which extend the reinforcement learning paradigm by allowing the learning agent to aggregate actions into reusable subroutines or skills. A close look at the components of hierarchical reinforcement learning suggests how they might map onto neural structures, in particular regions within the dorsolateral and orbital prefrontal cortex. It also suggests specific ways in which hierarchical reinforcement learning might provide a complement to existing psychological models of hierarchically structured behavior. A particularly important question that hierarchical reinforcement learning brings to the fore is that of how learning identifies new action routines that are likely to provide useful building blocks in solving a wide range of future problems. Here and at many other points, hierarchical reinforcement learning offers an appealing framework for investigating the computational and neural underpinnings of hierarchically structured behavior. PMID:18926527
The Celebration School: A Model Learning Community.
ERIC Educational Resources Information Center
Ishler, Richard E.; Vogel, Bobbi
1996-01-01
A model professional development school (PDS) serves Celebration, Florida, a planned community built by the Disney Corporation. The K-12 Celebration School resulted from cooperation among the Osceola County School District, Stetson University, and Disney. In this PDS, featuring multiage groupings and individualized instruction, students, staff,…
Interprofessional E-Learning and Collaborative Work: Practices and Technologies
ERIC Educational Resources Information Center
Bromage, Adrian, Ed.; Clouder, Lynn, Ed.; Thistlethwaite, Jill, Ed.; Gordon, Frances, Ed.
2010-01-01
Interprofessionalism, an emerging model and philosophy of multi-disciplinary and multi-agency working, has in increasingly become an important means of cultivating joint endeavors across varied and diverse disciplinary and institutional settings. This book is therefore, an important source for understanding how interprofessionalism can be promoted…
Navigating complex decision spaces: Problems and paradigms in sequential choice
Walsh, Matthew M.; Anderson, John R.
2015-01-01
To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult when the consequences of an action follow a delay. This introduces the problem of temporal credit assignment. When feedback follows a sequence of decisions, how should the individual assign credit to the intermediate actions that comprise the sequence? Research in reinforcement learning provides two general solutions to this problem: model-free reinforcement learning and model-based reinforcement learning. In this review, we examine connections between stimulus-response and cognitive learning theories, habitual and goal-directed control, and model-free and model-based reinforcement learning. We then consider a range of problems related to temporal credit assignment. These include second-order conditioning and secondary reinforcers, latent learning and detour behavior, partially observable Markov decision processes, actions with distributed outcomes, and hierarchical learning. We ask whether humans and animals, when faced with these problems, behave in a manner consistent with reinforcement learning techniques. Throughout, we seek to identify neural substrates of model-free and model-based reinforcement learning. The former class of techniques is understood in terms of the neurotransmitter dopamine and its effects in the basal ganglia. The latter is understood in terms of a distributed network of regions including the prefrontal cortex, medial temporal lobes cerebellum, and basal ganglia. Not only do reinforcement learning techniques have a natural interpretation in terms of human and animal behavior, but they also provide a useful framework for understanding neural reward valuation and action selection. PMID:23834192
Model-Based Reinforcement Learning under Concurrent Schedules of Reinforcement in Rodents
ERIC Educational Resources Information Center
Huh, Namjung; Jo, Suhyun; Kim, Hoseok; Sul, Jung Hoon; Jung, Min Whan
2009-01-01
Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's…
GA-based fuzzy reinforcement learning for control of a magnetic bearing system.
Lin, C T; Jou, C P
2000-01-01
This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.
11.2 YIP Human In the Loop Statistical RelationalLearners
2017-10-23
learning formalisms including inverse reinforcement learning [4] and statistical relational learning [7, 5, 8]. We have also applied our algorithms in...one introduced for label preferences. 4 Figure 2: Active Advice Seeking for Inverse Reinforcement Learning. active advice seeking is in selecting the...learning tasks. 1.2.1 Sequential Decision-Making Our previous work on advice for inverse reinforcement learning (IRL) defined advice as action
DOT National Transportation Integrated Search
2011-07-01
The Interagency Transportation, Land Use, and Climate Change Pilot Project utilized a scenario planning process to develop a multi-agency transportation- and land use-focused development strategy for Cape Cod, Massachusetts, with the intention of ach...
RIDEing Vocabulary: Using Etienne Wenger's Community of Practice Theory to Master Word Use
ERIC Educational Resources Information Center
Schiera, Rachel
2016-01-01
Students' success in vocabulary learning is best gauged by authentic use of the targeted vocabulary in conversation and writing tasks. A vocabulary teaching approach that emphasizes meaningful repetition, relationship building, and concrete experiences encourages language development. This article explores a multi-age, multi-grade learning…
A Profile of the California Partnership Academies, 2004-2005
ERIC Educational Resources Information Center
ConnectEd: The California Center for College and Career, 2007
2007-01-01
State legislation launched the California Partnership Academies (CPAs) in 1984. Now operating in more than 200 comprehensive high schools, CPAs have been used as a model for high school reform in California and elsewhere. Academies typically feature multi-age learning groups, team teaching and career-based instruction. Teachers help students…
Druthers! A Collection of Viable Ideas from Rural Schools.
ERIC Educational Resources Information Center
Elliott, Richard D., Comp.
An individualized junior high school, a youth resources program that interweaves high school with supervised work experiences, multi-aged elementary family groupings that mainstream EMR (educable mentally retarded) children, and a single library room transformed into seven optional learning stations using a multi-channel audio system are real…
Multiagent Learning in the Presence of Agents with Limitations
2003-05-14
abstract domain. They examine the problem of adapting to a specific opponent in simulated robotic soccer (Noda, Matsubara, Hiraki , & Frank, 1998...Equilibrium points in n-person games. PNAS, 36, 48–49. Reprinted in (Kuhn, 1997). Noda, I., Matsubara, H., Hiraki , K., & Frank, I. (1998). Soccer server: a
Effects of dopamine on reinforcement learning and consolidation in Parkinson's disease.
Grogan, John P; Tsivos, Demitra; Smith, Laura; Knight, Brogan E; Bogacz, Rafal; Whone, Alan; Coulthard, Elizabeth J
2017-07-10
Emerging evidence suggests that dopamine may modulate learning and memory with important implications for understanding the neurobiology of memory and future therapeutic targeting. An influential hypothesis posits that dopamine biases reinforcement learning. More recent data also suggest an influence during both consolidation and retrieval. Eighteen Parkinson's disease patients learned through feedback ON or OFF medication, with memory tested 24 hr later ON or OFF medication (4 conditions, within-subjects design with matched healthy control group). Patients OFF medication during learning decreased in memory accuracy over the following 24 hr. In contrast to previous studies, however, dopaminergic medication during learning and testing did not affect expression of positive or negative reinforcement. Two further experiments were run without the 24 hr delay, but they too failed to reproduce effects of dopaminergic medication on reinforcement learning. While supportive of a dopaminergic role in consolidation, this study failed to replicate previous findings on reinforcement learning.
Enhanced Experience Replay for Deep Reinforcement Learning
2015-11-01
ARL-TR-7538 ● NOV 2015 US Army Research Laboratory Enhanced Experience Replay for Deep Reinforcement Learning by David Doria...Experience Replay for Deep Reinforcement Learning by David Doria, Bryan Dawson, and Manuel Vindiola Computational and Information Sciences Directorate...
2003-11-01
Lafayette, IN 47907. [Lane et al-97b] T. Lane and C . E. Brodley. Sequence matching and learning in anomaly detection for computer security. Proceedings of...Mining, pp 259-263. 1998. [Lane et al-98b] T. Lane and C . E. Brodley. Temporal sequence learning and data reduction for anomaly detection ...W. Lee, C . Park, and S. Stolfo. Towards Automatic Intrusion Detection using NFR. 1st USENIX Workshop on Intrusion Detection and Network Monitoring
Prespeech motor learning in a neural network using reinforcement.
Warlaumont, Anne S; Westermann, Gert; Buder, Eugene H; Oller, D Kimbrough
2013-02-01
Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one's language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the different conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network's post-learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network's post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model's post-learning productions were more likely to resemble the English vowels and vice versa. Copyright © 2012 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Redish, A. David; Jensen, Steve; Johnson, Adam; Kurth-Nelson, Zeb
2007-01-01
Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated value and are thus unable to capture renewal. TDRL…
Behavioral and neural properties of social reinforcement learning
Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Libby, Victoria; Glover, Gary; Voss, Henning U.; Ballon, Douglas J.; Casey, BJ
2011-01-01
Social learning is critical for engaging in complex interactions with other individuals. Learning from positive social exchanges, such as acceptance from peers, may be similar to basic reinforcement learning. We formally test this hypothesis by developing a novel paradigm that is based upon work in non-human primates and human imaging studies of reinforcement learning. The probability of receiving positive social reinforcement from three distinct peers was parametrically manipulated while brain activity was recorded in healthy adults using event-related functional magnetic resonance imaging (fMRI). Over the course of the experiment, participants responded more quickly to faces of peers who provided more frequent positive social reinforcement, and rated them as more likeable. Modeling trial-by-trial learning showed ventral striatum and orbital frontal cortex activity correlated positively with forming expectations about receiving social reinforcement. Rostral anterior cingulate cortex activity tracked positively with modulations of expected value of the cues (peers). Together, the findings across three levels of analysis - social preferences, response latencies and modeling neural responses – are consistent with reinforcement learning theory and non-human primate electrophysiological studies of reward. This work highlights the fundamental influence of acceptance by one’s peers in altering subsequent behavior. PMID:21917787
Ilango, A; Wetzel, W; Scheich, H; Ohl, F W
2010-03-31
Learned changes in behavior can be elicited by either appetitive or aversive reinforcers. It is, however, not clear whether the two types of motivation, (approaching appetitive stimuli and avoiding aversive stimuli) drive learning in the same or different ways, nor is their interaction understood in situations where the two types are combined in a single experiment. To investigate this question we have developed a novel learning paradigm for Mongolian gerbils, which not only allows rewards and punishments to be presented in isolation or in combination with each other, but also can use these opposite reinforcers to drive the same learned behavior. Specifically, we studied learning of tone-conditioned hurdle crossing in a shuttle box driven by either an appetitive reinforcer (brain stimulation reward) or an aversive reinforcer (electrical footshock), or by a combination of both. Combination of the two reinforcers potentiated speed of acquisition, led to maximum possible performance, and delayed extinction as compared to either reinforcer alone. Additional experiments, using partial reinforcement protocols and experiments in which one of the reinforcers was omitted after the animals had been previously trained with the combination of both reinforcers, indicated that appetitive and aversive reinforcers operated together but acted in different ways: in this particular experimental context, punishment appeared to be more effective for initial acquisition and reward more effective to maintain a high level of conditioned responses (CRs). The results imply that learning mechanisms in problem solving were maximally effective when the initial punishment of mistakes was combined with the subsequent rewarding of correct performance. Copyright 2010 IBRO. Published by Elsevier Ltd. All rights reserved.
Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers
ERIC Educational Resources Information Center
Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.
2014-01-01
Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…
The cerebellum: a neural system for the study of reinforcement learning.
Swain, Rodney A; Kerr, Abigail L; Thompson, Richard F
2011-01-01
In its strictest application, the term "reinforcement learning" refers to a computational approach to learning in which an agent (often a machine) interacts with a mutable environment to maximize reward through trial and error. The approach borrows essentials from several fields, most notably Computer Science, Behavioral Neuroscience, and Psychology. At the most basic level, a neural system capable of mediating reinforcement learning must be able to acquire sensory information about the external environment and internal milieu (either directly or through connectivities with other brain regions), must be able to select a behavior to be executed, and must be capable of providing evaluative feedback about the success of that behavior. Given that Psychology informs us that reinforcers, both positive and negative, are stimuli or consequences that increase the probability that the immediately antecedent behavior will be repeated and that reinforcer strength or viability is modulated by the organism's past experience with the reinforcer, its affect, and even the state of its muscles (e.g., eyes open or closed); it is the case that any neural system that supports reinforcement learning must also be sensitive to these same considerations. Once learning is established, such a neural system must finally be able to maintain continued response expression and prevent response drift. In this report, we examine both historical and recent evidence that the cerebellum satisfies all of these requirements. While we report evidence from a variety of learning paradigms, the majority of our discussion will focus on classical conditioning of the rabbit eye blink response as an ideal model system for the study of reinforcement and reinforcement learning.
"I Disagree!" Said a Second-Grader: Butterflies, Conflict, and Literate Thinking.
ERIC Educational Resources Information Center
Salyer, David M.
2000-01-01
Describes how an inquiry-based science project on the life cycle of butterflies provided a developmentally appropriate learning experience in a first and second grade multiage classroom. Maintains that the critical exchange of ideas among students made students' thinking available for inspection, and allowed students to use their talk as a tool…
Understanding Culture and Diversity: Australian Aboriginal Art
ERIC Educational Resources Information Center
Vize, Anne
2009-01-01
Australian Aboriginal culture is rich, complex and fascinating. The art of Aboriginal Australians shows a great understanding of the earth and its creatures. This article presents an activity which has been designed as a multi-age project. The learning outcomes have been written to suit both younger and older students. Aspects of the project could…
Children's Spontaneous Play in Writer's Workshop
ERIC Educational Resources Information Center
Lysaker, Judith T.; Wheat, Jennifer; Benson, Emily
2010-01-01
Research on the relationship between literacy and play has a rich history. Yet few studies have examined children's use of spontaneous play during literacy events as children are learning to read and write. This case study examines the use of play and the quality of playfulness in a kindergarten/first grade multiage classroom during Writer's…
Learning through Story: A Collaborative, Multimodal Arts Approach
ERIC Educational Resources Information Center
Barton, Georgina; Baguley, Margaret
2014-01-01
Literate practice in the arts encompasses both aesthetics and creativity. It is also multimodal in nature and often collaborative. This article presents data collected from a small multi-age school, with children from Prep to Year 7, during their preparation for an end-of-year show. The children had studied the topics of conservation and…
Student Query Trend Assessment with Semantical Annotation and Artificial Intelligent Multi-Agents
ERIC Educational Resources Information Center
Malik, Kaleem Razzaq; Mir, Rizwan Riaz; Farhan, Muhammad; Rafiq, Tariq; Aslam, Muhammad
2017-01-01
Research in era of data representation to contribute and improve key data policy involving the assessment of learning, training and English language competency. Students are required to communicate in English with high level impact using language and influence. The electronic technology works to assess students' questions positively enabling…
The role of GABAB receptors in human reinforcement learning.
Ort, Andres; Kometer, Michael; Rohde, Judith; Seifritz, Erich; Vollenweider, Franz X
2014-10-01
Behavioral evidence from human studies suggests that the γ-aminobutyric acid type B receptor (GABAB receptor) agonist baclofen modulates reinforcement learning and reduces craving in patients with addiction spectrum disorders. However, in contrast to the well established role of dopamine in reinforcement learning, the mechanisms by which the GABAB receptor influences reinforcement learning in humans remain completely unknown. To further elucidate this issue, a cross-over, double-blind, placebo-controlled study was performed in healthy human subjects (N=15) to test the effects of baclofen (20 and 50mg p.o.) on probabilistic reinforcement learning. Outcomes were the feedback-induced P2 component of the event-related potential, the feedback-related negativity, and the P300 component of the event-related potential. Baclofen produced a reduction of P2 amplitude over the course of the experiment, but did not modulate the feedback-related negativity. Furthermore, there was a trend towards increased learning after baclofen administration relative to placebo over the course of the experiment. The present results extend previous theories of reinforcement learning, which focus on the importance of mesolimbic dopamine signaling, and indicate that stimulation of cortical GABAB receptors in a fronto-parietal network leads to better attentional allocation in reinforcement learning. This observation is a first step in our understanding of how baclofen may improve reinforcement learning in healthy subjects. Further studies with bigger sample sizes are needed to corroborate this conclusion and furthermore, test this effect in patients with addiction spectrum disorder. Copyright © 2014 Elsevier B.V. and ECNP. All rights reserved.
Chronic Heart Failure Follow-up Management Based on Agent Technology.
Mohammadzadeh, Niloofar; Safdari, Reza
2015-10-01
Monitoring heart failure patients through continues assessment of sign and symptoms by information technology tools lead to large reduction in re-hospitalization. Agent technology is one of the strongest artificial intelligence areas; therefore, it can be expected to facilitate, accelerate, and improve health services especially in home care and telemedicine. The aim of this article is to provide an agent-based model for chronic heart failure (CHF) follow-up management. This research was performed in 2013-2014 to determine appropriate scenarios and the data required to monitor and follow-up CHF patients, and then an agent-based model was designed. Agents in the proposed model perform the following tasks: medical data access, communication with other agents of the framework and intelligent data analysis, including medical data processing, reasoning, negotiation for decision-making, and learning capabilities. The proposed multi-agent system has ability to learn and thus improve itself. Implementation of this model with more and various interval times at a broader level could achieve better results. The proposed multi-agent system is no substitute for cardiologists, but it could assist them in decision-making.
Effects of dopamine on reinforcement learning and consolidation in Parkinson’s disease
Grogan, John P; Tsivos, Demitra; Smith, Laura; Knight, Brogan E; Bogacz, Rafal; Whone, Alan; Coulthard, Elizabeth J
2017-01-01
Emerging evidence suggests that dopamine may modulate learning and memory with important implications for understanding the neurobiology of memory and future therapeutic targeting. An influential hypothesis posits that dopamine biases reinforcement learning. More recent data also suggest an influence during both consolidation and retrieval. Eighteen Parkinson’s disease patients learned through feedback ON or OFF medication, with memory tested 24 hr later ON or OFF medication (4 conditions, within-subjects design with matched healthy control group). Patients OFF medication during learning decreased in memory accuracy over the following 24 hr. In contrast to previous studies, however, dopaminergic medication during learning and testing did not affect expression of positive or negative reinforcement. Two further experiments were run without the 24 hr delay, but they too failed to reproduce effects of dopaminergic medication on reinforcement learning. While supportive of a dopaminergic role in consolidation, this study failed to replicate previous findings on reinforcement learning. DOI: http://dx.doi.org/10.7554/eLife.26801.001 PMID:28691905
Fear of losing money? Aversive conditioning with secondary reinforcers.
Delgado, M R; Labouliere, C D; Phelps, E A
2006-12-01
Money is a secondary reinforcer that acquires its value through social communication and interaction. In everyday human behavior and laboratory studies, money has been shown to influence appetitive or reward learning. It is unclear, however, if money has a similar impact on aversive learning. The goal of this study was to investigate the efficacy of money in aversive learning, comparing it with primary reinforcers that are traditionally used in fear conditioning paradigms. A series of experiments were conducted in which participants initially played a gambling game that led to a monetary gain. They were then presented with an aversive conditioning paradigm, with either shock (primary reinforcer) or loss of money (secondary reinforcer) as the unconditioned stimulus. Skin conductance responses and subjective ratings indicated that potential monetary loss modulated the conditioned response. Depending on the presentation context, the secondary reinforcer was as effective as the primary reinforcer during aversive conditioning. These results suggest that stimuli that acquire reinforcing properties through social communication and interaction, such as money, can effectively influence aversive learning.
Reinforcement learning and Tourette syndrome.
Palminteri, Stefano; Pessiglione, Mathias
2013-01-01
In this chapter, we report the first experimental explorations of reinforcement learning in Tourette syndrome, realized by our team in the last few years. This report will be preceded by an introduction aimed to provide the reader with the state of the art of the knowledge concerning the neural bases of reinforcement learning at the moment of these studies and the scientific rationale beyond them. In short, reinforcement learning is learning by trial and error to maximize rewards and minimize punishments. This decision-making and learning process implicates the dopaminergic system projecting to the frontal cortex-basal ganglia circuits. A large body of evidence suggests that the dysfunction of the same neural systems is implicated in the pathophysiology of Tourette syndrome. Our results show that Tourette condition, as well as the most common pharmacological treatments (dopamine antagonists), affects reinforcement learning performance in these patients. Specifically, the results suggest a deficit in negative reinforcement learning, possibly underpinned by a functional hyperdopaminergia, which could explain the persistence of tics, despite their evident inadaptive (negative) value. This idea, together with the implications of these results in Tourette therapy and the future perspectives, is discussed in Section 4 of this chapter. © 2013 Elsevier Inc. All rights reserved.
On the integration of reinforcement learning and approximate reasoning for control
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1991-01-01
The author discusses the importance of strengthening the knowledge representation characteristic of reinforcement learning techniques using methods such as approximate reasoning. The ARIC (approximate reasoning-based intelligent control) architecture is an example of such a hybrid approach in which the fuzzy control rules are modified (fine-tuned) using reinforcement learning. ARIC also demonstrates that it is possible to start with an approximately correct control knowledge base and learn to refine this knowledge through further experience. On the other hand, techniques such as the TD (temporal difference) algorithm and Q-learning establish stronger theoretical foundations for their use in adaptive control and also in stability analysis of hybrid reinforcement learning and approximate reasoning-based controllers.
Intelligence moderates reinforcement learning: a mini-review of the neural evidence
2014-01-01
Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. PMID:25185818
Intelligence moderates reinforcement learning: a mini-review of the neural evidence.
Chen, Chong
2015-06-01
Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. Copyright © 2015 the American Physiological Society.
Prespeech motor learning in a neural network using reinforcement☆
Warlaumont, Anne S.; Westermann, Gert; Buder, Eugene H.; Oller, D. Kimbrough
2012-01-01
Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one’s language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the differ-ent conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network’s post learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network’s post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model’s post-learning productions were more likely to resemble the English vowels and vice versa. PMID:23275137
Reinforcement learning in complementarity game and population dynamics
NASA Astrophysics Data System (ADS)
Jost, Jürgen; Li, Wei
2014-02-01
We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005), 10.1016/j.physa.2004.07.005] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.
The prefrontal cortex and hybrid learning during iterative competitive games.
Abe, Hiroshi; Seo, Hyojung; Lee, Daeyeol
2011-12-01
Behavioral changes driven by reinforcement and punishment are referred to as simple or model-free reinforcement learning. Animals can also change their behaviors by observing events that are neither appetitive nor aversive when these events provide new information about payoffs available from alternative actions. This is an example of model-based reinforcement learning and can be accomplished by incorporating hypothetical reward signals into the value functions for specific actions. Recent neuroimaging and single-neuron recording studies showed that the prefrontal cortex and the striatum are involved not only in reinforcement and punishment, but also in model-based reinforcement learning. We found evidence for both types of learning, and hence hybrid learning, in monkeys during simulated competitive games. In addition, in both the dorsolateral prefrontal cortex and orbitofrontal cortex, individual neurons heterogeneously encoded signals related to actual and hypothetical outcomes from specific actions, suggesting that both areas might contribute to hybrid learning. © 2011 New York Academy of Sciences.
Deserno, Lorenz; Boehme, Rebecca; Heinz, Andreas; Schlagenhauf, Florian
2013-01-01
Abnormalities in reinforcement learning are a key finding in schizophrenia and have been proposed to be linked to elevated levels of dopamine neurotransmission. Behavioral deficits in reinforcement learning and their neural correlates may contribute to the formation of clinical characteristics of schizophrenia. The ability to form predictions about future outcomes is fundamental for environmental interactions and depends on neuronal teaching signals, like reward prediction errors. While aberrant prediction errors, that encode non-salient events as surprising, have been proposed to contribute to the formation of positive symptoms, a failure to build neural representations of decision values may result in negative symptoms. Here, we review behavioral and neuroimaging research in schizophrenia and focus on studies that implemented reinforcement learning models. In addition, we discuss studies that combined reinforcement learning with measures of dopamine. Thereby, we suggest how reinforcement learning abnormalities in schizophrenia may contribute to the formation of psychotic symptoms and may interact with cognitive deficits. These ideas point toward an interplay of more rigid versus flexible control over reinforcement learning. Pronounced deficits in the flexible or model-based domain may allow for a detailed characterization of well-established cognitive deficits in schizophrenia patients based on computational models of learning. Finally, we propose a framework based on the potentially crucial contribution of dopamine to dysfunctional reinforcement learning on the level of neural networks. Future research may strongly benefit from computational modeling but also requires further methodological improvement for clinical group studies. These research tools may help to improve our understanding of disease-specific mechanisms and may help to identify clinically relevant subgroups of the heterogeneous entity schizophrenia. PMID:24391603
Generalization of value in reinforcement learning by humans.
Wimmer, G Elliott; Daw, Nathaniel D; Shohamy, Daphna
2012-04-01
Research in decision-making has focused on the role of dopamine and its striatal targets in guiding choices via learned stimulus-reward or stimulus-response associations, behavior that is well described by reinforcement learning theories. However, basic reinforcement learning is relatively limited in scope and does not explain how learning about stimulus regularities or relations may guide decision-making. A candidate mechanism for this type of learning comes from the domain of memory, which has highlighted a role for the hippocampus in learning of stimulus-stimulus relations, typically dissociated from the role of the striatum in stimulus-response learning. Here, we used functional magnetic resonance imaging and computational model-based analyses to examine the joint contributions of these mechanisms to reinforcement learning. Humans performed a reinforcement learning task with added relational structure, modeled after tasks used to isolate hippocampal contributions to memory. On each trial participants chose one of four options, but the reward probabilities for pairs of options were correlated across trials. This (uninstructed) relationship between pairs of options potentially enabled an observer to learn about option values based on experience with the other options and to generalize across them. We observed blood oxygen level-dependent (BOLD) activity related to learning in the striatum and also in the hippocampus. By comparing a basic reinforcement learning model to one augmented to allow feedback to generalize between correlated options, we tested whether choice behavior and BOLD activity were influenced by the opportunity to generalize across correlated options. Although such generalization goes beyond standard computational accounts of reinforcement learning and striatal BOLD, both choices and striatal BOLD activity were better explained by the augmented model. Consistent with the hypothesized role for the hippocampus in this generalization, functional connectivity between the ventral striatum and hippocampus was modulated, across participants, by the ability of the augmented model to capture participants' choice. Our results thus point toward an interactive model in which striatal reinforcement learning systems may employ relational representations typically associated with the hippocampus. © 2012 The Authors. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.
Mitchell, D G V; Fine, C; Richell, R A; Newman, C; Lumsden, J; Blair, K S; Blair, R J R
2006-05-01
Previous work has shown that individuals with psychopathy are impaired on some forms of associative learning, particularly stimulus-reinforcement learning (Blair et al., 2004; Newman & Kosson, 1986). Animal work suggests that the acquisition of stimulus-reinforcement associations requires the amygdala (Baxter & Murray, 2002). Individuals with psychopathy also show impoverished reversal learning (Mitchell, Colledge, Leonard, & Blair, 2002). Reversal learning is supported by the ventrolateral and orbitofrontal cortex (Rolls, 2004). In this paper we present experiments investigating stimulus-reinforcement learning and relearning in patients with lesions of the orbitofrontal cortex or amygdala, and individuals with developmental psychopathy without known trauma. The results are interpreted with reference to current neurocognitive models of stimulus-reinforcement learning, relearning, and developmental psychopathy. Copyright (c) 2006 APA, all rights reserved.
Model-based reinforcement learning with dimension reduction.
Tangkaratt, Voot; Morimoto, Jun; Sugiyama, Masashi
2016-12-01
The goal of reinforcement learning is to learn an optimal policy which controls an agent to acquire the maximum cumulative reward. The model-based reinforcement learning approach learns a transition model of the environment from data, and then derives the optimal policy using the transition model. However, learning an accurate transition model in high-dimensional environments requires a large amount of data which is difficult to obtain. To overcome this difficulty, in this paper, we propose to combine model-based reinforcement learning with the recently developed least-squares conditional entropy (LSCE) method, which simultaneously performs transition model estimation and dimension reduction. We also further extend the proposed method to imitation learning scenarios. The experimental results show that policy search combined with LSCE performs well for high-dimensional control tasks including real humanoid robot control. Copyright © 2016 Elsevier Ltd. All rights reserved.
Reinforcement of Science Learning through Local Culture: A Delphi Study
ERIC Educational Resources Information Center
Nuangchalerm, Prasart
2008-01-01
This study aims to explore the ways to reinforce science learning through local culture by using Delphi technique. Twenty four participants in various fields of study were selected. The result of study provides a framework for reinforcement of science learning through local culture on the theme life and environment. (Contains 1 table.)
Learning in Collaboration: A Case Study of a Community Based Partnership Program
ERIC Educational Resources Information Center
Syam, Devarati S.
2010-01-01
This ethnographic case study investigated a multi-agency partnership project in a Midwestern city, the goal of which was to holistically address the health, safety and wellness issues of teen girls in an alternative school. The researcher was one of the eleven partners representing five different organizations that came together to create a…
ERIC Educational Resources Information Center
Christopher, Rose; Horsley, Sarah
2016-01-01
The Dudley Behavioural Support Team (BST) was set up based on Positive Behavioural Support (PBS) principles to support individuals with behaviours that challenge. The Winterbourne Review emphasises the importance of developing high-quality specialist community services and the Ensuring Quality Services (Local Government Association & NHS…
Murakoshi, Kazushi; Mizuno, Junya
2004-11-01
In order to rapidly follow unexpected environmental changes, we propose a parameter control method in reinforcement learning that changes each of learning parameters in appropriate directions. We determine each appropriate direction on the basis of relationships between behaviors and neuromodulators by considering an emergency as a key word. Computer experiments show that the agents using our proposed method could rapidly respond to unexpected environmental changes, not depending on either two reinforcement learning algorithms (Q-learning and actor-critic (AC) architecture) or two learning problems (discontinuous and continuous state-action problems).
Partial Planning Reinforcement Learning
2012-08-31
Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 15. SUBJECT TERMS Reinforcement Learning, Bayesian Optimization, Active ... Learning , Action Model Learning, Decision Theoretic Assistance Prasad Tadepalli, Alan Fern Oregon State University Office of Sponsored Programs Oregon State
Reinforcement learning in scheduling
NASA Technical Reports Server (NTRS)
Dietterich, Tom G.; Ok, Dokyeong; Zhang, Wei; Tadepalli, Prasad
1994-01-01
The goal of this research is to apply reinforcement learning methods to real-world problems like scheduling. In this preliminary paper, we show that learning to solve scheduling problems such as the Space Shuttle Payload Processing and the Automatic Guided Vehicle (AGV) scheduling can be usefully studied in the reinforcement learning framework. We discuss some of the special challenges posed by the scheduling domain to these methods and propose some possible solutions we plan to implement.
Neural Basis of Reinforcement Learning and Decision Making
Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan
2012-01-01
Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain. PMID:22462543
Otto, A Ross; Gershman, Samuel J; Markman, Arthur B; Daw, Nathaniel D
2013-05-01
A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls behavior-and under what circumstances-are still unclear. Following the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrated that having human decision makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement-learning strategy. Further, we showed that, across trials, people negotiate the trade-off between the two systems dynamically as a function of concurrent executive-function demands, and people's choice latencies reflect the computational expenses of the strategy they employ. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.
Otto, A. Ross; Gershman, Samuel J.; Markman, Arthur B.; Daw, Nathaniel D.
2013-01-01
A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. Along these lines, a flexible but computationally expensive model-based reinforcement learning system has been contrasted with a less flexible but more efficient model-free reinforcement learning system. The factors governing which system controls behavior—and under what circumstances—are still unclear. Based on the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrate that having human decision-makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement learning strategy. Further, we show that across trials, people negotiate this tradeoff dynamically as a function of concurrent executive function demands and their choice latencies reflect the computational expenses of the strategy employed. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources. PMID:23558545
Planning for Multiagent Using ASP-Prolog
NASA Astrophysics Data System (ADS)
Son, Tran Cao; Pontelli, Enrico; Nguyen, Ngoc-Hieu
This paper presents an Answer Set Programming based approach to multiagent planning. The proposed methodology extends the action language \\cal B in [12] to represent and reason about plans with cooperative actions of an individual agent operating in a multiagent environment. This language is used to formalize multiagent planning problems and the notion of a joint plan for multiagent in the presence of cooperative actions. Finally, the paper presents a system for computing joint plans based on the ASP-Prolog system.
Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.
Ren, Zhipeng; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Zhipeng Ren; Daoyi Dong; Huaxiong Li; Chunlin Chen; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Ren, Zhipeng
2018-06-01
In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.
B-tree search reinforcement learning for model based intelligent agent
NASA Astrophysics Data System (ADS)
Bhuvaneswari, S.; Vignashwaran, R.
2013-03-01
Agents trained by learning techniques provide a powerful approximation of active solutions for naive approaches. In this study using B - Trees implying reinforced learning the data search for information retrieval is moderated to achieve accuracy with minimum search time. The impact of variables and tactics applied in training are determined using reinforcement learning. Agents based on these techniques perform satisfactory baseline and act as finite agents based on the predetermined model against competitors from the course.
Using Fuzzy Logic for Performance Evaluation in Reinforcement Learning
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Khedkar, Pratap S.
1992-01-01
Current reinforcement learning algorithms require long training periods which generally limit their applicability to small size problems. A new architecture is described which uses fuzzy rules to initialize its two neural networks: a neural network for performance evaluation and another for action selection. This architecture is applied to control of dynamic systems and it is demonstrated that it is possible to start with an approximate prior knowledge and learn to refine it through experiments using reinforcement learning.
Reinforcement learning in multidimensional environments relies on attention mechanisms.
Niv, Yael; Daniel, Reka; Geana, Andra; Gershman, Samuel J; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C
2015-05-27
In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this "representation learning" process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the "curse of dimensionality" in reinforcement learning. Copyright © 2015 the authors 0270-6474/15/358145-13$15.00/0.
Changes in corticostriatal connectivity during reinforcement learning in humans.
Horga, Guillermo; Maia, Tiago V; Marsh, Rachel; Hao, Xuejun; Xu, Dongrong; Duan, Yunsuo; Tau, Gregory Z; Graniello, Barbara; Wang, Zhishun; Kangarlu, Alayar; Martinez, Diana; Packard, Mark G; Peterson, Bradley S
2015-02-01
Many computational models assume that reinforcement learning relies on changes in synaptic efficacy between cortical regions representing stimuli and striatal regions involved in response selection, but this assumption has thus far lacked empirical support in humans. We recorded hemodynamic signals with fMRI while participants navigated a virtual maze to find hidden rewards. We fitted a reinforcement-learning algorithm to participants' choice behavior and evaluated the neural activity and the changes in functional connectivity related to trial-by-trial learning variables. Activity in the posterior putamen during choice periods increased progressively during learning. Furthermore, the functional connections between the sensorimotor cortex and the posterior putamen strengthened progressively as participants learned the task. These changes in corticostriatal connectivity differentiated participants who learned the task from those who did not. These findings provide a direct link between changes in corticostriatal connectivity and learning, thereby supporting a central assumption common to several computational models of reinforcement learning. © 2014 Wiley Periodicals, Inc.
Hisey, Erin; Kearney, Matthew Gene; Mooney, Richard
2018-04-01
The complex skills underlying verbal and musical expression can be learned without external punishment or reward, indicating their learning is internally guided. The neural mechanisms that mediate internally guided learning are poorly understood, but a circuit comprising dopamine-releasing neurons in the midbrain ventral tegmental area (VTA) and their targets in the basal ganglia are important to externally reinforced learning. Juvenile zebra finches copy a tutor song in a process that is internally guided and, in adulthood, can learn to modify the fundamental frequency (pitch) of a target syllable in response to external reinforcement with white noise. Here we combined intersectional genetic ablation of VTA neurons, reversible blockade of dopamine receptors in the basal ganglia, and singing-triggered optogenetic stimulation of VTA terminals to establish that a common VTA-basal ganglia circuit enables internally guided song copying and externally reinforced syllable pitch learning.
Awata, Hiroko; Watanabe, Takahito; Hamanaka, Yoshitaka; Mito, Taro; Noji, Sumihare; Mizunami, Makoto
2015-11-02
Elucidation of reinforcement mechanisms in associative learning is an important subject in neuroscience. In mammals, dopamine neurons are thought to play critical roles in mediating both appetitive and aversive reinforcement. Our pharmacological studies suggested that octopamine and dopamine neurons mediate reward and punishment, respectively, in crickets, but recent studies in fruit-flies concluded that dopamine neurons mediates both reward and punishment, via the type 1 dopamine receptor Dop1. To resolve the discrepancy between studies in different insect species, we produced Dop1 knockout crickets using the CRISPR/Cas9 system and found that they are defective in aversive learning with sodium chloride punishment but not appetitive learning with water or sucrose reward. The results suggest that dopamine and octopamine neurons mediate aversive and appetitive reinforcement, respectively, in crickets. We suggest unexpected diversity in neurotransmitters mediating appetitive reinforcement between crickets and fruit-flies, although the neurotransmitter mediating aversive reinforcement is conserved. This study demonstrates usefulness of the CRISPR/Cas9 system for producing knockout animals for the study of learning and memory.
Social Cognition as Reinforcement Learning: Feedback Modulates Emotion Inference.
Zaki, Jamil; Kallman, Seth; Wimmer, G Elliott; Ochsner, Kevin; Shohamy, Daphna
2016-09-01
Neuroscientific studies of social cognition typically employ paradigms in which perceivers draw single-shot inferences about the internal states of strangers. Real-world social inference features much different parameters: People often encounter and learn about particular social targets (e.g., friends) over time and receive feedback about whether their inferences are correct or incorrect. Here, we examined this process and, more broadly, the intersection between social cognition and reinforcement learning. Perceivers were scanned using fMRI while repeatedly encountering three social targets who produced conflicting visual and verbal emotional cues. Perceivers guessed how targets felt and received feedback about whether they had guessed correctly. Visual cues reliably predicted one target's emotion, verbal cues predicted a second target's emotion, and neither reliably predicted the third target's emotion. Perceivers successfully used this information to update their judgments over time. Furthermore, trial-by-trial learning signals-estimated using two reinforcement learning models-tracked activity in ventral striatum and ventromedial pFC, structures associated with reinforcement learning, and regions associated with updating social impressions, including TPJ. These data suggest that learning about others' emotions, like other forms of feedback learning, relies on domain-general reinforcement mechanisms as well as domain-specific social information processing.
Human-level control through deep reinforcement learning.
Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis
2015-02-26
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Human-level control through deep reinforcement learning
NASA Astrophysics Data System (ADS)
Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis
2015-02-01
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Probabilistic Reinforcement Learning in Adults with Autism Spectrum Disorders
Solomon, Marjorie; Smith, Anne C.; Frank, Michael J.; Ly, Stanford; Carter, Cameron S.
2017-01-01
Background Autism spectrum disorders (ASDs) can be conceptualized as disorders of learning, however there have been few experimental studies taking this perspective. Methods We examined the probabilistic reinforcement learning performance of 28 adults with ASDs and 30 typically developing adults on a task requiring learning relationships between three stimulus pairs consisting of Japanese characters with feedback that was valid with different probabilities (80%, 70%, and 60%). Both univariate and Bayesian state–space data analytic methods were employed. Hypotheses were based on the extant literature as well as on neurobiological and computational models of reinforcement learning. Results Both groups learned the task after training. However, there were group differences in early learning in the first task block where individuals with ASDs acquired the most frequently accurately reinforced stimulus pair (80%) comparably to typically developing individuals; exhibited poorer acquisition of the less frequently reinforced 70% pair as assessed by state–space learning curves; and outperformed typically developing individuals on the near chance (60%) pair. Individuals with ASDs also demonstrated deficits in using positive feedback to exploit rewarded choices. Conclusions Results support the contention that individuals with ASDs are slower learners. Based on neurobiology and on the results of computational modeling, one interpretation of this pattern of findings is that impairments are related to deficits in flexible updating of reinforcement history as mediated by the orbito-frontal cortex, with spared functioning of the basal ganglia. This hypothesis about the pathophysiology of learning in ASDs can be tested using functional magnetic resonance imaging. PMID:21425243
ERIC Educational Resources Information Center
Marshak, David
This paper on multiage classrooms provides first steps toward a systemic understanding of the defining qualities of multiage classrooms and, from teachers' perspectives, the benefits of such classrooms for students, teachers, and parents. The multiage classroom movement in elementary schools is viewed as not just restructuring, but also as the…
ERIC Educational Resources Information Center
Kansas State Univ., Manhattan. Center for Rural Education and Small Schools.
This proceedings contains abstracts of 21 presentations. Titles and presenters are: "Teaching and Learning in Multiage Classrooms" (Laura Blevins and others); "Leadership, School Reform and the Rural School Superintendent" (Mike Boone); "Teaching English as a Second Language from Theory to Practice" (Mingsheng Dai); "A Guide for Central Office…
Multi-Agent Design and Implementation for an Online Peer Help System
ERIC Educational Resources Information Center
Meng, Anbo
2014-01-01
With the rapid advance of e-learning, the online peer help is playing increasingly important role. This paper explores the application of MAS to an online peer help system (MAPS). In the design phase, the architecture of MAPS is proposed, which consists of a set of agents including the personal agent, the course agent, the diagnosis agent, the DF…
Facilitating Democracy in a Testing Culture: Challenges and Opportunities for School Leaders
ERIC Educational Resources Information Center
Bergmark, Ulrika; Salopek, Michelle; Kawai, Roi; Lane-Myler, Jennifer
2014-01-01
In 2010, Principal Kirk introduced Small Group Meeting (SGM) at Hillcrest Elementary. SGMs are multiage student groupings who meet with school faculty once a month to work on community building, service-learning projects, and advising. Many teachers liked the SGMs, some felt they needed more time to prepare, and others felt it was a waste of time.…
Dramatherapy and Family Therapy in Education: Essential Pieces of the Multi-Agency Jigsaw
ERIC Educational Resources Information Center
McFarlane, Penny; Harvey, Jenny
2012-01-01
A collaborative therapeutic approach often proves the best way to assess and meet the needs of children experiencing barriers to learning. This book gives a concise overview of drama and family therapy and describes how both therapies can work together to provide essential pieces of the jigsaw of emotional support for troubled children within an…
Chronic Heart Failure Follow-up Management Based on Agent Technology
Safdari, Reza
2015-01-01
Objectives Monitoring heart failure patients through continues assessment of sign and symptoms by information technology tools lead to large reduction in re-hospitalization. Agent technology is one of the strongest artificial intelligence areas; therefore, it can be expected to facilitate, accelerate, and improve health services especially in home care and telemedicine. The aim of this article is to provide an agent-based model for chronic heart failure (CHF) follow-up management. Methods This research was performed in 2013-2014 to determine appropriate scenarios and the data required to monitor and follow-up CHF patients, and then an agent-based model was designed. Results Agents in the proposed model perform the following tasks: medical data access, communication with other agents of the framework and intelligent data analysis, including medical data processing, reasoning, negotiation for decision-making, and learning capabilities. Conclusions The proposed multi-agent system has ability to learn and thus improve itself. Implementation of this model with more and various interval times at a broader level could achieve better results. The proposed multi-agent system is no substitute for cardiologists, but it could assist them in decision-making. PMID:26618038
Short-memory traders and their impact on group learning in financial markets
LeBaron, Blake
2002-01-01
This article highlights several issues from simulating agent-based financial markets. These all center around the issue of learning in a multiagent setting, and specifically the question of whether the trading behavior of short-memory agents could interfere with the learning process of the market as whole. It is shown in a simple example that short-memory traders persist in generating excess volatility and other features common to actual markets. Problems related to short-memory trader behavior can be eliminated by using several different methods. These are discussed along with their relevance to agent-based models in general. PMID:11997443
2006-03-01
represented by the set of tiles that it lies in. A variation on tile coding is Berenji and Vengerov’s [4, 5] use of fuzzy state aggregation (FSA) as a means...approximation with Q-learning is not a new or unusual concept [1, 3]. Berenji and Vengerov [4, 5] advanced this work in their application of Q-learning and... Berenji and Vengerov [4, 5]. The simplified Tileworld consists of agents, reward spikes, and deformations. The agent must select which reward to pursue
Role of dopamine D2 receptors in human reinforcement learning.
Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W
2014-09-01
Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well.
Role of Dopamine D2 Receptors in Human Reinforcement Learning
Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W
2014-01-01
Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well. PMID:24713613
Reinforcement learning in computer vision
NASA Astrophysics Data System (ADS)
Bernstein, A. V.; Burnaev, E. V.
2018-04-01
Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computer vision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computer vision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computer vision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computer vision problems.
Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms
Daniel, Reka; Geana, Andra; Gershman, Samuel J.; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C.
2015-01-01
In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this “representation learning” process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the “curse of dimensionality” in reinforcement learning. PMID:26019331
Network congestion control algorithm based on Actor-Critic reinforcement learning model
NASA Astrophysics Data System (ADS)
Xu, Tao; Gong, Lina; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen
2018-04-01
Aiming at the network congestion control problem, a congestion control algorithm based on Actor-Critic reinforcement learning model is designed. Through the genetic algorithm in the congestion control strategy, the network congestion problems can be better found and prevented. According to Actor-Critic reinforcement learning, the simulation experiment of network congestion control algorithm is designed. The simulation experiments verify that the AQM controller can predict the dynamic characteristics of the network system. Moreover, the learning strategy is adopted to optimize the network performance, and the dropping probability of packets is adaptively adjusted so as to improve the network performance and avoid congestion. Based on the above finding, it is concluded that the network congestion control algorithm based on Actor-Critic reinforcement learning model can effectively avoid the occurrence of TCP network congestion.
From Recurrent Choice to Skill Learning: A Reinforcement-Learning Model
ERIC Educational Resources Information Center
Fu, Wai-Tat; Anderson, John R.
2006-01-01
The authors propose a reinforcement-learning mechanism as a model for recurrent choice and extend it to account for skill learning. The model was inspired by recent research in neurophysiological studies of the basal ganglia and provides an integrated explanation of recurrent choice behavior and skill learning. The behavior includes effects of…
A Distributed Ambient Intelligence Based Multi-Agent System for Alzheimer Health Care
NASA Astrophysics Data System (ADS)
Tapia, Dante I.; RodríGuez, Sara; Corchado, Juan M.
This chapter presents ALZ-MAS (Alzheimer multi-agent system), an ambient intelligence (AmI)-based multi-agent system aimed at enhancing the assistance and health care for Alzheimer patients. The system makes use of several context-aware technologies that allow it to automatically obtain information from users and the environment in an evenly distributed way, focusing on the characteristics of ubiquity, awareness, intelligence, mobility, etc., all of which are concepts defined by AmI. ALZ-MAS makes use of a services oriented multi-agent architecture, called flexible user and services oriented multi-agent architecture, to distribute resources and enhance its performance. It is demonstrated that a SOA approach is adequate to build distributed and highly dynamic AmI-based multi-agent systems.
Adolescent-specific patterns of behavior and neural activity during social reinforcement learning
Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, BJ
2014-01-01
Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents towards action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggests possible explanations for how peers may motivate adolescent behavior. PMID:24550063
Adolescent-specific patterns of behavior and neural activity during social reinforcement learning.
Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, B J
2014-06-01
Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.
Stochastic Reinforcement Benefits Skill Acquisition
ERIC Educational Resources Information Center
Dayan, Eran; Averbeck, Bruno B.; Richmond, Barry J.; Cohen, Leonardo G.
2014-01-01
Learning complex skills is driven by reinforcement, which facilitates both online within-session gains and retention of the acquired skills. Yet, in ecologically relevant situations, skills are often acquired when mapping between actions and rewarding outcomes is unknown to the learning agent, resulting in reinforcement schedules of a stochastic…
A reward optimization method based on action subrewards in hierarchical reinforcement learning.
Fu, Yuchen; Liu, Quan; Ling, Xionghong; Cui, Zhiming
2014-01-01
Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.
Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models
Najnin, Shamima; Banerjee, Bonny
2018-01-01
Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The “novel words to novel objects” language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task. PMID:29441027
Reinforcement learning improves behaviour from evaluative feedback
NASA Astrophysics Data System (ADS)
Littman, Michael L.
2015-05-01
Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.
Reinforcement learning improves behaviour from evaluative feedback.
Littman, Michael L
2015-05-28
Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.
Quantum-Enhanced Machine Learning
NASA Astrophysics Data System (ADS)
Dunjko, Vedran; Taylor, Jacob M.; Briegel, Hans J.
2016-09-01
The emerging field of quantum machine learning has the potential to substantially aid in the problems and scope of artificial intelligence. This is only enhanced by recent successes in the field of classical machine learning. In this work we propose an approach for the systematic treatment of machine learning, from the perspective of quantum information. Our approach is general and covers all three main branches of machine learning: supervised, unsupervised, and reinforcement learning. While quantum improvements in supervised and unsupervised learning have been reported, reinforcement learning has received much less attention. Within our approach, we tackle the problem of quantum enhancements in reinforcement learning as well, and propose a systematic scheme for providing improvements. As an example, we show that quadratic improvements in learning efficiency, and exponential improvements in performance over limited time periods, can be obtained for a broad class of learning problems.
Examining Play among Young Children in Single-Age and Multi-Age Preschool Classroom Settings
ERIC Educational Resources Information Center
Youhne, Mia Song
2009-01-01
Advocates for multi-age classrooms claim multi-age groupings benefit children (Brynes, Shuster, & Jones, 1994). Currently, there is a lack of research examining play among students in multi-age classrooms. If indeed there is a positive benefit of play among children, research is needed to examine these behaviors among and between young children in…
The drift diffusion model as the choice rule in reinforcement learning.
Pedersen, Mads Lund; Frank, Michael J; Biele, Guido
2017-08-01
Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.
Can model-free reinforcement learning explain deontological moral judgments?
Ayars, Alisabeth
2016-05-01
Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework. Copyright © 2016 Elsevier B.V. All rights reserved.
The drift diffusion model as the choice rule in reinforcement learning
Frank, Michael J.
2017-01-01
Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyper-activity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups. PMID:27966103
General functioning predicts reward and punishment learning in schizophrenia.
Somlai, Zsuzsanna; Moustafa, Ahmed A; Kéri, Szabolcs; Myers, Catherine E; Gluck, Mark A
2011-04-01
Previous studies investigating feedback-driven reinforcement learning in patients with schizophrenia have provided mixed results. In this study, we explored the clinical predictors of reward and punishment learning using a probabilistic classification learning task. Patients with schizophrenia (n=40) performed similarly to healthy controls (n=30) on the classification learning task. However, more severe negative and general symptoms were associated with lower reward-learning performance, whereas poorer general psychosocial functioning was correlated with both lower reward- and punishment-learning performances. Multiple linear regression analyses indicated that general psychosocial functioning was the only significant predictor of reinforcement learning performance when education, antipsychotic dose, and positive, negative and general symptoms were included in the analysis. These results suggest a close relationship between reinforcement learning and general psychosocial functioning in schizophrenia. Published by Elsevier B.V.
Agent-based traffic management and reinforcement learning in congested intersection network.
DOT National Transportation Integrated Search
2012-08-01
This study evaluates the performance of traffic control systems based on reinforcement learning (RL), also called approximate dynamic programming (ADP). Two algorithms have been selected for testing: 1) Q-learning and 2) approximate dynamic programmi...
Operant conditioning of enhanced pain sensitivity by heat-pain titration.
Becker, Susanne; Kleinböhl, Dieter; Klossika, Iris; Hölzl, Rupert
2008-11-15
Operant conditioning mechanisms have been demonstrated to be important in the development of chronic pain. Most experimental studies have investigated the operant modulation of verbal pain reports with extrinsic reinforcement, such as verbal reinforcement. Whether this reflects actual changes in the subjective experience of the nociceptive stimulus remained unclear. This study replicates and extends our previous demonstration that enhanced pain sensitivity to prolonged heat-pain stimulation could be learned in healthy participants through intrinsic reinforcement (contingent changes in nociceptive input) independent of verbal pain reports. In addition, we examine whether different magnitudes of reinforcement differentially enhance pain sensitivity using an operant heat-pain titration paradigm. It is based on the previously developed non-verbal behavioral discrimination task for the assessment of sensitization, which uses discriminative down- or up-regulation of stimulus temperatures in response to changes in subjective intensity. In operant heat-pain titration, this discriminative behavior and not verbal pain report was contingently reinforced or punished by acute decreases or increases in heat-pain intensity. The magnitude of reinforcement was varied between three groups: low (N1=13), medium (N2=11) and high reinforcement (N3=12). Continuous reinforcement was applied to acquire and train the operant behavior, followed by partial reinforcement to analyze the underlying learning mechanisms. Results demonstrated that sensitization to prolonged heat-pain stimulation was enhanced by operant learning within 1h. The extent of sensitization was directly dependent on the received magnitude of reinforcement. Thus, operant learning mechanisms based on intrinsic reinforcement may provide an explanation for the gradual development of sustained hypersensitivity during pain that is becoming chronic.
Applications of Multi-Agent Technology to Power Systems
NASA Astrophysics Data System (ADS)
Nagata, Takeshi
Currently, agents are focus of intense on many sub-fields of computer science and artificial intelligence. Agents are being used in an increasingly wide variety of applications. Many important computing applications such as planning, process control, communication networks and concurrent systems will benefit from using multi-agent system approach. A multi-agent system is a structure given by an environment together with a set of artificial agents capable to act on this environment. Multi-agent models are oriented towards interactions, collaborative phenomena, and autonomy. This article presents the applications of multi-agent technology to the power systems.
Improving Robot Motor Learning with Negatively Valenced Reinforcement Signals
Navarro-Guerrero, Nicolás; Lowe, Robert J.; Wermter, Stefan
2017-01-01
Both nociception and punishment signals have been used in robotics. However, the potential for using these negatively valenced types of reinforcement learning signals for robot learning has not been exploited in detail yet. Nociceptive signals are primarily used as triggers of preprogrammed action sequences. Punishment signals are typically disembodied, i.e., with no or little relation to the agent-intrinsic limitations, and they are often used to impose behavioral constraints. Here, we provide an alternative approach for nociceptive signals as drivers of learning rather than simple triggers of preprogrammed behavior. Explicitly, we use nociception to expand the state space while we use punishment as a negative reinforcement learning signal. We compare the performance—in terms of task error, the amount of perceived nociception, and length of learned action sequences—of different neural networks imbued with punishment-based reinforcement signals for inverse kinematic learning. We contrast the performance of a version of the neural network that receives nociceptive inputs to that without such a process. Furthermore, we provide evidence that nociception can improve learning—making the algorithm more robust against network initializations—as well as behavioral performance by reducing the task error, perceived nociception, and length of learned action sequences. Moreover, we provide evidence that punishment, at least as typically used within reinforcement learning applications, may be detrimental in all relevant metrics. PMID:28420976
Organization-based Model-driven Development of High-assurance Multiagent Systems
2009-02-27
based Model -driven Development of High-assurance Multiagent Systems " performed by Dr. Scott A . DeLoach and Dr Robby at Kansas State University... A Capabilities Based Model for Artificial Organizations. Journal of Autonomous Agents and Multiagent Systems . Volume 16, no. 1, February 2008, pp...Matson, E . T. (2007). A capabilities based theory of artificial organizations. Journal of Autonomous Agents and Multiagent Systems
Exploration of Force Transition in Stability Operations Using Multi-Agent Simulation
2006-09-01
risk, mission failure risk, and time in the context of the operational threat environment. The Pythagoras Multi-Agent Simulation and Data Farming...NUMBER OF PAGES 173 14. SUBJECT TERMS Stability Operations, Peace Operations, Data Farming, Pythagoras , Agent- Based Model, Multi-Agent Simulation...the operational threat environment. The Pythagoras Multi-Agent Simulation and Data Farming techniques are used to investigate force-level
NASA Astrophysics Data System (ADS)
Zhang, Zhong
In this work, motivated by the need to coordinate transmission maintenance scheduling among a multiplicity of self-interested entities in restructured power industry, a distributed decision support framework based on multiagent negotiation systems (MANS) is developed. An innovative risk-based transmission maintenance optimization procedure is introduced. Several models for linking condition monitoring information to the equipment's instantaneous failure probability are presented, which enable quantitative evaluation of the effectiveness of maintenance activities in terms of system cumulative risk reduction. Methodologies of statistical processing, equipment deterioration evaluation and time-dependent failure probability calculation are also described. A novel framework capable of facilitating distributed decision-making through multiagent negotiation is developed. A multiagent negotiation model is developed and illustrated that accounts for uncertainty and enables social rationality. Some issues of multiagent negotiation convergence and scalability are discussed. The relationships between agent-based negotiation and auction systems are also identified. A four-step MAS design methodology for constructing multiagent systems for power system applications is presented. A generic multiagent negotiation system, capable of inter-agent communication and distributed decision support through inter-agent negotiations, is implemented. A multiagent system framework for facilitating the automated integration of condition monitoring information and maintenance scheduling for power transformers is developed. Simulations of multiagent negotiation-based maintenance scheduling among several independent utilities are provided. It is shown to be a viable alternative solution paradigm to the traditional centralized optimization approach in today's deregulated environment. This multiagent system framework not only facilitates the decision-making among competing power system entities, but also provides a tool to use in studying competitive industry relative to monopolistic industry.
Place preference and vocal learning rely on distinct reinforcers in songbirds.
Murdoch, Don; Chen, Ruidong; Goldberg, Jesse H
2018-04-30
In reinforcement learning (RL) agents are typically tasked with maximizing a single objective function such as reward. But it remains poorly understood how agents might pursue distinct objectives at once. In machines, multiobjective RL can be achieved by dividing a single agent into multiple sub-agents, each of which is shaped by agent-specific reinforcement, but it remains unknown if animals adopt this strategy. Here we use songbirds to test if navigation and singing, two behaviors with distinct objectives, can be differentially reinforced. We demonstrate that strobe flashes aversively condition place preference but not song syllables. Brief noise bursts aversively condition song syllables but positively reinforce place preference. Thus distinct behavior-generating systems, or agencies, within a single animal can be shaped by correspondingly distinct reinforcement signals. Our findings suggest that spatially segregated vocal circuits can solve a credit assignment problem associated with multiobjective learning.
ERIC Educational Resources Information Center
Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias
2011-01-01
Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…
Machine Learning Control For Highly Reconfigurable High-Order Systems
2015-01-02
develop and flight test a Reinforcement Learning based approach for autonomous tracking of ground targets using a fixed wing Unmanned...Reinforcement Learning - based algorithms are developed for learning agents’ time dependent dynamics while also learning to control them. Three algorithms...to a wide range of engineering- based problems . Implementation of these solutions, however, is often complicated by the hysteretic, non-linear,
Reinforcement and inference in cross-situational word learning.
Tilles, Paulo F C; Fontanari, José F
2013-01-01
Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation) inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only) reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.
Liu, Xiaoyang; Ho, Daniel W C; Cao, Jinde; Xu, Wenying
This brief investigates the problem of finite-time robust consensus (FTRC) for second-order nonlinear multiagent systems with external disturbances. Based on the global finite-time stability theory of discontinuous homogeneous systems, a novel finite-time convergent discontinuous disturbed observer (DDO) is proposed for the leader-following multiagent systems. The states of the designed DDO are then used to design the control inputs to achieve the FTRC of nonlinear multiagent systems in the presence of bounded disturbances. The simulation results are provided to validate the effectiveness of these theoretical results.This brief investigates the problem of finite-time robust consensus (FTRC) for second-order nonlinear multiagent systems with external disturbances. Based on the global finite-time stability theory of discontinuous homogeneous systems, a novel finite-time convergent discontinuous disturbed observer (DDO) is proposed for the leader-following multiagent systems. The states of the designed DDO are then used to design the control inputs to achieve the FTRC of nonlinear multiagent systems in the presence of bounded disturbances. The simulation results are provided to validate the effectiveness of these theoretical results.
Evolution with Reinforcement Learning in Negotiation
Zou, Yi; Zhan, Wenjie; Shao, Yuan
2014-01-01
Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm. PMID:25048108
Evolution with reinforcement learning in negotiation.
Zou, Yi; Zhan, Wenjie; Shao, Yuan
2014-01-01
Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm.
Overcoming Learned Helplessness in Community College Students.
ERIC Educational Resources Information Center
Roueche, John E.; Mink, Oscar G.
1982-01-01
Reviews research on the effects of repeated experiences of helplessness and on locus of control. Identifies conditions necessary for overcoming learned helplessness; i.e., the potential for learning to occur; consistent reinforcement; relevant, valued reinforcers; and favorable psychological situation. Recommends eight ways for teachers to…
Michaelides, Michael; Miller, Michael L; DiNieri, Jennifer A; Gomez, Juan L; Schwartz, Elizabeth; Egervari, Gabor; Wang, Gene Jack; Mobbs, Charles V; Volkow, Nora D; Hurd, Yasmin L
2017-11-01
Appetitive drive is influenced by coordinated interactions between brain circuits that regulate reinforcement and homeostatic signals that control metabolism. Glucose modulates striatal dopamine (DA) and regulates appetitive drive and reinforcement learning. Striatal DA D2 receptors (D2Rs) also regulate reinforcement learning and are implicated in glucose-related metabolic disorders. Nevertheless, interactions between striatal D2R and peripheral glucose have not been previously described. Here we show that manipulations involving striatal D2R signaling coincide with perseverative and impulsive-like responding for sucrose, a disaccharide consisting of fructose and glucose. Fructose conveys orosensory (ie, taste) reinforcement but does not convey metabolic (ie, nutrient-derived) reinforcement. Glucose however conveys orosensory reinforcement but unlike fructose, it is a major metabolic energy source, underlies sustained reinforcement, and activates striatal circuitry. We found that mice with deletion of dopamine- and cAMP-regulated neuronal phosphoprotein (DARPP-32) exclusively in D2R-expressing cells exhibited preferential D2R changes in the nucleus accumbens (NAc), a striatal region that critically regulates sucrose reinforcement. These changes coincided with perseverative and impulsive-like responding for sucrose pellets and sustained reinforcement learning of glucose-paired flavors. These mice were also characterized by significant glucose intolerance (ie, impaired glucose utilization). Systemic glucose administration significantly attenuated sucrose operant responding and D2R activation or blockade in the NAc bidirectionally modulated blood glucose levels and glucose tolerance. Collectively, these results implicate NAc D2R in regulating both peripheral glucose levels and glucose-dependent reinforcement learning behaviors and highlight the notion that glucose metabolic impairments arising from disrupted NAc D2R signaling are involved in compulsive and perseverative feeding behaviors.
Model of interaction in Smart Grid on the basis of multi-agent system
NASA Astrophysics Data System (ADS)
Engel, E. A.; Kovalev, I. V.; Engel, N. E.
2016-11-01
This paper presents model of interaction in Smart Grid on the basis of multi-agent system. The use of travelling waves in the multi-agent system describes the behavior of the Smart Grid from the local point, which is being the complement of the conventional approach. The simulation results show that the absorption of the wave in the distributed multi-agent systems is effectively simulated the interaction in Smart Grid.
The Effects of Partial Reinforcement in the Acquisition and Extinction of Recurrent Serial Patterns.
ERIC Educational Resources Information Center
Dockstader, Steven L.
The purpose of these 2 experiments was to determine whether sequential response pattern behavior is affected by partial reinforcement in the same way as other behavior systems. The first experiment investigated the partial reinforcement extinction effects (PREE) in a sequential concept learning task where subjects were required to learn a…
Microstimulation of the Human Substantia Nigra Alters Reinforcement Learning
Ramayya, Ashwin G.; Misra, Amrit
2014-01-01
Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action–reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action–reward associations rather than stimulus–reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action–reward associations during reinforcement learning. PMID:24828643
1987-09-01
Luthans (28) expanded the concept of learning as follows: 1. Learning involves a change, though not necessarily an improvement, in behaviour. Learning...that results in an unpleasant outcome is not likely to be repeated (36:244). Luthans and Kreitner (27) described the various forms of reinforcement as...four 33 alternatives (defined previously on page 24 and taken from Luthans ) of positive reinforcement, negative reinforcement, extinction and punishment
Mastery Learning through Individualized Instruction: A Reinforcement Strategy
ERIC Educational Resources Information Center
Sagy, John; Ravi, R.; Ananthasayanam, R.
2009-01-01
The present study attempts to gauge the effect of individualized instructional methods as a reinforcement strategy for mastery learning. Among various individualized instructional methods, the study focuses on PIM (Programmed Instructional Method) and CAIM (Computer Assisted Instruction Method). Mastery learning is a process where students achieve…
Segers, Elien; Beckers, Tom; Geurts, Hilde; Claes, Laurence; Danckaerts, Marina; van der Oord, Saskia
2018-01-01
Introduction: Behavioral Parent Training (BPT) is often provided for childhood psychiatric disorders. These disorders have been shown to be associated with working memory impairments. BPT is based on operant learning principles, yet how operant principles shape behavior (through the partial reinforcement (PRF) extinction effect, i.e., greater resistance to extinction that is created when behavior is reinforced partially rather than continuously) and the potential role of working memory therein is scarcely studied in children. This study explored the PRF extinction effect and the role of working memory therein using experimental tasks in typically developing children. Methods: Ninety-seven children (age 6–10) completed a working memory task and an operant learning task, in which children acquired a response-sequence rule under either continuous or PRF (120 trials), followed by an extinction phase (80 trials). Data of 88 children were used for analysis. Results: The PRF extinction effect was confirmed: We observed slower acquisition and extinction in the PRF condition as compared to the continuous reinforcement (CRF) condition. Working memory was negatively related to acquisition but not extinction performance. Conclusion: Both reinforcement contingencies and working memory relate to acquisition performance. Potential implications for BPT are that decreasing working memory load may enhance the chance of optimally learning through reinforcement. PMID:29643822
2016-09-07
been demonstrated on maximum power point tracking for photovoltaic arrays and for wind turbines . 3. ES has recently been implemented on the Mars...high-dimensional optimization problems . Extensions and applications of these techniques were developed during the realization of the project. 15...studied problems of dynamic average consensus and a class of unconstrained continuous-time optimization algorithms for the coordination of multiple
Learning science in small multi-age groups: the role of age composition
NASA Astrophysics Data System (ADS)
Kallery, Maria; Loupidou, Thomais
2016-06-01
The present study examines how the overall cognitive achievements in science of the younger children in a class where the students work in small multi-age groups are influenced by the number of older children in the groups. The context of the study was early-years education. The study has two parts: The first part involved classes attended by pre-primary children aged 4-6. The second part included one primary class attended by students aged 6-8 in addition to the pre-primary classes. Students were involved in inquiry-based science activities. Two sources of data were used: Lesson recordings and children's assessments. The data from both sources were separately analyzed and the findings plotted. The resulting graphs indicate a linear relationship between the overall performance of the younger children in a class and the number of older ones participating in the groups in each class. It seems that the age composition of the groups can significantly affect the overall cognitive achievements of the younger children and preferentially determines the time within which this factor reaches its maximum value. The findings can be utilized in deciding the age composition of small groups in a class with the aim of facilitating the younger children's learning in science.
Instructional control of reinforcement learning: A behavioral and neurocomputational investigation
Doll, Bradley B.; Jacobs, W. Jake; Sanfey, Alan G.; Frank, Michael J.
2011-01-01
Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is “overridden” at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract “Q-learning” and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a “confirmation bias” in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes. PMID:19595993
van den Akker, Karolien; Havermans, Remco C; Bouton, Mark E; Jansen, Anita
2014-10-01
Animals and humans can easily learn to associate an initially neutral cue with food intake through classical conditioning, but extinction of learned appetitive responses can be more difficult. Intermittent or partial reinforcement of food cues causes especially persistent behaviour in animals: after exposure to such learning schedules, the decline in responding that occurs during extinction is slow. After extinction, increases in responding with renewed reinforcement of food cues (reacquisition) might be less rapid after acquisition with partial reinforcement. In humans, it may be that the eating behaviour of some individuals resembles partial reinforcement schedules to a greater extent, possibly affecting dieting success by interacting with extinction and reacquisition. Furthermore, impulsivity has been associated with less successful dieting, and this association might be explained by impulsivity affecting the learning and extinction of appetitive responses. In the present two studies, the effects of different reinforcement schedules and impulsivity on the acquisition, extinction, and reacquisition of appetitive responses were investigated in a conditioning paradigm involving food rewards in healthy humans. Overall, the results indicate both partial reinforcement schedules and, possibly, impulsivity to be associated with worse extinction performance. A new model of dieting success is proposed: learning histories and, perhaps, certain personality traits (impulsivity) can interfere with the extinction and reacquisition of appetitive responses to food cues and they may be causally related to unsuccessful dieting. Copyright © 2014 Elsevier Ltd. All rights reserved.
Game Engineering a Multiagent Systems Perspective
2016-07-21
AFRL-AFOSR-VA-TR-2016-0260 Game Engineering A Multiagent Systems Perspective Jason Marden REGENTS OF THE UNIVERSITY OF COLORADO THE 3100 MARINE ST...21-06-2016 2. REPORT TYPE Final Report 3. DATES COVERED (From - To) 07/01/2012 - 06/30/2015 4. TITLE AND SUBTITLE Game Engineering A Multiagent...for public release. AFOSR Project Final Summary Jason R. Marden Contract/Grant Title: Game Engineering A Multiagent
Regulating recognition decisions through incremental reinforcement learning.
Han, Sanghoon; Dobbins, Ian G
2009-06-01
Does incremental reinforcement learning influence recognition memory judgments? We examined this question by subtly altering the relative validity or availability of feedback in order to differentially reinforce old or new recognition judgments. Experiment 1 probabilistically and incorrectly indicated that either misses or false alarms were correct in the context of feedback that was otherwise accurate. Experiment 2 selectively withheld feedback for either misses or false alarms in the context of feedback that was otherwise present. Both manipulations caused prominent shifts of recognition memory decision criteria that remained for considerable periods even after feedback had been altogether removed. Overall, these data demonstrate that incremental reinforcement-learning mechanisms influence the degree of caution subjects exercise when evaluating explicit memories.
Infant Contingency Learning in Different Cultural Contexts
ERIC Educational Resources Information Center
Graf, Frauke; Lamm, Bettina; Goertz, Claudia; Kolling, Thorsten; Freitag, Claudia; Spangler, Sibylle; Fassbender, Ina; Teubert, Manuel; Vierhaus, Marc; Keller, Heidi; Lohaus, Arnold; Schwarzer, Gudrun; Knopf, Monika
2012-01-01
Three-month-old Cameroonian Nso farmer and German middle-class infants were compared regarding learning and retention in a computerized mobile task. Infants achieving a preset learning criterion during reinforcement were tested for immediate and long-term retention measured in terms of an increased response rate after reinforcement and after a…
Adaptive Educational Software by Applying Reinforcement Learning
ERIC Educational Resources Information Center
Bennane, Abdellah
2013-01-01
The introduction of the intelligence in teaching software is the object of this paper. In software elaboration process, one uses some learning techniques in order to adapt the teaching software to characteristics of student. Generally, one uses the artificial intelligence techniques like reinforcement learning, Bayesian network in order to adapt…
A Robust Cooperated Control Method with Reinforcement Learning and Adaptive H∞ Control
NASA Astrophysics Data System (ADS)
Obayashi, Masanao; Uchiyama, Shogo; Kuremoto, Takashi; Kobayashi, Kunikazu
This study proposes a robust cooperated control method combining reinforcement learning with robust control to control the system. A remarkable characteristic of the reinforcement learning is that it doesn't require model formula, however, it doesn't guarantee the stability of the system. On the other hand, robust control system guarantees stability and robustness, however, it requires model formula. We employ both the actor-critic method which is a kind of reinforcement learning with minimal amount of computation to control continuous valued actions and the traditional robust control, that is, H∞ control. The proposed system was compared method with the conventional control method, that is, the actor-critic only used, through the computer simulation of controlling the angle and the position of a crane system, and the simulation result showed the effectiveness of the proposed method.
Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji
2015-01-01
A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach.
Punishment insensitivity and impaired reinforcement learning in preschoolers.
Briggs-Gowan, Margaret J; Nichols, Sara R; Voss, Joel; Zobel, Elvira; Carter, Alice S; McCarthy, Kimberly J; Pine, Daniel S; Blair, James; Wakschlag, Lauren S
2014-01-01
Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a developmental vulnerability to psychopathic traits. One hundred and fifty-seven preschoolers (mean age 4.7 ± 0.8 years) participated in a substudy that was embedded within a larger project. Children completed the 'Stars-in-Jars' task, which involved learning to select rewarded jars and avoid punished jars. Maternal report of responsiveness to socialization was assessed with the Punishment Insensitivity and Low Concern for Others scales of the Multidimensional Assessment of Preschool Disruptive Behavior (MAP-DB). Punishment Insensitivity, but not Low Concern for Others, was significantly associated with reinforcement learning in multivariate models that accounted for age and sex. Specifically, higher Punishment Insensitivity was associated with significantly lower overall performance and more errors on punished trials ('passive avoidance'). Impairments in reinforcement learning manifest in preschoolers who are high in maternal ratings of Punishment Insensitivity. If replicated, these findings may help to pinpoint the neurodevelopmental antecedents of psychopathic tendencies and suggest novel intervention targets beginning in early childhood. © 2013 The Authors. Journal of Child Psychology and Psychiatry © 2013 Association for Child and Adolescent Mental Health.
Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji
2015-01-01
A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach. PMID:25734662
An Approach to Model Based Testing of Multiagent Systems
Nadeem, Aamer
2015-01-01
Autonomous agents perform on behalf of the user to achieve defined goals or objectives. They are situated in dynamic environment and are able to operate autonomously to achieve their goals. In a multiagent system, agents cooperate with each other to achieve a common goal. Testing of multiagent systems is a challenging task due to the autonomous and proactive behavior of agents. However, testing is required to build confidence into the working of a multiagent system. Prometheus methodology is a commonly used approach to design multiagents systems. Systematic and thorough testing of each interaction is necessary. This paper proposes a novel approach to testing of multiagent systems based on Prometheus design artifacts. In the proposed approach, different interactions between the agent and actors are considered to test the multiagent system. These interactions include percepts and actions along with messages between the agents which can be modeled in a protocol diagram. The protocol diagram is converted into a protocol graph, on which different coverage criteria are applied to generate test paths that cover interactions between the agents. A prototype tool has been developed to generate test paths from protocol graph according to the specified coverage criterion. PMID:25874263
Schulz, Daniela; Henn, Fritz A; Petri, David; Huston, Joseph P
2016-08-04
Principles of negative reinforcement learning may play a critical role in the etiology and treatment of depression. We examined the integrity of positive reinforcement learning in congenitally helpless (cH) rats, an animal model of depression, using a random ratio schedule and a devaluation-extinction procedure. Furthermore, we tested whether an antidepressant dose of the monoamine oxidase (MAO)-B inhibitor deprenyl would reverse any deficits in positive reinforcement learning. We found that cH rats (n=9) were impaired in the acquisition of even simple operant contingencies, such as a fixed interval (FI) 20 schedule. cH rats exhibited no apparent deficits in appetite or reward sensitivity. They reacted to the devaluation of food in a manner consistent with a dose-response relationship. Reinforcer motivation as assessed by lever pressing across sessions with progressively decreasing reward probabilities was highest in congenitally non-helpless (cNH, n=10) rats as long as the reward probabilities remained relatively high. cNH compared to wild-type (n=10) rats were also more resistant to extinction across sessions. Compared to saline (n=5), deprenyl (n=5) reduced the duration of immobility of cH rats in the forced swimming test, indicative of antidepressant effects, but did not restore any deficits in the acquisition of a FI 20 schedule. We conclude that positive reinforcement learning was impaired in rats bred for helplessness, possibly due to motivational impairments but not deficits in reward sensitivity, and that deprenyl exerted antidepressant effects but did not reverse the deficits in positive reinforcement learning. Copyright © 2016 IBRO. Published by Elsevier Ltd. All rights reserved.
Insel, Catherine; Reinen, Jenna; Weber, Jochen; Wager, Tor D; Jarskog, L Fredrik; Shohamy, Daphna; Smith, Edward E
2014-03-01
Schizophrenia is characterized by an abnormal dopamine system, and dopamine blockade is the primary mechanism of antipsychotic treatment. Consistent with the known role of dopamine in reward processing, prior research has demonstrated that patients with schizophrenia exhibit impairments in reward-based learning. However, it remains unknown how treatment with antipsychotic medication impacts the behavioral and neural signatures of reinforcement learning in schizophrenia. The goal of this study was to examine whether antipsychotic medication modulates behavioral and neural responses to prediction error coding during reinforcement learning. Patients with schizophrenia completed a reinforcement learning task while undergoing functional magnetic resonance imaging. The task consisted of two separate conditions in which participants accumulated monetary gain or avoided monetary loss. Behavioral results indicated that antipsychotic medication dose was associated with altered behavioral approaches to learning, such that patients taking higher doses of medication showed increased sensitivity to negative reinforcement. Higher doses of antipsychotic medication were also associated with higher learning rates (LRs), suggesting that medication enhanced sensitivity to trial-by-trial feedback. Neuroimaging data demonstrated that antipsychotic dose was related to differences in neural signatures of feedback prediction error during the loss condition. Specifically, patients taking higher doses of medication showed attenuated prediction error responses in the striatum and the medial prefrontal cortex. These findings indicate that antipsychotic medication treatment may influence motivational processes in patients with schizophrenia.
Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C
2014-03-01
Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.
Microstimulation of the human substantia nigra alters reinforcement learning.
Ramayya, Ashwin G; Misra, Amrit; Baltuch, Gordon H; Kahana, Michael J
2014-05-14
Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action-reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action-reward associations rather than stimulus-reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action-reward associations during reinforcement learning. Copyright © 2014 the authors 0270-6474/14/346887-09$15.00/0.
Batch Mode Reinforcement Learning based on the Synthesis of Artificial Trajectories
Fonteneau, Raphael; Murphy, Susan A.; Wehenkel, Louis; Ernst, Damien
2013-01-01
In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value function. As an alternative to the use of function approximators, we rely on the synthesis of “artificial trajectories” from the given sample of trajectories, and show that this idea opens new avenues for designing and analyzing algorithms for batch mode reinforcement learning. PMID:24049244
Gershman, Samuel J.; Pesaran, Bijan; Daw, Nathaniel D.
2009-01-01
Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable, due to the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning – such as prediction error signals for action valuation associated with dopamine and the striatum – can cope with this “curse of dimensionality.” We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and BOLD activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to “divide and conquer” reinforcement learning over high-dimensional action spaces. PMID:19864565
Gershman, Samuel J; Pesaran, Bijan; Daw, Nathaniel D
2009-10-28
Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.
Separation of time-based and trial-based accounts of the partial reinforcement extinction effect.
Bouton, Mark E; Woods, Amanda M; Todd, Travis P
2014-01-01
Two appetitive conditioning experiments with rats examined time-based and trial-based accounts of the partial reinforcement extinction effect (PREE). In the PREE, the loss of responding that occurs in extinction is slower when the conditioned stimulus (CS) has been paired with a reinforcer on some of its presentations (partially reinforced) instead of every presentation (continuously reinforced). According to a time-based or "time-accumulation" view (e.g., Gallistel and Gibbon, 2000), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger amount of time has accumulated in the CS over trials. In contrast, according to a trial-based view (e.g., Capaldi, 1967), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger number of CS presentations. Experiment 1 used a procedure that equated partially and continuously reinforced groups on their expected times to reinforcement during conditioning. A PREE was still observed. Experiment 2 then used an extinction procedure that allowed time in the CS and the number of trials to accumulate differentially through extinction. The PREE was still evident when responding was examined as a function of expected time units to the reinforcer, but was eliminated when responding was examined as a function of expected trial units to the reinforcer. There was no evidence that the animal responded according to the ratio of time accumulated during the CS in extinction over the time in the CS expected before the reinforcer. The results thus favor a trial-based account over a time-based account of extinction and the PREE. This article is part of a Special Issue entitled: Associative and Temporal Learning. Copyright © 2013 Elsevier B.V. All rights reserved.
A Review of Norms and Normative Multiagent Systems
Mahmoud, Moamin A.; Ahmad, Mohd Sharifuddin; Mustapha, Aida
2014-01-01
Norms and normative multiagent systems have become the subjects of interest for many researchers. Such interest is caused by the need for agents to exploit the norms in enhancing their performance in a community. The term norm is used to characterize the behaviours of community members. The concept of normative multiagent systems is used to facilitate collaboration and coordination among social groups of agents. Many researches have been conducted on norms that investigate the fundamental concepts, definitions, classification, and types of norms and normative multiagent systems including normative architectures and normative processes. However, very few researches have been found to comprehensively study and analyze the literature in advancing the current state of norms and normative multiagent systems. Consequently, this paper attempts to present the current state of research on norms and normative multiagent systems and propose a norm's life cycle model based on the review of the literature. Subsequently, this paper highlights the significant areas for future work. PMID:25110739
Autonomous reinforcement learning with experience replay.
Wawrzyński, Paweł; Tanwani, Ajay Kumar
2013-05-01
This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. Copyright © 2012 Elsevier Ltd. All rights reserved.
Shephard, Elizabeth; Jackson, Georgina M; Groom, Madeleine J
2016-06-01
Altered reinforcement learning is implicated in the causes of Tourette syndrome (TS) and attention-deficit/hyperactivity disorder (ADHD). TS and ADHD frequently co-occur but how this affects reinforcement learning has not been investigated. We examined the ability of young people with TS (n=18), TS+ADHD (N=17), ADHD (n=13) and typically developing controls (n=20) to learn and reverse stimulus-response (S-R) associations based on positive and negative reinforcement feedback. We used a 2 (TS-yes, TS-no)×2 (ADHD-yes, ADHD-no) factorial design to assess the effects of TS, ADHD, and their interaction on behavioural (accuracy, RT) and event-related potential (stimulus-locked P3, feedback-locked P2, feedback-related negativity, FRN) indices of learning and reversing the S-R associations. TS was associated with intact learning and reversal performance and largely typical ERP amplitudes. ADHD was associated with lower accuracy during S-R learning and impaired reversal learning (significantly reduced accuracy and a trend for smaller P3 amplitude). The results indicate that co-occurring ADHD symptoms impair reversal learning in TS+ADHD. The implications of these findings for behavioural tic therapies are discussed. Copyright © 2016 ISDN. Published by Elsevier Ltd. All rights reserved.
Utilising reinforcement learning to develop strategies for driving auditory neural implants.
Lee, Geoffrey W; Zambetta, Fabio; Li, Xiaodong; Paolini, Antonio G
2016-08-01
In this paper we propose a novel application of reinforcement learning to the area of auditory neural stimulation. We aim to develop a simulation environment which is based off real neurological responses to auditory and electrical stimulation in the cochlear nucleus (CN) and inferior colliculus (IC) of an animal model. Using this simulator we implement closed loop reinforcement learning algorithms to determine which methods are most effective at learning effective acoustic neural stimulation strategies. By recording a comprehensive set of acoustic frequency presentations and neural responses from a set of animals we created a large database of neural responses to acoustic stimulation. Extensive electrical stimulation in the CN and the recording of neural responses in the IC provides a mapping of how the auditory system responds to electrical stimuli. The combined dataset is used as the foundation for the simulator, which is used to implement and test learning algorithms. Reinforcement learning, utilising a modified n-Armed Bandit solution, is implemented to demonstrate the model's function. We show the ability to effectively learn stimulation patterns which mimic the cochlea's ability to covert acoustic frequencies to neural activity. Time taken to learn effective replication using neural stimulation takes less than 20 min under continuous testing. These results show the utility of reinforcement learning in the field of neural stimulation. These results can be coupled with existing sound processing technologies to develop new auditory prosthetics that are adaptable to the recipients current auditory pathway. The same process can theoretically be abstracted to other sensory and motor systems to develop similar electrical replication of neural signals.
Embedded Incremental Feature Selection for Reinforcement Learning
2012-05-01
Prior to this work, feature selection for reinforce- ment learning has focused on linear value function ap- proximation ( Kolter and Ng, 2009; Parr et al...InProceed- ings of the the 23rd International Conference on Ma- chine Learning, pages 449–456. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature
Social Learning, Reinforcement and Crime: Evidence from Three European Cities
ERIC Educational Resources Information Center
Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina
2012-01-01
This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…
Novelty and Inductive Generalization in Human Reinforcement Learning
Gershman, Samuel J.; Niv, Yael
2015-01-01
In reinforcement learning, a decision maker searching for the most rewarding option is often faced with the question: what is the value of an option that has never been tried before? One way to frame this question is as an inductive problem: how can I generalize my previous experience with one set of options to a novel option? We show how hierarchical Bayesian inference can be used to solve this problem, and describe an equivalence between the Bayesian model and temporal difference learning algorithms that have been proposed as models of reinforcement learning in humans and animals. According to our view, the search for the best option is guided by abstract knowledge about the relationships between different options in an environment, resulting in greater search efficiency compared to traditional reinforcement learning algorithms previously applied to human cognition. In two behavioral experiments, we test several predictions of our model, providing evidence that humans learn and exploit structured inductive knowledge to make predictions about novel options. In light of this model, we suggest a new interpretation of dopaminergic responses to novelty. PMID:25808176
Learning with incomplete information and the mathematical structure behind it.
Kühn, Reimer; Stamatescu, Ion-Olimpiu
2007-07-01
We investigate the problem of learning with incomplete information as exemplified by learning with delayed reinforcement. We study a two phase learning scenario in which a phase of Hebbian associative learning based on momentary internal representations is supplemented by an 'unlearning' phase depending on a graded reinforcement signal. The reinforcement signal quantifies the success-rate globally for a number of learning steps in phase one, and 'unlearning' is indiscriminate with respect to associations learnt in that phase. Learning according to this model is studied via simulations and analytically within a student-teacher scenario for both single layer networks and, for a committee machine. Success and speed of learning depend on the ratio lambda of the learning rates used for the associative Hebbian learning phase and for the unlearning-correction in response to the reinforcement signal, respectively. Asymptotically perfect generalization is possible only, if this ratio exceeds a critical value lambda( c ), in which case the generalization error exhibits a power law decay with the number of examples seen by the student, with an exponent that depends in a non-universal manner on the parameter lambda. We find these features to be robust against a wide spectrum of modifications of microscopic modelling details. Two illustrative applications-one of a robot learning to navigate a field containing obstacles, and the problem of identifying a specific component in a collection of stimuli-are also provided.
NASA Astrophysics Data System (ADS)
Cui, Bing; Zhao, Chunhui; Ma, Tiedong; Feng, Chi
2017-02-01
In this paper, the cooperative adaptive consensus tracking problem for heterogeneous nonlinear multi-agent systems on directed graph is addressed. Each follower is modelled as a general nonlinear system with the unknown and nonidentical nonlinear dynamics, disturbances and actuator failures. Cooperative fault tolerant neural network tracking controllers with online adaptive learning features are proposed to guarantee that all agents synchronise to the trajectory of one leader with bounded adjustable synchronisation errors. With the help of linear quadratic regulator-based optimal design, a graph-dependent Lyapunov proof provides error bounds that depend on the graph topology, one virtual matrix and some design parameters. Of particular interest is that if the control gain is selected appropriately, the proposed control scheme can be implemented in a unified framework no matter whether there are faults or not. Furthermore, the fault detection and isolation are not needed to implement. Finally, a simulation is given to verify the effectiveness of the proposed method.
A multi-agent intelligent environment for medical knowledge.
Vicari, Rosa M; Flores, Cecilia D; Silvestre, André M; Seixas, Louise J; Ladeira, Marcelo; Coelho, Helder
2003-03-01
AMPLIA is a multi-agent intelligent learning environment designed to support training of diagnostic reasoning and modelling of domains with complex and uncertain knowledge. AMPLIA focuses on the medical area. It is a system that deals with uncertainty under the Bayesian network approach, where learner-modelling tasks will consist of creating a Bayesian network for a problem the system will present. The construction of a network involves qualitative and quantitative aspects. The qualitative part concerns the network topology, that is, causal relations among the domain variables. After it is ready, the quantitative part is specified. It is composed of the distribution of conditional probability of the variables represented. A negotiation process (managed by an intelligent MediatorAgent) will treat the differences of topology and probability distribution between the model the learner built and the one built-in in the system. That negotiation process occurs between the agents that represent the expert knowledge domain (DomainAgent) and the agent that represents the learner knowledge (LearnerAgent).
2014-09-29
Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity , and Performance W. Bradley Knox...positive a trainer’s reward values are; temporal discounting, the extent to which future reward is discounted in value; episodicity , whether task...learning occurs in discrete learning episodes instead of one continuing session; and task performance, the agent’s performance on the task the trainer
Fuzzy Q-Learning for Generalization of Reinforcement Learning
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1996-01-01
Fuzzy Q-Learning, introduced earlier by the author, is an extension of Q-Learning into fuzzy environments. GARIC is a methodology for fuzzy reinforcement learning. In this paper, we introduce GARIC-Q, a new method for doing incremental Dynamic Programming using a society of intelligent agents which are controlled at the top level by Fuzzy Q-Learning and at the local level, each agent learns and operates based on GARIC. GARIC-Q improves the speed and applicability of Fuzzy Q-Learning through generalization of input space by using fuzzy rules and bridges the gap between Q-Learning and rule based intelligent systems.
Developing Multi-Agency Leadership in Education
ERIC Educational Resources Information Center
Close, Paul
2012-01-01
This article contributes to the growing debate around how we understand and develop multi-agency leadership in children and young people's services. Bringing together a range of inter-disciplinary research, it presents a framework for multi-agency leadership development, which, it argues, is well theorised, multi-level and versed in key field…
Distributed consensus for discrete-time heterogeneous multi-agent systems
NASA Astrophysics Data System (ADS)
Zhao, Huanyu; Fei, Shumin
2018-06-01
This paper studies the consensus problem for a class of discrete-time heterogeneous multi-agent systems. Two kinds of consensus algorithms will be considered. The heterogeneous multi-agent systems considered are converted into equivalent error systems by a model transformation. Then we analyse the consensus problem of the original systems by analysing the stability problem of the error systems. Some sufficient conditions for consensus of heterogeneous multi-agent systems are obtained by applying algebraic graph theory and matrix theory. Simulation examples are presented to show the usefulness of the results.
NASA Astrophysics Data System (ADS)
Bai, Jing; Wen, Guoguang; Rahmani, Ahmed
2018-04-01
Leaderless consensus for the fractional-order nonlinear multi-agent systems is investigated in this paper. At the first part, a control protocol is proposed to achieve leaderless consensus for the nonlinear single-integrator multi-agent systems. At the second part, based on sliding mode estimator, a control protocol is given to solve leaderless consensus for the the nonlinear single-integrator multi-agent systems. It shows that the control protocol can improve the systems' convergence speed. At the third part, a control protocol is designed to accomplish leaderless consensus for the nonlinear double-integrator multi-agent systems. To judge the systems' stability in this paper, two classic continuous Lyapunov candidate functions are chosen. Finally, several worked out examples under directed interaction topology are given to prove above results.
Framework for robot skill learning using reinforcement learning
NASA Astrophysics Data System (ADS)
Wei, Yingzi; Zhao, Mingyang
2003-09-01
Robot acquiring skill is a process similar to human skill learning. Reinforcement learning (RL) is an on-line actor critic method for a robot to develop its skill. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. We present an augmented reward function that provides a new way for RL controller to incorporate prior knowledge and experience into the RL controller. Also, the difference form of augmented reward function is considered carefully. The additional reward beyond conventional reward will provide more heuristic information for RL. In this paper, we present a strategy for the task of complex skill learning. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. The new form of value function is introduced to attain smooth motion switching swiftly. We present a formal, but practical, framework for robot skill learning and also illustrate with an example the utility of method for learning skilled robot control on line.
Proactivity and Reinforcement: The Contingency of Social Behavior
ERIC Educational Resources Information Center
Williams, J. Sherwood; And Others
1976-01-01
This paper analyzes development of group structure in terms of the stimulus-sampling perspective. Learning is the continual sampling of possibilities, with those reinforced possibilities increasing in probability of occurance. This contingency learning approach is tested experimentally. (NG)
Mobile robots exploration through cnn-based reinforcement learning.
Tai, Lei; Liu, Ming
2016-01-01
Exploration in an unknown environment is an elemental application for mobile robots. In this paper, we outlined a reinforcement learning method aiming for solving the exploration problem in a corridor environment. The learning model took the depth image from an RGB-D sensor as the only input. The feature representation of the depth image was extracted through a pre-trained convolutional-neural-networks model. Based on the recent success of deep Q-network on artificial intelligence, the robot controller achieved the exploration and obstacle avoidance abilities in several different simulated environments. It is the first time that the reinforcement learning is used to build an exploration strategy for mobile robots through raw sensor information.
Altered neural encoding of prediction errors in assault-related posttraumatic stress disorder.
Ross, Marisa C; Lenow, Jennifer K; Kilts, Clinton D; Cisler, Josh M
2018-05-12
Posttraumatic stress disorder (PTSD) is widely associated with deficits in extinguishing learned fear responses, which relies on mechanisms of reinforcement learning (e.g., updating expectations based on prediction errors). However, the degree to which PTSD is associated with impairments in general reinforcement learning (i.e., outside of the context of fear stimuli) remains poorly understood. Here, we investigate brain and behavioral differences in general reinforcement learning between adult women with and without a current diagnosis of PTSD. 29 adult females (15 PTSD with exposure to assaultive violence, 14 controls) underwent a neutral reinforcement-learning task (i.e., two arm bandit task) during fMRI. We modeled participant behavior using different adaptations of the Rescorla-Wagner (RW) model and used Independent Component Analysis to identify timecourses for large-scale a priori brain networks. We found that an anticorrelated and risk sensitive RW model best fit participant behavior, with no differences in computational parameters between groups. Women in the PTSD group demonstrated significantly less neural encoding of prediction errors in both a ventral striatum/mPFC and anterior insula network compared to healthy controls. Weakened encoding of prediction errors in the ventral striatum/mPFC and anterior insula during a general reinforcement learning task, outside of the context of fear stimuli, suggests the possibility of a broader conceptualization of learning differences in PTSD than currently proposed in current neurocircuitry models of PTSD. Copyright © 2018 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Hwang, Kuo-An; Yang, Chia-Hao
2009-01-01
Most courses based on distance learning focus on the cognitive domain of learning. Because students are sometimes inattentive or tired, they may neglect the attention goal of learning. This study proposes an auto-detection and reinforcement mechanism for the distance-education system based on the reinforcement teaching strategy. If a student is…
When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition
ERIC Educational Resources Information Center
Janssen, Christian P.; Gray, Wayne D.
2012-01-01
Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other…
Dissociating error-based and reinforcement-based loss functions during sensorimotor learning
McGregor, Heather R.; Mohatarem, Ayman
2017-01-01
It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback. PMID:28753634
Dissociating error-based and reinforcement-based loss functions during sensorimotor learning.
Cashaback, Joshua G A; McGregor, Heather R; Mohatarem, Ayman; Gribble, Paul L
2017-07-01
It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback.
Hart, Andrew S.; Collins, Anne L.; Bernstein, Ilene L.; Phillips, Paul E. M.
2012-01-01
Alcohol use during adolescence has profound and enduring consequences on decision-making under risk. However, the fundamental psychological processes underlying these changes are unknown. Here, we show that alcohol use produces over-fast learning for better-than-expected, but not worse-than-expected, outcomes without altering subjective reward valuation. We constructed a simple reinforcement learning model to simulate altered decision making using behavioral parameters extracted from rats with a history of adolescent alcohol use. Remarkably, the learning imbalance alone was sufficient to simulate the divergence in choice behavior observed between these groups of animals. These findings identify a selective alteration in reinforcement learning following adolescent alcohol use that can account for a robust change in risk-based decision making persisting into later life. PMID:22615989
Intelligent systems for strategic power infrastructure defense
NASA Astrophysics Data System (ADS)
Jung, Ju-Hwan
A fault or disturbance in a power system can be severe due to the sources of vulnerability such as human errors, protection and control system failures, a failure of communication networks to deliver critical control signals, and market and load uncertainties. There have been several catastrophic failures resulting from disturbances involving the sources of vulnerability while power systems are designed to withstand disturbances or faults. To avoid catastrophic failures or minimize the impact of a disturbance(s), the state of the power system has to be analyzed correctly and preventive or corrective self-healing control actions have to be deployed. This dissertation addresses two aspects of power systems: Defense system and diagnosis, both concerned with the power system analysis and operation during events involving faults or disturbances. This study is intended to develop a defense system that is able to assess power system vulnerability and to perform self-healing control actions based on the system-wide analysis. In order to meet the requirements of the system-wide analysis, the defense system is designed with multi-agent system technologies. Since power systems are dynamic and uncertain the self-healing control actions need to be adaptive. This study applies the reinforcement learning technique to provide a theoretical basis for adaptation. One of the important issues in adaptation is the convergence of the learning algorithm. An appropriate convergence criterion is derived and an application with a load-shedding scheme is demonstrated in this study. This dissertation also demonstrates the feasibility of the defense system and self-healing control actions through multi-agent system technologies. The other subject of this research is to investigate the methodology for on-line fault diagnosis using the information from Sequence-of-Events Recorders (SER). The proposed multiple-hypothesis analysis generates one or more hypothetical fault scenarios to interpret the SER information. In order to avoid ambiguity of the hypotheses, this study proposes a new method to determine the credibility of each hypothesis. Even if there is not enough SER information, the proposed method is able to perform an accurate fault and malfunction analysis. To avoid exhaustive testing, a minimal set of test scenarios is derived, which is able to handle missing information and SERs. During extreme contingencies or cascading events, fault diagnosis is the first step in the operation of the power system. On-line fault diagnosis provides necessary and correct information for the defense system to make correct and efficient decisions on self-healing control actions. It has been shown in previous studies that incorrect fault diagnosis can lead to catastrophic failures in power systems. Fault diagnosis is an important issue for strategic power infrastructure defense.
A Discussion of Possibility of Reinforcement Learning Using Event-Related Potential in BCI
NASA Astrophysics Data System (ADS)
Yamagishi, Yuya; Tsubone, Tadashi; Wada, Yasuhiro
Recently, Brain computer interface (BCI) which is a direct connecting pathway an external device such as a computer or a robot and a human brain have gotten a lot of attention. Since BCI can control the machines as robots by using the brain activity without using the voluntary muscle, the BCI may become a useful communication tool for handicapped persons, for instance, amyotrophic lateral sclerosis patients. However, in order to realize the BCI system which can perform precise tasks on various environments, it is necessary to design the control rules to adapt to the dynamic environments. Reinforcement learning is one approach of the design of the control rule. If this reinforcement leaning can be performed by the brain activity, it leads to the attainment of BCI that has general versatility. In this research, we paid attention to P300 of event-related potential as an alternative signal of the reward of reinforcement learning. We discriminated between the success and the failure trials from P300 of the EEG of the single trial by using the proposed discrimination algorithm based on Support vector machine. The possibility of reinforcement learning was examined from the viewpoint of the number of discriminated trials. It was shown that there was a possibility to be able to learn in most subjects.
Comparative learning theory and its application in the training of horses.
Cooper, J J
1998-11-01
Training can best be explained as a process that occurs through stimulus-response-reinforcement chains, whereby animals are conditioned to associate cues in their environment, with specific behavioural responses and their rewarding consequences. Research into learning in horses has concentrated on their powers of discrimination and on primary positive reinforcement schedules, where the correct response is paired with a desirable consequence such as food. In contrast, a number of other learning processes that are used in training have been widely studied in other species, but have received little scientific investigation in the horse. These include: negative reinforcement, where performance of the correct response is followed by removal of, or decrease in, intensity of a unpleasant stimulus; punishment, where an incorrect response is paired with an undesirable consequence, but without consistent prior warning; secondary conditioning, where a natural primary reinforcer such as food is closely associated with an arbitrary secondary reinforcer such as vocal praise; and variable or partial conditioning, where once the correct response has been learnt, reinforcement is presented according to an intermittent schedule to increase resistance to extinction outside of training.
Developing Multi-Agency Teams: Implications of a National Programme Evaluation
ERIC Educational Resources Information Center
Simkins, Tim; Garrick, Ros
2012-01-01
This paper explores the factors which influence the effectiveness of formal development programmes targeted at multi-agency teams in children's services. It draws on two studies of the National College for School Leadership's Multi-Agency Teams Development programme, reporting key characteristics of the programme, short-term outcomes in terms of…
Multi-Age Classrooms. NEA Teacher-to-Teacher Books.
ERIC Educational Resources Information Center
Gutloff, Karen, Ed.
This guide is designed for elementary school teachers to assist them in developing multi-age classrooms as part of their school restructuring efforts. Each of six sections presents a story from teachers who describe the challenges and joys of multi-age teaching, from parent backlash to school district support and praise. Section 1, "Step by…
Quality Care through Multi-Age Grouping of Children.
ERIC Educational Resources Information Center
Prendergast, Leo
2002-01-01
Asserts that multi-age grouping in early childhood settings can and does work. Addresses four main hurdles to successful implementation: (1) laws and regulations that act as barriers; (2) health concerns; (3) overcoming educational values that conflict with those of the age-grouped classroom; and (4) staff misunderstanding of multi-age grouping…
Multiage Instruction and Inclusion: A Collaborative Approach
ERIC Educational Resources Information Center
Stuart, Shannon K.; Connor, Mary; Cady, Karin; Zweifel, Alicia
2007-01-01
This article describes a multiage classroom led by three co-teachers who facilitate the education of 42 students ages six through nine years. The classroom is located in a public school district that practices inclusion and subscribes to the principles of whole schooling. A literature review defines the concepts of co-teaching, multiage education,…
Implementing Multiage Education: A Practical Guide.
ERIC Educational Resources Information Center
Kasten, Wendy C.; Lolli, Elizabeth Monce
Noting that multiage education continues to receive a great deal of interest as educators, legislators, and parents seek to find ways to improve educational experiences for all children, this book takes readers by the hand and guides them as they move from exploring the concept of multiage to the actual stages of implementation. As is consistent…
The nature of sexual reinforcement.
Crawford, L L; Holloway, K S; Domjan, M
1993-01-01
Sexual reinforcers are not part of a regulatory system involved in the maintenance of critical metabolic processes, they differ for males and females, they differ as a function of species and mating system, and they show ontogenetic and seasonal changes related to endocrine conditions. Exposure to a member of the opposite sex without copulation can be sufficient for sexual reinforcement. However, copulatory access is a stronger reinforcer, and copulatory opportunity can serve to enhance the reinforcing efficacy of stimulus features of a sexual partner. Conversely, under certain conditions, noncopulatory exposure serves to decrease reinforcer efficacy. Many common learning phenomena such as acquisition, extinction, discrimination learning, second-order conditioning, and latent inhibition have been demonstrated in sexual conditioning. These observations extend the generality of findings obtained with more conventional reinforcers, but the mechanisms of these effects and their gender and species specificity remain to be explored. PMID:8354970
Mesolimbic confidence signals guide perceptual learning in the absence of external feedback
Guggenmos, Matthias; Wilbertz, Gregor; Hebart, Martin N; Sterzer, Philipp
2016-01-01
It is well established that learning can occur without external feedback, yet normative reinforcement learning theories have difficulties explaining such instances of learning. Here, we propose that human observers are capable of generating their own feedback signals by monitoring internal decision variables. We investigated this hypothesis in a visual perceptual learning task using fMRI and confidence reports as a measure for this monitoring process. Employing a novel computational model in which learning is guided by confidence-based reinforcement signals, we found that mesolimbic brain areas encoded both anticipation and prediction error of confidence—in remarkable similarity to previous findings for external reward-based feedback. We demonstrate that the model accounts for choice and confidence reports and show that the mesolimbic confidence prediction error modulation derived through the model predicts individual learning success. These results provide a mechanistic neurobiological explanation for learning without external feedback by augmenting reinforcement models with confidence-based feedback. DOI: http://dx.doi.org/10.7554/eLife.13388.001 PMID:27021283
Mirolli, Marco; Santucci, Vieri G; Baldassarre, Gianluca
2013-03-01
An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions. Copyright © 2013 Elsevier Ltd. All rights reserved.
Collins, Anne G E; Frank, Michael J
2018-03-06
Learning from rewards and punishments is essential to survival and facilitates flexible human behavior. It is widely appreciated that multiple cognitive and reinforcement learning systems contribute to decision-making, but the nature of their interactions is elusive. Here, we leverage methods for extracting trial-by-trial indices of reinforcement learning (RL) and working memory (WM) in human electro-encephalography to reveal single-trial computations beyond that afforded by behavior alone. Neural dynamics confirmed that increases in neural expectation were predictive of reduced neural surprise in the following feedback period, supporting central tenets of RL models. Within- and cross-trial dynamics revealed a cooperative interplay between systems for learning, in which WM contributes expectations to guide RL, despite competition between systems during choice. Together, these results provide a deeper understanding of how multiple neural systems interact for learning and decision-making and facilitate analysis of their disruption in clinical populations.
Learning and tuning fuzzy logic controllers through reinforcements.
Berenji, H R; Khedkar, P
1992-01-01
A method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system is presented. It is shown that: the generalized approximate-reasoning-based intelligent control (GARIC) architecture learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward network, which can then adaptively improve performance by using gradient descent methods. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.
Verifying Multi-Agent Systems via Unbounded Model Checking
NASA Technical Reports Server (NTRS)
Kacprzak, M.; Lomuscio, A.; Lasica, T.; Penczek, W.; Szreter, M.
2004-01-01
We present an approach to the problem of verification of epistemic properties in multi-agent systems by means of symbolic model checking. In particular, it is shown how to extend the technique of unbounded model checking from a purely temporal setting to a temporal-epistemic one. In order to achieve this, we base our discussion on interpreted systems semantics, a popular semantics used in multi-agent systems literature. We give details of the technique and show how it can be applied to the well known train, gate and controller problem. Keywords: model checking, unbounded model checking, multi-agent systems
NASA Technical Reports Server (NTRS)
Pena, Joaquin; Hinchey, Michael G.; Ruiz-Cortes, Antonio
2006-01-01
The field of Software Product Lines (SPL) emphasizes building a core architecture for a family of software products from which concrete products can be derived rapidly. This helps to reduce time-to-market, costs, etc., and can result in improved software quality and safety. Current AOSE methodologies are concerned with developing a single Multiagent System. We propose an initial approach to developing the core architecture of a Multiagent Systems Product Line (MAS-PL), exemplifying our approach with reference to a concept NASA mission based on multiagent technology.
Impairments in action-outcome learning in schizophrenia.
Morris, Richard W; Cyrzon, Chad; Green, Melissa J; Le Pelley, Mike E; Balleine, Bernard W
2018-03-03
Learning the causal relation between actions and their outcomes (AO learning) is critical for goal-directed behavior when actions are guided by desire for the outcome. This can be contrasted with habits that are acquired by reinforcement and primed by prevailing stimuli, in which causal learning plays no part. Recently, we demonstrated that goal-directed actions are impaired in schizophrenia; however, whether this deficit exists alongside impairments in habit or reinforcement learning is unknown. The present study distinguished deficits in causal learning from reinforcement learning in schizophrenia. We tested people with schizophrenia (SZ, n = 25) and healthy adults (HA, n = 25) in a vending machine task. Participants learned two action-outcome contingencies (e.g., push left to get a chocolate M&M, push right to get a cracker), and they also learned one contingency was degraded by delivery of noncontingent outcomes (e.g., free M&Ms), as well as changes in value by outcome devaluation. Both groups learned the best action to obtain rewards; however, SZ did not distinguish the more causal action when one AO contingency was degraded. Moreover, action selection in SZ was insensitive to changes in outcome value unless feedback was provided, and this was related to the deficit in AO learning. The failure to encode the causal relation between action and outcome in schizophrenia occurred without any apparent deficit in reinforcement learning. This implies that poor goal-directed behavior in schizophrenia cannot be explained by a more primary deficit in reward learning such as insensitivity to reward value or reward prediction errors.
ERIC Educational Resources Information Center
Heitzman, Andrew J.
The New York State Center for Migrant Studies conducted this 1968 study which investigated effects of token reinforcers on reading and arithmetic skills learnings of migrant primary school students during a 6-week summer school session. Students (Negro and Caucasian) received plastic tokens to reward skills learning responses. Tokens were traded…
ERIC Educational Resources Information Center
Neu, Jessica Adele
2013-01-01
I conducted two studies on the comparative effects of the observation of learn units during (a) reinforcement or (b) correction conditions on the acquisition of math objectives. The dependent variables were the within-session cumulative numbers of correct responses emitted during observational sessions. The independent variables were the…
ERIC Educational Resources Information Center
Chi, Min; VanLehn, Kurt; Litman, Diane; Jordan, Pamela
2011-01-01
Pedagogical strategies are policies for a tutor to decide the next action when there are multiple actions available. When the content is controlled to be the same across experimental conditions, there has been little evidence that tutorial decisions have an impact on students' learning. In this paper, we applied Reinforcement Learning (RL) to…
The Identification and Establishment of Reinforcement for Collaboration in Elementary Students
ERIC Educational Resources Information Center
Darcy, Laura
2017-01-01
In Experiment 1, I conducted a functional analysis of student rate of learning with and without a peer-yoked contingency for 12 students in Kindergarten through 2nd grade in order to determine if they had conditioned reinforcement for collaboration. Using an ABAB reversal design, I compared rate of learning as measured by learn units to criterion…
Stress enhances model-free reinforcement learning only after negative outcome
Lee, Daeyeol
2017-01-01
Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i.e., how quickly information from the new environment is incorporated into choices. Participants were randomly assigned to either a stress or a control condition, and performed a two-stage Markov decision-making task in which the reward probabilities underwent periodic reversals without notice. We found that stress increased the contribution of model-free reinforcement learning only after negative outcome. Furthermore, stress decreased the learning rate. The results suggest that stress diminishes one’s ability to make adaptive choices in multiple aspects of reinforcement learning. This finding has implications for understanding how stress facilitates maladaptive habits, such as addictive behavior, and other dysfunctional behaviors associated with stress in clinical and educational contexts. PMID:28723943
Stress enhances model-free reinforcement learning only after negative outcome.
Park, Heyeon; Lee, Daeyeol; Chey, Jeanyung
2017-01-01
Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i.e., how quickly information from the new environment is incorporated into choices. Participants were randomly assigned to either a stress or a control condition, and performed a two-stage Markov decision-making task in which the reward probabilities underwent periodic reversals without notice. We found that stress increased the contribution of model-free reinforcement learning only after negative outcome. Furthermore, stress decreased the learning rate. The results suggest that stress diminishes one's ability to make adaptive choices in multiple aspects of reinforcement learning. This finding has implications for understanding how stress facilitates maladaptive habits, such as addictive behavior, and other dysfunctional behaviors associated with stress in clinical and educational contexts.
Dillon, Laura; Collins, Meaghan; Conway, Maura; Cunningham, Kate
2013-01-01
Three experiments examined the implicit learning of sequences under conditions in which the elements comprising a sequence were equated in terms of reinforcement probability. In Experiment 1 cotton-top tamarins (Saguinus oedipus) experienced a five-element sequence displayed serially on a touch screen in which reinforcement probability was equated across elements at .16 per element. Tamarins demonstrated learning of this sequence with higher latencies during a random test as compared to baseline sequence training. In Experiments 2 and 3, manipulations of the procedure used in the first experiment were undertaken to rule out a confound owing to the fact that the elements in Experiment 1 bore different temporal relations to the intertrial interval (ITI), an inhibitory period. The results of Experiments 2 and 3 indicated that the implicit learning observed in Experiment 1 was not due to temporal proximity between some elements and the inhibitory ITI. The results taken together support two conclusion: First that tamarins engaged in sequence learning whether or not there was contingent reinforcement for learning the sequence, and second that this learning was not due to subtle differences in associative strength between the elements of the sequence. PMID:23344718
Improving the Science Excursion: An Educational Technologist's View
ERIC Educational Resources Information Center
Balson, M.
1973-01-01
Analyzes the nature of the learning process and attempts to show how the three components of a reinforcement contingency, the stimulus, the response and the reinforcement can be utilized to increase the efficiency of a typical science learning experience, the excursion. (JR)
Curriculum Development for Transfer Learning in Dynamic Multiagent Settings
2016-06-01
HFO) Half field offense [19] is a subtask of Robocup simulated soccer in which a team of m offensive players try to score a goal against n defensive... players while playing on one half of a soccer field. The domain poses many challenges, including a large, continuous state and action space, coordi...case study . In RoboCup-2006: Robot Soccer World Cup X, volume 4434 of Lecture Notes in Artificial Intelligence, pages 72–85. Springer Verlag, Berlin
Smith, Tim J.; Senju, Atsushi
2017-01-01
While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue–reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue–reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. PMID:28250186
Vernetti, Angélina; Smith, Tim J; Senju, Atsushi
2017-03-15
While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. © 2017 The Authors.
Shephard, E; Jackson, G M; Groom, M J
2014-01-01
This study examined neurocognitive differences between children and adults in the ability to learn and adapt simple stimulus-response associations through feedback. Fourteen typically developing children (mean age=10.2) and 15 healthy adults (mean age=25.5) completed a simple task in which they learned to associate visually presented stimuli with manual responses based on performance feedback (acquisition phase), and then reversed and re-learned those associations following an unexpected change in reinforcement contingencies (reversal phase). Electrophysiological activity was recorded throughout task performance. We found no group differences in learning-related changes in performance (reaction time, accuracy) or in the amplitude of event-related potentials (ERPs) associated with stimulus processing (P3 ERP) or feedback processing (feedback-related negativity; FRN) during the acquisition phase. However, children's performance was significantly more disrupted by the reversal than adults and FRN amplitudes were significantly modulated by the reversal phase in children but not adults. These findings indicate that children have specific difficulties with reinforcement learning when acquired behaviours must be altered. This may be caused by the added demands on immature executive functioning, specifically response monitoring, created by the requirement to reverse the associations, or a developmental difference in the way in which children and adults approach reinforcement learning. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.
Reinforcement Learning with Orthonormal Basis Adaptation Based on Activity-Oriented Index Allocation
NASA Astrophysics Data System (ADS)
Satoh, Hideki
An orthonormal basis adaptation method for function approximation was developed and applied to reinforcement learning with multi-dimensional continuous state space. First, a basis used for linear function approximation of a control function is set to an orthonormal basis. Next, basis elements with small activities are replaced with other candidate elements as learning progresses. As this replacement is repeated, the number of basis elements with large activities increases. Example chaos control problems for multiple logistic maps were solved, demonstrating that the method for adapting an orthonormal basis can modify a basis while holding the orthonormality in accordance with changes in the environment to improve the performance of reinforcement learning and to eliminate the adverse effects of redundant noisy states.
Evaluation of Multi-Age Team (MAT): Implementation at Crabapple Middle School: Report for 1995-1996.
ERIC Educational Resources Information Center
Elmore, Randy; Wisenbaker, Joseph
In fall 1993, administrators and faculty at the Crabapple Middle School in Roswell, Georgia, implemented the Multi-Age Team (MAT) program, creating multiage teams of sixth-, seventh-, and eighth-grade students. The project's main goal was to enhance self-esteem. Additional goals included implementation of interdisciplinary, thematic instruction;…
Evaluation of Multi-Age Team (MAT) Implementation at Crabapple Middle School: Report for 1994-1995.
ERIC Educational Resources Information Center
Elmore, Randy; Wisenbaker, Joseph
In fall 1993, administrators and faculty at the Crabappple Middle School in Roswell, Georgia, implemented the Multi-Age Team (MAT) program, creating multi-age teams of sixth-, seventh-, and eighth-grade students. The projects' main goal was to enhance self-esteem. Additional goals included implementation of interdisciplinary, thematic instruction;…
A Descriptive Study of Multi-Age Art Education in Florida
ERIC Educational Resources Information Center
Broome, Jeffrey L.
2009-01-01
Multi-age classrooms feature the purposeful grouping of students from two or more grade levels in order to form communities of learners. During the past 40 years, multi-age education has been examined in literature and research in many different ways and contexts. In the subject area of visual art, however, little literature can be found that…
The Art Teacher and Multi-Age Homeroom Teachers: Qualitative Observations and Comparisons
ERIC Educational Resources Information Center
Broome, Jeffrey L.
2016-01-01
Multi-age classrooms feature the intentional grouping of students from consecutive grade levels for the purpose of fostering a nurturing classroom atmosphere. While an abundance of research on multi-age education has been produced throughout the past 50 years, only recent efforts have seen researchers turn their attention to the experiences of art…
Flow Navigation by Smart Microswimmers via Reinforcement Learning
NASA Astrophysics Data System (ADS)
Colabrese, Simona; Biferale, Luca; Celani, Antonio; Gustavsson, Kristian
2017-11-01
We have numerically modeled active particles which are able to acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. We show that those swimmers can learn effective strategies just by experience, using a reinforcement learning algorithm. As an example, we focus on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, exploiting the underlying flow whenever possible. The reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This work paves the way towards the engineering of smart microswimmers that solve difficult navigation problems. ERC AdG NewTURB 339032.
A neural model of hierarchical reinforcement learning.
Rasmussen, Daniel; Voelker, Aaron; Eliasmith, Chris
2017-01-01
We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain's general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model's behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions.
Neural correlates of reinforcement learning and social preferences in competitive bidding.
van den Bos, Wouter; Talwar, Arjun; McClure, Samuel M
2013-01-30
In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.
A theoretical framework for negotiating the path of emergency management multi-agency coordination.
Curnin, Steven; Owen, Christine; Paton, Douglas; Brooks, Benjamin
2015-03-01
Multi-agency coordination represents a significant challenge in emergency management. The need for liaison officers working in strategic level emergency operations centres to play organizational boundary spanning roles within multi-agency coordination arrangements that are enacted in complex and dynamic emergency response scenarios creates significant research and practical challenges. The aim of the paper is to address a gap in the literature regarding the concept of multi-agency coordination from a human-environment interaction perspective. We present a theoretical framework for facilitating multi-agency coordination in emergency management that is grounded in human factors and ergonomics using the methodology of core-task analysis. As a result we believe the framework will enable liaison officers to cope more efficiently within the work domain. In addition, we provide suggestions for extending the theory of core-task analysis to an alternate high reliability environment. Copyright © 2014 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Balcarras, Matthew; Ardid, Salva; Kaping, Daniel; Everling, Stefan; Womelsdorf, Thilo
2016-02-01
Attention includes processes that evaluate stimuli relevance, select the most relevant stimulus against less relevant stimuli, and bias choice behavior toward the selected information. It is not clear how these processes interact. Here, we captured these processes in a reinforcement learning framework applied to a feature-based attention task that required macaques to learn and update the value of stimulus features while ignoring nonrelevant sensory features, locations, and action plans. We found that value-based reinforcement learning mechanisms could account for feature-based attentional selection and choice behavior but required a value-independent stickiness selection process to explain selection errors while at asymptotic behavior. By comparing different reinforcement learning schemes, we found that trial-by-trial selections were best predicted by a model that only represents expected values for the task-relevant feature dimension, with nonrelevant stimulus features and action plans having only a marginal influence on covert selections. These findings show that attentional control subprocesses can be described by (1) the reinforcement learning of feature values within a restricted feature space that excludes irrelevant feature dimensions, (2) a stochastic selection process on feature-specific value representations, and (3) value-independent stickiness toward previous feature selections akin to perseveration in the motor domain. We speculate that these three mechanisms are implemented by distinct but interacting brain circuits and that the proposed formal account of feature-based stimulus selection will be important to understand how attentional subprocesses are implemented in primate brain networks.
Extinction of Pavlovian conditioning: The influence of trial number and reinforcement history.
Chan, C K J; Harris, Justin A
2017-08-01
Pavlovian conditioning is sensitive to the temporal relationship between the conditioned stimulus (CS) and the unconditioned stimulus (US). This has motivated models that describe learning as a process that continuously updates associative strength during the trial or specifically encodes the CS-US interval. These models predict that extinction of responding is also continuous, such that response loss is proportional to the cumulative duration of exposure to the CS without the US. We review evidence showing that this prediction is incorrect, and that extinction is trial-based rather than time-based. We also present two experiments that test the importance of trials versus time on the Partial Reinforcement Extinction Effect (PREE), in which responding extinguishes more slowly for a CS that was inconsistently reinforced with the US than for a consistently reinforced one. We show that increasing the number of extinction trials of the partially reinforced CS, relative to the consistently reinforced CS, overcomes the PREE. However, increasing the duration of extinction trials by the same amount does not overcome the PREE. We conclude that animals learn about the likelihood of the US per trial during conditioning, and learn trial-by-trial about the absence of the US during extinction. Moreover, what they learn about the likelihood of the US during conditioning affects how sensitive they are to the absence of the US during extinction. Copyright © 2017 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Dunn-Kenney, Maylan
2010-01-01
Service learning is often used in teacher education as a way to challenge social bias and provide teacher candidates with skills needed to work in partnership with diverse families. Although some literature suggests that service learning could reinforce cultural bias, there is little documentation. In a study of 21 early childhood teacher…
Deep Gate Recurrent Neural Network
2016-11-22
Schmidhuber. A system for robotic heart surgery that learns to tie knots using recurrent neural networks. In IEEE International Conference on...tasks, such as Machine Translation (Bahdanau et al. (2015)) or Robot Reinforcement Learning (Bakker (2001)). The main idea behind these networks is to...and J. Peters. Reinforcement learning in robotics : A survey. The International Journal of Robotics Research, 32:1238–1274, 2013. ISSN 0278-3649. doi
Applying Multiagent Simulation to Planetary Surface Operations
NASA Technical Reports Server (NTRS)
Sierhuis, Maarten; Sims, Michael H.; Clancey, William J.; Lee, Pascal; Swanson, Keith (Technical Monitor)
2000-01-01
This paper describes a multiagent modeling and simulation approach for designing cooperative systems. Issues addressed include the use of multiagent modeling and simulation for the design of human and robotic operations, as a theory for human/robot cooperation on planetary surface missions. We describe a design process for cooperative systems centered around the Brahms modeling and simulation environment being developed at NASA Ames.
The Impact of Multi-Age Instruction on Academic Performance in Mathematics and Reading
ERIC Educational Resources Information Center
Baukol, David
2010-01-01
Teachers and administrators are faced with a basic question when planning for a school year: how should the students be grouped when coming to school? Should students of similar age be together or should students be assigned to multi-age classrooms at the elementary school level? If the multi-age method is chosen, how will academic progress be…
Walker, Brendan M.
2013-01-01
This article represents one of five contributions focusing on the topic “Plasticity and neuroadaptive responses within the extended amygdala in response to chronic or excessive alcohol exposure” that were developed by awardees participating in the Young Investigator Award Symposium at the “Alcoholism and Stress: A Framework for Future Treatment Strategies” conference in Volterra, Italy on May 3–6, 2011 that was organized/chaired by Drs. Antonio Noronha and Fulton Crews and sponsored by the National Institute on Alcohol Abuse and Alcoholism. This review discusses the dependence-induced neuroadaptations in affective systems that provide a basis for negative reinforcement learning and presents evidence demonstrating that escalated alcohol consumption during withdrawal is a learned, plasticity-dependent process. The review concludes by identifying changes within extended amygdala dynorphin/kappa-opioid receptor systems that could serve as the foundation for the occurrence of negative reinforcement processes. While some evidence contained herein may be specific to alcohol dependence-related learning and plasticity, much of the information will be of relevance to any addictive disorder involving negative reinforcement mechanisms. Collectively, the information presented within this review provides a framework to assess the negative reinforcing effects of alcohol in a manner that distinguishes neuroadaptations produced by chronic alcohol exposure from the actual plasticity that is associated with negative reinforcement learning in dependent organisms. PMID:22459874
Reinforcement Learning Strategies for Clinical Trials in Non-small Cell Lung Cancer
Zhao, Yufan; Zeng, Donglin; Socinski, Mark A.; Kosorok, Michael R.
2010-01-01
Summary Typical regimens for advanced metastatic stage IIIB/IV non-small cell lung cancer (NSCLC) consist of multiple lines of treatment. We present an adaptive reinforcement learning approach to discover optimal individualized treatment regimens from a specially designed clinical trial (a “clinical reinforcement trial”) of an experimental treatment for patients with advanced NSCLC who have not been treated previously with systemic therapy. In addition to the complexity of the problem of selecting optimal compounds for first and second-line treatments based on prognostic factors, another primary goal is to determine the optimal time to initiate second-line therapy, either immediately or delayed after induction therapy, yielding the longest overall survival time. A reinforcement learning method called Q-learning is utilized which involves learning an optimal regimen from patient data generated from the clinical reinforcement trial. Approximating the Q-function with time-indexed parameters can be achieved by using a modification of support vector regression which can utilize censored data. Within this framework, a simulation study shows that the procedure can extract optimal regimens for two lines of treatment directly from clinical data without prior knowledge of the treatment effect mechanism. In addition, we demonstrate that the design reliably selects the best initial time for second-line therapy while taking into account the heterogeneity of NSCLC across patients. PMID:21385164
Hierarchical extreme learning machine based reinforcement learning for goal localization
NASA Astrophysics Data System (ADS)
AlDahoul, Nouar; Zaw Htike, Zaw; Akmeliawati, Rini
2017-03-01
The objective of goal localization is to find the location of goals in noisy environments. Simple actions are performed to move the agent towards the goal. The goal detector should be capable of minimizing the error between the predicted locations and the true ones. Few regions need to be processed by the agent to reduce the computational effort and increase the speed of convergence. In this paper, reinforcement learning (RL) method was utilized to find optimal series of actions to localize the goal region. The visual data, a set of images, is high dimensional unstructured data and needs to be represented efficiently to get a robust detector. Different deep Reinforcement models have already been used to localize a goal but most of them take long time to learn the model. This long learning time results from the weights fine tuning stage that is applied iteratively to find an accurate model. Hierarchical Extreme Learning Machine (H-ELM) was used as a fast deep model that doesn’t fine tune the weights. In other words, hidden weights are generated randomly and output weights are calculated analytically. H-ELM algorithm was used in this work to find good features for effective representation. This paper proposes a combination of Hierarchical Extreme learning machine and Reinforcement learning to find an optimal policy directly from visual input. This combination outperforms other methods in terms of accuracy and learning speed. The simulations and results were analysed by using MATLAB.
Depression, Activity, and Evaluation of Reinforcement
ERIC Educational Resources Information Center
Hammen, Constance L.; Glass, David R., Jr.
1975-01-01
This research attempted to find the causal relation between mood and level of reinforcement. An effort was made to learn what mood change might occur if depressed subjects increased their levels of participation in reinforcing activities. (Author/RK)
What Can Reinforcement Learning Teach Us About Non-Equilibrium Quantum Dynamics
NASA Astrophysics Data System (ADS)
Bukov, Marin; Day, Alexandre; Sels, Dries; Weinberg, Phillip; Polkovnikov, Anatoli; Mehta, Pankaj
Equilibrium thermodynamics and statistical physics are the building blocks of modern science and technology. Yet, our understanding of thermodynamic processes away from equilibrium is largely missing. In this talk, I will reveal the potential of what artificial intelligence can teach us about the complex behaviour of non-equilibrium systems. Specifically, I will discuss the problem of finding optimal drive protocols to prepare a desired target state in quantum mechanical systems by applying ideas from Reinforcement Learning [one can think of Reinforcement Learning as the study of how an agent (e.g. a robot) can learn and perfect a given policy through interactions with an environment.]. The driving protocols learnt by our agent suggest that the non-equilibrium world features possibilities easily defying intuition based on equilibrium physics.
Kinesthetic Reinforcement-Is It a Boon to Learning?
ERIC Educational Resources Information Center
Bohrer, Roxilu K.
1970-01-01
Language instruction, particularly in the elementary school, should be reinforced through the use of visual aids and through associated physical activity. Kinesthetic experiences provide an opportunity to make use of non-verbal cues to meaning, enliven classroom activities, and maximize learning for pupils. The author discusses the educational…
Reinforcing Basic Skills Through Social Studies. Grades 4-7.
ERIC Educational Resources Information Center
Lewis, Teresa Marie
Arranged into seven parts, this document provides a variety of games and activities, bulletin board ideas, overhead transparencies, student handouts, and learning station ideas to help reinforce basic social studies skills in the intermediate grades. In part 1, students learn about timelines, first constructing their own life timeline, then a…
Effects of Reinforcement on Peer Imitation in a Small Group Play Context
ERIC Educational Resources Information Center
Barton, Erin E.; Ledford, Jennifer R.
2018-01-01
Children with disabilities often have deficits in imitation skills, particularly in imitating peers. Imitation is considered a behavioral cusp--which, once learned, allows a child to access additional and previously unavailable learning opportunities. In the current study, researchers examined the efficacy of contingent reinforcement delivered…
Neurofeedback in Learning Disabled Children: Visual versus Auditory Reinforcement.
Fernández, Thalía; Bosch-Bayard, Jorge; Harmony, Thalía; Caballero, María I; Díaz-Comas, Lourdes; Galán, Lídice; Ricardo-Garcell, Josefina; Aubert, Eduardo; Otero-Ojeda, Gloria
2016-03-01
Children with learning disabilities (LD) frequently have an EEG characterized by an excess of theta and a deficit of alpha activities. NFB using an auditory stimulus as reinforcer has proven to be a useful tool to treat LD children by positively reinforcing decreases of the theta/alpha ratio. The aim of the present study was to optimize the NFB procedure by comparing the efficacy of visual (with eyes open) versus auditory (with eyes closed) reinforcers. Twenty LD children with an abnormally high theta/alpha ratio were randomly assigned to the Auditory or the Visual group, where a 500 Hz tone or a visual stimulus (a white square), respectively, was used as a positive reinforcer when the value of the theta/alpha ratio was reduced. Both groups had signs consistent with EEG maturation, but only the Auditory Group showed behavioral/cognitive improvements. In conclusion, the auditory reinforcer was more efficacious in reducing the theta/alpha ratio, and it improved the cognitive abilities more than the visual reinforcer.
Promoting a culture of disaster preparedness.
Medina, Angeli
2016-01-01
Disasters from all hazards, ranging from natural disasters, human-induced disasters, effects of climate change to social conflicts can significantly affect the healthcare system and community. This requires a paradigm shift from a reactive approach to a disaster risk management 'all-hazards' approach. Disaster management is a joint effort of the city, state, regional, national, multi-agencies and international organisations that requires effective communication, collaboration and coordination. This paper offers lessons learned and best practices, which, when taken into consideration, can strengthen the phases of disaster risk management.
Harris, Justin A; Kwok, Dorothy W S
2018-01-01
During magazine approach conditioning, rats do not discriminate between a conditional stimulus (CS) that is consistently reinforced with food and a CS that is occasionally (partially) reinforced, as long as the CSs have the same overall reinforcement rate per second. This implies that rats are indifferent to the probability of reinforcement per trial. However, in the same rats, the per-trial reinforcement rate will affect subsequent extinction-responding extinguishes more rapidly for a CS that was consistently reinforced than for a partially reinforced CS. Here, we trained rats with consistently and partially reinforced CSs that were matched for overall reinforcement rate per second. We measured conditioned responding both during and immediately after the CSs. Differences in the per-trial probability of reinforcement did not affect the acquisition of responding during the CS but did affect subsequent extinction of that responding, and also affected the post-CS response rates during conditioning. Indeed, CSs with the same probability of reinforcement per trial evoked the same amount of post-CS responding even when they differed in overall reinforcement rate and thus evoked different amounts of responding during the CS. We conclude that reinforcement rate per second controls rats' acquisition of responding during the CS, but at the same time, rats also learn specifically about the probability of reinforcement per trial. The latter learning affects the rats' expectation of reinforcement as an outcome of the trial, which influences their ability to detect retrospectively that an opportunity for reinforcement was missed, and, in turn, drives extinction. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Explorations in Multi-Age Teaming (MAT): Evaluations of Three Projects in Fulton County, Georgia.
ERIC Educational Resources Information Center
Elmore, Randy; Hopping, Linda; Jenkins-Miller, Minnie; McElroy, Camille; Minafee, Margaret; Wisenbaker, Joseph
Multi-Age Teaming (MAT) programs were implemented at Crabapple and McNair Middle Schools in Fulton County, Georgia, in the fall of 1993, and at Camp Creek Middle School in the fall of 1994. An important goal of these programs was the creation of school families within schools with multi-age teams of sixth-, seventh-, and eighth-grade students. At…
Distributed Market-Based Algorithms for Multi-Agent Planning with Shared Resources
2013-02-01
1 Introduction 1 2 Distributed Market-Based Multi-Agent Planning 5 2.1 Problem Formulation...over the deterministic planner, on the “test set” of scenarios with changing economies. . . 50 xi xii Chapter 1 Introduction Multi-agent planning is...representation of the objective (4.2.1). For example, for the supply chain mangement problem, we assumed a sequence of Bernoulli coin flips, which seems
Research of negotiation in network trade system based on multi-agent
NASA Astrophysics Data System (ADS)
Cai, Jun; Wang, Guozheng; Wu, Haiyan
2009-07-01
A construction and implementation technology of network trade based on multi-agent is described in this paper. First, we researched the technology of multi-agent, then we discussed the consumer's behaviors and the negotiation between purchaser and bargainer which emerges in the traditional business mode and analysed the key technology to implement the network trade system. Finally, we implement the system.
ERIC Educational Resources Information Center
Zrinzo, Michelle; Greer, R. Douglas
2013-01-01
Prior research has demonstrated the establishment of reinforcers for learning and maintenance with young children as a function of social learning where a peer and an adult experimenter were present. The presence of an adult experimenter was eliminated in the present study to test if the effect produced in the prior studies would occur with only…
Structure identification in fuzzy inference using reinforcement learning
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Khedkar, Pratap
1993-01-01
In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.
Bakic, Jasmina; Pourtois, Gilles; Jepma, Marieke; Duprat, Romain; De Raedt, Rudi; Baeken, Chris
2017-01-01
Major depressive disorder (MDD) creates debilitating effects on a wide range of cognitive functions, including reinforcement learning (RL). In this study, we sought to assess whether reward processing as such, or alternatively the complex interplay between motivation and reward might potentially account for the abnormal reward-based learning in MDD. A total of 35 treatment resistant MDD patients and 44 age matched healthy controls (HCs) performed a standard probabilistic learning task. RL was titrated using behavioral, computational modeling and event-related brain potentials (ERPs) data. MDD patients showed comparable learning rate compared to HCs. However, they showed decreased lose-shift responses as well as blunted subjective evaluations of the reinforcers used during the task, relative to HCs. Moreover, MDD patients showed normal internal (at the level of error-related negativity, ERN) but abnormal external (at the level of feedback-related negativity, FRN) reward prediction error (RPE) signals during RL, selectively when additional efforts had to be made to establish learning. Collectively, these results lend support to the assumption that MDD does not impair reward processing per se during RL. Instead, it seems to alter the processing of the emotional value of (external) reinforcers during RL, when additional intrinsic motivational processes have to be engaged. © 2016 Wiley Periodicals, Inc.
ERIC Educational Resources Information Center
Punnett, Audrey F.; Steinhauer, Gene D.
1984-01-01
Four reading disabled children were given eight sessions of ocular motor training with reinforcement and eight sessions without reinforcement. Two reading disabled control Ss were treated similarly but received no ocular motor training. Results demonstrated that reinforcement can improve ocular motor skills, which in turn elevates reading…
Learning the specific quality of taste reinforcement in larval Drosophila.
Schleyer, Michael; Miura, Daisuke; Tanimura, Teiichi; Gerber, Bertram
2015-01-27
The only property of reinforcement insects are commonly thought to learn about is its value. We show that larval Drosophila not only remember the value of reinforcement (How much?), but also its quality (What?). This is demonstrated both within the appetitive domain by using sugar vs amino acid as different reward qualities, and within the aversive domain by using bitter vs high-concentration salt as different qualities of punishment. From the available literature, such nuanced memories for the quality of reinforcement are unexpected and pose a challenge to present models of how insect memory is organized. Given that animals as simple as larval Drosophila, endowed with but 10,000 neurons, operate with both reinforcement value and quality, we suggest that both are fundamental aspects of mnemonic processing-in any brain.
The evolution of continuous learning of the structure of the environment
Kolodny, Oren; Edelman, Shimon; Lotem, Arnon
2014-01-01
Continuous, ‘always on’, learning of structure from a stream of data is studied mainly in the fields of machine learning or language acquisition, but its evolutionary roots may go back to the first organisms that were internally motivated to learn and represent their environment. Here, we study under what conditions such continuous learning (CL) may be more adaptive than simple reinforcement learning and examine how it could have evolved from the same basic associative elements. We use agent-based computer simulations to compare three learning strategies: simple reinforcement learning; reinforcement learning with chaining (RL-chain) and CL that applies the same associative mechanisms used by the other strategies, but also seeks statistical regularities in the relations among all items in the environment, regardless of the initial association with food. We show that a sufficiently structured environment favours the evolution of both RL-chain and CL and that CL outperforms the other strategies when food is relatively rare and the time for learning is limited. This advantage of internally motivated CL stems from its ability to capture statistical patterns in the environment even before they are associated with food, at which point they immediately become useful for planning. PMID:24402920
The partial-reinforcement extinction effect and the contingent-sampling hypothesis.
Hochman, Guy; Erev, Ido
2013-12-01
The partial-reinforcement extinction effect (PREE) implies that learning under partial reinforcements is more robust than learning under full reinforcements. While the advantages of partial reinforcements have been well-documented in laboratory studies, field research has failed to support this prediction. In the present study, we aimed to clarify this pattern. Experiment 1 showed that partial reinforcements increase the tendency to select the promoted option during extinction; however, this effect is much smaller than the negative effect of partial reinforcements on the tendency to select the promoted option during the training phase. Experiment 2 demonstrated that the overall effect of partial reinforcements varies inversely with the attractiveness of the alternative to the promoted behavior: The overall effect is negative when the alternative is relatively attractive, and positive when the alternative is relatively unattractive. These results can be captured with a contingent-sampling model assuming that people select options that provided the best payoff in similar past experiences. The best fit was obtained under the assumption that similarity is defined by the sequence of the last four outcomes.
The effects of aging on the interaction between reinforcement learning and attention.
Radulescu, Angela; Daniel, Reka; Niv, Yael
2016-11-01
Reinforcement learning (RL) in complex environments relies on selective attention to uncover those aspects of the environment that are most predictive of reward. Whereas previous work has focused on age-related changes in RL, it is not known whether older adults learn differently from younger adults when selective attention is required. In 2 experiments, we examined how aging affects the interaction between RL and selective attention. Younger and older adults performed a learning task in which only 1 stimulus dimension was relevant to predicting reward, and within it, 1 "target" feature was the most rewarding. Participants had to discover this target feature through trial and error. In Experiment 1, stimuli varied on 1 or 3 dimensions and participants received hints that revealed the target feature, the relevant dimension, or gave no information. Group-related differences in accuracy and RTs differed systematically as a function of the number of dimensions and the type of hint available. In Experiment 2 we used trial-by-trial computational modeling of the learning process to test for age-related differences in learning strategies. Behavior of both young and older adults was explained well by a reinforcement-learning model that uses selective attention to constrain learning. However, the model suggested that older adults restricted their learning to fewer features, employing more focused attention than younger adults. Furthermore, this difference in strategy predicted age-related deficits in accuracy. We discuss these results suggesting that a narrower filter of attention may reflect an adaptation to the reduced capabilities of the reinforcement learning system. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Tiger salamanders' (Ambystoma tigrinum) response learning and usage of visual cues.
Kundey, Shannon M A; Millar, Roberto; McPherson, Justin; Gonzalez, Maya; Fitz, Aleyna; Allen, Chadbourne
2016-05-01
We explored tiger salamanders' (Ambystoma tigrinum) learning to execute a response within a maze as proximal visual cue conditions varied. In Experiment 1, salamanders learned to turn consistently in a T-maze for reinforcement before the maze was rotated. All learned the initial task and executed the trained turn during test, suggesting that they learned to demonstrate the reinforced response during training and continued to perform it during test. In a second experiment utilizing a similar procedure, two visual cues were placed consistently at the maze junction. Salamanders were reinforced for turning towards one cue. Cue placement was reversed during test. All learned the initial task, but executed the trained turn rather than turning towards the visual cue during test, evidencing response learning. In Experiment 3, we investigated whether a compound visual cue could control salamanders' behaviour when it was the only cue predictive of reinforcement in a cross-maze by varying start position and cue placement. All learned to turn in the direction indicated by the compound visual cue, indicating that visual cues can come to control their behaviour. Following training, testing revealed that salamanders attended to stimuli foreground over background features. Overall, these results suggest that salamanders learn to execute responses over learning to use visual cues but can use visual cues if required. Our success with this paradigm offers the potential in future studies to explore salamanders' cognition further, as well as to shed light on how features of the tiger salamanders' life history (e.g. hibernation and metamorphosis) impact cognition.
Refining Linear Fuzzy Rules by Reinforcement Learning
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Khedkar, Pratap S.; Malkani, Anil
1996-01-01
Linear fuzzy rules are increasingly being used in the development of fuzzy logic systems. Radial basis functions have also been used in the antecedents of the rules for clustering in product space which can automatically generate a set of linear fuzzy rules from an input/output data set. Manual methods are usually used in refining these rules. This paper presents a method for refining the parameters of these rules using reinforcement learning which can be applied in domains where supervised input-output data is not available and reinforcements are received only after a long sequence of actions. This is shown for a generalization of radial basis functions. The formation of fuzzy rules from data and their automatic refinement is an important step in closing the gap between the application of reinforcement learning methods in the domains where only some limited input-output data is available.
A model to capture and manage tacit knowledge using a multiagent system
NASA Astrophysics Data System (ADS)
Paolino, Lilyam; Paggi, Horacio; Alonso, Fernando; López, Genoveva
2014-10-01
This article presents a model to capture and register business tacit knowledge belonging to different sources, using an expert multiagent system which enables the entry of incidences and captures the tacit knowledge which could fix them. This knowledge and their sources are evaluated through the application of trustworthy algorithms that lead to the registration of the data base and the best of each of them. Through its intelligent software agents, this system interacts with the administrator, users, with the knowledge sources and with all the practice communities which might exist in the business world. The sources as well as the knowledge are constantly evaluated, before being registered and also after that, in order to decide the staying or modification of its original weighting. If there is the possibility of better, new knowledge are registered through the old ones. This is also part of an investigation being carried out which refers to knowledge management methodologies in order to manage tacit business knowledge so as to make the business competitiveness easier and leading to innovation learning.
Multiagent Work Practice Simulation: Progress and Challenges
NASA Technical Reports Server (NTRS)
Clancey, William J.; Sierhuis, Maarten; Shaffe, Michael G. (Technical Monitor)
2001-01-01
Modeling and simulating complex human-system interactions requires going beyond formal procedures and information flows to analyze how people interact with each other. Such work practices include conversations, modes of communication, informal assistance, impromptu meetings, workarounds, and so on. To make these social processes visible, we have developed a multiagent simulation tool, called Brahms, for modeling the activities of people belonging to multiple groups, situated in a physical environment (geographic regions, buildings, transport vehicles, etc.) consisting of tools, documents, and a computer system. We are finding many useful applications of Brahms for system requirements analysis, instruction, implementing software agents, and as a workbench for relating cognitive and social theories of human behavior. Many challenges remain for representing work practices, including modeling: memory over multiple days, scheduled activities combining physical objects, groups, and locations on a timeline (such as a Space Shuttle mission), habitat vehicles with trajectories (such as the Shuttle), agent movement in 3D space (e.g., inside the International Space Station), agent posture and line of sight, coupled movements (such as carrying objects), and learning (mimicry, forming habits, detecting repetition, etc.).
Multiagent Work Practice Simulation: Progress and Challenges
NASA Technical Reports Server (NTRS)
Clancey, William J.; Sierhuis, Maarten
2002-01-01
Modeling and simulating complex human-system interactions requires going beyond formal procedures and information flows to analyze how people interact with each other. Such work practices include conversations, modes of communication, informal assistance, impromptu meetings, workarounds, and so on. To make these social processes visible, we have developed a multiagent simulation tool, called Brahms, for modeling the activities of people belonging to multiple groups, situated in a physical environment (geographic regions, buildings, transport vehicles, etc.) consisting of tools, documents, and computer systems. We are finding many useful applications of Brahms for system requirements analysis, instruction, implementing software agents, and as a workbench for relating cognitive and social theories of human behavior. Many challenges remain for representing work practices, including modeling: memory over multiple days, scheduled activities combining physical objects, groups, and locations on a timeline (such as a Space Shuttle mission), habitat vehicles with trajectories (such as the Shuttle), agent movement in 3d space (e.g., inside the International Space Station), agent posture and line of sight, coupled movements (such as carrying objects), and learning (mimicry, forming habits, detecting repetition, etc.).
Atlas, Lauren Y; Doll, Bradley B; Li, Jian; Daw, Nathaniel D; Phelps, Elizabeth A
2016-01-01
Socially-conveyed rules and instructions strongly shape expectations and emotions. Yet most neuroscientific studies of learning consider reinforcement history alone, irrespective of knowledge acquired through other means. We examined fear conditioning and reversal in humans to test whether instructed knowledge modulates the neural mechanisms of feedback-driven learning. One group was informed about contingencies and reversals. A second group learned only from reinforcement. We combined quantitative models with functional magnetic resonance imaging and found that instructions induced dissociations in the neural systems of aversive learning. Responses in striatum and orbitofrontal cortex updated with instructions and correlated with prefrontal responses to instructions. Amygdala responses were influenced by reinforcement similarly in both groups and did not update with instructions. Results extend work on instructed reward learning and reveal novel dissociations that have not been observed with punishments or rewards. Findings support theories of specialized threat-detection and may have implications for fear maintenance in anxiety. DOI: http://dx.doi.org/10.7554/eLife.15192.001 PMID:27171199
Flow Navigation by Smart Microswimmers via Reinforcement Learning
NASA Astrophysics Data System (ADS)
Colabrese, Simona; Gustavsson, Kristian; Celani, Antonio; Biferale, Luca
2017-04-01
Smart active particles can acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. Their goal is to learn the best way to navigate by exploiting the underlying flow whenever possible. As an example, we focus our attention on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, given the constraints enforced by fluid mechanics. By means of numerical experiments, we show that swimmers indeed learn nearly optimal strategies just by experience. A reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This Letter illustrates the potential of reinforcement learning algorithms to model adaptive behavior in complex flows and paves the way towards the engineering of smart microswimmers that solve difficult navigation problems.
Learning and tuning fuzzy logic controllers through reinforcements
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Khedkar, Pratap
1992-01-01
A new method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system is presented. In particular, our Generalized Approximate Reasoning-based Intelligent Control (GARIC) architecture: (1) learns and tunes a fuzzy logic controller even when only weak reinforcements, such as a binary failure signal, is available; (2) introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; (3) introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and (4) learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward network, which can then adaptively improve performance by using gradient descent methods. We extend the AHC algorithm of Barto, Sutton, and Anderson to include the prior control knowledge of human operators. The GARIC architecture is applied to a cart-pole balancing system and has demonstrated significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.
Predictive Control of Networked Multiagent Systems via Cloud Computing.
Liu, Guo-Ping
2017-01-18
This paper studies the design and analysis of networked multiagent predictive control systems via cloud computing. A cloud predictive control scheme for networked multiagent systems (NMASs) is proposed to achieve consensus and stability simultaneously and to compensate for network delays actively. The design of the cloud predictive controller for NMASs is detailed. The analysis of the cloud predictive control scheme gives the necessary and sufficient conditions of stability and consensus of closed-loop networked multiagent control systems. The proposed scheme is verified to characterize the dynamical behavior and control performance of NMASs through simulations. The outcome provides a foundation for the development of cooperative and coordinative control of NMASs and its applications.
Multi-agent cooperation rescue algorithm based on influence degree and state prediction
NASA Astrophysics Data System (ADS)
Zheng, Yanbin; Ma, Guangfu; Wang, Linlin; Xi, Pengxue
2018-04-01
Aiming at the multi-agent cooperative rescue in disaster, a multi-agent cooperative rescue algorithm based on impact degree and state prediction is proposed. Firstly, based on the influence of the information in the scene on the collaborative task, the influence degree function is used to filter the information. Secondly, using the selected information to predict the state of the system and Agent behavior. Finally, according to the result of the forecast, the cooperative behavior of Agent is guided and improved the efficiency of individual collaboration. The simulation results show that this algorithm can effectively solve the cooperative rescue problem of multi-agent and ensure the efficient completion of the task.
A Quantum Approach to Multi-Agent Systems (MAS), Organizations, and Control
2003-06-01
interdependent interactions between individuals represented approximately as vocal harmonic I resonators. Then the growth rate of an organization fits ...A quantum approach to multi-agent systems (MAS), organizations , and control W.F. Lawless Paine College 1235 15th Street Augusta, GA 30901...AND SUBTITLE A quantum approach to multi-agent systems (MAS), organizations , and control 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT
ERIC Educational Resources Information Center
Hammerer, Dorothea; Li, Shu-Chen; Muller, Viktor; Lindenberger, Ulman
2011-01-01
By recording the feedback-related negativity (FRN) in response to gains and losses, we investigated the contribution of outcome monitoring mechanisms to age-associated differences in probabilistic reinforcement learning. Specifically, we assessed the difference of the monitoring reactions to gains and losses to investigate the monitoring of…
Reinforcement Learning in Young Adults with Developmental Language Impairment
ERIC Educational Resources Information Center
Lee, Joanna C.; Tomblin, J. Bruce
2012-01-01
The aim of the study was to examine reinforcement learning (RL) in young adults with developmental language impairment (DLI) within the context of a neurocomputational model of the basal ganglia-dopamine system (Frank, Seeberger, & O'Reilly, 2004). Two groups of young adults, one with DLI and the other without, were recruited. A probabilistic…
Effective Reinforcement Techniques in Elementary Physical Education: The Key to Behavior Management
ERIC Educational Resources Information Center
Downing, John; Keating, Tedd; Bennett, Carl
2005-01-01
The ability to shape appropriate behavior while extinguishing misbehavior is critical to teaching and learning in physical education. The scientific principles that affect student learning in the gymnasium also apply to the methods teachers use to influence social behaviors. Research indicates that reinforcement strategies are more effective than…
A neural model of hierarchical reinforcement learning
Rasmussen, Daniel; Eliasmith, Chris
2017-01-01
We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain’s general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model’s behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions. PMID:28683111
Zhu, Ruoqing; Zeng, Donglin; Kosorok, Michael R.
2015-01-01
In this paper, we introduce a new type of tree-based method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001) under high-dimensional settings. The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach enables linear combination cuts at little extra computational cost. Second, we propose a variable muting procedure that progressively eliminates noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards terminal nodes, where the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method under basic assumptions and discuss rationale in general settings. PMID:26903687
Reinforcement learning state estimator.
Morimoto, Jun; Doya, Kenji
2007-03-01
In this study, we propose a novel use of reinforcement learning for estimating hidden variables and parameters of nonlinear dynamical systems. A critical issue in hidden-state estimation is that we cannot directly observe estimation errors. However, by defining errors of observable variables as a delayed penalty, we can apply a reinforcement learning frame-work to state estimation problems. Specifically, we derive a method to construct a nonlinear state estimator by finding an appropriate feedback input gain using the policy gradient method. We tested the proposed method on single pendulum dynamics and show that the joint angle variable could be successfully estimated by observing only the angular velocity, and vice versa. In addition, we show that we could acquire a state estimator for the pendulum swing-up task in which a swing-up controller is also acquired by reinforcement learning simultaneously. Furthermore, we demonstrate that it is possible to estimate the dynamics of the pendulum itself while the hidden variables are estimated in the pendulum swing-up task. Application of the proposed method to a two-linked biped model is also presented.
Reciprocity Family Counseling: A Multi-Ethnic Model.
ERIC Educational Resources Information Center
Penrose, David M.
The Reciprocity Family Counseling Method involves learning principles of behavior modification including selective reinforcement, behavioral contracting, self-correction, and over-correction. Selective reinforcement refers to the recognition and modification of parent/child responses and reinforcers. Parents and children are asked to identify…
Reinforcement learning: Solving two case studies
NASA Astrophysics Data System (ADS)
Duarte, Ana Filipa; Silva, Pedro; dos Santos, Cristina Peixoto
2012-09-01
Reinforcement Learning algorithms offer interesting features for the control of autonomous systems, such as the ability to learn from direct interaction with the environment, and the use of a simple reward signalas opposed to the input-outputs pairsused in classic supervised learning. The reward signal indicates the success of failure of the actions executed by the agent in the environment. In this work, are described RL algorithmsapplied to two case studies: the Crawler robot and the widely known inverted pendulum. We explore RL capabilities to autonomously learn a basic locomotion pattern in the Crawler, andapproach the balancing problem of biped locomotion using the inverted pendulum.
Reinforcement active learning in the vibrissae system: optimal object localization.
Gordon, Goren; Dorfman, Nimrod; Ahissar, Ehud
2013-01-01
Rats move their whiskers to acquire information about their environment. It has been observed that they palpate novel objects and objects they are required to localize in space. We analyze whisker-based object localization using two complementary paradigms, namely, active learning and intrinsic-reward reinforcement learning. Active learning algorithms select the next training samples according to the hypothesized solution in order to better discriminate between correct and incorrect labels. Intrinsic-reward reinforcement learning uses prediction errors as the reward to an actor-critic design, such that behavior converges to the one that optimizes the learning process. We show that in the context of object localization, the two paradigms result in palpation whisking as their respective optimal solution. These results suggest that rats may employ principles of active learning and/or intrinsic reward in tactile exploration and can guide future research to seek the underlying neuronal mechanisms that implement them. Furthermore, these paradigms are easily transferable to biomimetic whisker-based artificial sensors and can improve the active exploration of their environment. Copyright © 2012 Elsevier Ltd. All rights reserved.
An intelligent agent for optimal river-reservoir system management
NASA Astrophysics Data System (ADS)
Rieker, Jeffrey D.; Labadie, John W.
2012-09-01
A generalized software package is presented for developing an intelligent agent for stochastic optimization of complex river-reservoir system management and operations. Reinforcement learning is an approach to artificial intelligence for developing a decision-making agent that learns the best operational policies without the need for explicit probabilistic models of hydrologic system behavior. The agent learns these strategies experientially in a Markov decision process through observational interaction with the environment and simulation of the river-reservoir system using well-calibrated models. The graphical user interface for the reinforcement learning process controller includes numerous learning method options and dynamic displays for visualizing the adaptive behavior of the agent. As a case study, the generalized reinforcement learning software is applied to developing an intelligent agent for optimal management of water stored in the Truckee river-reservoir system of California and Nevada for the purpose of streamflow augmentation for water quality enhancement. The intelligent agent successfully learns long-term reservoir operational policies that specifically focus on mitigating water temperature extremes during persistent drought periods that jeopardize the survival of threatened and endangered fish species.
Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning
Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien
2015-01-01
Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction. PMID:26065018
Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning.
Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien
2015-01-01
Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction.
A comparison of differential reinforcement procedures with children with autism.
Boudreau, Brittany A; Vladescu, Jason C; Kodak, Tiffany M; Argott, Paul J; Kisamore, April N
2015-12-01
The current evaluation compared the effects of 2 differential reinforcement arrangements and a nondifferential reinforcement arrangement on the acquisition of tacts for 3 children with autism. Participants learned in all reinforcement-based conditions, and we discuss areas for future research in light of these findings and potential limitations. © Society for the Experimental Analysis of Behavior.
Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis
2013-01-01
Background Depression is characterised partly by blunted reactions to reward. However, tasks probing this deficiency have not distinguished insensitivity to reward from insensitivity to the prediction errors for reward that determine learning and are putatively reported by the phasic activity of dopamine neurons. We attempted to disentangle these factors with respect to anhedonia in the context of stress, Major Depressive Disorder (MDD), Bipolar Disorder (BPD) and a dopaminergic challenge. Methods Six behavioural datasets involving 392 experimental sessions were subjected to a model-based, Bayesian meta-analysis. Participants across all six studies performed a probabilistic reward task that used an asymmetric reinforcement schedule to assess reward learning. Healthy controls were tested under baseline conditions, stress or after receiving the dopamine D2 agonist pramipexole. In addition, participants with current or past MDD or BPD were evaluated. Reinforcement learning models isolated the contributions of variation in reward sensitivity and learning rate. Results MDD and anhedonia reduced reward sensitivity more than they affected the learning rate, while a low dose of the dopamine D2 agonist pramipexole showed the opposite pattern. Stress led to a pattern consistent with a mixed effect on reward sensitivity and learning rate. Conclusion Reward-related learning reflected at least two partially separable contributions. The first related to phasic prediction error signalling, and was preferentially modulated by a low dose of the dopamine agonist pramipexole. The second related directly to reward sensitivity, and was preferentially reduced in MDD and anhedonia. Stress altered both components. Collectively, these findings highlight the contribution of model-based reinforcement learning meta-analysis for dissecting anhedonic behavior. PMID:23782813
Schönberg, Tom; Daw, Nathaniel D; Joel, Daphna; O'Doherty, John P
2007-11-21
The computational framework of reinforcement learning has been used to forward our understanding of the neural mechanisms underlying reward learning and decision-making behavior. It is known that humans vary widely in their performance in decision-making tasks. Here, we used a simple four-armed bandit task in which subjects are almost evenly split into two groups on the basis of their performance: those who do learn to favor choice of the optimal action and those who do not. Using models of reinforcement learning we sought to determine the neural basis of these intrinsic differences in performance by scanning both groups with functional magnetic resonance imaging. We scanned 29 subjects while they performed the reward-based decision-making task. Our results suggest that these two groups differ markedly in the degree to which reinforcement learning signals in the striatum are engaged during task performance. While the learners showed robust prediction error signals in both the ventral and dorsal striatum during learning, the nonlearner group showed a marked absence of such signals. Moreover, the magnitude of prediction error signals in a region of dorsal striatum correlated significantly with a measure of behavioral performance across all subjects. These findings support a crucial role of prediction error signals, likely originating from dopaminergic midbrain neurons, in enabling learning of action selection preferences on the basis of obtained rewards. Thus, spontaneously observed individual differences in decision making performance demonstrate the suggested dependence of this type of learning on the functional integrity of the dopaminergic striatal system in humans.
Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.
Ezaki, Takahiro; Horita, Yutaka; Takezawa, Masanori; Masuda, Naoki
2016-07-01
Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody) conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.
The Potential for Double-Loop Learning to Enable Landscape Conservation Efforts
NASA Astrophysics Data System (ADS)
Petersen, Brian; Montambault, Jensen; Koopman, Marni
2014-10-01
As conservation increases its emphasis on implementing change at landscape-level scales, multi-agency, cross-boundary, and multi-stakeholder networks become more important. These elements complicate traditional notions of learning. To investigate this further, we examined structures of learning in the Landscape Conservation Cooperatives (LCCs), which include the entire US and its territories, as well as parts of Canada, Mexico, and Caribbean and Pacific island states. We used semi-structured interviews, transcribed and analyzed using NVivo, as well as a charrette-style workshop to understand the difference between the original stated goals of individual LCCs and the values and purposes expressed as the collaboration matured. We suggest double-loop learning as a theoretical framework appropriate to landscape-scale conservation, recognizing that concerns about accountability are among the valid points of view that must be considered in multi-stakeholder collaborations. Methods from the social sciences and public health sectors provide insights on how such learning might be actualized.
Social stress reactivity alters reward and punishment learning
Frank, Michael J.; Allen, John J. B.
2011-01-01
To examine how stress affects cognitive functioning, individual differences in trait vulnerability (punishment sensitivity) and state reactivity (negative affect) to social evaluative threat were examined during concurrent reinforcement learning. Lower trait-level punishment sensitivity predicted better reward learning and poorer punishment learning; the opposite pattern was found in more punishment sensitive individuals. Increasing state-level negative affect was directly related to punishment learning accuracy in highly punishment sensitive individuals, but these measures were inversely related in less sensitive individuals. Combined electrophysiological measurement, performance accuracy and computational estimations of learning parameters suggest that trait and state vulnerability to stress alter cortico-striatal functioning during reinforcement learning, possibly mediated via medio-frontal cortical systems. PMID:20453038
Social stress reactivity alters reward and punishment learning.
Cavanagh, James F; Frank, Michael J; Allen, John J B
2011-06-01
To examine how stress affects cognitive functioning, individual differences in trait vulnerability (punishment sensitivity) and state reactivity (negative affect) to social evaluative threat were examined during concurrent reinforcement learning. Lower trait-level punishment sensitivity predicted better reward learning and poorer punishment learning; the opposite pattern was found in more punishment sensitive individuals. Increasing state-level negative affect was directly related to punishment learning accuracy in highly punishment sensitive individuals, but these measures were inversely related in less sensitive individuals. Combined electrophysiological measurement, performance accuracy and computational estimations of learning parameters suggest that trait and state vulnerability to stress alter cortico-striatal functioning during reinforcement learning, possibly mediated via medio-frontal cortical systems.
Strauss, Gregory P; Thaler, Nicholas S; Matveeva, Tatyana M; Vogel, Sally J; Sutton, Griffin P; Lee, Bern G; Allen, Daniel N
2015-08-01
There is increasing evidence that schizophrenia (SZ) and bipolar disorder (BD) share a number of cognitive, neurobiological, and genetic markers. Shared features may be most prevalent among SZ and BD with a history of psychosis. This study extended this literature by examining reinforcement learning (RL) performance in individuals with SZ (n = 29), BD with a history of psychosis (BD+; n = 24), BD without a history of psychosis (BD-; n = 23), and healthy controls (HC; n = 24). RL was assessed through a probabilistic stimulus selection task with acquisition and test phases. Computational modeling evaluated competing accounts of the data. Each participant's trial-by-trial decision-making behavior was fit to 3 computational models of RL: (a) a standard actor-critic model simulating pure basal ganglia-dependent learning, (b) a pure Q-learning model simulating action selection as a function of learned expected reward value, and (c) a hybrid model where an actor-critic is "augmented" by a Q-learning component, meant to capture the top-down influence of orbitofrontal cortex value representations on the striatum. The SZ group demonstrated greater reinforcement learning impairments at acquisition and test phases than the BD+, BD-, and HC groups. The BD+ and BD- groups displayed comparable performance at acquisition and test phases. Collapsing across diagnostic categories, greater severity of current psychosis was associated with poorer acquisition of the most rewarding stimuli as well as poor go/no-go learning at test. Model fits revealed that reinforcement learning in SZ was best characterized by a pure actor-critic model where learning is driven by prediction error signaling alone. In contrast, BD-, BD+, and HC were best fit by a hybrid model where prediction errors are influenced by top-down expected value representations that guide decision making. These findings suggest that abnormalities in the reward system are more prominent in SZ than BD; however, current psychotic symptoms may be associated with reinforcement learning deficits regardless of a Diagnostic and Statistical Manual of Mental Disorders (5th Edition; American Psychiatric Association, 2013) diagnosis. (c) 2015 APA, all rights reserved).
Motor Learning Enhances Use-Dependent Plasticity
2017-01-01
Motor behaviors are shaped not only by current sensory signals but also by the history of recent experiences. For instance, repeated movements toward a particular target bias the subsequent movements toward that target direction. This process, called use-dependent plasticity (UDP), is considered a basic and goal-independent way of forming motor memories. Most studies consider movement history as the critical component that leads to UDP (Classen et al., 1998; Verstynen and Sabes, 2011). However, the effects of learning (i.e., improved performance) on UDP during movement repetition have not been investigated. Here, we used transcranial magnetic stimulation in two experiments to assess plasticity changes occurring in the primary motor cortex after individuals repeated reinforced and nonreinforced actions. The first experiment assessed whether learning a skill task modulates UDP. We found that a group that successfully learned the skill task showed greater UDP than a group that did not accumulate learning, but made comparable repeated actions. The second experiment aimed to understand the role of reinforcement learning in UDP while controlling for reward magnitude and action kinematics. We found that providing subjects with a binary reward without visual feedback of the cursor led to increased UDP effects. Subjects in the group that received comparable reward not associated with their actions maintained the previously induced UDP. Our findings illustrate how reinforcing consistent actions strengthens use-dependent memories and provide insight into operant mechanisms that modulate plastic changes in the motor cortex. SIGNIFICANCE STATEMENT Performing consistent motor actions induces use-dependent plastic changes in the motor cortex. This plasticity reflects one of the basic forms of human motor learning. Past studies assumed that this form of learning is exclusively affected by repetition of actions. However, here we showed that success-based reinforcement signals could affect the human use-dependent plasticity (UDP) process. Our results indicate that learning augments and interacts with UDP. This effect is important to the understanding of the interplay between the different forms of motor learning and suggests that reinforcement is not only important to learning new behaviors, but can shape our subsequent behavior via its interaction with UDP. PMID:28143961
Salvador, Alexandre; Worbe, Yulia; Delorme, Cécile; Coricelli, Giorgio; Gaillard, Raphaël; Robbins, Trevor W; Hartmann, Andreas; Palminteri, Stefano
2017-07-24
The dopamine partial agonist aripiprazole is increasingly used to treat pathologies for which other antipsychotics are indicated because it displays fewer side effects, such as sedation and depression-like symptoms, than other dopamine receptor antagonists. Previously, we showed that aripiprazole may protect motivational function by preserving reinforcement-related signals used to sustain reward-maximization. However, the effect of aripiprazole on more cognitive facets of human reinforcement learning, such as learning from the forgone outcomes of alternative courses of action (i.e., counterfactual learning), is unknown. To test the influence of aripiprazole on counterfactual learning, we administered a reinforcement learning task that involves both direct learning from obtained outcomes and indirect learning from forgone outcomes to two groups of Gilles de la Tourette (GTS) patients, one consisting of patients who were completely unmedicated and the other consisting of patients who were receiving aripiprazole monotherapy, and to healthy subjects. We found that whereas learning performance improved in the presence of counterfactual feedback in both healthy controls and unmedicated GTS patients, this was not the case in aripiprazole-medicated GTS patients. Our results suggest that whereas aripiprazole preserves direct learning of action-outcome associations, it may impair more complex inferential processes, such as counterfactual learning from forgone outcomes, in GTS patients treated with this medication.
Somatosensory Contribution to the Initial Stages of Human Motor Learning
Bernardi, Nicolò F.; Darainy, Mohammad
2015-01-01
The early stages of motor skill acquisition are often marked by uncertainty about the sensory and motor goals of the task, as is the case in learning to speak or learning the feel of a good tennis serve. Here we present an experimental model of this early learning process, in which targets are acquired by exploration and reinforcement rather than sensory error. We use this model to investigate the relative contribution of motor and sensory factors to human motor learning. Participants make active reaching movements or matched passive movements to an unseen target using a robot arm. We find that learning through passive movements paired with reinforcement is comparable with learning associated with active movement, both in terms of magnitude and durability, with improvements due to training still observable at a 1 week retest. Motor learning is also accompanied by changes in somatosensory perceptual acuity. No stable changes in motor performance are observed for participants that train, actively or passively, in the absence of reinforcement, or for participants who are given explicit information about target position in the absence of somatosensory experience. These findings indicate that the somatosensory system dominates learning in the early stages of motor skill acquisition. SIGNIFICANCE STATEMENT The research focuses on the initial stages of human motor learning, introducing a new experimental model that closely approximates the key features of motor learning outside of the laboratory. The finding indicates that it is the somatosensory system rather than the motor system that dominates learning in the early stages of motor skill acquisition. This is important given that most of our computational models of motor learning are based on the idea that learning is motoric in origin. This is also a valuable finding for rehabilitation of patients with limited mobility as it shows that reinforcement in conjunction with passive movement results in benefits to motor learning that are as great as those observed for active movement training. PMID:26490869
Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning.
Sidarta, Ananda; Vahdat, Shahabeddin; Bernardi, Nicolò F; Ostry, David J
2016-11-16
As one learns to dance or play tennis, the desired somatosensory state is typically unknown. Trial and error is important as motor behavior is shaped by successful and unsuccessful movements. As an experimental model, we designed a task in which human participants make reaching movements to a hidden target and receive positive reinforcement when successful. We identified somatic and reinforcement-based sources of plasticity on the basis of changes in functional connectivity using resting-state fMRI before and after learning. The neuroimaging data revealed reinforcement-related changes in both motor and somatosensory brain areas in which a strengthening of connectivity was related to the amount of positive reinforcement during learning. Areas of prefrontal cortex were similarly altered in relation to reinforcement, with connectivity between sensorimotor areas of putamen and the reward-related ventromedial prefrontal cortex strengthened in relation to the amount of successful feedback received. In other analyses, we assessed connectivity related to changes in movement direction between trials, a type of variability that presumably reflects exploratory strategies during learning. We found that connectivity in a network linking motor and somatosensory cortices increased with trial-to-trial changes in direction. Connectivity varied as well with the change in movement direction following incorrect movements. Here the changes were observed in a somatic memory and decision making network involving ventrolateral prefrontal cortex and second somatosensory cortex. Our results point to the idea that the initial stages of motor learning are not wholly motor but rather involve plasticity in somatic and prefrontal networks related both to reward and exploration. In the initial stages of motor learning, the placement of the limbs is learned primarily through trial and error. In an experimental analog, participants make reaching movements to a hidden target and receive positive feedback when successful. We identified sources of plasticity based on changes in functional connectivity using resting-state fMRI. The main finding is that there is a strengthening of connectivity between reward-related prefrontal areas and sensorimotor areas in the basal ganglia and frontal cortex. There is also a strengthening of connectivity related to movement exploration in sensorimotor circuits involved in somatic memory and decision making. The results indicate that initial stages of motor learning depend on plasticity in somatic and prefrontal networks related to reward and exploration. Copyright © 2016 the authors 0270-6474/16/3611682-11$15.00/0.
Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning
Sidarta, Ananda; Vahdat, Shahabeddin; Bernardi, Nicolò F.
2016-01-01
As one learns to dance or play tennis, the desired somatosensory state is typically unknown. Trial and error is important as motor behavior is shaped by successful and unsuccessful movements. As an experimental model, we designed a task in which human participants make reaching movements to a hidden target and receive positive reinforcement when successful. We identified somatic and reinforcement-based sources of plasticity on the basis of changes in functional connectivity using resting-state fMRI before and after learning. The neuroimaging data revealed reinforcement-related changes in both motor and somatosensory brain areas in which a strengthening of connectivity was related to the amount of positive reinforcement during learning. Areas of prefrontal cortex were similarly altered in relation to reinforcement, with connectivity between sensorimotor areas of putamen and the reward-related ventromedial prefrontal cortex strengthened in relation to the amount of successful feedback received. In other analyses, we assessed connectivity related to changes in movement direction between trials, a type of variability that presumably reflects exploratory strategies during learning. We found that connectivity in a network linking motor and somatosensory cortices increased with trial-to-trial changes in direction. Connectivity varied as well with the change in movement direction following incorrect movements. Here the changes were observed in a somatic memory and decision making network involving ventrolateral prefrontal cortex and second somatosensory cortex. Our results point to the idea that the initial stages of motor learning are not wholly motor but rather involve plasticity in somatic and prefrontal networks related both to reward and exploration. SIGNIFICANCE STATEMENT In the initial stages of motor learning, the placement of the limbs is learned primarily through trial and error. In an experimental analog, participants make reaching movements to a hidden target and receive positive feedback when successful. We identified sources of plasticity based on changes in functional connectivity using resting-state fMRI. The main finding is that there is a strengthening of connectivity between reward-related prefrontal areas and sensorimotor areas in the basal ganglia and frontal cortex. There is also a strengthening of connectivity related to movement exploration in sensorimotor circuits involved in somatic memory and decision making. The results indicate that initial stages of motor learning depend on plasticity in somatic and prefrontal networks related to reward and exploration. PMID:27852776
What is the optimal task difficulty for reinforcement learning of brain self-regulation?
Bauer, Robert; Vukelić, Mathias; Gharabaghi, Alireza
2016-09-01
The balance between action and reward during neurofeedback may influence reinforcement learning of brain self-regulation. Eleven healthy volunteers participated in three runs of motor imagery-based brain-machine interface feedback where a robot passively opened the hand contingent to β-band modulation. For each run, the β-desynchronization threshold to initiate the hand robot movement increased in difficulty (low, moderate, and demanding). In this context, the incentive to learn was estimated by the change of reward per action, operationalized as the change in reward duration per movement onset. Variance analysis revealed a significant interaction between threshold difficulty and the relationship between reward duration and number of movement onsets (p<0.001), indicating a negative learning incentive for low difficulty, but a positive learning incentive for moderate and demanding runs. Exploration of different thresholds in the same data set indicated that the learning incentive peaked at higher thresholds than the threshold which resulted in maximum classification accuracy. Specificity is more important than sensitivity of neurofeedback for reinforcement learning of brain self-regulation. Learning efficiency requires adequate challenge by neurofeedback interventions. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
The Computational Development of Reinforcement Learning during Adolescence
Palminteri, Stefano; Coricelli, Giorgio; Blakemore, Sarah-Jayne
2016-01-01
Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents’ behaviour was better explained by a basic reinforcement learning algorithm, adults’ behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence. PMID:27322574
Robust reinforcement learning.
Morimoto, Jun; Doya, Kenji
2005-02-01
This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both offline learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H(infinity) control, we consider a differential game in which a "disturbing" agent tries to make the worst possible disturbance while a "control" agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H(infinity) control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.
Kerr, Robert R.; Grayden, David B.; Thomas, Doreen A.; Gilson, Matthieu; Burkitt, Anthony N.
2014-01-01
A fundamental goal of neuroscience is to understand how cognitive processes, such as operant conditioning, are performed by the brain. Typical and well studied examples of operant conditioning, in which the firing rates of individual cortical neurons in monkeys are increased using rewards, provide an opportunity for insight into this. Studies of reward-modulated spike-timing-dependent plasticity (RSTDP), and of other models such as R-max, have reproduced this learning behavior, but they have assumed that no unsupervised learning is present (i.e., no learning occurs without, or independent of, rewards). We show that these models cannot elicit firing rate reinforcement while exhibiting both reward learning and ongoing, stable unsupervised learning. To fix this issue, we propose a new RSTDP model of synaptic plasticity based upon the observed effects that dopamine has on long-term potentiation and depression (LTP and LTD). We show, both analytically and through simulations, that our new model can exhibit unsupervised learning and lead to firing rate reinforcement. This requires that the strengthening of LTP by the reward signal is greater than the strengthening of LTD and that the reinforced neuron exhibits irregular firing. We show the robustness of our findings to spike-timing correlations, to the synaptic weight dependence that is assumed, and to changes in the mean reward. We also consider our model in the differential reinforcement of two nearby neurons. Our model aligns more strongly with experimental studies than previous models and makes testable predictions for future experiments. PMID:24475240
Fee, Michale S.
2012-01-01
In its simplest formulation, reinforcement learning is based on the idea that if an action taken in a particular context is followed by a favorable outcome, then, in the same context, the tendency to produce that action should be strengthened, or reinforced. While reinforcement learning forms the basis of many current theories of basal ganglia (BG) function, these models do not incorporate distinct computational roles for signals that convey context, and those that convey what action an animal takes. Recent experiments in the songbird suggest that vocal-related BG circuitry receives two functionally distinct excitatory inputs. One input is from a cortical region that carries context information about the current “time” in the motor sequence. The other is an efference copy of motor commands from a separate cortical brain region that generates vocal variability during learning. Based on these findings, I propose here a general model of vertebrate BG function that combines context information with a distinct motor efference copy signal. The signals are integrated by a learning rule in which efference copy inputs gate the potentiation of context inputs (but not efference copy inputs) onto medium spiny neurons in response to a rewarded action. The hypothesis is described in terms of a circuit that implements the learning of visually guided saccades. The model makes testable predictions about the anatomical and functional properties of hypothesized context and efference copy inputs to the striatum from both thalamic and cortical sources. PMID:22754501
Fee, Michale S
2012-01-01
In its simplest formulation, reinforcement learning is based on the idea that if an action taken in a particular context is followed by a favorable outcome, then, in the same context, the tendency to produce that action should be strengthened, or reinforced. While reinforcement learning forms the basis of many current theories of basal ganglia (BG) function, these models do not incorporate distinct computational roles for signals that convey context, and those that convey what action an animal takes. Recent experiments in the songbird suggest that vocal-related BG circuitry receives two functionally distinct excitatory inputs. One input is from a cortical region that carries context information about the current "time" in the motor sequence. The other is an efference copy of motor commands from a separate cortical brain region that generates vocal variability during learning. Based on these findings, I propose here a general model of vertebrate BG function that combines context information with a distinct motor efference copy signal. The signals are integrated by a learning rule in which efference copy inputs gate the potentiation of context inputs (but not efference copy inputs) onto medium spiny neurons in response to a rewarded action. The hypothesis is described in terms of a circuit that implements the learning of visually guided saccades. The model makes testable predictions about the anatomical and functional properties of hypothesized context and efference copy inputs to the striatum from both thalamic and cortical sources.
Dere, Ekrem; De Souza-Silva, Maria A; Topic, Bianca; Spieler, Richard E; Haas, Helmut L; Huston, Joseph P
2003-01-01
The brain's histaminergic system has been implicated in hippocampal synaptic plasticity, learning, and memory, as well as brain reward and reinforcement. Our past pharmacological and lesion studies indicated that the brain's histamine system exerts inhibitory effects on the brain's reinforcement respective reward system reciprocal to mesolimbic dopamine systems, thereby modulating learning and memory performance. Given the close functional relationship between brain reinforcement and memory processes, the total disruption of brain histamine synthesis via genetic disruption of its synthesizing enzyme, histidine decarboxylase (HDC), in the mouse might have differential effects on learning dependent on the task-inherent reinforcement contingencies. Here, we investigated the effects of an HDC gene disruption in the mouse in a nonreinforced object exploration task and a negatively reinforced water-maze task as well as on neo- and ventro-striatal dopamine systems known to be involved in brain reward and reinforcement. Histidine decarboxylase knockout (HDC-KO) mice had higher dihydrophenylacetic acid concentrations and a higher dihydrophenylacetic acid/dopamine ratio in the neostriatum. In the ventral striatum, dihydrophenylacetic acid/dopamine and 3-methoxytyramine/dopamine ratios were higher in HDC-KO mice. Furthermore, the HDC-KO mice showed improved water-maze performance during both hidden and cued platform tasks, but deficient object discrimination based on temporal relationships. Our data imply that disruption of brain histamine synthesis can have both memory promoting and suppressive effects via distinct and independent mechanisms and further indicate that these opposed effects are related to the task-inherent reinforcement contingencies.
ERIC Educational Resources Information Center
Kahnt, Thorsten; Park, Soyoung Q.; Cohen, Michael X.; Beck, Anne; Heinz, Andreas; Wrase, Jana
2009-01-01
It has been suggested that the target areas of dopaminergic midbrain neurons, the dorsal (DS) and ventral striatum (VS), are differently involved in reinforcement learning especially as actor and critic. Whereas the critic learns to predict rewards, the actor maintains action values to guide future decisions. The different midbrain connections to…
Autonomous Inter-Task Transfer in Reinforcement Learning Domains
2008-08-01
Twentieth International Joint Conference on Artificial Intelli - gence, 2007. 304 Fumihide Tanaka and Masayuki Yamamura. Multitask reinforcement learning...Functions . . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . 18 2.2.4 Instance-based...tures [Laird et al., 1986, Choi et al., 2007]. However, TL for RL tasks has only recently been gaining attention in the artificial intelligence
A look at Behaviourism and Perceptual Control Theory in Interface Design
1998-02-01
behaviours such as response variability, instinctive drift, autoshaping , etc. Perceptual Control Theory (PCT) postulates that behaviours result from the...internal variables. Behaviourism, on the other hand, can not account for variability in responses, instinctive drift, autoshaping , etc. Researchers... Autoshaping . Animals appear to learn without reinforcement. However, conditioning theory speculates that learning results only when reinforcement
Locomotion training of legged robots using hybrid machine learning techniques
NASA Technical Reports Server (NTRS)
Simon, William E.; Doerschuk, Peggy I.; Zhang, Wen-Ran; Li, Andrew L.
1995-01-01
In this study artificial neural networks and fuzzy logic are used to control the jumping behavior of a three-link uniped robot. The biped locomotion control problem is an increment of the uniped locomotion control. Study of legged locomotion dynamics indicates that a hierarchical controller is required to control the behavior of a legged robot. A structured control strategy is suggested which includes navigator, motion planner, biped coordinator and uniped controllers. A three-link uniped robot simulation is developed to be used as the plant. Neurocontrollers were trained both online and offline. In the case of on-line training, a reinforcement learning technique was used to train the neurocontroller to make the robot jump to a specified height. After several hundred iterations of training, the plant output achieved an accuracy of 7.4%. However, when jump distance and body angular momentum were also included in the control objectives, training time became impractically long. In the case of off-line training, a three-layered backpropagation (BP) network was first used with three inputs, three outputs and 15 to 40 hidden nodes. Pre-generated data were presented to the network with a learning rate as low as 0.003 in order to reach convergence. The low learning rate required for convergence resulted in a very slow training process which took weeks to learn 460 examples. After training, performance of the neurocontroller was rather poor. Consequently, the BP network was replaced by a Cerebeller Model Articulation Controller (CMAC) network. Subsequent experiments described in this document show that the CMAC network is more suitable to the solution of uniped locomotion control problems in terms of both learning efficiency and performance. A new approach is introduced in this report, viz., a self-organizing multiagent cerebeller model for fuzzy-neural control of uniped locomotion is suggested to improve training efficiency. This is currently being evaluated for a possible patent by NASA, Johnson Space Center. An alternative modular approach is also developed which uses separate controllers for each stage of the running stride. A self-organizing fuzzy-neural controller controls the height, distance and angular momentum of the stride. A CMAC-based controller controls the movement of the leg from the time the foot leaves the ground to the time of landing. Because the leg joints are controlled at each time step during flight, movement is smooth and obstacles can be avoided. Initial results indicate that this approach can yield fast, accurate results.
BEHAVIORAL MECHANISMS UNDERLYING NICOTINE REINFORCEMENT
Rupprecht, Laura E.; Smith, Tracy T.; Schassburger, Rachel L.; Buffalari, Deanne M.; Sved, Alan F.; Donny, Eric C.
2015-01-01
Cigarette smoking is the leading cause of preventable deaths worldwide and nicotine, the primary psychoactive constituent in tobacco, drives sustained use. The behavioral actions of nicotine are complex and extend well beyond the actions of the drug as a primary reinforcer. Stimuli that are consistently paired with nicotine can, through associative learning, take on reinforcing properties as conditioned stimuli. These conditioned stimuli can then impact the rate and probability of behavior and even function as conditioning reinforcers that maintain behavior in the absence of nicotine. Nicotine can also act as a conditioned stimulus, predicting the delivery of other reinforcers, which may allow nicotine to acquire value as a conditioned reinforcer. These associative effects, establishing non-nicotine stimuli as conditioned stimuli with discriminative stimulus and conditioned reinforcing properties as well as establishing nicotine as a conditioned stimulus, are predicted by basic conditioning principles. However, nicotine can also act non-associatively. Nicotine directly enhances the reinforcing efficacy of other reinforcing stimuli in the environment, an effect that does not require a temporal or predictive relationship between nicotine and either the stimulus or the behavior. Hence, the reinforcing actions of nicotine stem both from the primary reinforcing actions of the drug (and the subsequent associative learning effects) as well as the reinforcement enhancement action of nicotine which is non-associative in nature. Gaining a better understanding of how nicotine impacts behavior will allow for maximally effective tobacco control efforts aimed at reducing the harm associated with tobacco use by reducing and/or treating its addictiveness. PMID:25638333
Multi-Agency Radiological Laboratory Analytical Protocols Manual (MARLAP)
The Multi-Agency Radiological Laboratory Analytical Protocols Manual (MARLAP) provides guidance for the planning, implementation and assessment phases of projects that require laboratory analysis of radionuclides.
Markou, Athina; Salamone, John D; Bussey, Timothy J; Mar, Adam C; Brunner, Daniela; Gilmour, Gary; Balsam, Peter
2013-11-01
The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu) meeting. A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. Copyright © 2013 Elsevier Ltd. All rights reserved.
Goal-Directed and Habit-Like Modulations of Stimulus Processing during Reinforcement Learning.
Luque, David; Beesley, Tom; Morris, Richard W; Jack, Bradley N; Griffiths, Oren; Whitford, Thomas J; Le Pelley, Mike E
2017-03-15
Recent research has shown that perceptual processing of stimuli previously associated with high-value rewards is automatically prioritized even when rewards are no longer available. It has been hypothesized that such reward-related modulation of stimulus salience is conceptually similar to an "attentional habit." Recording event-related potentials in humans during a reinforcement learning task, we show strong evidence in favor of this hypothesis. Resistance to outcome devaluation (the defining feature of a habit) was shown by the stimulus-locked P1 component, reflecting activity in the extrastriate visual cortex. Analysis at longer latencies revealed a positive component (corresponding to the P3b, from 550-700 ms) sensitive to outcome devaluation. Therefore, distinct spatiotemporal patterns of brain activity were observed corresponding to habitual and goal-directed processes. These results demonstrate that reinforcement learning engages both attentional habits and goal-directed processes in parallel. Consequences for brain and computational models of reinforcement learning are discussed. SIGNIFICANCE STATEMENT The human attentional network adapts to detect stimuli that predict important rewards. A recent hypothesis suggests that the visual cortex automatically prioritizes reward-related stimuli, driven by cached representations of reward value; that is, stimulus-response habits. Alternatively, the neural system may track the current value of the predicted outcome. Our results demonstrate for the first time that visual cortex activity is increased for reward-related stimuli even when the rewarding event is temporarily devalued. In contrast, longer-latency brain activity was specifically sensitive to transient changes in reward value. Therefore, we show that both habit-like attention and goal-directed processes occur in the same learning episode at different latencies. This result has important consequences for computational models of reinforcement learning. Copyright © 2017 the authors 0270-6474/17/373009-09$15.00/0.
Markou, Athina; Salamone, John D.; Bussey, Timothy; Mar, Adam; Brunner, Daniela; Gilmour, Gary; Balsam, Peter
2013-01-01
The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu). A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. PMID:23994273
Feature Reinforcement Learning: Part I. Unstructured MDPs
NASA Astrophysics Data System (ADS)
Hutter, Marcus
2009-12-01
General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part II (Hutter, 2009c). The role of POMDPs is also considered there.
The role of within-compound associations in learning about absent cues.
Witnauer, James E; Miller, Ralph R
2011-05-01
When two cues are reinforced together (in compound), most associative models assume that animals learn an associative network that includes direct cue-outcome associations and a within-compound association. All models of associative learning subscribe to the importance of cue-outcome associations, but most models assume that within-compound associations are irrelevant to each cue's subsequent behavioral control. In the present article, we present an extension of Van Hamme and Wasserman's (Learning and Motivation 25:127-151, 1994) model of retrospective revaluation based on learning about absent cues that are retrieved through within-compound associations. The model was compared with a model lacking retrieval through within-compound associations. Simulations showed that within-compound associations are necessary for the model to explain higher-order retrospective revaluation and the observed greater retrospective revaluation after partial reinforcement than after continuous reinforcement alone. These simulations suggest that the associability of an absent stimulus is determined by the extent to which the stimulus is activated through the within-compound association.
The role of within-compound associations in learning about absent cues
Witnauer, James E.
2011-01-01
When two cues are reinforced together (in compound), most associative models assume that animals learn an associative network that includes direct cue–outcome associations and a within-compound association. All models of associative learning subscribe to the importance of cue–outcome associations, but most models assume that within-compound associations are irrelevant to each cue's subsequent behavioral control. In the present article, we present an extension of Van Hamme and Wasserman's (Learning and Motivation 25:127–151, 1994) model of retrospective revaluation based on learning about absent cues that are retrieved through within-compound associations. The model was compared with a model lacking retrieval through within-compound associations. Simulations showed that within-compound associations are necessary for the model to explain higher-order retrospective revaluation and the observed greater retrospective revaluation after partial reinforcement than after continuous reinforcement alone. These simulations suggest that the associability of an absent stimulus is determined by the extent to which the stimulus is activated through the within-compound association. PMID:21264569
Pleasurable music affects reinforcement learning according to the listener
Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira
2013-01-01
Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875
Network Supervision of Adult Experience and Learning Dependent Sensory Cortical Plasticity.
Blake, David T
2017-06-18
The brain is capable of remodeling throughout life. The sensory cortices provide a useful preparation for studying neuroplasticity both during development and thereafter. In adulthood, sensory cortices change in the cortical area activated by behaviorally relevant stimuli, by the strength of response within that activated area, and by the temporal profiles of those responses. Evidence supports forms of unsupervised, reinforcement, and fully supervised network learning rules. Studies on experience-dependent plasticity have mostly not controlled for learning, and they find support for unsupervised learning mechanisms. Changes occur with greatest ease in neurons containing α-CamKII, which are pyramidal neurons in layers II/III and layers V/VI. These changes use synaptic mechanisms including long term depression. Synaptic strengthening at NMDA-containing synapses does occur, but its weak association with activity suggests other factors also initiate changes. Studies that control learning find support of reinforcement learning rules and limited evidence of other forms of supervised learning. Behaviorally associating a stimulus with reinforcement leads to a strengthening of cortical response strength and enlarging of response area with poor selectivity. Associating a stimulus with omission of reinforcement leads to a selective weakening of responses. In some preparations in which these associations are not as clearly made, neurons with the most informative discharges are relatively stronger after training. Studies analyzing the temporal profile of responses associated with omission of reward, or of plasticity in studies with different discriminanda but statistically matched stimuli, support the existence of limited supervised network learning. © 2017 American Physiological Society. Compr Physiol 7:977-1008, 2017. Copyright © 2017 John Wiley & Sons, Inc.
Pfeifer, Gaby; Garfinkel, Sarah N; Gould van Praag, Cassandra D; Sahota, Kuljit; Betka, Sophie; Critchley, Hugo D
2017-05-01
Feedback processing is critical to trial-and-error learning. Here, we examined whether interoceptive signals concerning the state of cardiovascular arousal influence the processing of reinforcing feedback during the learning of 'emotional' face-name pairs, with subsequent effects on retrieval. Participants (N=29) engaged in a learning task of face-name pairs (fearful, neutral, happy faces). Correct and incorrect learning decisions were reinforced by auditory feedback, which was delivered either at cardiac systole (on the heartbeat, when baroreceptors signal the contraction of the heart to the brain), or at diastole (between heartbeats during baroreceptor quiescence). We discovered a cardiac influence on feedback processing that enhanced the learning of fearful faces in people with heightened interoceptive ability. Individuals with enhanced accuracy on a heartbeat counting task learned fearful face-name pairs better when feedback was given at systole than at diastole. This effect was not present for neutral and happy faces. At retrieval, we also observed related effects of personality: First, individuals scoring higher for extraversion showed poorer retrieval accuracy. These individuals additionally manifested lower resting heart rate and lower state anxiety, suggesting that attenuated levels of cardiovascular arousal in extraverts underlies poorer performance. Second, higher extraversion scores predicted higher emotional intensity ratings of fearful faces reinforced at systole. Third, individuals scoring higher for neuroticism showed higher retrieval confidence for fearful faces reinforced at diastole. Our results show that cardiac signals shape feedback processing to influence learning of fearful faces, an effect underpinned by personality differences linked to psychophysiological arousal. Copyright © 2017 Elsevier B.V. All rights reserved.
Coordination of fractional-order nonlinear multi-agent systems via distributed impulsive control
NASA Astrophysics Data System (ADS)
Ma, Tiedong; Li, Teng; Cui, Bing
2018-01-01
The coordination of fractional-order nonlinear multi-agent systems via distributed impulsive control method is studied in this paper. Based on the theory of impulsive differential equations, algebraic graph theory, Lyapunov stability theory and Mittag-Leffler function, two novel sufficient conditions for achieving the cooperative control of a class of fractional-order nonlinear multi-agent systems are derived. Finally, two numerical simulations are verified to illustrate the effectiveness and feasibility of the proposed method.
Massive Multi-Agent Systems Control
NASA Technical Reports Server (NTRS)
Campagne, Jean-Charles; Gardon, Alain; Collomb, Etienne; Nishida, Toyoaki
2004-01-01
In order to build massive multi-agent systems, considered as complex and dynamic systems, one needs a method to analyze and control the system. We suggest an approach using morphology to represent and control the state of large organizations composed of a great number of light software agents. Morphology is understood as representing the state of the multi-agent system as shapes in an abstract geometrical space, this notion is close to the notion of phase space in physics.
Wang, Xinghu; Hong, Yiguang; Yi, Peng; Ji, Haibo; Kang, Yu
2017-05-24
In this paper, a distributed optimization problem is studied for continuous-time multiagent systems with unknown-frequency disturbances. A distributed gradient-based control is proposed for the agents to achieve the optimal consensus with estimating unknown frequencies and rejecting the bounded disturbance in the semi-global sense. Based on convex optimization analysis and adaptive internal model approach, the exact optimization solution can be obtained for the multiagent system disturbed by exogenous disturbances with uncertain parameters.
Zou, Lei; Wang, Zidong; Gao, Huijun; Alsaadi, Fuad E
2017-03-31
This paper is concerned with the distributed H∞ consensus control problem for a discrete time-varying multiagent system with the stochastic communication protocol (SCP). A directed graph is used to characterize the communication topology of the multiagent network. The data transmission between each agent and the neighboring ones is implemented via a constrained communication channel where only one neighboring agent is allowed to transmit data at each time instant. The SCP is applied to schedule the signal transmission of the multiagent system. A sequence of random variables is utilized to capture the scheduling behavior of the SCP. By using the mapping technology combined with the Hadamard product, the closed-loop multiagent system is modeled as a time-varying system with a stochastic parameter matrix. The purpose of the addressed problem is to design a cooperative controller for each agent such that, for all probabilistic scheduling behaviors, the H∞ consensus performance is achieved over a given finite horizon for the closed-loop multiagent system. A necessary and sufficient condition is derived to ensure the H∞ consensus performance based on the completing squares approach and the stochastic analysis technique. Then, the controller parameters are obtained by solving two coupled backward recursive Riccati difference equations. Finally, a numerical example is given to illustrate the effectiveness of the proposed controller design scheme.
Multiagent data warehousing and multiagent data mining for cerebrum/cerebellum modeling
NASA Astrophysics Data System (ADS)
Zhang, Wen-Ran
2002-03-01
An algorithm named Neighbor-Miner is outlined for multiagent data warehousing and multiagent data mining. The algorithm is defined in an evolving dynamic environment with autonomous or semiautonomous agents. Instead of mining frequent itemsets from customer transactions, the new algorithm discovers new agents and mining agent associations in first-order logic from agent attributes and actions. While the Apriori algorithm uses frequency as a priory threshold, the new algorithm uses agent similarity as priory knowledge. The concept of agent similarity leads to the notions of agent cuboid, orthogonal multiagent data warehousing (MADWH), and multiagent data mining (MADM). Based on agent similarities and action similarities, Neighbor-Miner is proposed and illustrated in a MADWH/MADM approach to cerebrum/cerebellum modeling. It is shown that (1) semiautonomous neurofuzzy agents can be identified for uniped locomotion and gymnastic training based on attribute relevance analysis; (2) new agents can be discovered and agent cuboids can be dynamically constructed in an orthogonal MADWH, which resembles an evolving cerebrum/cerebellum system; and (3) dynamic motion laws can be discovered as association rules in first order logic. Although examples in legged robot gymnastics are used to illustrate the basic ideas, the new approach is generally suitable for a broad category of data mining tasks where knowledge can be discovered collectively by a set of agents from a geographically or geometrically distributed but relevant environment, especially in scientific and engineering data environments.
A reinforcement learning-based architecture for fuzzy logic control
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1992-01-01
This paper introduces a new method for learning to refine a rule-based fuzzy logic controller. A reinforcement learning technique is used in conjunction with a multilayer neural network model of a fuzzy controller. The approximate reasoning based intelligent control (ARIC) architecture proposed here learns by updating its prediction of the physical system's behavior and fine tunes a control knowledge base. Its theory is related to Sutton's temporal difference (TD) method. Because ARIC has the advantage of using the control knowledge of an experienced operator and fine tuning it through the process of learning, it learns faster than systems that train networks from scratch. The approach is applied to a cart-pole balancing system.
Franklin, Nicholas T; Frank, Michael J
2015-12-25
Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments.
Pilarski, Patrick M; Dawson, Michael R; Degris, Thomas; Fahimi, Farbod; Carey, Jason P; Sutton, Richard S
2011-01-01
As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis. © 2011 IEEE
Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning.
Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane
2017-01-01
Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.
Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning
Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane
2017-01-01
Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning. PMID:29163004
The mechanisms of labor division from the perspective of individual optimization
NASA Astrophysics Data System (ADS)
Zhu, Lirong; Chen, Jiawei; Di, Zengru; Chen, Liujun; Liu, Yan; Stanley, H. Eugene
2017-12-01
Although the tools of complexity research have been applied to the phenomenon of labor division, its underlying mechanisms are still unclear. Researchers have used evolutionary models to study labor division in terms of global optimization, but focusing on individual optimization is a more realistic, real-world approach. We do this by first developing a multi-agent model that takes into account information-sharing and learning-by-doing and by using simulations to demonstrate the emergence of labor division. We then use a master equation method and find that the computational results are consistent with the results of the simulation. Finally we find that the core underlying mechanisms that cause labor division are learning-by-doing, information cost, and random fluctuation.
NASA Technical Reports Server (NTRS)
Jani, Yashvant
1992-01-01
The reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Maximum Mission (SMM) satellite simulation. In utilizing these fuzzy learning techniques, we also use the Approximate Reasoning based Intelligent Control (ARIC) architecture, and so we use two terms interchangeable to imply the same. This activity is carried out in the Software Technology Laboratory utilizing the Orbital Operations Simulator (OOS). This report is the deliverable D3 in our project activity and provides the test results of the fuzzy learning translational controller. This report is organized in six sections. Based on our experience and analysis with the attitude controller, we have modified the basic configuration of the reinforcement learning algorithm in ARIC as described in section 2. The shuttle translational controller and its implementation in fuzzy learning architecture is described in section 3. Two test cases that we have performed are described in section 4. Our results and conclusions are discussed in section 5, and section 6 provides future plans and summary for the project.
Learning and tuning fuzzy logic controllers through reinforcements
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Khedkar, Pratap
1992-01-01
This paper presents a new method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system. In particular, our generalized approximate reasoning-based intelligent control (GARIC) architecture (1) learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; (2) introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; (3) introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and (4) learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward neural network, which can then adaptively improve performance by using gradient descent methods. We extend the AHC algorithm of Barto et al. (1983) to include the prior control knowledge of human operators. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.
Learning the specific quality of taste reinforcement in larval Drosophila
Schleyer, Michael; Miura, Daisuke; Tanimura, Teiichi; Gerber, Bertram
2015-01-01
The only property of reinforcement insects are commonly thought to learn about is its value. We show that larval Drosophila not only remember the value of reinforcement (How much?), but also its quality (What?). This is demonstrated both within the appetitive domain by using sugar vs amino acid as different reward qualities, and within the aversive domain by using bitter vs high-concentration salt as different qualities of punishment. From the available literature, such nuanced memories for the quality of reinforcement are unexpected and pose a challenge to present models of how insect memory is organized. Given that animals as simple as larval Drosophila, endowed with but 10,000 neurons, operate with both reinforcement value and quality, we suggest that both are fundamental aspects of mnemonic processing—in any brain. DOI: http://dx.doi.org/10.7554/eLife.04711.001 PMID:25622533
Evidence for a neural law of effect.
Athalye, Vivek R; Santos, Fernando J; Carmena, Jose M; Costa, Rui M
2018-03-02
Thorndike's law of effect states that actions that lead to reinforcements tend to be repeated more often. Accordingly, neural activity patterns leading to reinforcement are also reentered more frequently. Reinforcement relies on dopaminergic activity in the ventral tegmental area (VTA), and animals shape their behavior to receive dopaminergic stimulation. Seeking evidence for a neural law of effect, we found that mice learn to reenter more frequently motor cortical activity patterns that trigger optogenetic VTA self-stimulation. Learning was accompanied by gradual shaping of these patterns, with participating neurons progressively increasing and aligning their covariance to that of the target pattern. Motor cortex patterns that lead to phasic dopaminergic VTA activity are progressively reinforced and shaped, suggesting a mechanism by which animals select and shape actions to reliably achieve reinforcement. Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Boctor, Lisa
2013-03-01
The majority of nursing students are kinesthetic learners, preferring a hands-on, active approach to education. Research shows that active-learning strategies can increase student learning and satisfaction. This study looks at the use of one active-learning strategy, a Jeopardy-style game, 'Nursopardy', to reinforce Fundamentals of Nursing material, aiding in students' preparation for a standardized final exam. The game was created keeping students varied learning styles and the NCLEX blueprint in mind. The blueprint was used to create 5 categories, with 26 total questions. Student survey results, using a five-point Likert scale showed that they did find this learning method enjoyable and beneficial to learning. More research is recommended regarding learning outcomes, when using active-learning strategies, such as games. Copyright © 2012 Elsevier Ltd. All rights reserved.
Zsuga, Judit; Biro, Klara; Papp, Csaba; Tajti, Gabor; Gesztelyi, Rudolf
2016-02-01
Reinforcement learning (RL) is a powerful concept underlying forms of associative learning governed by the use of a scalar reward signal, with learning taking place if expectations are violated. RL may be assessed using model-based and model-free approaches. Model-based reinforcement learning involves the amygdala, the hippocampus, and the orbitofrontal cortex (OFC). The model-free system involves the pedunculopontine-tegmental nucleus (PPTgN), the ventral tegmental area (VTA) and the ventral striatum (VS). Based on the functional connectivity of VS, model-free and model based RL systems center on the VS that by integrating model-free signals (received as reward prediction error) and model-based reward related input computes value. Using the concept of reinforcement learning agent we propose that the VS serves as the value function component of the RL agent. Regarding the model utilized for model-based computations we turned to the proactive brain concept, which offers an ubiquitous function for the default network based on its great functional overlap with contextual associative areas. Hence, by means of the default network the brain continuously organizes its environment into context frames enabling the formulation of analogy-based association that are turned into predictions of what to expect. The OFC integrates reward-related information into context frames upon computing reward expectation by compiling stimulus-reward and context-reward information offered by the amygdala and hippocampus, respectively. Furthermore we suggest that the integration of model-based expectations regarding reward into the value signal is further supported by the efferent of the OFC that reach structures canonical for model-free learning (e.g., the PPTgN, VTA, and VS). (c) 2016 APA, all rights reserved).
Incorporating Dispositional Traits into the Treatment of Anorexia Nervosa
Herzog, David; Moskovich, Ashley; Merwin, Rhonda; Lin, Tammy
2014-01-01
We provide a general framework to guide the development of interventions that aim to address persistent features in eating disorders that may preclude effective treatment. Using perfectionism as an exemplar, we draw from research in cognitive neuroscience regarding attention and reinforcement learning, from learning theory and social psychology regarding vicarious learning and implications for the role modeling of significant others, and from clinical psychology on the importance of verbal narratives as barriers that may influence expectations and shape reinforcement schedules. PMID:21243482
Hybrid learning in signalling games
NASA Astrophysics Data System (ADS)
Barrett, Jeffrey A.; Cochran, Calvin T.; Huttegger, Simon; Fujiwara, Naoki
2017-09-01
Lewis-Skyrms signalling games have been studied under a variety of low-rationality learning dynamics. Reinforcement dynamics are stable but slow and prone to evolving suboptimal signalling conventions. A low-inertia trial-and-error dynamical like win-stay/lose-randomise is fast and reliable at finding perfect signalling conventions but unstable in the context of noise or agent error. Here we consider a low-rationality hybrid of reinforcement and win-stay/lose-randomise learning that exhibits the virtues of both. This hybrid dynamics is reliable, stable and exceptionally fast.
Sitaraman, Divya; Kramer, Elizabeth F.; Kahsai, Lily; Ostrowski, Daniela; Zars, Troy
2017-01-01
Feedback mechanisms in operant learning are critical for animals to increase reward or reduce punishment. However, not all conditions have a behavior that can readily resolve an event. Animals must then try out different behaviors to better their situation through outcome learning. This form of learning allows for novel solutions and with positive experience can lead to unexpected behavioral routines. Learned helplessness, as a type of outcome learning, manifests in part as increases in escape latency in the face of repeated unpredicted shocks. Little is known about the mechanisms of outcome learning. When fruit fly Drosophila melanogaster are exposed to unpredicted high temperatures in a place learning paradigm, flies both increase escape latencies and have a higher memory when given control of a place/temperature contingency. Here we describe discrete serotonin neuronal circuits that mediate aversive reinforcement, escape latencies, and memory levels after place learning in the presence and absence of unexpected aversive events. The results show that two features of learned helplessness depend on the same modulatory system as aversive reinforcement. Moreover, changes in aversive reinforcement and escape latency depend on local neural circuit modulation, while memory enhancement requires larger modulation of multiple behavioral control circuits. PMID:29321732
Ellwood, Ian T.; Patel, Tosha; Wadia, Varun; Lee, Anthony T.; Liptak, Alayna T.
2017-01-01
Dopamine neurons in the ventral tegmental area (VTA) encode reward prediction errors and can drive reinforcement learning through their projections to striatum, but much less is known about their projections to prefrontal cortex (PFC). Here, we studied these projections and observed phasic VTA–PFC fiber photometry signals after the delivery of rewards. Next, we studied how optogenetic stimulation of these projections affects behavior using conditioned place preference and a task in which mice learn associations between cues and food rewards and then use those associations to make choices. Neither phasic nor tonic stimulation of dopaminergic VTA–PFC projections elicited place preference. Furthermore, substituting phasic VTA–PFC stimulation for food rewards was not sufficient to reinforce new cue–reward associations nor maintain previously learned ones. However, the same patterns of stimulation that failed to reinforce place preference or cue–reward associations were able to modify behavior in other ways. First, continuous tonic stimulation maintained previously learned cue–reward associations even after they ceased being valid. Second, delivering phasic stimulation either continuously or after choices not previously associated with reward induced mice to make choices that deviated from previously learned associations. In summary, despite the fact that dopaminergic VTA–PFC projections exhibit phasic increases in activity that are time locked to the delivery of rewards, phasic activation of these projections does not necessarily reinforce specific actions. Rather, dopaminergic VTA–PFC activity can control whether mice maintain or deviate from previously learned cue–reward associations. SIGNIFICANCE STATEMENT Dopaminergic inputs from ventral tegmental area (VTA) to striatum encode reward prediction errors and reinforce specific actions; however, it is currently unknown whether dopaminergic inputs to prefrontal cortex (PFC) play similar or distinct roles. Here, we used bulk Ca2+ imaging to show that unexpected rewards or reward-predicting cues elicit phasic increases in the activity of dopaminergic VTA–PFC fibers. However, in multiple behavioral paradigms, we failed to observe reinforcing effects after stimulation of these fibers. In these same experiments, we did find that tonic or phasic patterns of stimulation caused mice to maintain or deviate from previously learned cue–reward associations, respectively. Therefore, although they may exhibit similar patterns of activity, dopaminergic inputs to striatum and PFC can elicit divergent behavioral effects. PMID:28739583
ERIC Educational Resources Information Center
Galbreath, Joy; Feldman, David
The relationship of reading comprehension accuracy and a contingently administered token reinforcement program used with an elementary level learning disabled student in the classroom was examined. The S earned points for each correct answer made after oral reading sessions. At the conclusion of the class he could exchange his points for rewards.…
Cocaine addiction as a homeostatic reinforcement learning disorder.
Keramati, Mehdi; Durand, Audrey; Girardeau, Paul; Gutkin, Boris; Ahmed, Serge H
2017-03-01
Drug addiction implicates both reward learning and homeostatic regulation mechanisms of the brain. This has stimulated 2 partially successful theoretical perspectives on addiction. Many important aspects of addiction, however, remain to be explained within a single, unified framework that integrates the 2 mechanisms. Building upon a recently developed homeostatic reinforcement learning theory, the authors focus on a key transition stage of addiction that is well modeled in animals, escalation of drug use, and propose a computational theory of cocaine addiction where cocaine reinforces behavior due to its rapid homeostatic corrective effect, whereas its chronic use induces slow and long-lasting changes in homeostatic setpoint. Simulations show that our new theory accounts for key behavioral and neurobiological features of addiction, most notably, escalation of cocaine use, drug-primed craving and relapse, individual differences underlying dose-response curves, and dopamine D2-receptor downregulation in addicts. The theory also generates unique predictions about cocaine self-administration behavior in rats that are confirmed by new experimental results. Viewing addiction as a homeostatic reinforcement learning disorder coherently explains many behavioral and neurobiological aspects of the transition to cocaine addiction, and suggests a new perspective toward understanding addiction. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Challenges in the Verification of Reinforcement Learning Algorithms
NASA Technical Reports Server (NTRS)
Van Wesel, Perry; Goodloe, Alwyn E.
2017-01-01
Machine learning (ML) is increasingly being applied to a wide array of domains from search engines to autonomous vehicles. These algorithms, however, are notoriously complex and hard to verify. This work looks at the assumptions underlying machine learning algorithms as well as some of the challenges in trying to verify ML algorithms. Furthermore, we focus on the specific challenges of verifying reinforcement learning algorithms. These are highlighted using a specific example. Ultimately, we do not offer a solution to the complex problem of ML verification, but point out possible approaches for verification and interesting research opportunities.
Refining fuzzy logic controllers with machine learning
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.
1994-01-01
In this paper, we describe the GARIC (Generalized Approximate Reasoning-Based Intelligent Control) architecture, which learns from its past performance and modifies the labels in the fuzzy rules to improve performance. It uses fuzzy reinforcement learning which is a hybrid method of fuzzy logic and reinforcement learning. This technology can simplify and automate the application of fuzzy logic control to a variety of systems. GARIC has been applied in simulation studies of the Space Shuttle rendezvous and docking experiments. It has the potential of being applied in other aerospace systems as well as in consumer products such as appliances, cameras, and cars.
Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning.
Feng, Yuntian; Zhang, Hongjun; Hao, Wenning; Chen, Gang
2017-01-01
We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q -Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.
Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning
Zhang, Hongjun; Chen, Gang
2017-01-01
We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q-Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score. PMID:28894463
Investigation of a Reinforcement-Based Toilet Training Procedure for Children with Autism.
ERIC Educational Resources Information Center
Cicero, Frank R.; Pfadt, Al
2002-01-01
This study evaluated the effectiveness of a reinforcement-based toilet training intervention with three children with autism. Procedures included positive reinforcement, graduated guidance, scheduled practice trials, and forward prompting. All three children reduced urination accidents to zero and learned to request bathroom use spontaneously…
Sex Differences in Reinforcement and Punishment on Prime-Time Television.
ERIC Educational Resources Information Center
Downs, A. Chris; Gowan, Darryl C.
1980-01-01
Television programs were analyzed for frequencies of positive reinforcement and punishment exchanged among performers varying in age and sex. Females were found to more often exhibit and receive reinforcement, whereas males more often exhibited and received punishment. These findings have implications for children's learning of positive and…
Separation of Time-Based and Trial-Based Accounts of the Partial Reinforcement Extinction Effect
Bouton, Mark E.; Woods, Amanda M.; Todd, Travis P.
2013-01-01
Two appetitive conditioning experiments with rats examined time-based and trial-based accounts of the partial reinforcement extinction effect (PREE). In the PREE, the loss of responding that occurs in extinction is slower when the conditioned stimulus (CS) has been paired with a reinforcer on some of its presentations (partially reinforced) instead of every presentation (continuously reinforced). According to a time-based or “time-accumulation” view (e.g., Gallistel & Gibbon, 2000), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger amount of time has accumulated in the CS over trials. In contrast, according to a trial-based view (e.g., Capaldi, 1967), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger number of CS presentations. Experiment 1 used a procedure that equated partially- and continuously-reinforced groups on their expected times to reinforcement during conditioning. A PREE was still observed. Experiment 2 then used an extinction procedure that allowed time in the CS and the number of trials to accumulate differentially through extinction. The PREE was still evident when responding was examined as a function of expected time units to the reinforcer, but was eliminated when responding was examined as a function of expected trial units to the reinforcer. There was no evidence that the animal responded according to the ratio of time accumulated during the CS in extinction over the time in the CS expected before the reinforcer. The results thus favor a trial-based account over a time-based account of extinction and the PREE. PMID:23962669
Schifani, Christin; Sukhanov, Ilya; Dorofeikova, Mariia; Bespalov, Anton
2017-07-28
There is a need to develop cognitive tasks that address valid neuropsychological constructs implicated in disease mechanisms and can be used in animals and humans to guide novel drug discovery. Present experiments aimed to characterize a novel reinforcement learning task based on a classical operant behavioral phenomenon observed in multiple species - differences in response patterning under variable (VI) vs fixed interval (FI) schedules of reinforcement. Wistar rats were trained to press a lever for food under VI30s and later weekly test sessions were introduced with reinforcement schedule switched to FI30s. During the FI30s test session, post-reinforcement pauses (PRPs) gradually grew towards the end of the session reaching 22-43% of the initial values. Animals could be retrained under VI30s conditions, and FI30s test sessions were repeated over a period of several months without appreciable signs of a practice effect. Administration of the non-competitive N-methyl-d-aspartate (NMDA) receptor antagonist MK-801 ((5S,10R)-(+)-5-Methyl-10,11-dihydro-5H-dibenzo[a,d]cyclohepten-5,10-imine maleate) prior to FI30s sessions prevented adjustment of PRPs associated with the change from VI to FI schedule. This effect was most pronounced at the highest tested dose of MK-801 and appeared to be independent of the effects of this dose on response rates. These results provide initial evidence for the possibility to use different response patterning under VI and FI schedules with equivalent reinforcement density for studying effects of drug treatment on reinforcement learning. Copyright © 2017 Elsevier B.V. All rights reserved.
Preliminary Work for Examining the Scalability of Reinforcement Learning
NASA Technical Reports Server (NTRS)
Clouse, Jeff
1998-01-01
Researchers began studying automated agents that learn to perform multiple-step tasks early in the history of artificial intelligence (Samuel, 1963; Samuel, 1967; Waterman, 1970; Fikes, Hart & Nilsonn, 1972). Multiple-step tasks are tasks that can only be solved via a sequence of decisions, such as control problems, robotics problems, classic problem-solving, and game-playing. The objective of agents attempting to learn such tasks is to use the resources they have available in order to become more proficient at the tasks. In particular, each agent attempts to develop a good policy, a mapping from states to actions, that allows it to select actions that optimize a measure of its performance on the task; for example, reducing the number of steps necessary to complete the task successfully. Our study focuses on reinforcement learning, a set of learning techniques where the learner performs trial-and-error experiments in the task and adapts its policy based on the outcome of those experiments. Much of the work in reinforcement learning has focused on a particular, simple representation, where every problem state is represented explicitly in a table, and associated with each state are the actions that can be chosen in that state. A major advantage of this table lookup representation is that one can prove that certain reinforcement learning techniques will develop an optimal policy for the current task. The drawback is that the representation limits the application of reinforcement learning to multiple-step tasks with relatively small state-spaces. There has been a little theoretical work that proves that convergence to optimal solutions can be obtained when using generalization structures, but the structures are quite simple. The theory says little about complex structures, such as multi-layer, feedforward artificial neural networks (Rumelhart & McClelland, 1986), but empirical results indicate that the use of reinforcement learning with such structures is promising. These empirical results make no theoretical claims, nor compare the policies produced to optimal policies. A goal of our work is to be able to make the comparison between an optimal policy and one stored in an artificial neural network. A difficulty of performing such a study is finding a multiple-step task that is small enough that one can find an optimal policy using table lookup, yet large enough that, for practical purposes, an artificial neural network is really required. We have identified a limited form of the game OTHELLO as satisfying these requirements. The work we report here is in the very preliminary stages of research, but this paper provides background for the problem being studied and a description of our initial approach to examining the problem. In the remainder of this paper, we first describe reinforcement learning in more detail. Next, we present the game OTHELLO. Finally we argue that a restricted form of the game meets the requirements of our study, and describe our preliminary approach to finding an optimal solution to the problem.
Brain Research: Implications for Learning.
ERIC Educational Resources Information Center
Soares, Louise M.; Soares, Anthony T.
Brain research has illuminated several areas of the learning process: (1) learning as association; (2) learning as reinforcement; (3) learning as perception; (4) learning as imitation; (5) learning as organization; (6) learning as individual style; and (7) learning as brain activity. The classic conditioning model developed by Pavlov advanced…
Deep imitation learning for 3D navigation tasks.
Hussein, Ahmed; Elyan, Eyad; Gaber, Mohamed Medhat; Jayne, Chrisina
2018-01-01
Deep learning techniques have shown success in learning from raw high-dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in imitation learning has been scarcely explored. Imitation learning can be an efficient method to teach intelligent agents by providing a set of demonstrations to learn from. However, generalizing to situations that are not represented in the demonstrations can be challenging, especially in 3D environments. In this paper, we propose a deep imitation learning method to learn navigation tasks from demonstrations in a 3D environment. The supervised policy is refined using active learning in order to generalize to unseen situations. This approach is compared to two popular deep reinforcement learning techniques: deep-Q-networks and Asynchronous actor-critic (A3C). The proposed method as well as the reinforcement learning methods employ deep convolutional neural networks and learn directly from raw visual input. Methods for combining learning from demonstrations and experience are also investigated. This combination aims to join the generalization ability of learning by experience with the efficiency of learning by imitation. The proposed methods are evaluated on 4 navigation tasks in a 3D simulated environment. Navigation tasks are a typical problem that is relevant to many real applications. They pose the challenge of requiring demonstrations of long trajectories to reach the target and only providing delayed rewards (usually terminal) to the agent. The experiments show that the proposed method can successfully learn navigation tasks from raw visual input while learning from experience methods fail to learn an effective policy. Moreover, it is shown that active learning can significantly improve the performance of the initially learned policy using a small number of active samples.
Multi-Agency Radiation Survey and Site Investigation Manual (MARSSIM)
The Multi-Agency Radiation Survey and Site Investigation Manual (MARSSIM) provides detailed guidance on how to demonstrate that a site is in compliance with a radiation dose- or risk-based regulation.
NASA Astrophysics Data System (ADS)
Hashimoto, Ryoji; Matsumura, Tomoya; Nozato, Yoshihiro; Watanabe, Kenji; Onoye, Takao
A multi-agent object attention system is proposed, which is based on biologically inspired attractor selection model. Object attention is facilitated by using a video sequence and a depth map obtained through a compound-eye image sensor TOMBO. Robustness of the multi-agent system over environmental changes is enhanced by utilizing the biological model of adaptive response by attractor selection. To implement the proposed system, an efficient VLSI architecture is employed with reducing enormous computational costs and memory accesses required for depth map processing and multi-agent attractor selection process. According to the FPGA implementation result of the proposed object attention system, which is accomplished by using 7,063 slices, 640×512 pixel input images can be processed in real-time with three agents at a rate of 9fps in 48MHz operation.
Hassani, S. A.; Oemisch, M.; Balcarras, M.; Westendorff, S.; Ardid, S.; van der Meer, M. A.; Tiesinga, P.; Womelsdorf, T.
2017-01-01
Noradrenaline is believed to support cognitive flexibility through the alpha 2A noradrenergic receptor (a2A-NAR) acting in prefrontal cortex. Enhanced flexibility has been inferred from improved working memory with the a2A-NA agonist Guanfacine. But it has been unclear whether Guanfacine improves specific attention and learning mechanisms beyond working memory, and whether the drug effects can be formalized computationally to allow single subject predictions. We tested and confirmed these suggestions in a case study with a healthy nonhuman primate performing a feature-based reversal learning task evaluating performance using Bayesian and Reinforcement learning models. In an initial dose-testing phase we found a Guanfacine dose that increased performance accuracy, decreased distractibility and improved learning. In a second experimental phase using only that dose we examined the faster feature-based reversal learning with Guanfacine with single-subject computational modeling. Parameter estimation suggested that improved learning is not accounted for by varying a single reinforcement learning mechanism, but by changing the set of parameter values to higher learning rates and stronger suppression of non-chosen over chosen feature information. These findings provide an important starting point for developing nonhuman primate models to discern the synaptic mechanisms of attention and learning functions within the context of a computational neuropsychiatry framework. PMID:28091572
Electrophysiological correlates of observational learning in children.
Rodriguez Buritica, Julia M; Eppinger, Ben; Schuck, Nicolas W; Heekeren, Hauke R; Li, Shu-Chen
2016-09-01
Observational learning is an important mechanism for cognitive and social development. However, the neurophysiological mechanisms underlying observational learning in children are not well understood. In this study, we used a probabilistic reward-based observational learning paradigm to compare behavioral and electrophysiological markers of individual and observational reinforcement learning in 8- to 10-year-old children. Specifically, we manipulated the amount of observable information as well as children's similarity in age to the observed person (same-aged child vs. adult) to examine the effects of similarity in age on the integration of observed information in children. We show that the feedback-related negativity (FRN) during individual reinforcement learning reflects the valence of outcomes of own actions. Furthermore, we found that the feedback-related negativity during observational reinforcement learning (oFRN) showed a similar distinction between outcome valences of observed actions. This suggests that the oFRN can serve as a measure of observational learning in middle childhood. Moreover, during observational learning children profited from the additional social information and imitated the choices of their own peers more than those of adults, indicating that children have a tendency to conform more with similar others (e.g. their own peers) compared to dissimilar others (adults). Taken together, our results show that children can benefit from integrating observable information and that oFRN may serve as a measure of observational learning in children. © 2015 John Wiley & Sons Ltd.
Mechanisms and time course of vocal learning and consolidation in the adult songbird.
Warren, Timothy L; Tumer, Evren C; Charlesworth, Jonathan D; Brainard, Michael S
2011-10-01
In songbirds, the basal ganglia outflow nucleus LMAN is a cortical analog that is required for several forms of song plasticity and learning. Moreover, in adults, inactivating LMAN can reverse the initial expression of learning driven via aversive reinforcement. In the present study, we investigated how LMAN contributes to both reinforcement-driven learning and a self-driven recovery process in adult Bengalese finches. We first drove changes in the fundamental frequency of targeted song syllables and compared the effects of inactivating LMAN with the effects of interfering with N-methyl-d-aspartate (NMDA) receptor-dependent transmission from LMAN to one of its principal targets, the song premotor nucleus RA. Inactivating LMAN and blocking NMDA receptors in RA caused indistinguishable reversions in the expression of learning, indicating that LMAN contributes to learning through NMDA receptor-mediated glutamatergic transmission to RA. We next assessed how LMAN's role evolves over time by maintaining learned changes to song while periodically inactivating LMAN. The expression of learning consolidated to become LMAN independent over multiple days, indicating that this form of consolidation is not completed over one night, as previously suggested, and instead may occur gradually during singing. Subsequent cessation of reinforcement was followed by a gradual self-driven recovery of original song structure, indicating that consolidation does not correspond with the lasting retention of changes to song. Finally, for self-driven recovery, as for reinforcement-driven learning, LMAN was required for the expression of initial, but not later, changes to song. Our results indicate that NMDA receptor-dependent transmission from LMAN to RA plays an essential role in the initial expression of two distinct forms of vocal learning and that this role gradually wanes over a multiday process of consolidation. The results support an emerging view that cortical-basal ganglia circuits can direct the initial expression of learning via top-down influences on primary motor circuitry.
Mechanisms and time course of vocal learning and consolidation in the adult songbird
Tumer, Evren C.; Charlesworth, Jonathan D.; Brainard, Michael S.
2011-01-01
In songbirds, the basal ganglia outflow nucleus LMAN is a cortical analog that is required for several forms of song plasticity and learning. Moreover, in adults, inactivating LMAN can reverse the initial expression of learning driven via aversive reinforcement. In the present study, we investigated how LMAN contributes to both reinforcement-driven learning and a self-driven recovery process in adult Bengalese finches. We first drove changes in the fundamental frequency of targeted song syllables and compared the effects of inactivating LMAN with the effects of interfering with N-methyl-d-aspartate (NMDA) receptor-dependent transmission from LMAN to one of its principal targets, the song premotor nucleus RA. Inactivating LMAN and blocking NMDA receptors in RA caused indistinguishable reversions in the expression of learning, indicating that LMAN contributes to learning through NMDA receptor-mediated glutamatergic transmission to RA. We next assessed how LMAN's role evolves over time by maintaining learned changes to song while periodically inactivating LMAN. The expression of learning consolidated to become LMAN independent over multiple days, indicating that this form of consolidation is not completed over one night, as previously suggested, and instead may occur gradually during singing. Subsequent cessation of reinforcement was followed by a gradual self-driven recovery of original song structure, indicating that consolidation does not correspond with the lasting retention of changes to song. Finally, for self-driven recovery, as for reinforcement-driven learning, LMAN was required for the expression of initial, but not later, changes to song. Our results indicate that NMDA receptor-dependent transmission from LMAN to RA plays an essential role in the initial expression of two distinct forms of vocal learning and that this role gradually wanes over a multiday process of consolidation. The results support an emerging view that cortical-basal ganglia circuits can direct the initial expression of learning via top-down influences on primary motor circuitry. PMID:21734110
Wang, Yiwen; Wang, Fang; Xu, Kai; Zhang, Qiaosheng; Zhang, Shaomin; Zheng, Xiaoxiang
2015-05-01
Reinforcement learning (RL)-based brain machine interfaces (BMIs) enable the user to learn from the environment through interactions to complete the task without desired signals, which is promising for clinical applications. Previous studies exploited Q-learning techniques to discriminate neural states into simple directional actions providing the trial initial timing. However, the movements in BMI applications can be quite complicated, and the action timing explicitly shows the intention when to move. The rich actions and the corresponding neural states form a large state-action space, imposing generalization difficulty on Q-learning. In this paper, we propose to adopt attention-gated reinforcement learning (AGREL) as a new learning scheme for BMIs to adaptively decode high-dimensional neural activities into seven distinct movements (directional moves, holdings and resting) due to the efficient weight-updating. We apply AGREL on neural data recorded from M1 of a monkey to directly predict a seven-action set in a time sequence to reconstruct the trajectory of a center-out task. Compared to Q-learning techniques, AGREL could improve the target acquisition rate to 90.16% in average with faster convergence and more stability to follow neural activity over multiple days, indicating the potential to achieve better online decoding performance for more complicated BMI tasks.
ERIC Educational Resources Information Center
Kral, Paul A.; And Others
Investigates the effect of delay of reinforcement upon human discrimination learning with particular emphasis on the form of the gradient within the first few seconds of delay. In previous studies subjects are usually required to make an instrumental response to a stimulus, this is followed by the delay interval, and finally, the reinforcement…
Distributed Optimal Consensus Control for Multiagent Systems With Input Delay.
Zhang, Huaipin; Yue, Dong; Zhao, Wei; Hu, Songlin; Dou, Chunxia; Huaipin Zhang; Dong Yue; Wei Zhao; Songlin Hu; Chunxia Dou; Hu, Songlin; Zhang, Huaipin; Dou, Chunxia; Yue, Dong; Zhao, Wei
2018-06-01
This paper addresses the problem of distributed optimal consensus control for a continuous-time heterogeneous linear multiagent system subject to time varying input delays. First, by discretization and model transformation, the continuous-time input-delayed system is converted into a discrete-time delay-free system. Two delicate performance index functions are defined for these two systems. It is shown that the performance index functions are equivalent and the optimal consensus control problem of the input-delayed system can be cast into that of the delay-free system. Second, by virtue of the Hamilton-Jacobi-Bellman (HJB) equations, an optimal control policy for each agent is designed based on the delay-free system and a novel value iteration algorithm is proposed to learn the solutions to the HJB equations online. The proposed adaptive dynamic programming algorithm is implemented on the basis of a critic-action neural network (NN) structure. Third, it is proved that local consensus errors of the two systems and weight estimation errors of the critic-action NNs are uniformly ultimately bounded while the approximated control policies converge to their target values. Finally, two simulation examples are presented to illustrate the effectiveness of the developed method.
Learning Theory and the Typewriter Teacher
ERIC Educational Resources Information Center
Wakin, B. Bertha
1974-01-01
Eight basic principles of learning are described and discussed in terms of practical learning strategies for typewriting. Described are goal setting, preassessment, active participation, individual differences, reinforcement, practice, transfer of learning, and evaluation. (SC)
Pechtel, Pia; Pizzagalli, Diego A.
2013-01-01
Context Childhood sexual abuse (CSA) has been associated with psychopathology, particularly major depressive disorder (MDD), and high-risk behaviors. Despite grave epidemiological data, the mechanisms underlying these maladaptive outcomes remain poorly understood. Objective We examined whether CSA history, particularly in conjunction with past MDD, is associated with behavioral and neural dysfunction in reinforcement learning, and whether such dysfunction is linked to maladaptive behavior. Design Participants completed a clinical evaluation and a probabilistic reinforcement task while 128-channel event-related potentials were recorded. Setting Academic setting; participants recruited from the community. Participants Fifteen remitted depressed females with CSA history (CSA+rMDD), 16 remitted depressed females without CSA history (rMDD), and 18 healthy females. Main Outcome Measures Participants’ preference for choosing the most rewarded stimulus and avoiding the most punished stimulus was evaluated. The feedback-related negativity (FRN) and error-related negativity (ERN)–hypothesized to reflect activation in the anterior cingulate cortex–were used as electrophysiological indices of reinforcement learning. Results No group differences emerged in the acquisition of reinforcement contingencies. In trials requiring to rely partially or exclusively on previously rewarded information, the CSA+rMDD group showed (1) lower accuracy (relative to both controls and rMDD), (2) blunted electrophysiological differentiation between correct and incorrect responses (relative to controls), and (3) increased activation in the subgenual anterior cingulate cortex (relative to rMDD). CSA history was not associated with impairments in avoiding the most punished stimulus. Self-harm and suicidal behaviors correlated with poorer performance of previously rewarded–but not previously punished–trials. Conclusions Irrespective of past MDD, women with CSA histories showed neural and behavioral deficits in utilizing previous reinforcement to optimize decision-making in the absence of feedback (blunted “Go learning”). While the current study provides initial evidence for reward-specific deficits associated with CSA, future research is warranted to determine if disrupted positive reinforcement learning predicts high-risk behavior following CSA. PMID:23487253
Reinforcement Learning Deficits in People with Schizophrenia Persist after Extended Trials
Cicero, David C.; Martin, Elizabeth A.; Becker, Theresa M.; Kerns, John G.
2014-01-01
Previous research suggests that people with schizophrenia have difficulty learning from positive feedback and when learning needs to occur rapidly. However, they seem to have relatively intact learning from negative feedback when learning occurs gradually. Participants are typically given a limited amount of acquisition trials to learn the reward contingencies and then tested about what they learned. The current study examined whether participants with schizophrenia continue to display these deficits when given extra time to learn the contingences. Participants with schizophrenia and matched healthy controls completed the Probabilistic Selection Task, which measures positive and negative feedback learning separately. Participants with schizophrenia showed a deficit in learning from both positive and negative feedback. These reward learning deficits persisted even if people with schizophrenia are given extra time (up to 10 blocks of 60 trials) to learn the reward contingencies. These results suggest that the observed deficits cannot be attributed solely to slower learning and instead reflect a specific deficit in reinforcement learning. PMID:25172610
Reinforcement learning deficits in people with schizophrenia persist after extended trials.
Cicero, David C; Martin, Elizabeth A; Becker, Theresa M; Kerns, John G
2014-12-30
Previous research suggests that people with schizophrenia have difficulty learning from positive feedback and when learning needs to occur rapidly. However, they seem to have relatively intact learning from negative feedback when learning occurs gradually. Participants are typically given a limited amount of acquisition trials to learn the reward contingencies and then tested about what they learned. The current study examined whether participants with schizophrenia continue to display these deficits when given extra time to learn the contingences. Participants with schizophrenia and matched healthy controls completed the Probabilistic Selection Task, which measures positive and negative feedback learning separately. Participants with schizophrenia showed a deficit in learning from both positive feedback and negative feedback. These reward learning deficits persisted even if people with schizophrenia are given extra time (up to 10 blocks of 60 trials) to learn the reward contingencies. These results suggest that the observed deficits cannot be attributed solely to slower learning and instead reflect a specific deficit in reinforcement learning. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Reinforcement learning of periodical gaits in locomotion robots
NASA Astrophysics Data System (ADS)
Svinin, Mikhail; Yamada, Kazuyaki; Ushio, S.; Ueda, Kanji
1999-08-01
Emergence of stable gaits in locomotion robots is studied in this paper. A classifier system, implementing an instance- based reinforcement learning scheme, is used for sensory- motor control of an eight-legged mobile robot. Important feature of the classifier system is its ability to work with the continuous sensor space. The robot does not have a prior knowledge of the environment, its own internal model, and the goal coordinates. It is only assumed that the robot can acquire stable gaits by learning how to reach a light source. During the learning process the control system, is self-organized by reinforcement signals. Reaching the light source defines a global reward. Forward motion gets a local reward, while stepping back and falling down get a local punishment. Feasibility of the proposed self-organized system is tested under simulation and experiment. The control actions are specified at the leg level. It is shown that, as learning progresses, the number of the action rules in the classifier systems is stabilized to a certain level, corresponding to the acquired gait patterns.
Biases in probabilistic category learning in relation to social anxiety
Abraham, Anna; Hermann, Christiane
2015-01-01
Instrumental learning paradigms are rarely employed to investigate the mechanisms underlying acquired fear responses in social anxiety. Here, we adapted a probabilistic category learning paradigm to assess information processing biases as a function of the degree of social anxiety traits in a sample of healthy individuals without a diagnosis of social phobia. Participants were presented with three pairs of neutral faces with differing probabilistic accuracy contingencies (A/B: 80/20, C/D: 70/30, E/F: 60/40). Upon making their choice, negative and positive feedback was conveyed using angry and happy faces, respectively. The highly socially anxious group showed a strong tendency to be more accurate at learning the probability contingency associated with the most ambiguous stimulus pair (E/F: 60/40). Moreover, when pairing the most positively reinforced stimulus or the most negatively reinforced stimulus with all the other stimuli in a test phase, the highly socially anxious group avoided the most negatively reinforced stimulus significantly more than the control group. The results are discussed with reference to avoidance learning and hypersensitivity to negative socially evaluative information associated with social anxiety. PMID:26347685
Surprise beyond prediction error
Chumbley, Justin R; Burke, Christopher J; Stephan, Klaas E; Friston, Karl J; Tobler, Philippe N; Fehr, Ernst
2014-01-01
Surprise drives learning. Various neural “prediction error” signals are believed to underpin surprise-based reinforcement learning. Here, we report a surprise signal that reflects reinforcement learning but is neither un/signed reward prediction error (RPE) nor un/signed state prediction error (SPE). To exclude these alternatives, we measured surprise responses in the absence of RPE and accounted for a host of potential SPE confounds. This new surprise signal was evident in ventral striatum, primary sensory cortex, frontal poles, and amygdala. We interpret these findings via a normative model of surprise. PMID:24700400
Towards autonomous neuroprosthetic control using Hebbian reinforcement learning.
Mahmoudi, Babak; Pohlmeyer, Eric A; Prins, Noeline W; Geng, Shijia; Sanchez, Justin C
2013-12-01
Our goal was to design an adaptive neuroprosthetic controller that could learn the mapping from neural states to prosthetic actions and automatically adjust adaptation using only a binary evaluative feedback as a measure of desirability/undesirability of performance. Hebbian reinforcement learning (HRL) in a connectionist network was used for the design of the adaptive controller. The method combines the efficiency of supervised learning with the generality of reinforcement learning. The convergence properties of this approach were studied using both closed-loop control simulations and open-loop simulations that used primate neural data from robot-assisted reaching tasks. The HRL controller was able to perform classification and regression tasks using its episodic and sequential learning modes, respectively. In our experiments, the HRL controller quickly achieved convergence to an effective control policy, followed by robust performance. The controller also automatically stopped adapting the parameters after converging to a satisfactory control policy. Additionally, when the input neural vector was reorganized, the controller resumed adaptation to maintain performance. By estimating an evaluative feedback directly from the user, the HRL control algorithm may provide an efficient method for autonomous adaptation of neuroprosthetic systems. This method may enable the user to teach the controller the desired behavior using only a simple feedback signal.
Suppression of Striatal Prediction Errors by the Prefrontal Cortex in Placebo Hypoalgesia.
Schenk, Lieven A; Sprenger, Christian; Onat, Selim; Colloca, Luana; Büchel, Christian
2017-10-04
Classical learning theories predict extinction after the discontinuation of reinforcement through prediction errors. However, placebo hypoalgesia, although mediated by associative learning, has been shown to be resistant to extinction. We tested the hypothesis that this is mediated by the suppression of prediction error processing through the prefrontal cortex (PFC). We compared pain modulation through treatment cues (placebo hypoalgesia, treatment context) with pain modulation through stimulus intensity cues (stimulus context) during functional magnetic resonance imaging in 48 male and female healthy volunteers. During acquisition, our data show that expectations are correctly learned and that this is associated with prediction error signals in the ventral striatum (VS) in both contexts. However, in the nonreinforced test phase, pain modulation and expectations of pain relief persisted to a larger degree in the treatment context, indicating that the expectations were not correctly updated in the treatment context. Consistently, we observed significantly stronger neural prediction error signals in the VS in the stimulus context compared with the treatment context. A connectivity analysis revealed negative coupling between the anterior PFC and the VS in the treatment context, suggesting that the PFC can suppress the expression of prediction errors in the VS. Consistent with this, a participant's conceptual views and beliefs about treatments influenced the pain modulation only in the treatment context. Our results indicate that in placebo hypoalgesia contextual treatment information engages prefrontal conceptual processes, which can suppress prediction error processing in the VS and lead to reduced updating of treatment expectancies, resulting in less extinction of placebo hypoalgesia. SIGNIFICANCE STATEMENT In aversive and appetitive reinforcement learning, learned effects show extinction when reinforcement is discontinued. This is thought to be mediated by prediction errors (i.e., the difference between expectations and outcome). Although reinforcement learning has been central in explaining placebo hypoalgesia, placebo hypoalgesic effects show little extinction and persist after the discontinuation of reinforcement. Our results support the idea that conceptual treatment beliefs bias the neural processing of expectations in a treatment context compared with a more stimulus-driven processing of expectations with stimulus intensity cues. We provide evidence that this is associated with the suppression of prediction error processing in the ventral striatum by the prefrontal cortex. This provides a neural basis for persisting effects in reinforcement learning and placebo hypoalgesia. Copyright © 2017 the authors 0270-6474/17/379715-09$15.00/0.
Working Memory Contributions to Reinforcement Learning Impairments in Schizophrenia
Brown, Jaime K.; Gold, James M.; Waltz, James A.; Frank, Michael J.
2014-01-01
Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine-dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia. PMID:25297101
Bublitz, Alexander; Weinhold, Severine R.; Strobel, Sophia; Dehnhardt, Guido; Hanke, Frederike D.
2017-01-01
Octopuses (Octopus vulgaris) are generally considered to possess extraordinary cognitive abilities including the ability to successfully perform in a serial reversal learning task. During reversal learning, an animal is presented with a discrimination problem and after reaching a learning criterion, the signs of the stimuli are reversed: the former positive becomes the negative stimulus and vice versa. If an animal improves its performance over reversals, it is ascribed advanced cognitive abilities. Reversal learning has been tested in octopus in a number of studies. However, the experimental procedures adopted in these studies involved pre-training on the new positive stimulus after a reversal, strong negative reinforcement or might have enabled secondary cueing by the experimenter. These procedures could have all affected the outcome of reversal learning. Thus, in this study, serial visual reversal learning was revisited in octopus. We trained four common octopuses (O. vulgaris) to discriminate between 2-dimensional stimuli presented on a monitor in a simultaneous visual discrimination task and reversed the signs of the stimuli each time the animals reached the learning criterion of ≥80% in two consecutive sessions. The animals were trained using operant conditioning techniques including a secondary reinforcer, a rod that was pushed up and down the feeding tube, which signaled the correctness of a response and preceded the subsequent primary reinforcement of food. The experimental protocol did not involve negative reinforcement. One animal completed four reversals and showed progressive improvement, i.e., it decreased its errors to criterion the more reversals it experienced. This animal developed a generalized response strategy. In contrast, another animal completed only one reversal, whereas two animals did not learn to reverse during the first reversal. In conclusion, some octopus individuals can learn to reverse in a visual task demonstrating behavioral flexibility even with a refined methodology. PMID:28223940
Dao, Tien Tuan; Hoang, Tuan Nha; Ta, Xuan Hien; Tho, Marie Christine Ho Ba
2013-02-01
Human musculoskeletal system resources of the human body are valuable for the learning and medical purposes. Internet-based information from conventional search engines such as Google or Yahoo cannot response to the need of useful, accurate, reliable and good-quality human musculoskeletal resources related to medical processes, pathological knowledge and practical expertise. In this present work, an advanced knowledge-based personalized search engine was developed. Our search engine was based on a client-server multi-layer multi-agent architecture and the principle of semantic web services to acquire dynamically accurate and reliable HMSR information by a semantic processing and visualization approach. A security-enhanced mechanism was applied to protect the medical information. A multi-agent crawler was implemented to develop a content-based database of HMSR information. A new semantic-based PageRank score with related mathematical formulas were also defined and implemented. As the results, semantic web service descriptions were presented in OWL, WSDL and OWL-S formats. Operational scenarios with related web-based interfaces for personal computers and mobile devices were presented and analyzed. Functional comparison between our knowledge-based search engine, a conventional search engine and a semantic search engine showed the originality and the robustness of our knowledge-based personalized search engine. In fact, our knowledge-based personalized search engine allows different users such as orthopedic patient and experts or healthcare system managers or medical students to access remotely into useful, accurate, reliable and good-quality HMSR information for their learning and medical purposes. Copyright © 2012 Elsevier Inc. All rights reserved.
Learning in Mental Retardation: A Comprehensive Bibliography.
ERIC Educational Resources Information Center
Gardner, James M.; And Others
The bibliography on learning in mentally handicapped persons is divided into the following topic categories: applied behavior change, classical conditioning, discrimination, generalization, motor learning, reinforcement, verbal learning, and miscellaneous. An author index is included. (KW)
Asynchronous Gossip for Averaging and Spectral Ranking
NASA Astrophysics Data System (ADS)
Borkar, Vivek S.; Makhijani, Rahul; Sundaresan, Rajesh
2014-08-01
We consider two variants of the classical gossip algorithm. The first variant is a version of asynchronous stochastic approximation. We highlight a fundamental difficulty associated with the classical asynchronous gossip scheme, viz., that it may not converge to a desired average, and suggest an alternative scheme based on reinforcement learning that has guaranteed convergence to the desired average. We then discuss a potential application to a wireless network setting with simultaneous link activation constraints. The second variant is a gossip algorithm for distributed computation of the Perron-Frobenius eigenvector of a nonnegative matrix. While the first variant draws upon a reinforcement learning algorithm for an average cost controlled Markov decision problem, the second variant draws upon a reinforcement learning algorithm for risk-sensitive control. We then discuss potential applications of the second variant to ranking schemes, reputation networks, and principal component analysis.
Reinforcement learning in professional basketball players
Neiman, Tal; Loewenstein, Yonatan
2011-01-01
Reinforcement learning in complex natural environments is a challenging task because the agent should generalize from the outcomes of actions taken in one state of the world to future actions in different states of the world. The extent to which human experts find the proper level of generalization is unclear. Here we show, using the sequences of field goal attempts made by professional basketball players, that the outcome of even a single field goal attempt has a considerable effect on the rate of subsequent 3 point shot attempts, in line with standard models of reinforcement learning. However, this change in behaviour is associated with negative correlations between the outcomes of successive field goal attempts. These results indicate that despite years of experience and high motivation, professional players overgeneralize from the outcomes of their most recent actions, which leads to decreased performance. PMID:22146388
Universal effect of dynamical reinforcement learning mechanism in spatial evolutionary games
NASA Astrophysics Data System (ADS)
Zhang, Hai-Feng; Wu, Zhi-Xi; Wang, Bing-Hong
2012-06-01
One of the prototypical mechanisms in understanding the ubiquitous cooperation in social dilemma situations is the win-stay, lose-shift rule. In this work, a generalized win-stay, lose-shift learning model—a reinforcement learning model with dynamic aspiration level—is proposed to describe how humans adapt their social behaviors based on their social experiences. In the model, the players incorporate the information of the outcomes in previous rounds with time-dependent aspiration payoffs to regulate the probability of choosing cooperation. By investigating such a reinforcement learning rule in the spatial prisoner's dilemma game and public goods game, a most noteworthy viewpoint is that moderate greediness (i.e. moderate aspiration level) favors best the development and organization of collective cooperation. The generality of this observation is tested against different regulation strengths and different types of network of interaction as well. We also make comparisons with two recently proposed models to highlight the importance of the mechanism of adaptive aspiration level in supporting cooperation in structured populations.
Decker, Johannes H.; Otto, A. Ross; Daw, Nathaniel D.; Hartley, Catherine A.
2016-01-01
Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative strategies has not been well characterized. In this study, children, adolescents, and adults, performed a sequential reinforcement-learning task that enables estimation of model-based and model-free contributions to choice. Whereas a model-free strategy was evident in choice behavior across all age groups, evidence of a model-based strategy only emerged during adolescence and continued to increase into adulthood. These results suggest that recruitment of model-based valuation systems represents a critical cognitive component underlying the gradual maturation of goal-directed behavior. PMID:27084852
Reinforcement learning or active inference?
Friston, Karl J; Daunizeau, Jean; Kiebel, Stefan J
2009-07-29
This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.
Franklin, Nicholas T; Frank, Michael J
2015-01-01
Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments. DOI: http://dx.doi.org/10.7554/eLife.12029.001 PMID:26705698
Distributed Economic Dispatch in Microgrids Based on Cooperative Reinforcement Learning.
Liu, Weirong; Zhuang, Peng; Liang, Hao; Peng, Jun; Huang, Zhiwu; Weirong Liu; Peng Zhuang; Hao Liang; Jun Peng; Zhiwu Huang; Liu, Weirong; Liang, Hao; Peng, Jun; Zhuang, Peng; Huang, Zhiwu
2018-06-01
Microgrids incorporated with distributed generation (DG) units and energy storage (ES) devices are expected to play more and more important roles in the future power systems. Yet, achieving efficient distributed economic dispatch in microgrids is a challenging issue due to the randomness and nonlinear characteristics of DG units and loads. This paper proposes a cooperative reinforcement learning algorithm for distributed economic dispatch in microgrids. Utilizing the learning algorithm can avoid the difficulty of stochastic modeling and high computational complexity. In the cooperative reinforcement learning algorithm, the function approximation is leveraged to deal with the large and continuous state spaces. And a diffusion strategy is incorporated to coordinate the actions of DG units and ES devices. Based on the proposed algorithm, each node in microgrids only needs to communicate with its local neighbors, without relying on any centralized controllers. Algorithm convergence is analyzed, and simulations based on real-world meteorological and load data are conducted to validate the performance of the proposed algorithm.
Conflict resolution in multi-agent hybrid systems
DOT National Transportation Integrated Search
1996-12-01
A conflict resolution architecture for multi-agent hybrid systems with emphasis on Air Traffic Management Systems (ATMS) is presented. In such systems, conflicts arise in the form of potential collisions which are resolved locally by inter-agent coor...
Distributed Consensus of Stochastic Delayed Multi-agent Systems Under Asynchronous Switching.
Wu, Xiaotai; Tang, Yang; Cao, Jinde; Zhang, Wenbing
2016-08-01
In this paper, the distributed exponential consensus of stochastic delayed multi-agent systems with nonlinear dynamics is investigated under asynchronous switching. The asynchronous switching considered here is to account for the time of identifying the active modes of multi-agent systems. After receipt of confirmation of mode's switching, the matched controller can be applied, which means that the switching time of the matched controller in each node usually lags behind that of system switching. In order to handle the coexistence of switched signals and stochastic disturbances, a comparison principle of stochastic switched delayed systems is first proved. By means of this extended comparison principle, several easy to verified conditions for the existence of an asynchronously switched distributed controller are derived such that stochastic delayed multi-agent systems with asynchronous switching and nonlinear dynamics can achieve global exponential consensus. Two examples are given to illustrate the effectiveness of the proposed method.
Research progress of microbial corrosion of reinforced concrete structure
NASA Astrophysics Data System (ADS)
Li, Shengli; Li, Dawang; Jiang, Nan; Wang, Dongwei
2011-04-01
Microbial corrosion of reinforce concrete structure is a new branch of learning. This branch deals with civil engineering , environment engineering, biology, chemistry, materials science and so on and is a interdisciplinary area. Research progress of the causes, research methods and contents of microbial corrosion of reinforced concrete structure is described. The research in the field is just beginning and concerted effort is needed to go further into the mechanism of reinforce concrete structure and assess the security and natural life of reinforce concrete structure under the special condition and put forward the protective methods.
Reinforcement learning with Marr.
Niv, Yael; Langdon, Angela
2016-10-01
To many, the poster child for David Marr's famous three levels of scientific inquiry is reinforcement learning-a computational theory of reward optimization, which readily prescribes algorithmic solutions that evidence striking resemblance to signals found in the brain, suggesting a straightforward neural implementation. Here we review questions that remain open at each level of analysis, concluding that the path forward to their resolution calls for inspiration across levels, rather than a focus on mutual constraints.
Modeling Avoidance in Mood and Anxiety Disorders Using Reinforcement Learning.
Mkrtchian, Anahit; Aylward, Jessica; Dayan, Peter; Roiser, Jonathan P; Robinson, Oliver J
2017-10-01
Serious and debilitating symptoms of anxiety are the most common mental health problem worldwide, accounting for around 5% of all adult years lived with disability in the developed world. Avoidance behavior-avoiding social situations for fear of embarrassment, for instance-is a core feature of such anxiety. However, as for many other psychiatric symptoms the biological mechanisms underlying avoidance remain unclear. Reinforcement learning models provide formal and testable characterizations of the mechanisms of decision making; here, we examine avoidance in these terms. A total of 101 healthy participants and individuals with mood and anxiety disorders completed an approach-avoidance go/no-go task under stress induced by threat of unpredictable shock. We show an increased reliance in the mood and anxiety group on a parameter of our reinforcement learning model that characterizes a prepotent (pavlovian) bias to withhold responding in the face of negative outcomes. This was particularly the case when the mood and anxiety group was under stress. This formal description of avoidance within the reinforcement learning framework provides a new means of linking clinical symptoms with biophysically plausible models of neural circuitry and, as such, takes us closer to a mechanistic understanding of mood and anxiety disorders. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Arnold, Megan A; Newland, M Christopher
2018-06-16
Behavioral inflexibility is often assessed using reversal learning tasks, which require a relatively low degree of response variability. No studies have assessed sensitivity to reinforcement contingencies that specifically select highly variable response patterns in mice, let alone in models of neurodevelopmental disorders involving limited response variation. Operant variability and incremental repeated acquisition (IRA) were used to assess unique aspects of behavioral variability of two mouse strains: BALB/c, a model of some deficits in ASD, and C57Bl/6. On the operant variability task, BALB/c mice responded more repetitively during adolescence than C57Bl/6 mice when reinforcement did not require variability but responded more variably when reinforcement required variability. During IRA testing in adulthood, both strains acquired an unchanging, performance sequence equally well. Strain differences emerged, however, after novel learning sequences began alternating with the performance sequence: BALB/c mice substantially outperformed C57Bl/6 mice. Using litter-mate controls, it was found that adolescent experience with variability did not affect either learning or performance on the IRA task in adulthood. These findings constrain the use of BALB/c mice as a model of ASD, but once again reveal this strain is highly sensitive to reinforcement contingencies and they are fast and robust learners. Copyright © 2018. Published by Elsevier B.V.
Preventing Learned Helplessness.
ERIC Educational Resources Information Center
Hoy, Cheri
1986-01-01
To prevent learned helplessness in learning disabled students, teachers can share responsibilities with the students, train students to reinforce themselves for effort and self control, and introduce opportunities for changing counterproductive attitudes. (CL)
Bouton, Mark E.; Winterbauer, Neil E.; Todd, Travis P.
2012-01-01
It is widely recognized that extinction (the procedure in which a Pavlovian conditioned stimulus or an instrumental action is repeatedly presented without its reinforcer) weakens behavior without erasing the original learning. Most of the experiments that support this claim have focused on several “relapse” effects that occur after Pavlovian extinction, which collectively suggest that the original learning is saved through extinction. However, although such effects do occur after instrumental extinction, they have not been explored there in as much detail. This article reviews recent research in our laboratory that has investigated three relapse effects that occur after the extinction of instrumental (operant) learning. In renewal, responding returns after extinction when the behavior is tested in a different context; in resurgence, responding recovers when a second response that has been reinforced during extinction of the first is itself put on extinction; and in rapid reacquisition, extinguished responding returns rapidly when the response is reinforced again. The results provide new insights into extinction and relapse, and are consistent with principles that have been developed to explain extinction and relapse as they occur after Pavlovian conditioning. Extinction of instrumental learning, like Pavlovian learning, involves new learning that is relatively dependent on the context for expression. PMID:22450305
Enhanced appetitive learning and reversal learning in a mouse model for Prader-Willi syndrome.
Relkovic, Dinko; Humby, Trevor; Hagan, Jim J; Wilkinson, Lawrence S; Isles, Anthony R
2012-06-01
Prader-Willi syndrome (PWS) is caused by lack of paternally derived gene expression from the imprinted gene cluster on human chromosome 15q11-q13. PWS is characterized by severe hypotonia, a failure to thrive in infancy and, on emerging from infancy, evidence of learning disabilities and overeating behavior due to an abnormal satiety response and increased motivation by food. We have previously shown that an imprinting center deletion mouse model (PWS-IC) is quicker to acquire a preference for, and consume more of a palatable food. Here we examined how the use of this palatable food as a reinforcer influences learning in PWS-IC mice performing a simple appetitive learning task. On a nonspatial maze-based task, PWS-IC mice acquired criteria much quicker, making fewer errors during initial acquisition and also reversal learning. A manipulation where the reinforcer was devalued impaired wild-type performance but had no effect on PWS-IC mice. This suggests that increased motivation for the reinforcer in PWS-IC mice may underlie their enhanced learning. This supports previous findings in PWS patients and is the first behavioral study of an animal model of PWS in which the motivation of behavior by food rewards has been examined. © 2012 American Psychological Association
Economic decision-making in the ultimatum game by smokers.
Takahashi, Taiki
2007-10-01
No study to date compared degrees of inequity aversion in economic decision-making in the ultimatum game between non-addictive and addictive reinforcers. The comparison is potentially important in neuroeconomics and reinforcement learning theory of addiction. We compared the degrees of inequity aversion in the ultimatum game between money and cigarettes in habitual smokers. Smokers avoided inequity in the ultimatum game more dramatically for money than for cigarettes; i.e., there was a "domain effect" in decision-making in the ultimatum game. Reward-processing neural activities in the brain for non-addictive and addictive reinforcers may be distinct and the insula activation due to cue-induced craving may conflict with unfair offer-induced insula activation. Future studies in neuroeconomics of addiction should employ game-theoretic decision tasks for elucidating reinforcement learning processes in dopaminergic neural circuits.
Chang, Li-Chiu; Chen, Pin-An; Chang, Fi-John
2012-08-01
A reliable forecast of future events possesses great value. The main purpose of this paper is to propose an innovative learning technique for reinforcing the accuracy of two-step-ahead (2SA) forecasts. The real-time recurrent learning (RTRL) algorithm for recurrent neural networks (RNNs) can effectively model the dynamics of complex processes and has been used successfully in one-step-ahead forecasts for various time series. A reinforced RTRL algorithm for 2SA forecasts using RNNs is proposed in this paper, and its performance is investigated by two famous benchmark time series and a streamflow during flood events in Taiwan. Results demonstrate that the proposed reinforced 2SA RTRL algorithm for RNNs can adequately forecast the benchmark (theoretical) time series, significantly improve the accuracy of flood forecasts, and effectively reduce time-lag effects.
Fun While Learning and Earning. A Look Into Chattanooga Public Schools' Token Reinforcement Program.
ERIC Educational Resources Information Center
Smith, William F.; Sanders, Frank J.
A token reinforcement program was used by the Piney Woods Research and Demonstration Center in Chattanooga, Tennessee. Children who were from economically deprived homes received tokens for positive behavior. The tokens were redeemable for recess privileges, ice cream, candy, and other such reinforcers. All tokens were spent on the day earned so…
ERIC Educational Resources Information Center
Raska, David; Keller, Eileen Weisenbach; Shaw, Doris
2014-01-01
Curriculum-Faculty-Reinforcement (CFR) alignment is an alignment between fundamental marketing concepts that are integral to the mastery of knowledge expected of our marketing graduates, their perceived importance by the faculty, and their level of reinforcement throughout core marketing courses required to obtain a marketing degree. This research…
ERIC Educational Resources Information Center
Diegelmann, Soeren; Zars, Melissa; Zars, Troy
2006-01-01
Memories can have different strengths, largely dependent on the intensity of reinforcers encountered. The relationship between reinforcement and memory strength is evident in asymptotic memory curves, with the level of the asymptote related to the intensity of the reinforcer. Although this is likely a fundamental property of memory formation,…
ERIC Educational Resources Information Center
Moreno-Fernandez, Maria M.; Abad, Maria J. F.; Ramos-Alvarez, Manuel M.; Rosas, Juan M.
2011-01-01
Predictive value for continuously reinforced cues is affected by context changes when they are trained within a context in which a different cue undergoes partial reinforcement. An experiment was conducted with the goal of exploring the mechanisms underlying this context-switch effect. Human participants were trained in a predictive learning…
The Use of Reinforcement Procedures in Teaching Reading to Rural Culturally Deprived Children.
ERIC Educational Resources Information Center
Egeland, Byron
A group of culturally deprived children with severe reading and behavior problems was systematically given tangible reinforcers while learning to read. Twelve second-grade and 12 third-grade boys from a rural and lower socioeconomic background were taught reading with the use of tangible reinforcers (E group). Four similar control groups (C group)…
Multi-Agent Information Classification Using Dynamic Acquaintance Lists.
ERIC Educational Resources Information Center
Mukhopadhyay, Snehasis; Peng, Shengquan; Raje, Rajeev; Palakal, Mathew; Mostafa, Javed
2003-01-01
Discussion of automated information services focuses on information classification and collaborative agents, i.e. intelligent computer programs. Highlights include multi-agent systems; distributed artificial intelligence; thesauri; document representation and classification; agent modeling; acquaintances, or remote agents discovered through…
Taylor, Jordan A; Ivry, Richard B
2014-01-01
Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. © 2014 Elsevier B.V. All rights reserved.
Katnani, Husam A; Patel, Shaun R; Kwon, Churl-Su; Abdel-Aziz, Samer; Gale, John T; Eskandar, Emad N
2016-01-04
The primate brain has the remarkable ability of mapping sensory stimuli into motor behaviors that can lead to positive outcomes. We have previously shown that during the reinforcement of visual-motor behavior, activity in the caudate nucleus is correlated with the rate of learning. Moreover, phasic microstimulation in the caudate during the reinforcement period was shown to enhance associative learning, demonstrating the importance of temporal specificity to manipulate learning related changes. Here we present evidence that extends upon our previous finding by demonstrating that temporally coordinated phasic deep brain stimulation across both the nucleus accumbens and caudate can further enhance associative learning. Monkeys performed a visual-motor associative learning task and received stimulation at time points critical to learning related changes. Resulting performance revealed an enhancement in the rate, ceiling, and reaction times of learning. Stimulation of each brain region alone or at different time points did not generate the same effect.
Cerebellar and Prefrontal Cortex Contributions to Adaptation, Strategies, and Reinforcement Learning
Taylor, Jordan A.; Ivry, Richard B.
2014-01-01
Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. PMID:24916295
Modeling the behavioral substrates of associate learning and memory - Adaptive neural models
NASA Technical Reports Server (NTRS)
Lee, Chuen-Chien
1991-01-01
Three adaptive single-neuron models based on neural analogies of behavior modification episodes are proposed, which attempt to bridge the gap between psychology and neurophysiology. The proposed models capture the predictive nature of Pavlovian conditioning, which is essential to the theory of adaptive/learning systems. The models learn to anticipate the occurrence of a conditioned response before the presence of a reinforcing stimulus when training is complete. Furthermore, each model can find the most nonredundant and earliest predictor of reinforcement. The behavior of the models accounts for several aspects of basic animal learning phenomena in Pavlovian conditioning beyond previous related models. Computer simulations show how well the models fit empirical data from various animal learning paradigms.
Stimulus discriminability may bias value-based probabilistic learning.
Schutte, Iris; Slagter, Heleen A; Collins, Anne G E; Frank, Michael J; Kenemans, J Leon
2017-01-01
Reinforcement learning tasks are often used to assess participants' tendency to learn more from the positive or more from the negative consequences of one's action. However, this assessment often requires comparison in learning performance across different task conditions, which may differ in the relative salience or discriminability of the stimuli associated with more and less rewarding outcomes, respectively. To address this issue, in a first set of studies, participants were subjected to two versions of a common probabilistic learning task. The two versions differed with respect to the stimulus (Hiragana) characters associated with reward probability. The assignment of character to reward probability was fixed within version but reversed between versions. We found that performance was highly influenced by task version, which could be explained by the relative perceptual discriminability of characters assigned to high or low reward probabilities, as assessed by a separate discrimination experiment. Participants were more reliable in selecting rewarding characters that were more discriminable, leading to differences in learning curves and their sensitivity to reward probability. This difference in experienced reinforcement history was accompanied by performance biases in a test phase assessing ability to learn from positive vs. negative outcomes. In a subsequent large-scale web-based experiment, this impact of task version on learning and test measures was replicated and extended. Collectively, these findings imply a key role for perceptual factors in guiding reward learning and underscore the need to control stimulus discriminability when making inferences about individual differences in reinforcement learning.